News Public Opinion Data

This database systematically collects and organizes publicly available news text data from 50 mainstream Chinese financial news websites, covering key fields such as site Chinese name, publication time, section name, first title, title, last title, author, images, and body text. The data is updated in real-time, with cumulative volume exceeding 36 million records by the end of 2025. It comprehensively and timely reflects the dynamic dissemination and content evolution of online financial information, providing large-scale, structured, and high-timeliness foundational data resources for empirical research and applied analysis based on financial texts.

Key Features:

  • Coverage of Mainstream Financial Platforms, Representing Market Focus: Data sources include influential core financial websites such as EastMoney.com, Sina Finance, Hexun.com, and Cailian Press, effectively capturing the core context of China's financial online discourse and market information.
  • Strong Real-time Capability, Supporting Immediate Response to Market Dynamics: Compared to traditional electronic newspapers, financial website updates are faster. This database maintains synchronization with source sites, enabling research on the immediate dissemination pathways of market news, emergencies, policy releases, and their short-term impacts on financial markets.
  • Large-scale, Domain-focused Data Suitable for Deep Mining and Modeling: With over 36 million records concentrated in the vertical financial domain, it provides high-quality, large-scale training corpora for developing domain-specific text analysis models (e.g., sentiment analysis, event extraction, topic classification).

Potential Applications:

  • Financial Market Microstructure Research: Leveraging high-frequency news data enables precise analysis of correlations between news popularity, sentiment orientation, and price volatility/trading volume changes in assets like stocks, bonds, and futures—particularly suitable for event studies and high-frequency data analysis.
  • Financial Public Opinion Monitoring and Propagation Analysis: By tracking variations in headline phrasing, publication timing, and content focus across different financial websites (e.g., EastMoney.com, The Paper, Jiemian News), researchers can analyze information dissemination networks, opinion formation processes, and media agenda-setting.
  • Quantitative Investment and Information Factor Construction: The extensive text repository supports building quantitative factors based on news sentiment, topic popularity, or analyst viewpoints, providing data foundations for algorithmic trading and investment strategy development.
  • Financial Text Processing Technology Development and Validation: With its large scale, domain specificity, and clear structure, this dataset serves as an ideal experimental resource for developing and evaluating financial NLP tasks (e.g., financial entity recognition, automatic summarization, relation extraction).

The CnOpenData Chinese Financial News Text Database, continuously curated from public web sources, offers massive scale, real-time updates, vertical domain coverage, and complete structural information. It establishes a robust textual data infrastructure for academic research, industry analysis, policy evaluation, and technological innovation.


Time Range


Field Display


Sample Data


相关文献

  • 姜富伟、刘雨旻、孟令超,2024:《大语言模型、文本情绪与金融市场》,《管理世界》第8期。
  • 范小云、王业东、王道平等,2022:《不同来源金融文本信息含量的异质性分析——基于混合式文本情绪测度方法》,《管理世界》第10期。
  • 许雪晨、田侃,2021:《一种基于金融文本情感分析的股票指数预测新方法》,《数量经济技术经济研究》第12期。
  • 张宗新、吴钊颖,2021:《媒体情绪传染与分析师乐观偏差——基于机器学习文本分析方法的经验证据》,《管理世界》第1期。

数据更新频率

实时更新