This database systematically collects and organizes public news text data from 46 mainstream Chinese financial news websites, covering key fields such as site Chinese name, publication time, section name, primary headline, headline, secondary headline, author, images, and main text. The data is updated in real-time, with a cumulative volume exceeding 130 million entries by the end of 2025. It comprehensively and promptly reflects the dynamic dissemination and content evolution of online financial information, providing large-scale, structured, and highly timely foundational data resources for empirical research and applied analysis based on financial texts.
Data Features:
- Comprehensive Coverage of Mainstream Financial Platforms, Representing Market Focus: Data sources include core financial websites with extensive influence among investors and markets, such as East Money, Sina Finance, Hexun, and Cailian Press. This enables effective capture of the core trajectory of Chinese financial online discourse and market information.
- Strong Real-time Capability, Supporting Instant Response Analysis of Market Dynamics: Compared to traditional newspapers, financial websites release information more rapidly. This database maintains synchronized updates with source sites and can be used to study the immediate dissemination pathways of market news,突发事件 (emerging events), and policy releases, as well as their short-term impact on financial markets.
- Large-scale Data with Concentrated Themes, Suitable for Deep Mining and Modeling: With over 130 million entries focused exclusively on the financial vertical domain, it provides high-quality, large-scale training corpora for developing domain-specific text analysis models (e.g., sentiment analysis, event extraction, topic classification).
Potential Application Scenarios:
- Financial Market Microstructure Research: Leveraging high-frequency news release data enables precise analysis of correlations between news popularity, sentiment orientation, and price volatility/trading volume changes of assets such as stocks, bonds, and futures. This is particularly suitable for event study methodology and high-frequency data analysis.
- Financial Public Opinion Monitoring and Dissemination Analysis: By tracking the headline phrasing, publication timelines, and content focus of the same event across different financial websites (e.g., East Money, The Paper, Jiemian), researchers can analyze dissemination networks,舆论形成过程 (public opinion formation processes), and media agenda-setting in financial information.
- Quantitative Investment and Information Factor Construction: The massive text repository can be utilized to construct quantitative factors based on news sentiment, topic popularity, or analyst viewpoints, providing a data foundation for algorithmic trading and investment strategy development.
- Financial Text Processing Technology Development and Validation: With its large scale, domain specificity, and clear structure, this dataset serves as ideal experimental data for developing and evaluating financial-domain natural language processing (NLP) tasks such as financial entity recognition, automatic summarization, and relation extraction.
The CnOpenData Chinese Financial News Website Text Database is continuously compiled from publicly available online sources. With its massive scale, real-time updates, vertical domain coverage, and complete structured information, it provides a robust financial text data infrastructure for academic research, industry analysis, policy evaluation, and technological innovation.
Time Range
Field Display
Sample Data
相关文献
- 姜富伟、刘雨旻、孟令超,2024:《大语言模型、文本情绪与金融市场》,《管理世界》第8期。
- 范小云、王业东、王道平等,2022:《不同来源金融文本信息含量的异质性分析——基于混合式文本情绪测度方法》,《管理世界》第10期。
- 许雪晨、田侃,2021:《一种基于金融文本情感分析的股票指数预测新方法》,《数量经济技术经济研究》第12期。
- 张宗新、吴钊颖,2021:《媒体情绪传染与分析师乐观偏差——基于机器学习文本分析方法的经验证据》,《管理世界》第1期。
数据更新频率
实时更新