This database system collects public news text data from the electronic editions of 37 domestic mainstream financial and general-interest newspapers, covering key fields such as Chinese site name, publication time, section name, primary headline, headline, secondary headline, author, images, and article body. It provides comprehensively structured news content. The data is continuously updated in real-time, with over 14.49 million news entries accumulated by the end of 2025, offering a large-scale, sustainable textual resource for observing China's financial discourse dynamics, market information dissemination, and media trends.
Data Characteristics:
- Extensive and Representative Sources: The data encompasses influential financial and general newspapers in China such as China Securities Journal, Shanghai Securities News, Securities Times, People's Daily, and Securities Daily, reflecting core voices in mainstream financial discourse.
- Long Time Span Supporting Longitudinal Research: The database contains over a decade of continuous observation data from 20 newspapers, suitable for long-term studies on macroeconomic trends, market cycles, and policy evolution.
- Real-time Updates Capturing Dynamic Changes: Data updates synchronize with newspaper releases, enabling immediate tracking and analysis of market hotspots, public opinion events, and policy announcements. Applicable for modern text analysis methods including natural language processing, sentiment analysis, and topic modeling.
Potential Application Scenarios:
- Financial Markets and Sentiment Analysis: Researchers can analyze market hotspot evolution and investor sentiment fluctuations through headlines and body text, while also examining immediate and lagged effects of news on stock prices and trading volumes using publication timestamps.
- Policy Impact and Media Communication Research: Long-term data supports content analysis of media framing and opinion shifts following national economic policy releases, as well as studies on reporting stances and communication patterns across different newspapers during major financial events.
- Text Mining and Computational Method Validation: The large-scale, domain-focused database serves as an ideal corpus for training and testing NLP models (e.g., text classification, entity recognition, summarization) in finance, while also facilitating empirical validation of computational social science methodologies.
The CnOpenData Chinese Financial Newspaper Text Database is systematically compiled from publicly available sources. By aggregating content from mainstream Chinese financial news outlets in a continuous, comprehensive, and structured format, it combines macro-level temporal coverage with micro-level textual information, providing a robust data foundation for academic research, industry analysis, and decision-making support.
Time Range
Field Display
Sample Data
相关文献
- 姜富伟、刘雨旻、孟令超,2024:《大语言模型、文本情绪与金融市场》,《管理世界》第8期。
- 范小云、王业东、王道平等,2022:《不同来源金融文本信息含量的异质性分析——基于混合式文本情绪测度方法》,《管理世界》第10期。
- 许雪晨、田侃,2021:《一种基于金融文本情感分析的股票指数预测新方法》,《数量经济技术经济研究》第12期。
- 张宗新、吴钊颖,2021:《媒体情绪传染与分析师乐观偏差——基于机器学习文本分析方法的经验证据》,《管理世界》第1期。
数据更新频率
实时更新