This database system collects publicly available news text data from the e-versions of 37 leading domestic financial and general-interest newspapers and periodicals, covering key fields such as Chinese site names, publication dates, section names, primary headlines, headlines, sub-headlines, authors, images, and full-text content. It provides comprehensively structured news materials. The database undergoes continuous real-time updates, with over 14.49 million news entries accumulated by the end of 2025, offering a large-scale, sustainable textual resource for observing China's financial discourse dynamics, market information dissemination, and media trends.
Data Characteristics:
- Diverse Sources with Representativeness: The dataset encompasses influential financial and general publications such as China Securities Journal, Shanghai Securities News, Securities Times, People's Daily, and Securities Daily, reflecting core perspectives within mainstream financial discourse.
- Large Temporal Span Supporting Longitudinal Research: Contains over a decade of continuous observational data from 20 publications, facilitating vertical studies on long-term themes including macroeconomic trends, market cycles, and policy evolution.
- Synchronized Updates Capturing Dynamic Changes: Data updates align with source publications' release schedules, enabling real-time tracking and analysis of market trends, public sentiment events, and policy announcements. Supports modern text analysis methods such as natural language processing, sentiment analysis, and topic modeling.
Potential Application Scenarios:
- Financial Market and Sentiment Analysis: Researchers may analyze headline and body text to track market focus shifts and investor sentiment fluctuations, or investigate immediate/lagged effects of news on market indicators (e.g., stock prices, trading volumes) through publication timestamps.
- Policy Impact and Media Communication Studies: Longitudinal data supports content analysis of media framing and opinion leadership shifts following national economic policy releases, and enables examination of editorial stances and dissemination patterns across publications during major financial events.
- Text Mining and Computational Method Validation: The large-scale, domain-concentrated database serves as an ideal corpus for training/testing NLP models (e.g., text classification, entity recognition, summarization) in finance, and provides empirical validation for computational social science methodologies.
The CnOpenData Chinese Financial Newspaper Text Database systematically integrates content from public sources. By continuously aggregating mainstream Chinese financial news in a structured manner—combining macro-level temporal breadth with micro-level textual details—it establishes a robust data foundation for academic research, industry analysis, and decision-making support.
Temporal Coverage
Field Description
Sample Data
相关文献
- 姜富伟、刘雨旻、孟令超,2024:《大语言模型、文本情绪与金融市场》,《管理世界》第8期。
- 范小云、王业东、王道平等,2022:《不同来源金融文本信息含量的异质性分析——基于混合式文本情绪测度方法》,《管理世界》第10期。
- 许雪晨、田侃,2021:《一种基于金融文本情感分析的股票指数预测新方法》,《数量经济技术经济研究》第12期。
- 张宗新、吴钊颖,2021:《媒体情绪传染与分析师乐观偏差——基于机器学习文本分析方法的经验证据》,《管理世界》第1期。
数据更新频率
实时更新