Chinese listed companies release daily market updates on major trading platforms and establish interactive communication channels for investors. Investors from across the country engage in posts and replies on online stock discussion platforms of listed companies, forming a massive-scale textual corpus related to Chinese stock market investments. Researchers can leverage this extensive investor interaction data to observe multidimensional information regarding market judgments and individual sentiments.
Recent studies by domestic and international scholars have revealed:
Media text sentiment can more accurately measure fluctuations in investor sentiment in China's stock market, demonstrating significant in-sample and out-of-sample predictive power for stock returns. It also exhibits notable predictive capabilities for key macroeconomic indicators, holding substantial academic and practical value.
— Jiang Fuwei, Meng Lingchao, Tang Guohao: Media Text Sentiment and Stock Return Prediction, China Economic Quarterly, 2021, Vol. 21, No. 4.
To facilitate academic research, CnOpenData has conducted quantitative and content-level (sentiment analysis) statistical processing on both post and reply data from A-share stock forums, including fields such as security code(证券代码), company name(公司名称), posting timeframe(发帖时间段), post count(发帖数量), positive/negative post ratios(正负面帖占比), and positive/negative post counts(正负面帖数量). This provides high-quality data support for related studies.
Data Features
- Temporal granularity: Covers stock forum posts and replies since 2008, with timestamps categorized into pre-market, morning/afternoon trading sessions, midday break, and post-market periods;
- Data volume: Contains 350 million main posts and 650 million replies, representing an ultra-large-scale textual corpus;
- Field richness: Incorporates not only temporal statistics of posts/replies but also sentiment analysis using the Chinese Financial Sentiment Dictionary, detailing annual positive, negative, and neutral post/reply counts and proportions for each company.
Data Scale
Time Coverage
- Posting period: 1988-2022
- Reply period: 2007-2023
Field Demonstration
Sample Data
A股上市公司股吧发帖文本统计数据
A股上市公司股吧回帖文本统计数据
References
- Zheng Jiandong, Lyu Xiaoliang, Lyu Bin, Guo Feng, 2022: Social Media Interactions and Capital Market Pricing Efficiency: Evidence from Big Data Analysis of Stock Forum Posts, The Journal of Quantitative & Technical Economics, No. 11.
- Yin Bichao, Kong Dongmin, Ji Mianmian, 2022: Does Retail Investor Activism Improve Audit Quality?, Accounting Research, No. 10.
- Fan Xiaoyun, Wang Yedong, Wang Daoping, Guo Wenxuan, Hu Xuanyi, 2022: Heterogeneity Analysis of Financial Text Information Sources: A Hybrid Sentiment Measurement Approach, Management World, No. 10.
- Zhu Mengnan, Liang Yuheng, Wu Zengming, 2020: Internet Information Networks and Stock Price Crash Risk: Public Supervision vs. Irrational Contagion, China Industrial Economics, No. 10.
- Sun Kunpeng, Wang Dan, Xiao Xing, 2020: Internet Information Environment Regulation and Social Media's Corporate Governance Role, Management World, No. 7.
- Wang Dan, Sun Kunpeng, Gao Hao, 2020: "Voting with Words" on Social Media and Its Impact on Management Earnings Forecasts, Journal of Financial Research, No. 11.
- Bu Hui, Xie Zheng, Li Jiahong, Wu Junjie, 2018: Investor Sentiment Derived from Stock Commentary and Its Market Impact, Journal of Management Sciences in China, No. 4.
- Chang, Yen-Cheng, Harrison G. Hong, Larissa Tiedens, Na Wang, Bin Zhao, 2015: Does Diversity Lead to Diverse Opinions? Evidence from Languages and Stock Markets, Rock Center for Corporate Governance Working Paper No. 168.
- Sheridan Titman, Chishen Wei, Bin Zhao, 2021: Corporate Actions and the Manipulation of Retail Investors in China: An Analysis of Stock Splits, Journal of Financial Economics.
Update Frequency
Annual Update