Test - Pending Data

  CnOpenData's Wikipedia Hourly Page View Time Series Data compiles precise hourly viewership metrics for globally highly-accessed Wikipedia pages throughout January 2024. Structured in a time series format, this dataset records page titles across different language versions (domains) and their unique view counts at each hourly interval over 24-hour periods. It provides robust data support for in-depth analysis of Wikipedia browsing behaviors, shifts in user interests, and access trends for specific pages.

Data Uniqueness

  • High Temporal Resolution with Full Monthly Coverage: This dataset offers hourly-level page view data, comprehensively covering the entire natural month of January 2024. Such fine-grained temporal resolution enables analysts to track intraday attention fluctuations, pinpoint exact timing of traffic peaks (e.g., hourly reactions following news events), and conduct precise cyclical analyses (e.g., daily or weekly patterns). Compared to publicly available datasets that only provide daily or monthly aggregated data, this dataset demonstrates significant advantages in temporal resolution, facilitating micro-level behavioral research and real-time trend monitoring.
  • Focus on Popular Pages with High Data Value Density: Data undergoes rigorous filtering, with daily files including only pages viewed at least 10 times, yielding a daily average of 5-6 million records. Each entry thus represents a topic, figure, or event that garnered substantial public attention during the month. For studying social trends, popular culture phenomena, or the global dissemination and impact of major news events within specific periods, this dataset provides pre-screened, high-value analysis subjects with an excellent signal-to-noise ratio, enhancing research efficiency and analytical depth.
  • Standardized Structure Across Language/Geographical Dimensions: Data clearly identifies the Wikipedia subproject affiliation through the domain_code field. This standardized structure enables researchers to effortlessly conduct cross-lingual and cross-cultural comparative studies—such as analyzing spatiotemporal differences in attention to the same international event among different language user groups or exploring the activity levels of specific cultural themes within their primary language communities.

Data Application Value

  • Social Trends and Public Attention Research: Researchers can utilize this data to quantitatively analyze global netizens' collective interest foci during January 2024. By tracking the viewership time series of specific page titles, empirical studies on the formation, evolution, and decay patterns of public attention can be conducted, providing data support for communication studies, sociology, and public policy research.
  • Web Traffic Prediction and Platform Operations Analysis: Internet enterprises and Wikipedia's own operational teams can employ this data to build and validate web traffic forecasting models. Hourly-level time series data serves as an ideal input for training machine learning models to predict future traffic peaks, optimize server resource allocation, and formulate content recommendation strategies. The inherent cyclical patterns within the data are crucial for enhancing prediction accuracy.
  • Digital Humanities and Computational Social Science Exploration: This dataset offers rich empirical material for digital humanities and computational social science. Scholars can integrate the knowledge entities corresponding to page titles to study the online influence of cultural phenomena, the contemporary web attention dynamics of historical figures or events, or even analyze cultural biases or regional differences in knowledge consumption and dissemination through multilingual data comparisons.

  With its hourly-level temporal resolution, focus on popular pages, and standardized cross-lingual structure, this data provides academia and industry with a unique and powerful analytical tool. Whether used to reveal the micro-dynamics of public attention, support intelligent optimization of network infrastructure, or underpin cutting-edge cross-cultural digital research, this dataset delivers a solid, granular data foundation, empowering users to extract profound insights and value from vast web behaviors.


Field Demonstration

Using fields from January 1, 2024 as an example


Sample Data

Using sample data from January 1, 2024 as an example


Data Update Frequency

Irregular updates