Research Dataset

A Research-Grade Fact-Check Dataset for the Open Web

Continuously updated from 7 international fact-checking organizations. Structured, normalized, and encrypted at rest — built for researchers studying misinformation, not for moderation pipelines.

Request Access
7+Sources
5Verdict Categories
DailyIngestion Cadence
CMEKEncrypted at Rest

About the Dataset

This dataset aggregates fact-check records published by leading international fact-checking organizations. Each record captures the original claim, the organization's verdict, a normalized verdict label, publication metadata, and a content hash for deduplication. Ingestion runs daily via automated RSS and Atom feed parsing, with each record archived in its original form alongside normalized fields.

The dataset is designed for researchers studying misinformation patterns, claim lifecycles, cross-source verdict consistency, and the temporal dynamics of false information. It is not intended as a moderation tool or real-time decision system. Records reflect the judgments of the source organizations and are preserved as-is to support comparative and longitudinal analysis.

Coverage spans seven organizations across multiple geographies and languages, including English-language U.S. sources (PolitiFact, FactCheck.org, Snopes), a viral misinformation tracker (LeadStories), UK-based Full Fact, Africa Check covering sub-Saharan Africa, and AFP Fact Check with multilingual international reach.


Data Sources

PolitiFact

U.S. political fact-checking from the Poynter Institute. Rates claims on a six-point "Truth-O-Meter" scale ranging from True to Pants on Fire.

Snopes

One of the oldest fact-checking and rumor-debunking publications. Covers viral claims, urban legends, and political misinformation.

FactCheck.org

Nonpartisan U.S. political fact-checking operated by the Annenberg Public Policy Center at the University of Pennsylvania.

LeadStories

Focuses on viral misinformation trending on social media platforms. Uses a real-time trending story detection methodology.

Full Fact

UK-based independent fact-checking charity. Covers claims from politicians, media outlets, and public discourse in the United Kingdom.

Africa Check

Africa's first fact-checking organization, covering claims across sub-Saharan Africa in English, French, and Portuguese.

AFP Fact Check

Global fact-checking unit of Agence France-Presse. Covers claims in multiple languages across Europe, Asia, Africa, and the Americas.


Schema

Each record in the BigQuery table corresponds to a single fact-check article. Key fields are listed below. All content fields are nullable — individual feeds may not populate every attribute.

FieldTypeDescription
titlestringHeadline or title of the fact-check article
claimstringThe specific claim being evaluated, when extractable from the feed
linkstringCanonical URL of the original fact-check article
verdict_rawstringOriginal verdict label as published by the source organization
verdict_normalizedenumStandardized verdict: true | false | misleading | unsupported | exaggerated
published_at_normalizedtimestampPublication datetime normalized to UTC ISO 8601
source.namestringName of the fact-checking organization
source.urlstringBase URL of the source organization
languagestringBCP-47 language code of the article (e.g., 'en', 'fr')
content_sha256stringSHA-256 hash of canonical content for deduplication across ingestion runs

Sample Record

A representative record as it appears in the dataset after normalization. Field values are drawn from a real AFP Fact Check article for illustrative purposes.

{
  "title": "No, WHO did not declare a 'global health emergency' over a new mpox strain in January 2026",
  "claim": "The WHO declared a global health emergency over a new mpox strain in January 2026.",
  "link": "https://factcheck.afp.com/doc.afp.com.36UE3JE",
  "verdict_raw": "False",
  "verdict_normalized": "false",
  "published_at_normalized": "2026-01-14T09:22:00Z",
  "source": {
    "name": "AFP Fact Check",
    "url": "https://factcheck.afp.com"
  },
  "language": "en",
  "content_sha256": "a3f9c2d1e4b8765432fedcba9876543210abcdef0123456789abcdef01234567"
}

Data Governance

Encryption at Rest
All records are encrypted at rest using Google Cloud KMS with customer-managed encryption keys (CMEK). Key management is scoped to the dataset owner's GCP project and is not delegated to any third party.
Append-Only Archive
The dataset uses an append-only write pattern with a full audit trail. Records are never modified or deleted — corrections are represented as new records with updated normalized fields alongside the original.
Access Policy Labels
Each record carries a policy label: SAFE_PUBLIC for fully processed records, RESTRICTED for records pending manual review, and QUARANTINED for records flagged by automated quality checks. Shared dataset views surface only SAFE_PUBLIC records.
Storage and Access Control
Data is stored in Google BigQuery and mirrored to Google Cloud Storage in newline-delimited JSON format. Access is controlled via GCP IAM roles granted per researcher after review.

Request Access

This dataset is available to academic researchers, journalists, and data scientists working on misinformation research, computational social science, or related fields. Access is granted on a case-by-case basis after a brief review of the intended use.

Please include the following in your request:

  • Your name and institutional affiliation, or independent researcher status
  • A brief description of your research project or intended use case
  • The approximate data volume you expect to query
  • Whether you require BigQuery direct access, GCS export, or both
Send Access Request → alfredsyoung@gmail.com