Data journalism—reporting that involves the collection, analysis, and presentation of quantitative datasets—has consolidated over the past decade as an established subfield of journalism. The Panama Papers, the Pandora Papers, and countless smaller investigations have demonstrated that leaked databases, public records, and computational analysis can reveal stories that traditional reporting methods cannot reach.
But data journalism is not merely traditional journalism with a spreadsheet. It introduces new epistemological questions (what counts as evidence when the evidence is an algorithm?), new ethical challenges (how do you protect sources when the "source" is a dataset?), and new risks (how do you report in a country where the government uses digital surveillance to identify journalists' sources?).
The Epistemology of Code
Abhishek and Graves (2024) examine what critical code studies reveals about the epistemology of data journalism. While programming code underlies much of data journalism, few studies focus on it—on the assumptions, choices, and interpretive frameworks that are embedded in the code that journalists write to analyze data.
The paper argues that code is not neutral infrastructure—it embodies epistemological choices about what questions to ask, what data to include, what cleaning and transformation to apply, and what visualizations to produce. Two journalists analyzing the same dataset may write different code and reach different conclusions—not because either is wrong, but because their analytical choices (which variables to prioritize, which outliers to exclude, which statistical methods to apply) reflect different journalistic judgments.
This insight has implications for transparency: publishing datasets alongside stories (a growing practice) is insufficient for reproducibility if the code that analyzed the data is not also published. And publishing the code requires data journalism skills that many traditional journalists—and many news editors—do not possess.
Digital Surveillance and Press Freedom
Alashry (2024) investigates how Arab authorities use digital surveillance to control investigative reporting. The study examines the extent of digital surveillance, who faces risks and threats, and how journalists seeking press freedom use tools and techniques to communicate securely.
The surveillance dimension is crucial because data journalism often involves accessing and analyzing information that governments prefer to keep hidden. In authoritarian and semi-authoritarian contexts, the tools of data journalism (encrypted communication, secure data transfer, anonymous tip lines) are also the tools of operational security—and governments increasingly deploy sophisticated surveillance technology to penetrate both.
The study documents how journalists use open-source tools, encrypted messaging, and VPN technology to protect their reporting—and how governments deploy malware, network monitoring, and legal compulsion to breach those protections. The arms race between journalistic security and state surveillance is ongoing, and the balance of advantage shifts with each technological generation.
Safety in Latin American Journalism
Mesquita, de-Lima-Santos, and Gonçalves (2025) examine the safety challenges facing small investigative news organizations in Latin America. These organizations are stepping into the watchdog role by investigating corruption scandals that larger outlets sometimes overlook. However, this work exposes both journalists and their organizations to significant risks.
The paper identifies "three spheres of safety" that investigative journalists must navigate: physical safety (threats of violence, kidnapping, assassination), digital safety (hacking, surveillance, doxxing), and legal safety (defamation lawsuits, SLAPP suits, regulatory harassment). In Latin America, all three spheres are simultaneously threatened—creating an environment where investigative journalism is both necessary (given corruption and institutional weakness) and dangerous.
Blockchain Journalism
Shilina (2025) proposes Decentralized Ledger Journalism (DLJ) as a distinct subfield within data journalism. Drawing on the unique affordances of blockchain and other distributed ledger technologies, DLJ offers potential solutions to journalism's trust crisis: immutable publication records (preventing retroactive content alteration), transparent sourcing (verifiable chains of evidence), and decentralized distribution (resistant to censorship).
The concept is speculative but addresses real problems. In an era of deepfakes and manipulated evidence, blockchain-based verification could provide an authentication layer that traditional journalism lacks. A story published to a blockchain cannot be altered after the fact—providing a permanent, verifiable record that both sources and audiences can trust.
Claims and Evidence
<| Claim | Evidence | Verdict |
|---|---|---|
| Code in data journalism embodies epistemological choices | Abhishek & Graves (2024): analytical choices in code shape journalistic conclusions | ✅ Supported |
| Digital surveillance threatens investigative journalism in authoritarian contexts | Alashry (2024): Arab authorities use sophisticated surveillance against journalists | ✅ Supported |
| Small news organizations are filling investigative gaps that large outlets leave | Mesquita et al. (2025): Latin American organizations investigate overlooked corruption | ✅ Supported |
| Blockchain can solve journalism's trust crisis | Shilina (2025): conceptually promising; no operational implementation at scale | ⚠️ Uncertain |
Implications
Data journalism has expanded the investigative reporter's toolkit from interviews and documents to datasets and algorithms. This expansion creates genuine new capabilities—but also new responsibilities: transparency about analytical methods, security against digital surveillance, and critical awareness of the epistemological assumptions embedded in code. The journalists who will thrive are those who combine traditional journalistic judgment (is this story important? is the evidence sufficient? who is harmed?) with computational skill (can I analyze this data responsibly? is my code replicable?).