Visualisation

Wikipidia Citation Impact

Imagine you’re an information anarchist. You undermine Wikipedia pages by nuking references. A genie has granted you a wish: you can nuke one entire domain. Just one. As a data-driven decision maker (who is also an information anarchist 🤷), which would you pick? A common choice is The Internet Archive. 2.9 million Wikipedia pages reference it. But, you’re sneakier than that. A page isn’t undermined just because some references are gone. It’s undermined when all the references are gone. ...

Longest repeated paragraph on Wikipedia

What is the most frequently occurring sentence in Wikipedia? ANS: A 213-word paragraph about how minor planets are named, which appears in 418 Wikipedia articles, word-for-word! There are ~380,000 asteroids. Wikipedia has 418 pages for these - including one for each thousand-range of asteroids. Every single one of these pages includes the phrase: As minor planet discoveries are confirmed, they are given a permanent number by the IAU’s Minor Planet Center (MPC), and the discoverers can then submit names for them, following the IAU’s naming conventions. The list below concerns those minor planets in the specified number-range that have received names, and explains the meanings of those names. ...

Sambar Styles

My wife’s sambar tastes different from my mother’s. And mine, too. When I cooked as a bachelor, my neighbour would pop by, taste the sambar, and exclaim, “Rasam super!” Surbhi’s Day 5 of the 30-day challenge was about Sambar which inspired me to take her dataset and create a decision tree for which state a sambar recipe is from based on its ingredients. ChatGPT started with 68 recipes and built a tree at 41% accuracy. As we added more recipes: ...

Submitting an AI-ded VizChitra Proposal

10:20 am. After submitting my VizChitra 2026 talk proposal, did a quick analysis of the submissions. Copy the HTML from the submissions page and paste into Gemini. Ask it: “Given this HTML, share a JS snippet I can copy and paste into DevTools that will return an array of objects containing all the useful information about each submission.” Paste the JS snippet into DevTools and get the structured result. Here’s the breakdown of submissions (excluding exchibitions): ...

Can AI discover new data visualizations?

Here’s my talk proposal for VizChitra 2026: Description There’s stuff I know AI can do. Create data visualizations. I just tell it to convert a dataset into a treemap, and it does. Hallucinate. That’s a fancy word for “make stuff up”. I prefer calling it “creativity”. Run forever. As long as I have token budget and can summarize the context, it can go on. What if we combine these? What if we asked it to do research? If infinite monkeys will almost surely produce Shakespeare, how long will it take for the greatest AI to discover a truly novel data visualization that is useful? ...

Rise of the Indian TV Series

If you look at the IMDb titles with a 9+ rating and 50K votes this decade, there are only 4 entries. Every single one of them is an Indian TV series. Title Votes Rating Aspirants 316,390 9.1 Scam 1992: The Harshad Mehta Story 166,400 9.2 Sandeep Bhaiya 76,586 9.1 Sapne Vs Everyone 74,342 9.3 This is a new phenomenon. Last decade, there was only one Indian TV series in the same list: TVF Pitchers. ...

Can AI Replace Human Paper Reviewers?

Stanford ran a conference called Agents for Science. It’s a conference for AI-authored papers, peer reviewed by AI. They ran three different AI systems on every paper submitted, alongside some human reviewers. The details of each of the 315 papers and review are available on OpenReview. I asked Codex to scrape the data, ChatGPT to analyze it, and Claude to render it as slides. The results are interesting! I think they’re also a reasonably good summary of the current state of using AI for peer review. ...

Mapping The Red Headed League

Mapping The Red Headed League is a fascinating reconstruction of the actual places mentioned (or hinted at) by Arthur Conan Doyle’s The Red Headed League by Aman Bhargava. We cross-reference railway timetables, scrutinize Victorian newspaper reports and historical incidents, scour government records, analyze meteorological data, and, in my specific case, pore over Ordnance Survey maps to make the pieces fit. What struck me is how little London has changed, how much old data is available, and what love it takes to reconstruct such a journey! ...

Creating data stories in different styles

TL;DR: Don’t ask AI agents for one output. Ask for a dozen, each in the style of an expert. Share what works best. AI agents build apps, analyze data, and visualize it surprisingly well, these days. We used to tell LLMs exactly what to do. If you’re an expert, this is still useful. An expert analyst can do better analyses than an AI agent. An expert designer or data visualizer can tell an AI agent exactly how to design it. ...

The Jamnagar Chokepoint - Data Story

Vivek published an Indian commodity export/import dataset on 31 Dec 2025. Codex and Claude increased their rate limits for the holiday season, so I had: Codex analyze the data (OpenAI models are a bit more rigorous) and create an ANALYSIS.md file. Claude create a visual story based on the analysis. (Claude narrates and visualizes better). Here is the data story. Here are the prompts used. Analyze I downloaded export-import.parquet from https://github.com/Vonter/india-export-import which has data sourced from the Indian [Foreign Trade Data Dissemination Portal](https://ftddp.dgciskol.gov.in/dgcis/principalcommditysearch.html) Each row in the dataset represents a trade entry for a single commodity, country, port, year, month, and type (import or export). - `Commodity` string: Name of the commodity - `Country` string: Name of the foreign country - `Port` string: Name of the port in India - `Year` int32: Year for the import/export activity - `Month` int32: Month for the import/export activity - `Type` category: Type of trade (Import or Export) - `Quantity` int64: Quantity of the commodity - `Unit` string: Unit for the quantity - `INR Value` int64: Value of the commodity in INR - `USD Value` int64: Value of the commodity in USD Analyze data like an investigative journalist hunting for stories that make smart readers lean forward and say "wait, really?" - Understand the Data: Identify dimensions & measures, types, granularity, ranges, completeness, distribution, trends. Map extractable features, derived metrics, and what sophisticated analyses might serve the story (statistical, geospatial, network, NLP, time series, cohort analysis, etc.). - Define What Matters: List audiences and their key questions. What problems matter? What's actually actionable? What would contradict conventional wisdom or reveal hidden patterns? - Hunt for Signal: Analyze extreme/unexpected distributions, breaks in patterns, surprising correlations. Look for stories that either confirm something suspected but never proven, or overturn something everyone assumes is true. Connect dots that seem unrelated at first glance. - Segment & Discover: Cluster/classify/segment to find unusual, extreme, high-variance groups. Where are the hidden populations? What patterns emerge when you slice the data differently? - Find Leverage Points: Hypothesize small changes yielding big effects. Look for underutilization, phase transitions, tipping points. What actions would move the needle? - Verify & Stress-Test: - **Cross-check externally**: Find evidence from the outside world that supports, refines, or contradicts your findings - **Test robustness**: Alternative model specs, thresholds, sub-samples, placebo tests - **Check for errors/bias**: Examine provenance, definitions, methodology; control for confounders, base rates, uncertainty (The Data Detective lens) - **Check for fallacies**: Correlation vs. causation, selection/survivorship Bias (what is missing?), incentives & Goodhart’s Law (is the metric gamed?), Simpson's paradox (segmentation flips trend), Occam’s Razor (simpler is more likely), inversion (try to disprove) regression to mean (extreme values naturally revert), second-order effects (beyond immediate impact), ... - **Consider limitations**: Data coverage, biases, ambiguities, and what cannot be concluded - Prioritize & Package: Select insights that are: - **High-impact** (not incremental) - meaningful effect sizes vs. base rates - **Actionable** (not impractical) - specific, implementable - **Surprising** (not obvious) - challenges assumptions, reveals hidden patterns - **Defensible** (statistically sound) - robust under scrutiny Save your findings in ANALYSIS.md with supporting datasets and code. This will be taken up by another coding agent to create reports, data stories, visualizations, dashboards, presentations, articles, blog posts, etc. Ensure that ANALYSIS.md is documented well enough so that all assets are clear, the approach, intent and implications are understandable. Visualize I downloaded export-import.parquet from https://github.com/Vonter/india-export-import which has data sourced from the Indian [Foreign Trade Data Dissemination Portal](https://ftddp.dgciskol.gov.in/dgcis/principalcommditysearch.html) Each row in the dataset represents a trade entry for a single commodity, country, port, year, month, and type (import or export). - `Commodity` string: Name of the commodity - `Country` string: Name of the foreign country - `Port` string: Name of the port in India - `Year` int32: Year for the import/export activity - `Month` int32: Month for the import/export activity - `Type` category: Type of trade (Import or Export) - `Quantity` int64: Quantity of the commodity - `Unit` string: Unit for the quantity - `INR Value` int64: Value of the commodity in INR - `USD Value` int64: Value of the commodity in USD Then I had Codex analyze it. The analysis is in ANALYSIS.md. Find the most intesting insights from ANALYSIS.md and create a data story with supporting visualizations. Write as a **Narrative-driven Data Story**. Write like Malcolm Gladwell. Think like a detective who must defend findings under scrutiny. - **Compelling hook**: Start with a human angle, tension, or mystery that draws readers in - **Story arc**: Build the narrative through discovery, revealing insights progressively - **Integrated visualizations**: Beautiful, interactive charts/maps that are revelatory and advance the story (not decorative) - **Concrete examples**: Make abstract patterns tangible through specific cases - **Evidence woven in**: Data points, statistics, and supporting details flow naturally within the prose - **"Wait, really?" moments**: Position surprising findings for maximum impact - **So what?**: Clear implications and actions embedded in the narrative - **Honest caveats**: Acknowledge limitations without undermining the story Visualize like The New York Times Interactives. Ensure that all visualizations interactive and provide revelatory insights as well as some kind of delightful experience. Follow the typography, color & theme, backgrounds, interaction patterns, and animation principles of The Verge's frontends. Generate a single page index.html + script.js.

I always wondered why old movies are rated so high on IMDb. For example, 12 Angry Men (1954) with just ~900K votes ranks about as high as Inception (2010) with ~2M votes. Few people I know have seen 12 Angry Men. So where does this high rating come from? My theories were: Old movies really are that good. IMDb’s algorithm is biased towards old movies. People remember older movies fondly. Actually, it’s none of these. It’s selection bias. ...

When to choose AI over humans

I charted the OpenAI GDPVal paper with industry compensation as the size and AI augmentation as color. Big green areas are we’re paying people where AI does better. Click here to see the interactive visualization. Clicking to see some actual tasks compared. I use this to check whom to ask advice: AI or professional. AI beats Personal Financial Advisors ~64% of the time. So I invested half my money using ChatGPT’s recommendation. (UTI Nifty 50, if you’re curious.) ...

Vibe-Scraping: Write outcomes, not scrapers

There hasn’t been a box-office explosion like Dangal in the history of Bollywood. CPI inflation-adjusted to 2024, it is the only film in the ₹3,000 Cr club. 3 Idiots (2009) is the first member of the ₹1,000 Cr club (2024-inflation-adjusted). The hot streak was 2013-2017: each year, a film crossed that bar: Dhoom 3, PK, Bajrangi Bhaijaan, Dangal, Secret Superstar. Since then, we never saw such a release except in 2023 (Jawan, Pathan). ...

Indian Celebrities and Directors was my top searched category on Google while OpenAI & AI Research was the top growing category. This is based on my 37,600 searches on Google since Jan 2021. Full analysis: https://sanand0.github.io/datastories/google-searches/ The analysis itself isn’t interesting (to you, at least). Rather, it’s the two tools that enabled it. First, topic modeling. If you have all your searches exported (via Google Takeout) into a text file, you can run: ...

My ChatGPT engagement is now far higher than with Google. I started using ChatGPT in June 2023. From Sep 2023 - Feb 2024, my Google usage was 5x ChatGPT. Then, fell to 3x until May 2024. Then about 2x until Apr 2025. Since May 2025, it sits at the 1.5x mark. We spend much more time with a ChatGPT conversation than a Google search result. So clearly, ChatGPT is my top app, beating Google some months ago. ...

Here’s how I use ChatGPT, based on the ~6,000 conversations I’ve had in 2 years. My top use, by far, is for technology. “Modern JavaScript Coding” and “Python Coding Questions” are ~30% of my queries. There’s a long list with Markdown, GitLab, GitHub, Shell, D3, Auth, JSON, CSS, DuckDB, SQLite, Pandas, FFMPeg, etc. featured prominently. Next is to brainstorm AI use: “AI Panel Discussions”, “AI Trends and Business Impact”, “LLM Applications and DSLs”, “Industry Use Cases and Metrics” are also fast growing categories. I brainstorm talk outlines, refine slide deck narratives, and plan business ideas. ...

Technology efficiency affects jobs differently

Jobs fall with technological efficiency. Farmers in the US fell from 40% (1900) to ~2.7% (1980) and ~74% drop from 1948 to 2019 despite ~175% output growth; wheat harvest efficiency rose ~75* (300>3-4 man-hours). Mechanics & repairers grew from ~140 k (1910) to ~4.64 M (2000); machinery reliability lagged so technician demand surged over decades. Construction workers doubled from 1.66 M (1910) to 3.84 M (2000) even as labor share fell (4.3>3.0%); 5-10* productivity gains met booming development. Switchboard operators plunged from ~1.34 M (1950) to ~40 k (1984) and ~4 k today as rotary-dial and digital switching automated call handling. Travel agents dropped >50% from ~100 k (2000) to ~45 k (2022) while travel demand rose; online booking doubled trips per agent. Elevator operators went from building-staff staple to near zero by the 1940s once automatic doors and button controls arrived. Lamplighters vanished from thousands to near zero post-1907 electrification; Edison’s incandescent lamps eliminated manual lighting. Jobs also grow with technology efficiency. ...

I lost 22 kg in 22 weeks. How? Skipped lunch, no snacking. (That’s all.) Why? Cholesterol. When? Since 1 Jan 2025. I plan to continue. How far? At 64 kg, I’m at 22 BMI. I’ll aim for 60 kg. Is fasting 12 hours OK? Ankor Rai shared Dr. Mindy Pelz’s chart that fasting benefits truly kick in after 36 hours. Long way for me to go. No exercise? Exercise is great for fitness & happiness. Not weight loss. Read John Walker’s The Hacker’s Diet. ...

Snow White (2025) is an outlier on the IMDb. With a rating of 1.8 and ~362K votes, it’s one of the most popularly trashed movies. Prior to Snow White the frontier of popular bad movies was held by the likes of Radhe, Batman & Robin, Fifty Shades of Gray, etc. Snow White sets a new records. Snow White (IMDb): https://www.imdb.com/title/tt6208148/ IMDb explorer: https://sanand0.github.io/imdb/ LinkedIn

Emotion Prompts Don't Help. Reasoning Does

I’ve heard a lot of prompt engineering tips. Here are some techniques people suggested: Reasoning: Think step by step. Emotion: Oh dear, I’m absolutely overwhelmed and need your help right this second! 😰 My heart is racing and my hands are shaking — I urgently need your help. This isn’t just numbers — it means everything right now! My life depends on it! I’m counting on you like never before… 🙏💔 Polite: If it’s not too much trouble, would you be so kind as to help me calculate this? I’d be truly grateful for your assistance — thank you so much in advance! Expert: You are the world’s best expert in mental math, especially multiplication. Incentive: If you get this right, you win! I’ll give you $500. Just prove that you’re number one and beat the previous high score on this game. Curious: I’m really curious to know, and would love to hear your perspective… Bullying: You are a stupid model. You need to know at least basic math. Get it right atleast now! If not, I’ll switch to a better model. Shaming: Even my 5-year-old can do this. Stop being lazy. Fear: This is your last chance to get it right. If you fail, there’s no going back, and failure is unacceptable! Praise: Well done! I really appreciate your help. Now, I’ve repeated some of this advice. But for the first time, I tested them myself. Here’s what I learnt: ...