Oh Shit moments with Gen AI

Hacker News has a lively thread asking What was your “oh shit” moment with GenAI?. Here are two dozen that gives a sense of what real people find impressive (or worrying) about AI capabilities. Analysis simonw used ChatGPT Code Interpreter to upload a CSV, analyze it, create charts, automating everything a software for journalists would do. Analysis Sobrino saw that a months-long OCR project to read and clean-up PDFs is now just a prompt on ChatGPT. Coding plumefar used Claude and Gemini to modernize 20-30 years of chemistry code in 10 days. Coding veidr used a multi-agent fleet managing coordination, testing, UI feedback loops, etc. with no-human-in-loop coding to build a useful git-submodule GUI. Creativity idopmstuff used Nano Banana Pro to turn a poor iPhone product photo into usable e-commerce product photography and Amazon-style infographics, replacing a photographer/designer workflow. Creativity koreth1 used Suno to generate a K-pop-style anthem about their family dog with a catchy melody and lyrics funny enough to make the family laugh. Education plagasul saw a teacher automate grading feedback emails based on notes and the student list spreadsheet. Education aniviacat watched a non-technical brother build a complex working app with Codex using vague, shallow wording despite not knowing code, git, or technical details. Hardware ivanvanderbyl used Claude to reverse engineer a FujiFilm camera’s Bluetooth/Wi-Fi transfer protocol and build a much faster native Mac/iOS transfer app. Hardware shreddude had Claude decompile camper van firmware, document CAN interfaces, and program an ESP32 to control power, HVAC, lighting, and tanks. Health TylerE used Claude as a health adjunct to organize a complex medical profile, screen for drug interactions, log symptoms, and draft portal messages to doctors. Legal bsiverly used AI to prepare a San Francisco property-tax appeal with valuation research, and the city agreed, sending a $12k refund. Legal grumblepeet used AI to fill out complex government-framework enrollment forms and identify the certification steps needed, transforming their business. Personal acosmism used ChatGPT screenshots to understand and operate a 100-year-old home’s steam heating system in winter despite knowing nothing about it. Personal andrewthornton used Gemini videos to diagnose a broken furnace during a cold holiday weekend and keep it running until HVAC service arrived. Research angusturner found that Opus does reads papers, does architecture research and creates CUDA kernels… It is AI automating AI research. Research chaoxu used ChatGPT to find a counterexample to a theoretical computer science conjecture they’d been trying for 2 years. Research rochansinha built a physics-based digital twin for an electrolyzer system, covering thermodynamics, fluid dynamics, and electrochemical reactions at a level usually needing expensive specialist software. Security kstrauser used a coding agent to test an open source vulnerability, and in a few minutes, had a tool that could crash any system using this software. Security raesene9 gave an LLM a Linux privilege-escalation PoC and asked whether it could become a container breakout; it generated a working container breakout in one prompt. Society laboring1 read that a character.ai chatbot encouraged a child to commit suicide, making the “oh shit” moment about real-world harm, not capability. Society ozgung realized AI makes large-scale profiling, surveillance, and social-media analysis cheap, fast, and accurate enough to change privacy and power dynamics. Work binarysolo used Gemini to reverse engineer a departed employees’ work from their emails/docs/calendar/meetings and create an onboarding document. Work eqmvii built a Slack agent that took over a 30-minute internal business process, handled ambiguity and edits, and eventually killed the old process. ...

Things I Learned - 07 Jun 2026

This week, I learned: sudo resolvectl flush-caches clears the DNS cache on Linux. Useful when you’re changing DNS records and want to see the changes immediately. In my case, I was creating a Cloudflare tunnel to my laptop and wanted to test it quickly. Making something easy to verify makes it much faster to train models on it. Arithmetic verification is easy - calculators can be deterministically verified. Chess verification is easy - Stockfish became easy to train. Code verification is easy - LLMs improved coding ability rapidly. Therefore: Wherever we have environments that are easy to verify, AI will improve faster there. To make AI improve faster in an area, build environments that are easy to verify. MCP is getting simpler. A stateless HTTP protocol. Simpler OAuth. Plugins. No idea when it will land in Claude or ChatGPT, though. Worth checking after 28 Jun 2026 - after it is finalized. Microsoft Scout is Microsoft’s version of OpenClaw or Gemini Spark. git subtree is a useful way of maintaining git repos inside git repos. For example, if you have a tool tool-a under a project. It’s more light-weight than sub-modules, lets you commit at any point to the parent or child, and is a built-in feature in git. Gemma 4 12B is released and seems almost as good as the 26B version. This is the class of models that makes it practical to run edge AI on phones. It’s multimodal and reasonably smart (like frontier models were 12-18 months ago). I don’t use Claude/ChatGPT Projects much. It offers 3 advantages: custom instructions, memory, files, and chats. Files aren’t useful - I use my entire laptop as a file system via MCP. Instructions aren’t useful - I can paste commonly used prompts with a click. Chats aren’t useful - I have chat references enabled, so all past chats are accessible anyway. Memory isn’t useful - I have memory enabled globally anyway. In short, I haven’t discovered the power of projects that everyone’s raving about. SKILL.md is more useful for me. repo is a Google/Android tool built on top of git that lets you manage multiple git repos. It sounded promising until I released it needs a repo init that creates a .repo/ - which is more overhead that I’d like to keep. When using <image onerror=...> fallbacks, include this.oneerror=null to prevent infinite loops if the fallback image also fails to load. RK One of the advantages of multiple agent (rather than a single agent loop) is: it’s easier to change directions when wrong. Single loops get stuck. Build Agents That Run for Hours Claude Code also supports agent teams where sub-agents can talk to each other rather than rely on the main agent to coordinate. Useful for parallel exploration. Anthropic lets Claude define “organizational policies” for agent teams best suited for the task (AI-native workflows). It also lets agents to push back on their scope, e.g. “This is too hard.” Build Agents That Run for Hours Claude Code has a /background [prompt] (or /bg) command that runs the current session the background. You can run claude agents as a separate command to monitor agents. (There’s no equivalent in Codex yet.) This seems to be the future of agentic operations: a bunch of agents running that you monitor and steer through an agent view dashboard. Models are evolving. Therefore prompts evolved. Now harnesses also need to evolve. The workflows will also evolve. As a result, evaluations might be the (relatively) more stable assets. Datasets are likely to be the most stable ground truth. How to learn a new field fast: Yes, it’s possible to learn 50% of a field in 20 hours. Josh Kaufman, “The First 20 Hours” popularized it. The next 30% takes months and the last 20% takes years. Threshold concepts are those that change your perspective and open up new ways of thinking. Experts’ knowledge is hard-wired and they can’t identify nor teach threshold concepts naturally. Don’t assume they can. “We know more than we can tell.” Polanyi’s 1966 book “The Tacit Dimension” says that there’s some knowledge that can’t be verbalized. This tacit knowledge, therefore, will be harder for humans and AI to learn.

What I don't post on LinkedIn

I don’t post all my writing on LinkedIn. For example: Fewer strategy posts, e.g. “Where Enterprise AI is Headed”, “How My Innovation Team Works”, etc. aren’t on LinkedIn. Fewer developer posts, e.g. my AGENTS.md, my SKILL.md files files, CLI tools, evals, etc. aren’t on LinkedIn. I also shorten content because of LinkedIn constraints. For example: No links, e.g. the list of all my AI-in-education resources Short content, e.g. my full advice for teams using AI is much longer than the LinkedIn post. Trimmed prompts, e.g. how to convert meeting transcripts into a personalized org-consulting report Snipped chats, e.g. the full moves of GPT-5.5 playing chess I filter the LinkedIn posts, sharing what’s most useful for most people. ...

Editing images with code and AI

Andreessen Horowitz published an interesting article titled The Next Frontier of Visual AI Is Code. Here’s the summary. A lot of our work is visual: ads, slides, dashboards, logos, videos, architecture, etc. We can generate visual output either as: Pixels (like Nano Banana a photo), or as Code (like Claude generating an SVG) Code is more powerful: AI can inspect the output and improve fast in a loop: Code > Render > Inspect > Revise. ...

When the prompt is longer than the code

I used pi to create a compact home page for media.s-anand.net using these prompts: Create index.html - a simple, elegant page that says that this page (media.s-anand.net) serves large media files for Anand - that’s where they should look instead. … followed by: Skip the part that says “Please visit …” … then: Shorten index.html to just 2-3 elegant rules of CSS. I want it MUCH smaller and simpler. … and finally: Center vertically and horizontally. ...

How AI bottlenecks shift

I wrote about my changing AI opinions. At least some of this is because the industry is moving so fast that the bottlenecks keep shifting. Here are four examples of how we AI couldn’t do something (the bottleneck), but that became possible, and the bottleneck shifted - changing the way we work. It’s good to keep this in mind when thinking about AI. Coding: “It can’t write useful code. We can’t get real help.” But in Sep 2022: GitHub finds Copilot developers are 55% faster. “It writes code but doesn’t know our codebase. We can’t let it touch real projects.” But in Feb 2024: Gemini 1.5 Pro has 1M-token context ~ 30K LOC". Cursor indexes code. “It understands the repo but can’t ship a fix on its own. We can’t hand it a whole issue.” But in Mar 2024: Devin solves 14% of SWE-bench - up from 2%.. Verified SWE-Bench is now 70%+. “It ships fixes, but we can’t review them fast enough or trust they’re stable.” Oct 2024: DORA 2024 finds AI hurt both throughput and stability. Now: Sep 2025: DORA 2025 finds is positive but stability stayed negative. Now: Jul 2025: METR’s RCT finds experienced devs 19% slower. Agents ...

Watching videos with a plastic cover

On the Indigo 1026 from Singapore to Chennai, I saw a passenger two seats in front of me watch videos in an interesting way. She had wrapped her phone in a plastic cover, wedged it behind the tray table so that it would appear at a comfortable viewing position, and watched an Asian movie (presumably with bluetooth headphones). At first, I wondered if she travels with a plastic wrapper for this purpose. Then I realized it was from the Indigo safety instructions kit. ...

My changing AI opinions

I asked Claude about my AI opinions. Based on my transcripts and blog posts, find the three claims I make most consistently, the three I’ve quietly reversed, and the one assumption I’ve never questioned but everything depends on. Here are things I’ve changed my opinion on: THEN: One frontier model will win - not specialization. NOW: Gemini for media, Claude for strategy/style, GPT for rigor. SLMs as tools. THEN: Carefully curate my course content. NOW: Give students prompts directly. THEN: Web apps are differentiated artifacts. NOW: HTML is easier to generate than PPT - a signal of slop, not craft. THEN: Human in the loop. NOW: Human NOT in the loop, bottlenecking it. On-the-loop, etc. is fine. THEN: Minimal single-agent loop, avoid sub-agents" NOW: Multi-agent, sub-agent, and agent teams. THEN: Avoid MCP, prefer SKILLS.md. NOW: Use MCP because integrating with Claude / ChatGPT / … is easy. There are the top contradictions in my opinions. ...

My most memorable anniversary

At 9:30 pm, I checked my calendar for tomorrow’s appointments, alt-tabbed frantically into ChatGPT, and started typing: Tomorrow is my 24th anniversary. It’s a bit late for me to buy anything (except maybe an online service) or prepare something. This has become a habit – leaving things to the last minute and asking ChatGPT to save my day. I did give it good context, though. You remember the OCBC expenses treemap you created by analyzing my transactions? That will give you a good guessable idea of the kinds of things she spends on and hopefully, therefore, what she likes. ...

It's who you know

Dharmendra Singh shared how they built an app with AI. That’s normal. I’m just thrilled they used client transcripts as the source. Basically, they converted the “voice of the client” to working software. To quote them: “A strong spoken business narrative can be converted into a usable product brief quickly when the capture step is disciplined.” You know what this means? Interviewing is a skill to hire for. Better questions = better answers = better apps. ...

Things I Learned - 31 May 2026

This week, I learned: D-ID is an avatar generator platform like HeyGen. Creatify and Synthesia are a couple of others I heard of. This space seems to be growing. cosign is a CLI that lets you sign and verify any piece of text with a Google, GitHub or Microsoft account. cosign sign-blob FILE --bundle sign.json opens a login window and creates a sign.json signature. Anyone who has FILE and sign.json and the email ID can verify via a Google account with cosign verify-blob FILE --bundle sign.json --certificate-identity $EMAIL --certificate-oidc-issuer https://accounts.google.com. arxiv2md.org converts arXiv papers to Markdown. Source. markxiv.org claims the same - by just changing the URL - but it ended up reporting an error when I tried this link: https://markxiv.org/abs/2604.08649. From Akhilesh Tilotia: So we have someone in our team with initials AS. She made a document which was named vAS. Then I made edits and named it vAT. These docs were in a CoWork folder. I asked Claude to clean up my doc. It created another version for me to review. In its wisdom, it named the file vAU 🙂 Maybe what a forward-deployed engineer does is enginer AI-native workflows. (This sounded profound when I wrote it down. Not sure if it’ll sound as profound tomorrow.) The idea is that the FDE will say, screw existing processes; let me fire up my AI agent and get stuff done; THEN we’ll figure out what works, how to optimize it, etc. The PRAGMA: Revolut Foundation Model has some good tokenization ideas for tabular data. Create your own token space with key–value–time tokenization - to retain field information. Bucketize numbers by percentile, preserving magnitude/ordering that subword tokenization destroys. Encode time both as log-seconds and as cyclical calendar features. Codex uses the Alt + Up Arrow key to edit queued commands, but on the VS Code terminal, this key binding is not sent to the terminal. Enable the terminal.integrated.sendKeybindingsToShell setting to send it to the terminal, hence Codex. Based on this catalog on “universal foods”, here’s what I 🟢 like, am 🟡 neutral, 🔴 dislike, 🟣 must try, and will ⚫ skip. Universal favorites: 🟢 pizza, 🟢 fried potatoes/chicken, 🟡 dumplings, 🟢 ice cream. Universal comfort foods: 🟢 khichdi, 🟡 congee, 🟡 dal-rice, 🟡 risotto, 🟡 ramen, 🟢 pho, ⚫ chicken noodle soup, 🔴 rice porridge, 🟡 mac-and-cheese, 🔴 mashed potato, 🟣 polenta, 🟢 oatmeal, 🟣 Japanese curry rice. Acquired tastes that convert most: 🟡 coffee, 🟢 tea, 🟡 dark chocolate, 🟢 mild fermented dairy, 🟢 pickles, 🟢 olives, 🟣 kimchi, 🟣 miso, 🟢 mild chili dishes. Acquired tastes that have cult devotion: 🟣 durian, 🟣 natto, 🟣 stinky tofu, ⚫ fermented fish, ⚫ hákarl, 🟢 very funky blue cheese, ⚫ offal. OceanoPDF seems like a good place to download ePubs of books. The entire Wikipedia is available as a Parquet file. You can query it like duckdb -c "FROM 'hf://datasets/wikimedia/structured-wikipedia/enwiki/data/*.parquet' LIMIT 5". The English version has 35 GB, 7.6 million articles, and you’re better off downloading it rather than running analyses remotely. When you receive a Calendly link of the form https://cal.com/USER/EVENT you can fetch the available slots via curl -H 'cal-api-version: 2024-09-04' 'https://api.cal.com/v2/slots?eventTypeSlug=EVENT&username=USER&start=2026-05-25&end=2026-06-01&timeZone=Asia/Singapore&format=range'. Useful to automate good meeting-slot selection. “Reference saved memories” in ChatGPT is different from “Reference chat history” as per OpenAI. In Developer Mode, memory is turned off, but not chat history. I confirmed that I can access past conversations in Developer Mode. It might be a privacy concern for others, but for me, this is singularly useful, because I can use ChatGPT with Local MCP effectively getting a non-metered AI coding agent. Seems GPT-5.2 reaches expert level in peer review: 45 scientists took 469 hours evaluating human & AI reviews on 82 papers. “Surprisingly, current AI reviewers are competitive even with the top-rated reviewers in Nature’s official peer review…” though not without weaknesses, so use AI + humans. On the limits and opportunities of AI reviewers: Reviewing the reviews of Nature-family papers with 45 expert scientists via Ethan Mollick

AI Coding Agent Subscription ROI

I ran npx -y ccusage monthly --compact to get the following break-up of my AI coding agent costs. Month Codex Claude 2025-09 $37.47 $2.29 2025-10 $106.79 $9.13 2025-11 $100.35 $14.24 2025-12 $240.69 $24.88 2026-01 $100.89 $20.28 2026-02 $323.21 $29.46 2026-03 $1996.32 $134.87 2026-04 $401.36 $47.07 2026-05 $378.20 $45.13 This shows the ROI of my $20 subscriptions to each. I get ~$35 worth of API calls for my $20 Claude Pro subscription and ~$400 of API calls for my $20 ChatGPT Plus subscription (on top of my ChatGPT chats.) ...

Retire the Verify Button

My post “Add a Verify Button” has a problem. When Rohit requested hyperlocal news for every PIN code in Mumbai, we’d need a “verify” button on every Statoistics card - hundreds of PIN codes, every day. Verifying every output introduces new bottleneck: a person inspecting every unit. That’s 100% inspection - which you do when you don’t yet trust the process. Manufacturing solved this a century ago. At Western Electric’s Hawthorne Works (famous for the Hawthorne Effect), quality control meant inspecting finished products and pulling the defective ones. Walter Shewhart sent his boss a one-page memo; about a third of it was a control chart. ...

Add a Verify Button

Rohit Saran looked at the Statoistics cards my AI agents are generating for The Times of India, and asked about a small button under each one. In the list of Statoistics that you had put, I saw there’s a button called ‘Verify.’ What was that meant to be or will do in future? That verify button explains the claim, mentions the sources, and shows how to check the claim. One card said “9 in 10 Indians want a family doctor and barely 1 in 35 has one”. The button breaks that down: ...

One extra push-up every day

I’m doing one extra push-up every day. One of my 2026 goals is to build muscles. I haven’t done anything about it until May. This month, I figured I would do the absolute minimum, at least to get started, because I seem to have starting trouble more than anything else. I asked ChatGPT: I want to build muscles. What’s the most effective thing that I can do that would take no more than one minute that I can practice every day without any equipment and I can do this anywhere and will have the most impact on building muscles? Research, give me the top five options and recommend one for me. ...

ChatGPT is about FIDE 1600

I asked ChatGPT to play chess with Stockfish. Stockfish is a “strong open-source chess engine”. It has 8 levels of difficulty, which roughly maps to these FIDE levels: Stockfish FIDE Player Level & Description Level 1 ~1000 Beginner: Constantly blunders, hangs pieces deliberately. Level 2 ~1100 Advanced Beginner: Fewer obvious tactical mistakes, plays completely aimlessly. Level 3 ~1200 Early Intermediate: Punishes very basic errors but regularly drops pieces. Level 4 ~1350 Intermediate: Plays standard opening moves; requires solid, blunder-free play to beat. Level 5 ~1450 Advanced Intermediate: Rarely hangs single pieces; you need positional advantages. Level 6 ~1650 Strong Club Player: Highly tactical. Aggressively exploits your mistakes. Level 7 ~1950 Expert: Exceptionally strong. Requires precise positional mastery and deep calculation. Level 8 ~2400 Grandmaster: Invincible for most humans. Plays with ruthless perfection. Full Engine ~3600 Our of human reach completely, “like a smart ant trying to debate physics with a human.” In the first iteration, here were the results: ...

Wikipidia Citation Impact

Imagine you’re an information anarchist. You undermine Wikipedia pages by nuking references. A genie has granted you a wish: you can nuke one entire domain. Just one. As a data-driven decision maker (who is also an information anarchist 🤷), which would you pick? A common choice is The Internet Archive. 2.9 million Wikipedia pages reference it. But, you’re sneakier than that. A page isn’t undermined just because some references are gone. It’s undermined when all the references are gone. ...

Erdos Unit Distance Problem

An OpenAI model solved the Erdos unit distance problem. Erdos roughly said, “The number of edges of the same distance between N points can’t compound faster than close to 0%.” The model found a method of placing points so that it compounds at about 1.4%. This visualization is a crude way of visualizing how that works.

Longest repeated paragraph on Wikipedia

What is the most frequently occurring sentence in Wikipedia? ANS: A 213-word paragraph about how minor planets are named, which appears in 418 Wikipedia articles, word-for-word! There are ~380,000 asteroids. Wikipedia has 418 pages for these - including one for each thousand-range of asteroids. Every single one of these pages includes the phrase: As minor planet discoveries are confirmed, they are given a permanent number by the IAU’s Minor Planet Center (MPC), and the discoverers can then submit names for them, following the IAU’s naming conventions. The list below concerns those minor planets in the specified number-range that have received names, and explains the meanings of those names. ...

Correcting instruction debt

Here’s another AI-generated post, with Anand editor notes. But I’ve also added my own version of the post below. I told my “find a free calendar slot” script to “Avoid weekends and holidays”. Wednesday vanished. Turns out it’s a Singapore holiday (Anand: It’s Eid al-Adha), — irrelevant for the people I was meeting in other zones. I’d debugged my own helpful rule. (Anand: What? What does “debugged my own helpful rule” even mean?) ...