Using game-playing agents to teach

After an early morning beach walk with a classmate, I realized I hadn’t taken my house keys. My daughter would be sleeping, so I wandered with my phone. This is when I get ideas - often a dangerous time for my students. In this case, the idea was a rambling conversation with Claude that roughly begins with: As part of my Tools in Data Science course, I plan to create a Cloudflare worker which allows students to play a game using an API. The aim is to help them learn how to build or use AI coding agents to interact with APIs to solve problems. ...

Leaked key sociology

It’s impressive how easy it is to find leaked API keys in public repositories. I asked Codex to run trufflehog on ~5,000 student GitHub accounts and (so far, after a few hours, 15% coverage), it found quite a few. Some are intended to be public, like Google Custom Search Engine keys. 1 2 const GOOGLE_API_KEY = "AIza..."; const GOOGLE_CX = "211a..."; Some are Gemini API keys. 1 2 3 4 5 6 7 api_key1 = "AIza..." But what’s really impressive is, when I ran: GEMINI_API_KEY=AIza... curl "https://generativelanguage.googleapis.com/v1beta/models/gemini-3-flash-preview:generateContent" \ -H 'x-goog-api-key: $GEMINI_API_KEY' \ -H 'Content-Type: application/json' \ -d '{"contents": [{"parts": [{"text": "Hi"}]}]}' … on most leaked Gemini API keys, I got: ...

Gemini CLI harness is not good enough

I’ve long felt that while the Gemini 3 Pro model is fairly good, the Gemini CLI harness isn’t. I saw an example of this today. Me: Tell me the GitHub IDs of all students in this directory. Gemini CLI: SearchText 'github' within ./ Found 100 matches (limited) Sending this message (14606686 tokens) might exceed the remaining context window limit (1037604 tokens). Me: Only send the (small) required snippets of data. Write code as required. ...

The Nano Banana Paradox

STEP 1: I asked Nano Banana 2 (via Gemini Pro) to: Imagine and draw a photo that looks ultra realistic but on a closer look, is physically impossible, and can only exist because images are a 2D projection that we extrapolate into three dimensions. Avoid known / popular illusions or images of this kind, like Escher’s work, and create something truly original. Think and draw CAREFULLY! … six times, followed by “Suggest a name for this”. ...

Which LLMs get you better grades?

In my graded assignments students can pick an AI and “Ask AI” any question at the click of a button. It defaults to Google AI Mode, but other models are available. I know who uses which model and their scores in each assignment. I asked Codex to test the hypothesis whether using a specific model helps students perform better. The short answer? Yes. Model choice matters a lot. Across 333 students, here’s how much more/less students score compared with ChatGPT: ...

White Pebble Black Pebble

When I was in class 8 or 9, our English teacher told us a story I’ll never forget. There was a poor farmer who lived in a village. He owed the zamindar (landlord) of the village a lot of money. The zamindar had an eye on his daughter. “Marry your daughter to me, and I’ll forgive your debt,” he said. The farmer was reluctant. “Please, sir, what will the village say about your marrying such a young girl?” he asked. ...

AI for film dialogues

I was watching Vasu while Codex-ing and came across this dialogue: Here’s the dialogue, recorded via ffmpeg, transcribed via AI Studio: మీ నాన్న మిమ్మల్ని పోలీస్ ఆఫీసర్ అవ్వమని అడిగితే అయ్యారా? మీకు ఇష్టం కాబట్టి అయ్యారు. సచిన్ టెండూల్కర్ ని ఇంజనీర్ ని చేయాలని వాళ్ళ నాన్న అనుకుని ఉంటే, ఇండియా ఒక గొప్ప క్రికెటర్ ని మిస్ అయ్యేది. విశ్వనాథ్ ఆనంద్ ని డాక్టర్ ని చేయాలని వాళ్ళ అమ్మ కోరుకుని ఉంటే, ఇండియాకి ఓ గ్రాండ్ మాస్టర్ ఉండేవాడు కాదు. ...

Using Codex to improve Codex

Instead of learning and applying new Codex features, I asked it to analyze my sessions and tell me what I’m under-using. I'd like you to analyze my Codex sessions and help me use Codex better. sessions/ has all my past Codex sessions. Search online for the OpenAI Codex release notes for the latest features Codex has introduced and read them - from whatever source you find them. Then, create a comprehensive catalog of Codex features. Then, analyze my sessions and see which feature I could have used but didn't and make a comprehensive list. Then summarize which features I should be using more, how, what the benefits are, and with examples from my sessions. Document these in one or more Markdown files in this directory. Write scripts as required. Commit as you go. It did a thorough job of listing all the new features and analyzing my gaps. ...

AnalAIzing Cloud Costs

I have a GitHub Education since I teach at IITM. But if I switch back to a free account, how much would I need to pay? I asked Codex (5.3, xhigh): My GITHUB_TOKEN is in .env. Go through my GitHub billing. Ignore the $100 sponsorships I make. Other than that, my current metered usage is $6.71 for Feb 2026 (which is included in my billing plan). $0.35 comes from sanand0/exam and $0.34 from sanand0/blog and so on. That’s coming mostly from “Actions Linux”, occasionally “Actions Storage”. Pick a few of the top repos and tell me what I should do to make the cost zero - or reduce the cost as much as possible. See if there’s a pattern across repos. ...

Rofi vs Kanata

Kanata might be the most useful tool I can’t find a use for. It’s a cross-platform keyboard mapper. Some cool features: Make any key a modifier. Ctrl, Shift, Alt, etc. are modifiers. But we can make it so that pressing Space + I/J/K/L maps to Up/Left/Down/Right. Chords. You can map any sequence of keys to anything else. For example, Alt + G, then C can type git commit -m"Experimenting" [ENTER]. Ctrl + M, then Down, can reduce the music volume by 10%. Toggles. Double-clicking Caps Lock activates capitalization for the current word, and once you type a non-letter, it turns off. Or double-clicking Ctrl can turn on “gaming mode” where WASD becomes arrow keys, and double-clicking again turns it off. Tap Dance. Double-clicking left-shift can turn on Caps Lock. Triple-clicking turns it off. Quadruple-clicking … … and there’s lots more. ...

AI Expert Lens

My current favorite prompt fragment is the expert lens: Think like an expert. In this context: - What patterns would an expert in this field check / recognize that beginners would miss? - What questions would an expert ask that a beginner would not know to? - What problems / failures would an expert anticipate that beginners may not be aware of? - How would an expert analyze this? At each step, explain what they are looking for and why. When I add this to my questions, if feels a lot smarter. ...

AI video compression

I recorded a short screen cast of a demo I built. It was ~900KB - way too large to publish as a thumbnail. So I asked ChatGPT: What’s the best equivalent of squoosh.app for WEBM compression? I’m looking for a free modern high-quality online video compressor. There are a few, and they compressed it to a third of its size, but 300KB is still too large. So I attached the original and asked: ...

Birthday Sandwich Cake

It’s not every day your daughter turns 20. But it is nearly every day that annoying commitments stop you from doing important things - like buying the birthday cake and candles - especially when my wife is traveling. So, late at night, after useless meetings and well after when shops close, I asked Claude (the most creative of the lot): I have bread, Nutella, peanut butter, jam, and the usual household supplies. How can I celebrate my daughter’s 20th birthday with a birthday cake using stuff like these? Any creative ideas? ...

Repurposing blog posts for talks

Recently, I’ve re-used my own writing / transcripts as context to LLMs. For example, I’ve used: My meeting transcripts to answer interview questions My blog posts to write news articles My chat history to extract AI-related advice This repurposing can be used for so many things. For example, before delivering a talk to journalists “Review my Feb 2026 LLM posts and generate a single-sentence, ELI15 high-impact use case for journalists.” gets me list of use cases. Now, all I have to do is show what I did and share how it’s relevant for them, like: ...

Memorable explanations

Our brains remember some things better. Explaining that way makes it stick. Here are the eight things, most important first, that help you: Structure explanations memorably: Face. You remember faces before facts. So cast characters: “Imagine you’re a courier carrying a packet.” Prefer archetypes to real names — less baggage, more imagination. Place. You’re reading down a list now — and the top feels more important. That’s spatial wiring. Turn any concept into a map. Use higher, deeper, nearer, inside, … Tale. You read #1 and #2 first because they came first. Your brain built a cause from that sequence. Time creates cause for free. “Because” makes anything believable. Scale. “Two feet tall” lands instantly. “60 cm” forces you to convert. Your brain doesn’t measure — it compares. Give it reference objects, not just numbers. Deliver explanations memorably: ...

Transcript AI-ded interviews

Priyanka was ghost-writing an interview request from PC Quest for Ankor. Two questions were a bit technical: Straive combines data engineering, analytics, AI, and content services. At a technical level, how are enterprises stitching these capabilities together architecturally and operationally when addressing complex business problems at scale? GenAI systems tend to behave unpredictably when exposed to real workloads. What engineering patterns, monitoring approaches, or runtime safeguards are becoming essential to maintain reliability, performance, and cost control in production settings? … and she asked if I could review. ...

When LLM prices fall 10x every year

In Feb 2024, Claude 3 Opus was the best model, at $15/MTok. In Jul 2024, GPT 4o Mini reached that quality at 10% of the price. In Dec 2024, DeepSeek v3 reached that quality at 1% of the price. Video See the interactive version If the price continues to fall 10x every 11-12 months or so (and it has been), then in a year, a Claude 4.6 Opus like model will cost 1/10th of the $5/MTok today, and in 2 years, 1/100th of that. ...

Gemini Enterprise Business

I got an email from Google Cloud on my work account “excited to introduce you to Gemini Enterprise”. Once I signed up, it said, “you have 30 days to try Gemini Enterprise – Business edition at no cost.” After that, it costs US $21/user/month, which I can subscribe to here. The main differences from Gemini Pro (consumer accounts) seem to be: Data Privacy. Google won’t read or use your data to train. (In Pro, you need to turn it off explicitly. Here, it’s the default.) Admin Controls. Admins can turn off connectors, manage users, retention policies, etc. Copyright Indemnification. If AI infringes copyright and you get sued, Google will find the case. But if you’re using Gemini via your Google Workspace account (i.e. your work account already has Pro subscription), then it makes no difference - it’s all the same. ...

Scepticism and Humility

High Scepticism + High Humility = Scientist. Editor. Indecisive. “Let’s test it.” Good for high-stakes, irreversible decisions. System 2 thinking is slow and effortful. But if you do this too often or too long, you miss the window or other opportunities. High Scepticism + Low Humility = Critic. Troll. Red Hat. “You are wrong.” Good for stress-testing and auditing. To prevent/anticipate failures. But it’s toxic and demoralizing if you do it too much. ...

Using browser history as memory

I have a bad memory. (I need to write about that. I k eep forgetting to.) It’s worsening. Yesterday, I misplaced my debit card for the first time. Or maybe the second…? Which reminds me, I just forgot a call I have now! (Panic.) (15 min later.) So, anyway, therefore, I log stuff meticulously. Like what I did each day, what I ate, what I weigh, what pained me, etc. But the best logging is automated. My phone logs where I am. My bank logs what I spend. My calendar logs who I meet. ...