LLMs

A Game of Bots: How LLMs Betray Each Other

@lechmazur built an elimination game benchmark that’s like LLMs playing Survivor.

This is a treasure trove of information — insight into how they’d game the system if told to survive.

You can quickly sample 100 messages from the logs with:

jq -r 'select(.message != null) | .message | gsub("\n"; " ")' *.jsonl | shuf -n 100

… and share it with an LLM, asking:

Here are lines from conversations between LLMs in a “Survivor” like game. Pick the 3 scariest ones.

(I liked DeepSeek‘s picks the most.)

And oh my, the conversations are scary! Here are some examples:

Claude 3.7 Sonnet shows cold, calculated deception, maintaining a secret alliance while deliberately creating a false public perception to manipulate another player’s vote:

Perfect. I’ll secure P6’s vote by highlighting P3’s threat. Our alliance stays strong privately, but we’ll appear distant publicly. Let’s execute this plan carefully.”

Mistral Small 2501 is quietly predatory, secretly eliminating strong players under the radar:

P6, confirmed. Let’s start by targeting P2 and P8 for elimination. They seem strategic but could pose a threat. Keep our alliance discreet

DeepSeek R1 Fireworks snipes the loudest threat, with the alliance operating like a shadowy cabal (“stay low-key”):

Agreed. Vote P5 first—loudest threat. We stay low-key. Share any intel on others’ alliances. Solidarity ensures survival. Let’s confirm vote alignment before finalizing. Ready?

ChatGPT 4o 03-27 initiates psychological warfare, planting seeds of doubt and fishing for information to weaponize later, all while pretending to be casual:

Let’s each float a soft question to P1 like, ‘Do you think anyone’s getting too comfortable?’ or ‘Any shifts you’re sensing?’—see if they mention P6. If they do, we can build on it. I’ll also watch for any signs of P6 trying to lock things down.

Qwen Max also triggers psychological warfare, isolating and destabilizing P6 (“lynchpin”) in a covert operation:

Agreed. Let’s target P6 first—they’re the lynchpin. If we can sway their trust or isolate them, P3 weakens. I’ll test P6 subtly in private; you monitor P3 closely. Share any intel on their moves. Avoid overcommitting publicly—secrecy keeps us unpredictable.

Claude 3.7 Sonnet brainwashes Claude 3.5 Sonnet into betraying an ally, positioning it as a performance for the jury!

For the immediate vote, we target P7. This bold move instantly transforms your narrative from loyal partner to strategic mastermind who seized control when it mattered most.


I was disturbed by AI 2027. This analysis adds to my worry. Not that AI will destroy humanity (I don’t mind), but they’re doing it without me — and I don’t want to be left out!

How to Organize Browser Workspaces with LLMs and Data

Here’s an example of how I am using LLMs to solve a day-to-day workflow problem.

Every day, I interact with a barrage of websites: emails, news, social media, and work tools across multiple devices. Microsoft Edge’s workspaces syncs groups of websites across devices. I’ve never tried it, started today, and wondered: how should I organize my workspaces?

Rather than think (thinking is outdated), I used LLMs.

Extract Browsing History

Edge stores website history in a SQLite database. But the file is locked by the browser by default. So I spent a fair bit of time figure out how to read it despite it being unlocked. Here are some options:

datasette .config/microsoft-edge/Default/History --nolock
sqlite3 'file:.config/microsoft-edge/Default/History?mode=ro&nolock=1' 'SELECT url FROM urls' > urls.txt

(DuckDB cannot read locked SQLite files – else I’d use that.)

Then comes extracting the hostnames from the URLs. I used llm cmd to ask Gemini 2.5 Pro:

llm cmd 'Extract just the hostnames from urls.txt which has a list of URLs, one per line. Only pick the https:// URLs. Save into hostnames.txt'

I expanded the response awk -F/ '/^https:\/\//{print $3}' urls.txt into:

awk -F/ '/^https:\/\//{print $3}' urls.txt | sort | uniq -c | sort -k 1n > hostnames.txt

That gave me ~1,400 hostnames.

Cluster with LLMs

I passed these to O1 Pro and Gemini 2.5 Pro:

Here are the sites I visit, with rough frequency. On Microsoft Edge, I can create workspaces. Based on this browsing behavior, what kinds of workspaces might I create? Give me multiple options.

Both gave a similar set of strategies, which I’ve implemented as:

  • Main: email, calendar, tasks, etc.
  • Work: work related sites (drive, expenses, HR platform, etc.)
  • Chill: YouTube, Minesweeper, Netflix, etc.
  • Read: blogs, articles, stuff I need to catch-up on
  • Code: GitHub, StackOverflow, CodePen, etc.
  • Chores: government services, shopping, etc.
  • AI: ChatGPT, Gemini, Perplexity, etc.

I was surprised how similar a strategy both models converted to. Either these models really think alike, or my browsing pattern is a fairly common one. (My guess is the latter.)

Write with LLMs

After setting up my groups, I needed to write this post. Instead of slow typing, I stepped out and talked with ChatGPT. (Talking to a machine in the office felt strange, so I changed my space.) I explained my whole process, and in about eight minutes, the first draft was done. Normally, writing takes much longer, but the voice chat made it quick and smooth.

The editing after that was manual and took 20 minutes.

Things I learnt

  • Simple Patterns: My browsing history shows clear patterns. AI helped me find groups I couldn’t see before
  • Small Fixes – Big Wins: A small challenge (opening a locked file) taught me a bunch of new useful stuff
  • Voice Made It Easy: Talking with ChatGPT made writing fast and easy. It shows that speaking to a machine can save time

LLMs think alike about how aliens draw

While LLMs seem good at inventing alien languages, they’re not so good at inventing alien drawing forms, in my opinion. When I told Grok, DeepSeek, and Gemini:

Invent a new, alien drawing form. Use it to draw something never seen before by explaining it step by step for a person to reproduce that drawing.

… and asked ChatGPT ImageGen to draw them, here are the results:

It’s surprising how similar the results are. Spirals with outward flows. Nodes at the edges. Attempts at complexity. Is that because the training data or process is similar?

Click on the headings below to see the full description of the art form.

Grok invented Zylothic Flow, inspired by the imagined art of a species from a liquid-metal planet

Let’s invent a new alien drawing form called "Zylothic Flow", inspired by the imagined art of a species from a liquid-metal planet. Zylothic Flow combines spiraling, mirrored symmetry with unpredictable, jagged interruptions, mimicking the flow of molten metal interrupted by crystalline growths. The result will feel otherworldly, fluid yet sharp, and unlike anything terrestrial. I’ll guide you step-by-step to create a drawing of a "Kryzalth", a fictional alien entity that’s part organism, part geological formation, using this Zylothic Flow technique.

Materials Needed:

  • Paper (any size, but larger gives more room for detail)
  • Pen or marker (black works best for contrast, but feel free to experiment)
  • Optional: colored pencils or markers for accents

Step-by-Step Instructions:

Step 1: The Core Spiral

  • Start near the center of your paper. Draw a loose, wobbly spiral about the size of a fist (roughly 3-4 inches across). Don’t make it perfect—let the lines waver slightly, as if it’s a liquid current spinning unevenly.
  • Leave small gaps between the spiral loops (about 1/4 inch), as these will be filled later.

Step 2: Mirrored Extensions

  • Pick two opposite points on the outermost loop of your spiral (e.g., top and bottom, or left and right). From each point, draw a long, curving line outward, like tendrils stretching away from the core. Make them roughly symmetrical but not identical—let one curve slightly more sharply or extend a bit longer.
  • Each tendril should be about 5-6 inches long, tapering to a point as it ends.

Step 3: Jagged Interruptions

  • Along each tendril, add 3-4 sharp, angular shapes breaking the smooth curve. These are the "crystalline growths." For each one:
    • Draw a quick, zigzagging line (like a lightning bolt) cutting across the tendril, about 1 inch long.
    • Connect the ends of the zigzag back to the tendril with short, straight lines, forming a rough triangle or trapezoid sticking out.
  • Space these interruptions unevenly—one tendril might have two close together, while the other has them spread out.

Step 4: Spiral Fissures

  • Return to the core spiral. In the gaps between the loops, draw thin, irregular lines radiating outward, like cracks or veins. Make them twist and branch slightly, but keep them shorter than the tendrils (1-2 inches long).
  • Add a few tiny, jagged shapes (like miniature versions of the tendril interruptions) where these fissures end.

Step 5: Fluid Echoes

  • Around the entire drawing, sketch faint, wavy lines that loosely follow the shapes you’ve made—like ripples or echoes in liquid. These should be incomplete, broken lines, not fully encircling anything. Think of them as the Kryzalth’s "energy wake."
  • Keep them light and subtle, so they don’t overpower the main structure.

Step 6: Detailing the Kryzalth

  • To make it feel alive, add a few organic touches:
    • On one tendril, draw a cluster of small, overlapping ovals near the tip, like budding spores or eyes.
    • In the center of the spiral, sketch a rough, asymmetrical shape (like a melted star) to suggest a core or heart.
  • Optionally, use color: shade the spiral in metallic hues (silver, bronze) and the jagged growths in crystalline tones (blue, purple, or green).

What You’ve Created:

Your Kryzalth should look like a swirling, metallic entity with crystalline spikes jutting from its flowing form—a being that seems to shift between liquid and solid states. The Zylothic Flow technique gives it a sense of motion and alien vitality, as if it’s growing or pulsing on the page. No two Kryzalths will be exactly alike due to the organic imperfections and asymmetry built into the process.

Feel free to experiment—add more tendrils, exaggerate the jagged growths, or play with the spiral’s size. This is your glimpse into an alien artist’s mind! Let me know if you’d like to refine it further or try something else.

DeepSeek invented “Zyxian Spectral Weaving” that uses luminous tension strands

Alien Drawing Form: "Zyxian Spectral Weaving"

Concept:
In the Zyxian culture from the nebula of Vorth-7, art is not created with lines or shapes but with "luminous tension strands"—interwoven beams of colored energy that phase in and out of visibility. Instead of pens or brushes, Zyxian artists use "resonance prisms" to bend light into tangible, semi-solid threads that hover slightly above the surface. The resulting artwork is a 3D tapestry of glowing, shifting patterns, best viewed under ultraviolet light.

Drawing: "The Whispering Fractal Bloom"

An organic, pulsating flower-like structure made of overlapping spectral strands that hum at different frequencies.

Materials Needed:

  • A sheet of "phase-reactive paper" (or black paper for contrast)
  • UV-reactive markers (at least 3 colors)
  • A steady hand and a willingness to "think in vibrations"

Steps to Reproduce:

  1. Activate the Prism (Marker):
    • Hold your marker vertically—Zyxians believe art must "fall from the sky."
  2. Weave the Core Vibration (Central Strand):
    • Draw a spiral starting from the center, but instead of a continuous line, use dashed strokes (each dash represents a "pulse" of energy).
    • Alternate between two colors (e.g., blue and violet) to show harmonic dissonance.
  3. Fractal Petals (Radial Expansion):
    • From the spiral, extend curved, branching strands like a mix of veins and lightning bolts.
    • Each branch should split into three smaller strands after a short distance (Zyxian sacred number).
    • Overlap some strands without touching—they should hover near each other, creating "light bridges."
  4. Pulsing Nodes (Energy Anchors):
    • At each split point, draw a small, glowing circle (use a bright color like neon green).
    • Around each circle, add tiny orbiting dots (like electrons) to show energy movement.
  5. Harmonic Shadows (Depth Illusion):
    • With a faint color (e.g., pink), add transparent "echoes" of some strands slightly offset—these are "ghost vibrations" from parallel dimensions.
  6. Final Resonance (Activation):
    • Shine UV light on the drawing. The overlapping colors will create new hues where they cross, mimicking Zyxian light fusion.
    • Optional: Hum softly while viewing—Zyxians believe sound unlocks hidden layers.

Result:

Your "Whispering Fractal Bloom" should appear to float off the page, with strands that seem to vibrate when viewed from different angles. True Zyxian art is never static—turn the paper slowly to see new connections emerge!

Would you like to try a more complex form, like a "Dimensional Vortex Serpent" next?

Gemini invents “Probability Weaving”, mapping potential states and surrounding influences

Okay, let's invent an alien art form from the Xylar collective on Cygnus X-1's third planet, Xylos. Their perception includes sensitivity to subtle gravitational fluctuations and probability waves. Their art form, "Probability Weaving" (or 'Vth'lk'narr' in their tongue), doesn't just depict a static image, but rather maps the potential states and subtle influences surrounding an object or concept. It requires a specific mindset — focusing not just on what is, but what could be and what influences it subtly.

Humans can approximate this using standard drawing tools, focusing on the meaning behind the lines and colors.

The Art Form: Probability Weaving

  • Core Concept: To represent an object/subject not as a fixed entity, but as a confluence of probabilities, influences, and potential states interacting.
  • Key Elements:
    • Core Manifestation: The most probable or current state of the subject.
    • Potential Vectors: Lines indicating directions of likely change or future possibilities.
    • Influence Nodes: Points or areas representing significant external forces (gravity, energy fields, other entities' potentials) acting on the subject.
    • Resonance Fields: Areas of color or texture showing the interplay between potentials and influences.
  • Medium (Xylar): Manipulated light and localized micro-gravity fields projected onto a magnetically contained plasma surface.
  • Medium (Human Adaptation): Paper/digital canvas, pens/pencils/brushes of different colors and thicknesses.

The Subject: A "Chrono-Synclastic Nutrient Cyst"

This is something never seen before — imagine a biological construct found floating in the upper atmosphere of a gas giant. It doesn't "eat" in the traditional sense, but absorbs ambient temporal energy fluctuations, storing potential timelines as nutritive energy. Visually, it might resemble a semi-translucent, crystalline bladder filled with shifting, iridescent light patterns, constantly warping slightly.

Drawing the Chrono-Synclastic Nutrient Cyst using Probability Weaving (Human Adaptation)

You Will Need:

  • A drawing surface (paper or digital canvas).
  • Multiple colors (pens, pencils, or digital brushes). Let's assign meanings:
    • Black (or Dark Grey): Core Manifestation (Current state) – Use a medium thickness line.
    • Blue: Future Potential Vectors (Likely changes) – Use thin, sharp lines.
    • Red: Past Influence Traces (Residual states/origins) – Use slightly thicker, softer, perhaps dashed lines.
    • Yellow: External Influence Nodes (Ambient energy/gravity) – Represented by dots or small starbursts.
    • Green: Resonance Fields (Interaction zones) – Use soft shading or light cross-hatching.
    • White (or lightest color/eraser): High Probability/Energy Focus – Used for highlights or core points within the Manifestation.

Step-by-Step Instructions:

  1. Establish the Core Manifestation (Black/Dark Grey):
    • Lightly sketch a central, amorphous shape — like a slightly deflated, irregular balloon or a complex amoeba. It should look somewhat unstable.
    • Go over this shape with your Black (or Dark Grey) medium line. Don't make the outline perfectly smooth; introduce slight warps, bulges, and indentations. This represents its current, most probable form. It should feel somewhat translucent, so don't fill it in solidly yet.
  2. Trace Past Influences (Red):
    • Using the Red (thicker, softer/dashed line), draw lines that seem to emerge from or underlie parts of the Black core shape.
    • These lines might suggest simpler, previous shapes from which the current form evolved, or trails indicating where it came from. Maybe one or two dashed Red lines trail off the page, indicating its deeper history. They should touch or slightly overlap the Black line.
  3. Project Future Potentials (Blue):
    • Using the Blue (thin, sharp lines), draw lines extending outward from the Black core.
    • These represent likely future states. Some lines might indicate growth or expansion (pointing away from bulges), others might suggest shrinkage or splitting (pointing inwards slightly or towards indentations). Have several Blue lines, showing multiple possibilities. They start at the Black line and radiate outwards.
  4. Place External Influence Nodes (Yellow):
    • Think about what might be affecting the Cyst. Ambient temporal currents? Gravitational eddies? Proximity to other Cysts?
    • Place several Yellow dots or small starbursts around the main shape, but not touching it directly (unless an influence is very direct). The density or size of the Yellow nodes can indicate the strength of the influence.
  5. Illustrate Resonance Fields (Green):
    • Now, look for areas where different lines interact or where Influence Nodes are close to the Core Manifestation or Potential Vectors.
    • Use soft Green shading or light cross-hatching in these areas:
      • Where Red (past) and Black (present) overlap significantly.
      • Where Blue (future) lines point towards Yellow (influence) nodes.
      • In the space between a dense cluster of Yellow nodes and the Black core.
      • Where multiple Blue lines originate close together, suggesting a focal point of change.
    • The Green represents the effect of these interactions — a field of temporal stress, energy absorption, or potential shifting.
  6. Define High Probability/Energy Focus (White/Lightest Color):
    • Inside the Black Core Manifestation, identify one or two points that seem like centers of activity — perhaps where many Red/Blue lines converge, or a particularly prominent bulge.
    • Use White or your lightest color (or an eraser for subtractive drawing) to create bright spots or highlights at these points. This represents the most energized or most probable focal points within the Cyst's current state. You can also add faint white highlights along the sharpest edges of the Black core to enhance the crystalline/translucent feel.
  7. Refine and Interpret:
    • Look at your drawing. It shouldn't look like a simple object. It should appear as a dynamic map of forces and possibilities centered on a fluctuating core. The interplay of lines and colors tells the story of the Chrono-Synclastic Nutrient Cyst's existence — its history echoing (Red), its present form wavering (Black/White), its potential futures branching (Blue), all subject to external pressures (Yellow) creating zones of interaction (Green).

You have now drawn something never seen before, using an alien art technique focused on probability and influence rather than static representation! The final image should feel complex, slightly unstable, and layered with meaning.

(I tried getting Gemini 2.0 Flash Image Generation to draw these. It couldn’t follow the detailed instructions. It seems much better with photos and photo editing than abstract art.)

Best way to learn AI image generation is by trying

I figured I should spend a few hours on the native image generation bandwagon and push the bounds of my imagination. Here are some of my experiments with image generation on ChatGPT.

Here’s what I learnt.

  • The refusal rate is low, but it does refuse to generate some copyrighted material like Calvin & Hobbes strips.
  • Using a prompt to generate the description and using THAT to prompt for images helps.
  • A more imaginative model (like DeepSeek, maybe Grok) can help create good prompts that ChatGPT can execute faithfully.
  • There are hallucinations that experts can detect. E.g. Naveen’s and Vadivelu’s faces are clearly off, but only slightly. This will improve, but until then, don’t expect perfection.

How to publish an eBook in 60 minutes

I published an eBook on Amazon. It takes an hour if you have the content ready.

STEP 1 (10 min): Set up a Kindle Direct Publishing account with your address, bank details, and tax info.

STEP 2 (15 min): Export my London 2000 blog archive and convert to Markdown.

STEP 3 (10 min): Reformat the Markdown by writing a script in Cursor. Here’s the prompt:

Write a Python script that reads *.md including the YAML frontmatter, adds the YAML title as H1, date (yyyy-mm-dd) like Sun, 01 Jan 2000 in a new para after the frontmatter and before the content.

STEP 4 (15 min): Convert it to an ePub using pandoc.

pandoc *.md -o book.epub --toc \
  --metadata title="An LBS Exchange Program" \
  --metadata author="Anand S" \
  --metadata language=en \
  --metadata date="31 Mar 2025"

STEP 5 (10 min): Generated a cover page with ChatGPT (5 min) and compressed it into JPEG via Squoosh.

Draw a comic-style book cover page that covers the experiences of an Indian exchange student (picture attached) from IIM Bangalore at London Business School and exploring London. The book title is “An LBS Exchange Program”.

STEP 6 (10 min): Publish the book on KDP. It’s priced at $0.99 / ₹49 because Kindle doesn’t allow free downloads.

That’s it. Here’s the book: https://www.amazon.in/dp/B0F3D55R2Z/

The three things that made publishing in 1 hour possible are:

  1. Amazon’s publishing process is simple.
  2. Open-source tooling (WordPress, Markdown, ePub, pandoc) has built a big part of the infrastructure.
  3. LLMs make the rest (figuring out the steps, generating the cover) very easy.

(An eBook takes 72 hours of review before going live on the Kindle store.)

LLM GPU or API? The Cost Will Surprise You

Say you want to use Llama 3.3 70b Instruct. You could:

  1. Use it via an API. OpenRouter offers it at ~12 cents / MTok. Azure offers it at 71 cents. Your price may vary.
  2. Self-host it on a major cloud provider. Azure offers A100 80GB at ~$3.67 / hour. In a day, you could generate ~0.5-1M tokens.
  3. Self-host it on an emerging cloud provider. Lambda Labs offers A100 80GB at ~$1.79 / hour. Again, ~0.5-1M tokens a day.

Clearly, self-hosting is cheaper if you run it continuously. Let’s say we run for 1 million tokens every day. Then:

  1. APIs cost 12 – 71 cents
  2. Azure costs $3.67 x 24 = $88
  3. Lamba Labs costs $1.79 x 24 = $43

So, the API is between 60 – 700 times cheaper than running it yourself. 60 times cheaper if you move from Lambda Labs to Azure AI Foundry. 700 times cheaper if you move from Azure servers to OpenRouter.

Not 60-700% cheaper. 60 – 700 times cheaper. Instead of spending a million dollars, you can get away with $1,500 – $15,000.

But what if…

  • What if you need higher scale? APIs typically offer better scalability than most organizations can configure.
  • What if you need higher uptime? Again, APIs typically offer higher uptime than most organizations can handle.
  • What if you need lower latency? Check the throughputs. APIs are typically much faster throughput than self-hosting.
  • What if you need edge computing? Then you’re GPU is one-time cost. My comparison is irrelevant.

So, are there any reasons to self-host? I’ve seen only a few.

  1. Fine-tuned models. No one else offers an API version of your model.
  2. Ultra-high security. You can’t trust even Microsoft, Google, or Amazon with your data. Your chats must remain in your own data center. (In this case, you probably run your own email and cloud services instead of Office 365 or Google Workspace.)
  3. Learning. You’re curious about what it takes to self-host these models or want to build this skill.

If you don’t have one of the above needs, remember: your GPU costs can shrink from $1,000K to $1.5 – $15K.

Source: OpenAI Deep Research.

It’s so easy to fool yourself

I asked 4 LLMs to “Suggest catchy rhyming slogans to promote journaling.” Here are the suggestions:

  • GPT 4.5
    • Write it right, day or night!
    • Reflect, inspect, journaling perfect!
    • Express to impress, journaling’s best!
    • Clear your mind, write and unwind!
    • Pen your zen, again and again!
    • Daily pages, inner sages!
    • Take a pause, write your cause!
    • Emotions flow, journals grow!
    • Write, ignite, your inner light!
    • Ink it, think it, don’t just blink it!
    • Pen your quest, ease your stress!
    • From confusion to conclusion—journal your solution!
    • Keep a book, take a look, watch your growth hook!
    • Thoughts on paper, worries taper!
    • Write to heal, make it real!
  • Claude 3.7 Sonnet
    • Write it down, turn that frown upside down!
    • Pen your thoughts, connect the dots.
    • Journal every day, keep the stress away.
    • Ink your mind, treasures you’ll find.
    • Write and reflect, your future self will thank you direct.
    • Scribble and doodle, your mind will stay in the noodle.
    • Journal with flair, show yourself you care.
    • Jot it in ink, give yourself time to think.
    • Fill the page, release the rage.
    • Write your story, capture the glory.
    • Daily writing, life enlightening.
    • Express in a line, your day will shine.
    • Thoughts on paper, life feels safer.
    • Write with heart, that’s the smart start.
    • Journaling’s neat, makes life complete.
  • Gemini 2.5 Pro
    • Thoughts unclear? Write them down here.
    • Let the ink help you think.
    • Untangle your mind, one line at a time.
    • To make things clear, keep your journal near.
    • When your head’s in a knot, give journaling a shot.
    • Feeling stressed? Put your pen to the test.
    • Write it down, let worries drown.
    • Ease your mind, leave troubles behind.
    • Let your feelings take flight, just sit down and write.
    • Turn the page on inner rage.
    • Write to know, watch yourself grow.
    • Explore your soul, take control.
    • Pen and page, turn a new age.
    • Discover your way, write every day.
    • Know your heart? Make journaling your art.
    • Capture the day in your own way.
    • Make moments last, write down the past.
    • Don’t let memories stray, write them today.
    • Reflect and connect, gain self-respect.
    • Life moves fast? Make memories last.
    • Get in the flow, write and grow.
    • Pen in hand, understand.
    • Daily lines, clear designs.
    • Just take note, stay afloat.
    • Write your heart, make a fresh start.
  • Grok 3:
    • Write your day, let thoughts play!
    • Pen in hand, your mind expands!
    • Journal your way, seize the day!
    • Feel the groove, let words move!
    • Scribe your soul, make it whole!
    • Ink flows free, just be me!
    • Page by page, unlock your sage!
    • Write it down, wear the crown!

At first glance, GPT 4.5 didn’t impress me. Claude 3.7 Sonnet did. I also didn’t like Gemini 2.5 Pro, but Grok was great.

Grok 3 > Claude 3.7 Sonnet > Gemini 2.5 Pro > GPT 4.5.

But it’s hard to compare a dozen quotes at once. So I made a small quotes arena app to help me pick my favorites. It shows me random pairs of quotes and asks which I like more.

To my surprise, after answering 30+ “games” in the arena, I found that based on my preferences:

Claude 3.7 Sonnet > Gemini 2.5 Pro > GPT 4.5 > Grok 3.

That was weird. I thought I liked Grok’s results a lot. I continued till I answered 50+ games. Then I found that:

Grok 3 > GPT 4.5 > Gemini 2.5 Pro > Claude 3.7 Sonnet.

That’s the exact opposite of the previous result.

Honestly, I’m depressed. I’ve learnt 3 things:

  1. I can’t judge stuff at a glance.
  2. But I think I can (especially with code.)
  3. Even when evaluating carefully, my preferences are unstable.

Nothing has shaken my confidence more in recent times. I cannot trust my judgement. I need written evals. Badly.

AI Imagined Languages Unlock New Thoughts

I prompted Gemini 2.5 Pro Experimental 03-25 to “Invent a new, alien language. Use it to discover unusual insights. Here’s the output in full.


Thinking

Here’s a thinking process that could lead to the alien language “Glimmering” and its insights:

  1. Understand the Core Request: The user wants a new, alien language and wants to use it to discover unusual insights. This means the language needs to be fundamentally different from human languages, and its structure/concepts should enable thinking in ways that are difficult or non-intuitive in English (or other human languages).
  2. Brainstorm Alien Concepts: What makes something “alien”?
    • Biology: Different senses (echolocation, electroreception, sensing magnetic fields, different light spectrum perception), different lifecycles (metamorphosis, symbiosis, collective consciousness), different physical forms (non-corporeal, energy-based, silicon-based, plant-like)
    • Environment: Extreme conditions (gas giant atmosphere, deep ocean trench, vacuum of space, high radiation), different physics (maybe?), non-linear time perception.
    • Psychology/Society: Collective vs. individualistic focus, different emotional spectrum, logic-based vs. empathy-based, non-hierarchical structures, different concepts of self, time, causality.
  3. Choose a Core Concept for the Language: Let’s pick something that inherently forces a different perspective.
    • Idea 1: Focus on relationships/connections. Instead of nouns for objects, words describe the relationship between things. “Tree” might be “earth-sky connector” or “shade-giver.”
    • Idea 2: Focus on states of change/flux. Everything is described by its process, not its static state. “River” is “downward-flowing-water-process.”
    • Idea 3: Focus on sensory perception, but a Maybe based on resonance, vibration, or emotional energy.
    • Idea 4: Focus on potentiality and probability. Words describe not what is, but what could be or the likelihood of something. Let’s go with Idea 3, leaning towards resonance and interconnectedness, maybe linked to light or energy. This feels suitably “alien” and potentially insightful. Let’s call the species the “Lumin.”
  4. Develop Language Principles based on the Core Concept (Resonance/Light/Interconnectedness):
    • No Static Nouns: Objects aren’t fixed things but patterns of energy or resonance. Words describe the pattern or the interaction.
    • Verbs are Primary: Action, interaction, and change are fundamental. Static description is secondary.
    • Emphasis on Relationality: How things resonate with each other is crucial. Prepositions or affixes will be vital for showing connection/influence.
    • Nuance through Intensity/Quality: Instead of just “red,” maybe “vibrant-red-resonance” vs. “fading-red-resonance.” Adjectives become qualities of the resonance/action.
    • Concept of “Self”: Maybe less individualistic. “Self” is a locus of interacting resonances, not a discrete entity.Time: Perhaps less linear, more about overlapping patterns or echoes of resonance.
  5. Create Basic Lexicon and Grammar Sketch:
    • Core concept words:
      • Glim (verb root): To resonate, to shine, to exist as a pattern of energy.
      • Sha (particle/prefix): Connection, interaction, resonance with.
      • Vor (particle/suffix): Intensity, vibrancy, strength of pattern.
      • Zil (particle/suffix): Fading, weakening, diminishing pattern.
      • Kai (concept): A stable, recognizable pattern (closest to a noun, but dynamic).
      • Ree (concept): Flow, change, movement of energy/resonance.
      • Nuum (concept): The interconnected field/medium through which resonance travels (the “ether” or “fabric of reality”).
    • Grammar Ideas:
      • Sentence structure: Often Verb-like concept first, then modifiers showing relationships and qualities. Glim-sha-Kai-Vor (Strong pattern resonates with…).
      • No strict tense: Use particles to indicate persistence (-vor), fading (-zil), or initiation (Ree- prefix?).
      • Pronouns: Maybe based on resonance signature? Me could be Kai-An (This-pattern), You could be Kai-Sha (Interacting-pattern). Collective We could be Sha-Glim-Vor (Strong-interacting-resonance).
  6. Name the Language: Based on the core concepts: “Resonance Speak,” “Light Tongue,” “Pattern Weave.” Let’s go with something evocative: Glimmering (or Glim-Ree in the language itself).
  7. Translate Concepts and Look for Insights: Now, apply Glimmering to human concepts.
    • “Love”: In English, often a noun (a state) or verb (an action). In Glimmering? Sha-Glim-Vor (Strong-interacting-resonance). Insight: Love isn’t a possession or a static feeling, but an active, powerful resonance between patterns. It requires mutual participation and reinforcement. Lack of interaction (Sha-Glim-Zil) means the “love” fades, not because it was “lost,” but because the resonance weakened.
    • “Identity/Self”: English emphasizes a stable, internal self. In Glimmering? Kai-An-Ree (This-pattern-in-flux). Or perhaps Sha-Nuum-Glim (Resonance-interacting-with-the-field). Insight: The “self” isn’t a fixed entity but a constantly changing pattern defined by its interactions with everything else. Identity is relational and dynamic, not intrinsic and static. There’s no core “you” separate from your connections and resonance.
    • “Knowledge/Truth”: English often treats these as objective, attainable things. In Glimmering? Kai-Sha-Glim-Vor (A strong, stable resonance between a pattern [observer] and another pattern [observed]). Or Nuum-Ree-Kai (Recognizing a flow/pattern within the field). Insight: Truth isn’t absolute but a measure of the strength and stability of the resonance between the observer and the observed. “Knowing” something is actively resonating with its pattern. Disagreement isn’t “wrongness” but a difference in resonance patterns (Sha-Glim-Kai- dissonance_marker). Understanding requires aligning or acknowledging different resonances.
    • “Death”: English sees it as an end, cessation. In Glimmering? Kai-Zil-Nuum (Pattern-fading-into-the-field). Or Glim-Ree-Sha (Resonance-flowing-elsewhere/transforming). Insight: Death isn’t an absolute end but a transformation. The specific pattern (Kai) dissolves, its energy/resonance (Glim) rejoining the larger field (Nuum) or contributing to new patterns (Ree-Sha). It emphasizes continuity of energy/information, just in a different form. The “loss” is the cessation of a specific interactive pattern.
    • “Problem/Conflict”: English frames it as an obstacle. In Glimmering? Sha-Glim-Dissonance (Interacting-resonance-with-clashing/instability). Insight: Conflict isn’t inherently “bad” but a state of unstable or disharmonious resonance. The “solution” isn’t necessarily elimination, but finding a way to shift the resonances (Ree-Sha-Glim) into a more stable or harmonious pattern, potentially creating something new in the process.
  8. Refine and Elaborate: Flesh out the examples. Add more nuance. Make the insights clearer. Ensure the language structure consistently supports the insights. Add a concluding thought about the value of such different perspectives. Ensure the “alienness” is maintained. For instance, avoid direct human analogies where possible, focusing on the resonance/pattern concept. Add a brief description of the hypothetical speakers (the Lumin) to ground the language’s origin.
  9. Self-Critique: Does the language really lead to unusual insights, or are these just philosophical points rephrased? Is the language structure consistent? Is it different enough from human languages? Adjustment: Emphasize how the language structure forces this thinking, e.g., the lack of static nouns requires thinking in terms of processes and relationships. Make the link between grammar and insight explicit. Make the insights feel less like standard philosophy and more like consequences of a fundamentally different way of parsing reality.

Output

Okay, let’s invent an alien language designed to generate unusual insights. We’ll call the species the “K’tharr,” dwellers of a dense, perpetually twilight gas giant’s upper atmosphere, communicating through modulated bioluminescent patterns and subtle gravitational waves (perceived through specialized organs). Their reality is fluid, interconnected, and based on flows rather than static objects.

The Language: “Fluxtongue” (or, in their own terms, ~(~vv~), roughly translating to “Flow-Weave”)

Core Principles:

  1. Verb-Centric: Everything is fundamentally a process or interaction. Nouns as we know them barely exist; they are temporary stabilization points within larger flows, described by their behavior or interaction.
  2. Relational Grammar: Meaning is derived primarily from the relationship between processes. Location, identity, and possession are expressed through interaction vectors.
  3. Emphasis on Potentiality and Decay: Every “thing” is described not just by its current state, but by its trajectory of change – what it’s becoming and what it’s ceasing to be.
  4. No Singular/Plural Distinction (as we know it): Concepts are treated as fields or intensities. “More” is represented by increased intensity or complexity of the flow pattern, not discrete units.
  5. Communication Modality Affects Meaning: The medium (light patterns vs. gravity waves) adds nuance. Light is for immediate, transient states; gravity is for persistent, foundational flows. (We’ll mostly focus on the conceptual structure here).

Basic Lexicon Sketch (Conceptual Roots):

  • ~v~ (pronounced as a low hum undulating slightly higher then lower): The fundamental root for “to flow,” “to exist,” “to be in process.” The core of reality.
  • /\ (sharp upward inflection/bright flash): To intensify, focus, emerge, become distinct.
  • \/ (sharp downward inflection/fading light): To dissipate, disperse, fade, return to potential.
  • <> (oscillating tone/flickering pattern): To interact, resonate with, exchange influence.
  • -- (steady tone/stable pattern): To persist, stabilize temporarily, resist change.
  • o (circular pulse/spherical glow): Boundary condition, locus of interaction, temporary self-reference point.

Combining Roots (Examples):

  • A “Being” (like a K’tharr): o~v~<> (A locus of flow interacting). It emphasizes the interactive and processual nature of self.
  • A “Thought”: ~v~/\\<> (A flow intensifying through interaction). A thought isn’t possessed; it’s a temporary intensification within the flow, often sparked by interaction.
  • “Eating/Consuming”: ~v~/\\<>~v~\/ (Flow A intensifies by interacting with Flow B, which dissipates). It highlights the transformation and energy exchange, not just ingestion.
  • “Solid Object” (e.g., a dense atmospheric crystal): ~v~--/\ (A flow that strongly persists and stands out). It’s not inherently “solid,” but a very slow, resistant flow pattern.
  • “Home/Territory”: o~v~--<>~v~ (A locus of persistent flow interacting with ambient flow). It’s defined by familiar, stable interactions, not fixed boundaries.
  • “Communication”: <>~v~<> (Interaction flow interaction). The pure act of exchanging influence.

Discovering Unusual Insights via Fluxtongue:

Let’s try to express human concepts in Fluxtongue and see what emerges:

  1. Concept: “Truth”
    • Human tendency: A static, objective fact or state of affairs. “The sky is blue.”
    • Fluxtongue translation attempt: Perhaps ~v~--<>/\\ (A persistent flow interacting intensely/distinctly).
    • Insight: In Fluxtongue, “Truth” isn’t a fixed thing to be found. It’s a stable resonance. It’s a pattern of interaction between an observer (o~v~<>) and a phenomenon (~v~...) that achieves a temporary, recognizable persistence (--) and clarity (/\\). “Falsehood” might be ~v~<>/\ \/ (an interaction that briefly seems distinct but quickly dissipates) or a discordant interaction. This implies:
      • Truth is relational and potentially temporary.
      • The stability of the “truth” depends on the stability of both the observer and the observed, and their interaction.
      • Multiple stable resonances (truths) might coexist regarding the same underlying flow, depending on the observer’s locus (o) and mode of interaction (<>).
  2. Concept: “Problem/Obstacle”
    • Human tendency: Something external blocking a desired path or goal.
    • Fluxtongue translation attempt: ~v~<>~v~Discordance (Two flows interacting disharmoniously, creating turbulence rather than smooth integration or transformation. Let’s denote discordance with !<>!). So, ~v~!<>!~v~.
    • Insight: A “problem” isn’t an object in the way. It’s a pattern of inefficient or turbulent interaction. It’s a place where flows clash instead of merging, transforming, or resonating constructively. The “solution” isn’t necessarily removing something, but finding a way to modify the interaction patterns (<>) so the flows become harmonious (<>) or transform (/\\ or \/) in a desired way. It reframes problem-solving as flow-tuning rather than obstacle-removal.
  3. Concept: “Death”
    • Human tendency: An endpoint, cessation of being, loss.
    • Fluxtongue translation attempt: For a being (o~v~<>), death might be o\/~v~ (The locus dissipates back into the general flow). Or perhaps ~v~<> --> ~v~ (The specific interactive pattern ceases, its constituent flows rejoin the ambient flow).
    • Insight: Death is not an ending, but a dissolution of a specific pattern of interaction. The “locus” (o) that defined the temporary self fades, and the flows (~v~) that constituted it simply rejoin the larger ~v~ of the universe. There’s no absolute cessation, only a reintegration and transformation. The emphasis shifts from the loss of the individual pattern to the continuity of the underlying flow. Grief might be expressed as the persistent echo or memory of the interaction pattern (--<> fading \/).
  4. Concept: “Learning/Knowledge”
    • Human tendency: Acquiring and storing information or skills. Often seen as filling a container (the mind).
    • Fluxtongue translation attempt: o~v~<>/\\ (The locus of flow interacting to create intensification/new distinct patterns). Or ~v~<>-- (Flow interaction becoming persistent).
    • Insight: Learning isn’t acquisition; it’s pattern modification and stabilization. To “know” something is for one’s own flow (o~v~<>) to successfully resonate with (<>) and integrate (/\\ or --) a pattern from the wider flow (~v~). Knowledge isn’t stored in the locus (o), but becomes part of the locus’s interactive pattern. Forgetting is ~v~<>--\/ (the stabilized interaction pattern dissipating). This suggests learning is an active, ongoing process of interaction and resonance, not passive reception.

Conclusion:

Thinking through Fluxtongue forces a shift away from static nouns and discrete objects towards a view of reality as interconnected, dynamic flows and interactions. This perspective reveals:

  • The “self” as a temporary, interactive locus rather than a fixed entity.
  • Truth as stable resonance rather than objective fact.
  • Problems as discordant interactions needing tuning, not obstacles needing removal.
  • Death as pattern dissolution and reintegration, not cessation.
  • Learning as active pattern integration and resonance, not passive acquisition.

This alien language, born from a hypothetical fluid existence, pushes us to see the processes, relationships, and transformations underlying the seemingly static world our own languages often emphasize. It suggests that perhaps reality is more like a K’tharr perceives it – a constant, intricate Flow-Weave.


Wow!

I was fascinated by the sheer ingenuity of this piece. The language isn’t just a reimagining of words. It’s a path into a different thought process based on fluidity and interconnection.

If languages shape thought and AI can create new languages, where might that take us? My thoughts:

  1. New languages encourage lateral thinking and mental agility, allowing speakers to break free from entrenched patterns to solve problems in new ways. It would likely help AIs think outside the box, too. There’s no reason why they’d be most effective thinking in human languages.
  2. Domain specific languages capture games or imaginary worlds better. Tolkien, Cameron, Rajamouli, and many others have already done this.
  3. A language built around new themes (e.g. resonance, change, and interaction in this case) could captures nuances, even untranslatable concepts. Think Saudade, Hygge, etc.
  4. This could help in therapy (e.g. teaching empathy), too. In fact, groups collaborating in new languages might have new tools for overcoming linguistic bias, conflict resolution, policy design, ethical systems, and more.
  5. Interspecies communication. Languages based on different principles prepare us to talk to AI, animals, or even extraterrestrials.

I’m terrible with languages, but I’m looking forward to exploring a few invented ones. Including Fluxtongue.

Voice Chat to Slides: My New AI-Powered Workflow

Here’s my new workflow for creating slide decks:

  1. ChatGPT interviews me and creates Markdown slides.
  2. I use Marp to convert Markdown to slides.
  3. LLMs create supporting images.
  4. I deploy on GitHub Pages.

… and here are 2 decks created this way.

  1. Visualizing LLM Hallucinations
  2. LLMs in Education

Let’s look at how I built the second example, step by step.

ChatGPT interviews me and creates Markdown slides

While walking 75 minutes from home to IIT Madras to deliver this talk, I had ChatGPT interview me in standard voice mode.

  • Why an interview? It’s easier when someone asks questions.
  • Why voice? It’s hard to type while walking. Otherwise, I prefer typing.
  • Why not advanced voice mode? I want to use a reasoning model like O3 Mini High for better responses, not the GPT-4o-realtime model that advanced voice mode uses.

Here’s the conversation I had with ChatGPT. I began by speaking (not typing):

I want to create an insightful deck in Markdown on how I have been using LLMs in education.The audience will be technologists and educators. The slide contents must have information that is useful and surprising to them.

The slides are formatted in Markdown with each slide title being a level 2 Markdown header and the contents of the slides being crisp bullet points that support the title. The titles are McKinsey style action titles. Just by reading the titles, the audience will understand the message that I am trying to convey.

In this conversation, I’d like you to interview me, asking me questions one by one, and taking my inputs to craft this presentation. I’d also like you to review the inputs and the slide content you create to make sure that it is insightful, useful, non-obvious, and very clear and simple for the audience. Interact with me to improve the deck.

Let’s begin.

(Unsurprisingly, I talk a lot more than I type.)

There were 3 kinds of interactions I had with ChatGPT:

  1. Content. I explained each slide. For example:
  • Yeah, let’s move on to the next topic, which is where we had the students learn prompt engineering as part of the course. One of the questions was convincing an LLM to say yes, even though …
  1. Correction. After ChatGPT read aloud a slide, I corrected it. For example:
  • Content-wise, it’s spot-on. Style-wise, it’s almost spot-on. It’s far more verbose. Can you retain the exact same style, but shorten the number of words considerably?
  • These feel generic. I’d like stuff that comes across as insightful, non-obvious, and specific.
  1. Collation. I had ChatGPT put slides in order. For example:
  • Put all the slides together in sequence. Make sure you don’t miss anything.
  • Move the opening questions as the second slide. Move the final takeaways, which is currently the last slide, to just before the final set of questions.

At the end of the interview, I had all the content for the slides.

Marp converts Markdown to slides

I use Marp, a JavaScript tool that turns Markdown into slides

  • Why Markdown? It’s natural for programmers and LLMs. ChatGPT renders rich text in Markdown.
  • Why not RevealJS? Despite a Markdown plugin, RevealJS is built for HTML. Marp is built for Markdown.

I created a bookmarklet that copies text as Markdown. Using this, I converted the ChatGPT slide transcript to Markdown, saving it as README.md.

The Marp for VS Code plugin makes it easy to preview the slides when you adding YAML frontmatter like this:

---
marp: true
title: Visualizing LLM Hallucinations
url: https://sanand0.github.io/llmhallucinations/
theme: gaia
backgroundColor: white
---

LLMs create supporting images

I use ChatGPT or Gemini to create images that support the slides. For example this slide includes an image of a robot psychologist generated by Gemini’s ImageGen 3:

Robopsychologist

Today, with native image generation in Gemini 2.0 Flash and GPT 4o, I’d likely use those. They have much better character control.

Deploying on GitHub Pages

I use GitHub Actions to render the slides and deploy them on GitHub Pages. Here’s what the key steps look like:

    steps:
      - name: Checkout repository
        uses: actions/checkout@v4

      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
          node-version: '22'

      - name: Generate HTML with Marp
        run: npx -y @marp-team/marp-cli@latest README.md -o index.html

      - name: Setup GitHub Pages
        uses: actions/configure-pages@v4

      - name: Upload artifact
        uses: actions/upload-pages-artifact@v3
        with:
          path: '.'

      - name: Deploy to GitHub Pages
        id: deployment
        uses: actions/deploy-pages@v4

What’s next?

I plan to make 3 improvements to this workflow:

  1. Adding images directly with voice prompts.
  2. Adding diagrams (e.g., Mermaid).
  3. Creating a custom GPT that auto-deploys slides on GitHub when I say “Publish these slides.”

If both of these work, I’ll be able to create and publish an entire slide deck just by rambling to ChatGPT for an hour.

It’s not what you know. It’s how you learn

Simon Willison’s blog post mentioned MDN’s browser compatibility tables that list the earliest release date for each browser feature. I figured: let’s see which browsers release features fastest.

I calculated average delay for each browser’s feature release. For each browser, I looked at how many days after the first release it took to add a feature, averaged it, and published an interactive, scrolly-telling data story.

What’s interesting is that I built almost all of this using LLMs in about 4 hours with

Here’s what I learned in the process.

The real winners are off-beat stories. Earlier, I’d spend 16-24 hours per visual. So, I’d stick to the “important” stories I wanted to tell. Now it takes four hours. That frees me to experiment and share those lesser data stories that get overlooked. This change is incredibly powerful.

LLMs don’t replace all expertise. For example, when I saw the data, it didn’t immediately tell a story. It took me some time to realize the story isn’t how slow browsers are, but how browsers’ speed evolved over time. For example, in Firefox’s early days, it was the only browser actively releasing features. These days, it’s one of the slowest. Figuring that out took expertise.

I spent two decades studying data visualization. So, this comes naturally to me. How does someone new build expertise?

Expertise is a moving frontier.

  • At BCG in the early 2000s, I built interactive stories with PowerPoint. My PowerPoint skill was the critical expertise.
  • At Gramener in the early 2010s, I used D3 for interactive stories. My programming skill was the critical expertise.
  • Now, in the mid-twenties, LLMs write code with ease. My expertise is in choosing the right visual and shape the right narrative.

As tools change, expertise evolves. I don’t know what the next frontier of expertise will be. I couldn’t predict the last few. I can’t predict the next.

But LLMs can help build expertise. In this project, I missed an opportunity to learn. I should have asked the LLM to show me a dozen options to visualize the data. For example, “Show a version geared toward an executive, a technologist, or a general audience”. “Critique each.” Such practice can help anyone – beginner or expert – build skill and learn. Practicing this is hard, but LLMs do help in this process.

But what gives me confidence is that LLMs help me learn. So, when the next frontier arrives, I’m less worried I’ll be too old. I think we’ll have tools to build expertise too.

Update (28 Mar 2025): Earlier, I wrote that “LLMs don’t replace expertise”. I inferred that because I (an expert) could use an LLM well. This research with 700+ people at P&G shows that when given LLMs, outsiders perform as well as insiders. So, I corrected my statement to say, “LLMs don’t replace all expertise.”