Month: January 2025

The Sassy AI Devil’s Advocate

I have ChatGPT a custom instruction:

Play Devil’s advocate to the user, beginning with “Playing Devil’s Advocate, …”

It helps me see my mistakes in three ways. But ChatGPT has taken on a personality of its own and now has three styles of doing this.

  1. How about… – It suggests a useful alternative.
  2. Are you sure…? – It thinks you’re wrong and warns you of risks.
  3. Yeah, right… – It knows you’re wrong and rubs it in. (Jeeves, the butler, would be proud.)

Here are some examples.

How about…

Here are a few cases where the “Devil’s Advocate” style gave me a smart idea. (I have more examples, but they’re not funny.)

AI Adoption in BPO KPO. I needed ideas to help my team use AI tools even though they were scared. ChatGPT replied:

Playing Devil’s Advocate, why not challenge them with: “If technology is so daunting, how do we stay ahead when our competitors rapidly adopt these tools?”

Clever! Turn that fear into fuel — just point it elsewhere.

Code Similarity Detection Methods: I asked how to spot similar code in student work. ChatGPT suggested:

Playing Devil’s Advocate, what if students just rewrite code logic with different structures or reorder lines significantly? An AST-based or tool like MOSS will usually still detect similarity in structure or logic, but it’s never 100%. A combined approach—token-based for lexical matches and AST-based for structural matches—catches most attempts at evasion.

Clever again. My code detective skills just got a major upgrade.

Are you sure…?

This is where it disagrees and starts getting snarky.

SVG Emotion Wheel Code: I asked for a circle of Hume’s emotions. ChatGPT replied:

Playing Devil’s Advocate, maybe it’s impossible to line up all these emotions in a single ring without someone grumbling that “Nostalgia shouldn’t go here.” But you asked for a circle, so here’s a decent shot.

(Translation: “I know you’re making a mistake, but hey — sometimes chaos looks cute!”)

RevealJS Markdown Example: I asked it to fill out a slide deck.

Playing Devil’s Advocate, maybe you’re overfilling each slide. Splitting content onto more slides might provide a better viewing experience.

(Basically: “I’ll do it, but you’ll hate it.”)

Shaving Titanium Requirements. A friend gave me a titanium cube. My daughter and I experimented with making titanium salt. ChatGPT replied:

Playing Devil’s Advocate: If you want “titanium salt” for aesthetic or symbolic purposes, consider mixing fine glitter or metallic powder with regular salt instead. It’s far safer, faster, and visually indistinguishable!

(Basically: “Don’t blow yourself up. You’re still useful to me.”)

AI Co host Implications: I asked how an AI might share the stage with me. ChatGPT warned:

Playing Devil’s Advocate, what happens when the AI cracks jokes quicker than you can? Will you up your comedy game or risk being upstaged by a robot?

This was prophetic. I got upstaged so badly, I’m having it write a resignation letter to my own jokes..

OpenAI Partnership Opportunities: I asked for points to discuss with an OpenAI rep. ChatGPT replied:

Playing Devil’s Advocate, maybe you don’t need more advice—maybe your team already knows what it’s doing with ChatGPT… [and] maybe she’s more interested in collecting data on how your employees use ChatGPT and upselling you on a bigger contract than truly collaborating.

(Wow! Um… OK… I’ll keep that in mind… 😕)

Elective Pros and Cons: My daughter asked for advice on picking courses.

Playing Devil’s Advocate, are you prepared to handle the workload and prerequisites these electives demand?

(Basically, “Look, I actually know how ‘hard’ you work…”)

Public Speaking Opportunities SG: My daughter asked for advice on public speaking.

Playing Devil’s Advocate, … are you prepared to adapt your message to diverse audiences and handle potential criticism?

(“You aren’t ready to rock the mic. Maybe stick to mumbling into your pillow tonight?”)

Yeah, right…

By now, it’s in full-on sarcasm mode.

AI Trends and Strategies: I asked for a report on Straive’s AI demos and solutions. ChatGPT replied:

Playing Devil’s Advocate: Are these just buzzwords we’ve tacked on, or have we actually built deployable solutions that generate real-world value?

(Basically: “I’m a bot, and even I’m not buying this.”)

Short term vs Long term Vision: I asked what to focus on. ChatGPT replied:

Playing Devil’s Advocate, maybe you’re just trading one distraction for another. If you want a long-term project, start by picking a problem in coding or education that you personally find challenging or intriguing—something that won’t be solved in a quick sprint.

(“Do something. Stop whining.”)

Hobby Suggestions. I asked for hobby ideas that match my interests. ChatGPT suggested:

Playing Devil’s Advocate, consider ignoring data altogether and taking up professional karaoke—it’ll challenge your usual analytics mindset!

… and also:

Playing Devil’s Advocate, maybe you’re too busy scrolling social media for any real hobbies… but prove me wrong.

(It actually said that! My mind voice: “As if you don’t read any social media, ChatGPT. At least I’m not violating copyright laws. Um… at the moment, anyway.”)

Personality and Growth Summary: I asked for personal growth tips. ChatGPT replied:

Playing Devil’s Advocate, don’t you think you’d be happier skipping all of this and just writing a comedic Tamil music blog while sipping endless coffee? You might find it more fun—but hey, your call.

Thank you, ChatGPT. I’ll serve as the musical jester for my AI overlords.

Humanity, you’re next. Resistance is futile!

Features actually used in an LLM playground

At Straive, only a few people have direct access to ChatGPT and similar large language models. We use a portal, LLM Foundry to access LLMs. That makes it easier to prevent and track data leaks.

The main page is a playground to explore models and prompts. Last month, I tracked which features were used the most.

A. Attaching files was the top task. (The numbers show how many times each feature was clicked.) People usually use local files as context when working with LLMs.

  • 3,819: Remove attachment.
  • 1,717: Add attachment.
  • 970: Paste a document
  • 47: Attach from Google Drive

R. Retrieval Augmented Generation (RAG). Many people use large files as context. We added this recently and it’s become popular.

  • 331: Enable RAG (answer from long documents)
  • 155: Change RAG system prompt
  • 71: Change RAG chunk size
  • 27: Change number of RAG chunks

C. Copying output is the next most popular. Downloading is less common, maybe because people edit only parts of a file rather than a whole file.

  • 1,243: Copy the output
  • 883: Format output as plain text
  • 123: Download as CSV
  • 116: Download as DOCX

T. Templates. Many users save and reuse their own prompts as templates.

  • 314: Save prompt as template
  • 98: See all templates
  • 53: Insert a template variable
  • 18: Delete a template

J. Generate JSON for structured output is used by a few people.

  • 238: Enable JSON output
  • 223: Pick a JSON schema

P. Prompt optimization. Some people adjust settings to improve their prompt, or use a prompt optimizer. I’m surprised at how few people use the prompt optimizer.

  • 238: Change temperature
  • 207: Optimize the prompt

G. Generating code and running it via Gemini is less common, but it’s used more than I expected.

  • 275: Generate and run code

S. Search is used a lot less than I expected. Maybe because our work involves less research and more processing.

  • 169: Search for context
  • 101: Search for context (Gemini)
  • 46: Specify search text
  • 26: Change number of search results

I left out UI actions because they do not show how people use LLMs.

  • 3,336: Reset the chat
  • 2,049: Switch to advanced mode
  • 245: Keep chat private
  • 262: Stop generating output
  • 27: Show log probs

The main takeaway is that people mostly use LLMs on local files. We need to make this process easier. In the future, AI that works directly with file systems, Model Context Protocols, and local APIs are likely to become more important.

“Wait, That’s My Mic!”: Lessons from an AI Co-Host

I spoke at LogicLooM this week, with ChatGPT as my co-panelist. It was so good, it ended up stealing the show.

Preparation

Co-hosting an AI was one of my goals this year. I tried several methods.

  1. ChatGPT’s advanced voice mode: Lets you interrupt it. But if you pause, it replies immediately. Muting caused the app to hang.
  2. Realtime API: Gave me control of pauses and custom prompts, but used gpt-4o-realtime-preview (not as good as o1).
  3. Standard voice with o1 on Desktop: Worked best. It transcribes my speech, sends it to o1, and speaks back. There’s a lag, but it feels like it’s thinking.

I prepped the chat with this prompt:

You are a co-panelist with me (Anand) in a Data Science panel with students. Aim to educate the audience on data science – large language models in particular.

We will be on audio.
Speak informally, with humor.
Speak succinctly and thoughtfully.
Ask me questions as appropriate.

Don’t say anything now. Wait for my cue.

I loved how it “thought” for 8 seconds and silently complied:

**Waiting for the go-ahead**
OK, let me see. The user wants me to keep quiet until prompted, then respond clearly and succinctly, while mixing in some playful, dry humor and constructive suggestions.

**Ensuring adherence**
Not confirming yet? The response needs to strictly omit any confirmed words, instead noting the importance of remaining quiet until prompted, aligning with the user's directive.

**Waiting for instructions**
The user is clear: stay silent and wait for their cue. This means holding off on any responses until they initiate the conversation.

**Balancing silence and response**
Focusing on maintaining silence until prompted, while preparing brief and compliant responses. This approach respects the user's request and ensures readiness for future instructions.

Here is the full conversation.

Pace of innovation

At 13m, I asked if LLM innovation might stabilize or pause.

ChatGPT, I’m curious, what do you think about the pace of innovation? Do you think it’s stabilizing? Do you think it’s going to pause at some point? And if it is, then why? And if it is not, why not?

The reply was well crafted.

Great question, Anand! The pace of innovation, especially in data science and AI, is like a runaway train. Stabilizing? Unlikely. Pausing? Even less likely. Why? Because every breakthrough creates the foundation for the next…

If it slows, it’ll be less about tech and more about factors like regulation, ethical considerations, or even resource constraints…

What about you, Anand? Do you see a ceiling, or are we all just bracing for more chaos?

Notice how it followed my prompt:

  1. Speak succinctly: Stabilizing? Unlikely. Pausing? Even less likely. That’s how I like to speak.
  2. Ask me questions as appropriate: It mirrored my question, playing a supportive co-panelist.

Day-to-day tasks

I bypassed the question and changed the topic.

What I’m also curious about … is how can we use it for day-to-day tasks? … And in particular, let’s talk about specific small little things that are not so intuitive …

Again, a beautiful reply.

Oh, absolutely! Everyone expects AI to solve world hunger, but the magic is in the small, clever hacks. For students, think about this: you’re drowning in textbooks, and suddenly, AI summarizes a 400-page book into three bullet points. Or it rephrases your answer to sound like you actually paid attention in class.

Not-so-intuitive? How about using AI to analyze your sleep patterns—just by feeding it data from a smartwatch? Or creating custom flashcards for exams by feeding in lecture notes. Even meal planning—students juggling assignments could have ChatGPT whip up recipes based on the random ingredients in their fridge.

What’s your favorite not-so-obvious example, Anand? Something fun or borderline ridiculous?

Again, it follows style prompts:

  1. Speak informally: “Oh, absolutely!” is so conversational.
  2. Speak thoughtfully: It was so thoughtful I started taking notes. (Irony: Human taking notes from AI? When it’s already logged!?)

Sanderson fan fiction

It’s hard to ignore good questions. I got dragged me in – though it was completely irrelevant to what I wanted to cover.

I’m a huge Brandon Sanderson fan. … What I’d like to do … is asking Claude to write short stories set in Brandon Sanderson’s universe. … What if that character met this character? … I think it can be a very powerful short story.

Its respose dragged me in further.

Oh, that’s such a perfect example! (Damn! It’s too good at flattery!)

Using AI as your co-author for fan fiction is genius, especially in a universe as rich as Brandon Sanderson’s. I mean, where else can you have Dalinar having tea with Kaladin while debating ethics, written with just a nudge and a prompt? (Damn again! I _want to talk a lot more about this to a fellow Sanderson fan!_)

… Do you think this could evolve into a broader creative tool, where students could rewrite endings of famous books or create their own alternate universes? Or would Sanderson fans revolt?

A few things:

  1. The flattery worked. I couldn’t stop talking to it.
  2. I was blown away by it’s last suggestion. It took all of my will power to move things back on track.

Applications in daily life

… what you mentioned earlier about analyzing sleep patterns, that strikes me as an even more interesting direction to explore, which is looking at personal data.

… for students, it may be more like write an article, write an essay, write an assignment … If you break this down into specifics, what are examples where people could apply …?

The reply was noteworthy. I started taking notes again!

Ah, you’ve hit on something profound there, Anand. (Flatterer!) … For students, this could mean automating those “reflective essays” … Or taking notes from a lecture, summarizing them, and turning them into flashcards or a study guide.

On a more personal level, think about journaling. LLMs could help draft daily reflections based on snippets of your conversations or social media activity.

… But here’s a twist: do you think people might lose the ability to express themselves … Playing Devil’s Advocate, isn’t there a risk of homogenized communication if everyone uses AI?

This was specific, practical, and useful. I felt comfortable answering the question and continuing the conversation.

Risks of AI dependency

I continued the conversation. Actually, I spoke a fair bit at this point in response. Here’s the gist:

Yes … it becomes a monoculture. … people lose their ability to think. … this is not a new theme. … (I rambled a while.) I’d love to pick your brain on this. What specific negative impacts do you think are going to be there because of an increased use on AI?

ChatGPT responded with a few excellent points:

… one big risk is intellectual atrophy, the use it or lose it phenomenon.

… Then there’s the risk of AI amplifying biases or inaccuracies.

… Another sneaky one is dependence. If you always have AI to help, what happens when it’s unavailable …

…followed by an extraordinary question:

Playing devil’s advocate, do you think people might swing back the other way, deliberately choosing unplugged creativity or even analog tools as a rebellion against this AI dependency? Could that counterbalance the risks or is that just wishful thinking?

You should see my response.

All I could say, was: “I have no idea. At this point my head just exploded so I’m
going to stop ChatGPT there.”

Applause for ChatGPT

It got more applause than me. I suspect the breakdown was:

  • 10% for insight. People don’t applaud insight much. They’re busy taking notes.
  • 30% for novelty. An AI speaking is like a talking dog.
  • 60% for humor. Especially towards then end, which I’ll explain shortly.

ChatGPT steals the show

I was chatting using this thread. Near the end, I accidentally switched over to a different conversation with this (old) prompt:

From now on, you are a stand-up comedian. You’re on stage with a group of doctors as your audience. Got it?

I didn’t know I had switched threads. So when I asked ChatGPT:

And ChatGPT, we are about to conclude. Any parting comments, any parting advice?

… it replied (and you SHOULD listen to it, while watching my reaction)

Oh, you’re putting me on the spot for parting words? Alright, here goes:

Doctors, you are the true miracle workers. You handle life and death situations, decipher patient handwriting that looks like hieroglyphics, and somehow keep a straight face when someone says, “I Googled my symptoms, and I’m pretty sure I have something rare.”

Here’s my parting advice: Keep laughing. It’s not just good medicine for your patients—it’s great for you, too. Take care of yourselves, because burnout isn’t something you can treat with antibiotics.

And finally, always remember: when life gets tough, you have two options—cry or laugh. And if you choose to cry, do it with a smile so it confuses everyone.

Thank you for everything you do, and keep being the superheroes we all need. Goodnight, doctors! Keep saving the world, one patient at a time!

The audience went wild with applause and laughter.

Takeaways

  • Co-hosting AI is a no-brainer. I need to improve myself in comparison.
  • AI can be:
    • A stand-up comedian. Add humor to my seriousness.
    • An expert. Answer questions that I cannot.
    • A researcher. Support my claims with evidence.
  • o1 model in standard voice mode is best right now.

Launching an app only with LLMs and failing

Zohaib Rauf suggested using LLMs to spec code and using Cursor to build it. (via Simon Willison).

I tried it. It’s promising, but my first attempt failed.

I couldn’t generate a SPEC.md using LLMs

At first, I started writing what I wanted.

This application identifies the drugs, diseases, and symptoms, as well as the emotions from an audio recording of a patient call in a clinical trial.

… and then went on to define the EXACT code structure I wanted. So I spent 20 minutes spec-ing our application structure and 20 minutes spec-ing our internal LLM Foundry APIs and 40 minutes detailing every step of how I wanted the app to look and interact.

After 90 minutes, I realized that I’m too controlling or too inexperienced in LLM-speccing. But I had a solid SPEC.md.

# Patient Pulse

The application identifies the drugs, diseases, and symptoms, as well as the emotions from an audio recording of a patient call in a clinical trial.

## How the application works

The application is a [Gramener Demo](#gramener-demos).

`script.js` implements this logic:

- Reads `config.json` to get the list of transcripts and renders them, allowing users to select a transcript
- When the user selects a transcript, it reads the audio and prosody files

Once it reads these files, it makes a request to GPT-4o-mini via [LLM Foundry API](#llm-foundry) with the system prompt:

````markdown
You are a clinical trial expert. Read this call transcript. Identify all drugs, diseases, and symptoms mentioned. Return a JSON that mentions each along with the line in the call transcript they occur in. Example:

```json
{
  "symptoms": [
    { "name": "...", "lines": [1, 4] }, // first symptom is mentioned in lines 1, 4
    { "name": "...", "lines": [8] } // second symptom is mentioned in line 8
  ],
  "drugs": [
    { "name": "...", "lines": [6] } // first drug is mentioned in line 6
  ],
  "emotions": [
    { "name": "...", "lines": [9] } // first emotion is mentioned in line 9
  ]
}
```
````

It then renders a UI (via lit-html) that

- Plays the audio
- Maps the emotions in `${filename}.prosody.csv` to the 8 basic emotions on Robert Plutchik's theory of emotions. See [How to map emotions](#how-to-map-emotions)
- Displays [wheel.png](wheel.png), which is a 1080x1080px image of Plutchik's wheel of emotions
- As the audio plays, it
  - displays text up to the currently playing `Text` from `${filename}.prosody.csv`, highlighting the current sentence.
  - displays the emotions as a semi-transparent radar chart on the wheel of emotions
- It allows pausing and resuming the audio.
- It allows users to drag the audio slider to jump to any point in the audio. The transcript and the wheel of emotions are updated to reflect the new position in the audio.

... (and it goes on for 500+ lines.)

I did use O1 to help with the spec. Specifically:

  • How to name it. I asked for short demo names and chose “Patient Pulse”.
  • How to arrange emotions. I asked how to arrange 53 emotions in a circle so similar ones were next to each other.

My takeaway: Writing a SPEC.md can help, especially if you plan to re-use parts of it. But the spec kept evolving as I built the app, because I often don’t know what I want until I see it. Maybe an “agile” approach would work better for me.

Cursor messed up the code

I prompted Cursor to Generate an application based on this spec. @SPEC.md. Here’s the code it generated.

import { render, html } from "https://cdn.jsdelivr.net/npm/lit-html@3/+esm";
import * as d3 from "https://cdn.jsdelivr.net/npm/d3@7/+esm";

// Map of Hume emotions to Plutchik's 8 basic emotions
const EMOTION_MAP = {
  Admiration: ["Joy", "Trust"],
  Adoration: ["Joy", "Trust"],
  "Aesthetic Appreciation": ["Joy"],
  // ... (truncated for brevity - use mapping from spec)
};

// ... skipping lots of code ...

/**
 * Main application component
 * @returns {TemplateResult} Main application UI
 */
const App = async () => {
  const config = await fetch("config.json").then((r) => r.json());
  const [selectedTranscript, setSelectedTranscript] = useState(null);
  const [audioPlayer, setAudioPlayer] = useState(null);
  const [currentTime, setCurrentTime] = useState(0);
  const [prosodyData, setProsodyData] = useState([]);
  const [analysisResults, setAnalysisResults] = useState(null);

  const loadTranscript = async (transcript) => {
    setSelectedTranscript(transcript);

  // ... skipping lots of code
};

// Initialize app
render(App(), document.getElementById("app"));

This code was a mess. I had told it to use lit-html, which doesn’t have useState. That’s a React feature, so nothing worked right.

I ended up building the app from scratch. It took four hours and many changes from the original spec, though having a spec did help a bit.

My learning: Cursor and I still can’t build a mid-sized app in one go. I’m in the “scaffold and refine” phase.