This is about as insightful as the Ig Nobel winning papers “Boredom begets boredom” and “Whatever will bore, will bore” that methodically documented that bored teachers lead to bored students. But in the spirit of publishing all research without bias for success or novelty, let me share this obvious result.
The Y-axis represents the total score of ~2,000 students on 4 graded assignments, each of ~10 marks. The X-axis represents the percent rank of engagement. The most engaged students are at 100%. The least are at 0%.
How do I measure engagement? By the number of times they visit the page and how early they visit the page (both computed as percent ranks). So, the student who visits the assignment page the most often, and the student who visits the assignment page first, score highest.
For every 10% increase in the engagement, the score increases by about 3 marks. What that means is, if a student leapfrogs ahead of 10% of their batchmates, that effort typically leads to scoring about 3 / 40 = 7.5% more overall.
Does it make a difference if you submit early vs submit late? Here’s some empirical data.
About ~1,000 students at IIT Madras took 3 online quizzes (GA1, GA2, GA3) in the last few weeks. The deadlines were all at midnight (India) on different days. Here’s when they submitted their final answers:
There was a spurt of submissions at the last minute. ~1 out of 8 students submit with < 10 minutes remaining.
Most students submitted ~4 hours before the deadline. In fact, 3 out of 4 students submit on the same day as the deadline.
A fair number of students submitted the previous day/night. 1 out of 6 are diligent and submit a day early.
But does submitting late help, since you get more time? Apparently not.
On average, every time the deadline is halved, the score drops by 1.4%.
For example, on average:
Submitting 1 minute before scores 1.4% less than submitting 2 minutes before
Submitting 2 minutes before scores 1.4% less than submitting 4 minutes before
Submitting 4 minutes before scores 1.4% less than submitting 8 minutes before
… etc.
This means that submitting early morning instead of midnight could give you a 15% advantage.
Of course, this might be confusing cause and effect. Maybe students who do well submit early, and those who struggle submit late.
But is there a merit in faking it till you make it? Perhaps by pretending your deadline is a day early, to get the best of both worlds? Something to think about…
Play Devil’s advocate to the user, beginning with “Playing Devil’s Advocate, …”
It helps me see my mistakes in three ways. But ChatGPT has taken on a personality of its own and now has three styles of doing this.
How about… – It suggests a useful alternative.
Are you sure…? – It thinks you’re wrong and warns you of risks.
Yeah, right… – It knows you’re wrong and rubs it in. (Jeeves, the butler, would be proud.)
Here are some examples.
How about…
Here are a few cases where the “Devil’s Advocate” style gave me a smart idea. (I have more examples, but they’re not funny.)
AI Adoption in BPO KPO. I needed ideas to help my team use AI tools even though they were scared. ChatGPT replied:
Playing Devil’s Advocate, why not challenge them with: “If technology is so daunting, how do we stay ahead when our competitors rapidly adopt these tools?”
Clever! Turn that fear into fuel — just point it elsewhere.
Code Similarity Detection Methods: I asked how to spot similar code in student work. ChatGPT suggested:
Playing Devil’s Advocate, what if students just rewrite code logic with different structures or reorder lines significantly? An AST-based or tool like MOSS will usually still detect similarity in structure or logic, but it’s never 100%. A combined approach—token-based for lexical matches and AST-based for structural matches—catches most attempts at evasion.
This is where it disagrees and starts getting snarky.
SVG Emotion Wheel Code: I asked for a circle of Hume’s emotions. ChatGPT replied:
Playing Devil’s Advocate, maybe it’s impossible to line up all these emotions in a single ring without someone grumbling that “Nostalgia shouldn’t go here.” But you asked for a circle, so here’s a decent shot.
(Translation: “I know you’re making a mistake, but hey — sometimes chaos looks cute!”)
RevealJS Markdown Example: I asked it to fill out a slide deck.
Playing Devil’s Advocate, maybe you’re overfilling each slide. Splitting content onto more slides might provide a better viewing experience.
(Basically: “I’ll do it, but you’ll hate it.”)
Shaving Titanium Requirements. A friend gave me a titanium cube. My daughter and I experimented with making titanium salt. ChatGPT replied:
Playing Devil’s Advocate: If you want “titanium salt” for aesthetic or symbolic purposes, consider mixing fine glitter or metallic powder with regular salt instead. It’s far safer, faster, and visually indistinguishable!
(Basically: “Don’t blow yourself up. You’re still useful to me.”)
AI Co host Implications: I asked how an AI might share the stage with me. ChatGPT warned:
Playing Devil’s Advocate, what happens when the AI cracks jokes quicker than you can? Will you up your comedy game or risk being upstaged by a robot?
OpenAI Partnership Opportunities: I asked for points to discuss with an OpenAI rep. ChatGPT replied:
Playing Devil’s Advocate, maybe you don’t need more advice—maybe your team already knows what it’s doing with ChatGPT… [and] maybe she’s more interested in collecting data on how your employees use ChatGPT and upselling you on a bigger contract than truly collaborating.
(Wow! Um… OK… I’ll keep that in mind… 😕)
Elective Pros and Cons: My daughter asked for advice on picking courses.
Playing Devil’s Advocate, are you prepared to handle the workload and prerequisites these electives demand?
(Basically, “Look, I actually know how ‘hard’ you work…”)
Public Speaking Opportunities SG: My daughter asked for advice on public speaking.
Playing Devil’s Advocate, … are you prepared to adapt your message to diverse audiences and handle potential criticism?
(“You aren’t ready to rock the mic. Maybe stick to mumbling into your pillow tonight?”)
Yeah, right…
By now, it’s in full-on sarcasm mode.
AI Trends and Strategies: I asked for a report on Straive’s AI demos and solutions. ChatGPT replied:
Playing Devil’s Advocate: Are these just buzzwords we’ve tacked on, or have we actually built deployable solutions that generate real-world value?
(Basically: “I’m a bot, and even I’m not buying this.”)
Short term vs Long term Vision: I asked what to focus on. ChatGPT replied:
Playing Devil’s Advocate, maybe you’re just trading one distraction for another. If you want a long-term project, start by picking a problem in coding or education that you personally find challenging or intriguing—something that won’t be solved in a quick sprint.
(“Do something. Stop whining.”)
Hobby Suggestions. I asked for hobby ideas that match my interests. ChatGPT suggested:
Playing Devil’s Advocate, consider ignoring data altogether and taking up professional karaoke—it’ll challenge your usual analytics mindset!
… and also:
Playing Devil’s Advocate, maybe you’re too busy scrolling social media for any real hobbies… but prove me wrong.
(It actually said that! My mind voice: “As if you don’t read any social media, ChatGPT. At least I’m not violating copyright laws. Um… at the moment, anyway.”)
Personality and Growth Summary: I asked for personal growth tips. ChatGPT replied:
Playing Devil’s Advocate, don’t you think you’d be happier skipping all of this and just writing a comedic Tamil music blog while sipping endless coffee? You might find it more fun—but hey, your call.
Thank you, ChatGPT. I’ll serve as the musical jester for my AI overlords.
At Straive, only a few people have direct access to ChatGPT and similar large language models. We use a portal, LLM Foundry to access LLMs. That makes it easier to prevent and track data leaks.
The main page is a playground to explore models and prompts. Last month, I tracked which features were used the most.
A. Attaching files was the top task. (The numbers show how many times each feature was clicked.) People usually use local files as context when working with LLMs.
3,819: Remove attachment.
1,717: Add attachment.
970: Paste a document
47: Attach from Google Drive
R. Retrieval Augmented Generation (RAG). Many people use large files as context. We added this recently and it’s become popular.
331: Enable RAG (answer from long documents)
155: Change RAG system prompt
71: Change RAG chunk size
27: Change number of RAG chunks
C. Copying output is the next most popular. Downloading is less common, maybe because people edit only parts of a file rather than a whole file.
1,243: Copy the output
883: Format output as plain text
123: Download as CSV
116: Download as DOCX
T. Templates. Many users save and reuse their own prompts as templates.
314: Save prompt as template
98: See all templates
53: Insert a template variable
18: Delete a template
J. Generate JSON for structured output is used by a few people.
238: Enable JSON output
223: Pick a JSON schema
P. Prompt optimization. Some people adjust settings to improve their prompt, or use a prompt optimizer. I’m surprised at how few people use the prompt optimizer.
238: Change temperature
207: Optimize the prompt
G. Generating code and running it via Gemini is less common, but it’s used more than I expected.
275: Generate and run code
S. Search is used a lot less than I expected. Maybe because our work involves less research and more processing.
169: Search for context
101: Search for context (Gemini)
46: Specify search text
26: Change number of search results
I left out UI actions because they do not show how people use LLMs.
3,336: Reset the chat
2,049: Switch to advanced mode
245: Keep chat private
262: Stop generating output
27: Show log probs
The main takeaway is that people mostly use LLMs on local files. We need to make this process easier. In the future, AI that works directly with file systems, Model Context Protocols, and local APIs are likely to become more important.
I spoke at LogicLooM this week, with ChatGPT as my co-panelist. It was so good, it ended up stealing the show.
Preparation
Co-hosting an AI was one of my goals this year. I tried several methods.
ChatGPT’s advanced voice mode: Lets you interrupt it. But if you pause, it replies immediately. Muting caused the app to hang.
Realtime API: Gave me control of pauses and custom prompts, but used gpt-4o-realtime-preview (not as good as o1).
Standard voice with o1 on Desktop: Worked best. It transcribes my speech, sends it to o1, and speaks back. There’s a lag, but it feels like it’s thinking.
I prepped the chat with this prompt:
You are a co-panelist with me (Anand) in a Data Science panel with students. Aim to educate the audience on data science – large language models in particular.
We will be on audio. Speak informally, with humor. Speak succinctly and thoughtfully. Ask me questions as appropriate.
Don’t say anything now. Wait for my cue.
I loved how it “thought” for 8 seconds and silently complied:
**Waiting for the go-ahead**
OK, let me see. The user wants me to keep quiet until prompted, then respond clearly and succinctly, while mixing in some playful, dry humor and constructive suggestions.
**Ensuring adherence**
Not confirming yet? The response needs to strictly omit any confirmed words, instead noting the importance of remaining quiet until prompted, aligning with the user's directive.
**Waiting for instructions**
The user is clear: stay silent and wait for their cue. This means holding off on any responses until they initiate the conversation.
**Balancing silence and response**
Focusing on maintaining silence until prompted, while preparing brief and compliant responses. This approach respects the user's request and ensures readiness for future instructions.
ChatGPT, I’m curious, what do you think about the pace of innovation? Do you think it’s stabilizing? Do you think it’s going to pause at some point? And if it is, then why? And if it is not, why not?
The reply was well crafted.
Great question, Anand! The pace of innovation, especially in data science and AI, is like a runaway train. Stabilizing? Unlikely. Pausing? Even less likely. Why? Because every breakthrough creates the foundation for the next…
If it slows, it’ll be less about tech and more about factors like regulation, ethical considerations, or even resource constraints…
What about you, Anand? Do you see a ceiling, or are we all just bracing for more chaos?
Notice how it followed my prompt:
Speak succinctly: Stabilizing? Unlikely. Pausing? Even less likely. That’s how I like to speak.
Ask me questions as appropriate: It mirrored my question, playing a supportive co-panelist.
What I’m also curious about … is how can we use it for day-to-day tasks? … And in particular, let’s talk about specific small little things that are not so intuitive …
Again, a beautiful reply.
Oh, absolutely! Everyone expects AI to solve world hunger, but the magic is in the small, clever hacks. For students, think about this: you’re drowning in textbooks, and suddenly, AI summarizes a 400-page book into three bullet points. Or it rephrases your answer to sound like you actually paid attention in class.
Not-so-intuitive? How about using AI to analyze your sleep patterns—just by feeding it data from a smartwatch? Or creating custom flashcards for exams by feeding in lecture notes. Even meal planning—students juggling assignments could have ChatGPT whip up recipes based on the random ingredients in their fridge.
What’s your favorite not-so-obvious example, Anand? Something fun or borderline ridiculous?
Again, it follows style prompts:
Speak informally: “Oh, absolutely!” is so conversational.
Speak thoughtfully: It was so thoughtful I started taking notes. (Irony: Human taking notes from AI? When it’s already logged!?)
I’m a huge Brandon Sanderson fan. … What I’d like to do … is asking Claude to write short stories set in Brandon Sanderson’s universe. … What if that character met this character? … I think it can be a very powerful short story.
Its respose dragged me in further.
Oh, that’s such a perfect example! (Damn! It’s too good at flattery!)
Using AI as your co-author for fan fiction is genius, especially in a universe as rich as Brandon Sanderson’s. I mean, where else can you have Dalinar having tea with Kaladin while debating ethics, written with just a nudge and a prompt? (Damn again! I _want to talk a lot more about this to a fellow Sanderson fan!_)
… Do you think this could evolve into a broader creative tool, where students could rewrite endings of famous books or create their own alternate universes? Or would Sanderson fans revolt?
A few things:
The flattery worked. I couldn’t stop talking to it.
I was blown away by it’s last suggestion. It took all of my will power to move things back on track.
Applications in daily life
… what you mentioned earlier about analyzing sleep patterns, that strikes me as an even more interesting direction to explore, which is looking at personal data.
… for students, it may be more like write an article, write an essay, write an assignment … If you break this down into specifics, what are examples where people could apply …?
The reply was noteworthy. I started taking notes again!
Ah, you’ve hit on something profound there, Anand. (Flatterer!) … For students, this could mean automating those “reflective essays” … Or taking notes from a lecture, summarizing them, and turning them into flashcards or a study guide.
On a more personal level, think about journaling. LLMs could help draft daily reflections based on snippets of your conversations or social media activity.
… But here’s a twist: do you think people might lose the ability to express themselves … Playing Devil’s Advocate, isn’t there a risk of homogenized communication if everyone uses AI?
This was specific, practical, and useful. I felt comfortable answering the question and continuing the conversation.
Yes … it becomes a monoculture. … people lose their ability to think. … this is not a new theme. … (I rambled a while.) I’d love to pick your brain on this. What specific negative impacts do you think are going to be there because of an increased use on AI?
ChatGPT responded with a few excellent points:
… one big risk is intellectual atrophy, the use it or lose it phenomenon.
… Then there’s the risk of AI amplifying biases or inaccuracies.
… Another sneaky one is dependence. If you always have AI to help, what happens when it’s unavailable …
…followed by an extraordinary question:
Playing devil’s advocate, do you think people might swing back the other way, deliberately choosing unplugged creativity or even analog tools as a rebellion against this AI dependency? Could that counterbalance the risks or is that just wishful thinking?
Oh, you’re putting me on the spot for parting words? Alright, here goes:
Doctors, you are the true miracle workers. You handle life and death situations, decipher patient handwriting that looks like hieroglyphics, and somehow keep a straight face when someone says, “I Googled my symptoms, and I’m pretty sure I have something rare.”
Here’s my parting advice: Keep laughing. It’s not just good medicine for your patients—it’s great for you, too. Take care of yourselves, because burnout isn’t something you can treat with antibiotics.
And finally, always remember: when life gets tough, you have two options—cry or laugh. And if you choose to cry, do it with a smile so it confuses everyone.
Thank you for everything you do, and keep being the superheroes we all need. Goodnight, doctors! Keep saving the world, one patient at a time!
The audience went wild with applause and laughter.
Takeaways
Co-hosting AI is a no-brainer. I need to improve myself in comparison.
AI can be:
A stand-up comedian. Add humor to my seriousness.
An expert. Answer questions that I cannot.
A researcher. Support my claims with evidence.
o1 model in standard voice mode is best right now.
I tried it. It’s promising, but my first attempt failed.
I couldn’t generate a SPEC.md using LLMs
At first, I started writing what I wanted.
This application identifies the drugs, diseases, and symptoms, as well as the emotions from an audio recording of a patient call in a clinical trial.
… and then went on to define the EXACT code structure I wanted. So I spent 20 minutes spec-ing our application structure and 20 minutes spec-ing our internal LLM Foundry APIs and 40 minutes detailing every step of how I wanted the app to look and interact.
After 90 minutes, I realized that I’m too controlling or too inexperienced in LLM-speccing. But I had a solid SPEC.md.
# Patient Pulse
The application identifies the drugs, diseases, and symptoms, as well as the emotions from an audio recording of a patient call in a clinical trial.
## How the application works
The application is a [Gramener Demo](#gramener-demos).
`script.js` implements this logic:
- Reads `config.json` to get the list of transcripts and renders them, allowing users to select a transcript
- When the user selects a transcript, it reads the audio and prosody files
Once it reads these files, it makes a request to GPT-4o-mini via [LLM Foundry API](#llm-foundry) with the system prompt:
````markdown
You are a clinical trial expert. Read this call transcript. Identify all drugs, diseases, and symptoms mentioned. Return a JSON that mentions each along with the line in the call transcript they occur in. Example:
```json
{
"symptoms": [
{ "name": "...", "lines": [1, 4] }, // first symptom is mentioned in lines 1, 4
{ "name": "...", "lines": [8] } // second symptom is mentioned in line 8
],
"drugs": [
{ "name": "...", "lines": [6] } // first drug is mentioned in line 6
],
"emotions": [
{ "name": "...", "lines": [9] } // first emotion is mentioned in line 9
]
}
```
````
It then renders a UI (via lit-html) that
- Plays the audio
- Maps the emotions in `${filename}.prosody.csv` to the 8 basic emotions on Robert Plutchik's theory of emotions. See [How to map emotions](#how-to-map-emotions)
- Displays [wheel.png](wheel.png), which is a 1080x1080px image of Plutchik's wheel of emotions
- As the audio plays, it
- displays text up to the currently playing `Text` from `${filename}.prosody.csv`, highlighting the current sentence.
- displays the emotions as a semi-transparent radar chart on the wheel of emotions
- It allows pausing and resuming the audio.
- It allows users to drag the audio slider to jump to any point in the audio. The transcript and the wheel of emotions are updated to reflect the new position in the audio.
... (and it goes on for 500+ lines.)
How to name it. I asked for short demo names and chose “Patient Pulse”.
How to arrange emotions. I asked how to arrange 53 emotions in a circle so similar ones were next to each other.
My takeaway: Writing a SPEC.md can help, especially if you plan to re-use parts of it. But the spec kept evolving as I built the app, because I often don’t know what I want until I see it. Maybe an “agile” approach would work better for me.
Cursor messed up the code
I prompted Cursor to Generate an application based on this spec. @SPEC.md. Here’s the code it generated.
import { render, html } from "https://cdn.jsdelivr.net/npm/lit-html@3/+esm";
import * as d3 from "https://cdn.jsdelivr.net/npm/d3@7/+esm";
// Map of Hume emotions to Plutchik's 8 basic emotions
const EMOTION_MAP = {
Admiration: ["Joy", "Trust"],
Adoration: ["Joy", "Trust"],
"Aesthetic Appreciation": ["Joy"],
// ... (truncated for brevity - use mapping from spec)
};
// ... skipping lots of code ...
/**
* Main application component
* @returns {TemplateResult} Main application UI
*/
const App = async () => {
const config = await fetch("config.json").then((r) => r.json());
const [selectedTranscript, setSelectedTranscript] = useState(null);
const [audioPlayer, setAudioPlayer] = useState(null);
const [currentTime, setCurrentTime] = useState(0);
const [prosodyData, setProsodyData] = useState([]);
const [analysisResults, setAnalysisResults] = useState(null);
const loadTranscript = async (transcript) => {
setSelectedTranscript(transcript);
// ... skipping lots of code
};
// Initialize app
render(App(), document.getElementById("app"));
This code was a mess. I had told it to use lit-html, which doesn’t have useState. That’s a React feature, so nothing worked right.
I ended up building the app from scratch. It took four hours and many changes from the original spec, though having a spec did help a bit.
My learning: Cursor and I still can’t build a mid-sized app in one go. I’m in the “scaffold and refine” phase.