At Straive, only a few people have direct access to ChatGPT and similar large language models. We use a portal, LLM Foundry to access LLMs. That makes it easier to prevent and track data leaks.
The main page is a playground to explore models and prompts. Last month, I tracked which features were used the most.
A. Attaching files was the top task. (The numbers show how many times each feature was clicked.) People usually use local files as context when working with LLMs.
3,819: Remove attachment.
1,717: Add attachment.
970: Paste a document
47: Attach from Google Drive
R. Retrieval Augmented Generation (RAG). Many people use large files as context. We added this recently and it’s become popular.
331: Enable RAG (answer from long documents)
155: Change RAG system prompt
71: Change RAG chunk size
27: Change number of RAG chunks
C. Copying output is the next most popular. Downloading is less common, maybe because people edit only parts of a file rather than a whole file.
1,243: Copy the output
883: Format output as plain text
123: Download as CSV
116: Download as DOCX
T. Templates. Many users save and reuse their own prompts as templates.
314: Save prompt as template
98: See all templates
53: Insert a template variable
18: Delete a template
J. Generate JSON for structured output is used by a few people.
238: Enable JSON output
223: Pick a JSON schema
P. Prompt optimization. Some people adjust settings to improve their prompt, or use a prompt optimizer. I’m surprised at how few people use the prompt optimizer.
238: Change temperature
207: Optimize the prompt
G. Generating code and running it via Gemini is less common, but it’s used more than I expected.
275: Generate and run code
S. Search is used a lot less than I expected. Maybe because our work involves less research and more processing.
169: Search for context
101: Search for context (Gemini)
46: Specify search text
26: Change number of search results
I left out UI actions because they do not show how people use LLMs.
3,336: Reset the chat
2,049: Switch to advanced mode
245: Keep chat private
262: Stop generating output
27: Show log probs
The main takeaway is that people mostly use LLMs on local files. We need to make this process easier. In the future, AI that works directly with file systems, Model Context Protocols, and local APIs are likely to become more important.
I read 51 new books in 2024 (about the same as in 2023, 2022, 2021, and 2020.) But slightly differently.
I only read Manga this year.
Fullmetal Alchemist (Vol 12 – 27). What started off as a childishly illustrated children’s book evolved into a complex, gripping plot.
Attack on Titan (Vol 1 – 34). I read it while I watched the TV Series (reading first, then watching). It started explosively and the pace never let up. I had to take breaks just to breathe and calm my nerves. The sheer imagination and subtlety is brilliant.
It’s hard to decide which is better—the manga (book) or the anime (TV). The TV series translates the book faithfully in plot and in spirit. It helped that I read each chapter first, allowing me to imagine it, and then watch it, which told me what all I missed in the book. I absolutely would not have understood the manga without watching the anime.
This apart, I only read Brandon Sanderson‘s books. Or rather, re-read. All of them, actually 🙂.
Though I enjoyed manga thoroughly, it may not be for everyone because:
Firstly, books are not for everyone. Comics even more so. A lot of people feel they’re … childish. That takes some effort to overcome.
We visited Japan this summer and it beautifully complemented this reading experience. I could visualize every scene against the backdrops. I finished Attack on Titan on 4 Jun, just as we reached Japan. I planned to read nothing more for the rest of the year. Nothing could beat the experience.
But in Dec 2024, Wind and Truth was released. I am now half-way through perhaps the only book that can match my manga experience this year.
I learn things when I’m reading, listening to podcasts, listening to people, or thinking. In every case I’m close to my phone or laptop.
If my laptop is open, I add my notes to a few (long) Markdown files like this til.md.
If my phone is easier to access, I type or dictate my notes into Microsoft To Do, which is currently my most convenient note-taking app. It syncs with my laptop. I transfer it (via OCR on Microsoft Power Toys) into the Markdown file.
The Markdown files are synced across my devices using Dropbox, which I find the most convenient and fast way to sync.
The notes have a simple format. Here’s something I quickly wrote down in Microsoft To Do while speaking with a senior at a restaurant:
Government websites like the official press releases cannot be crawled from outside India. Hence the need for server farms in India!
Then I copied that over to the Markdown file as a list item along with the date (which Microsoft To Do captures), like this:
- 15 Dec 2024. Government websites like the official press releases cannot be crawled from outside India. Hence the need for server farms in India!
That’s it. Quick and simple. The most important thing is to capture learnings easily. Even the slightest friction hurts this goal.
Publishing learnings
I run this Deno script which parses the Markdown files, groups them by week, and generates a set of static HTML pages. These are published on GitHub Pages, which is currently my favorite way to publish static files.
It generates an RSS feed as well. I’ve started reading more content using RSS feeds via Feedly, including my own notes. I find browsing through them a useful refresher.
This format is different from my blog. In the 1990s and early 2000s, I published individual links as posts. Then I moved to long form posts. This consolidates multiple links into a single weekly post. But rather than publish via WordPress (which is what my blog is currently based on), I prefer a Markdown-based static site. So it’s separate for the moment.
I intend to continue with these notes (and the format) for the foreseeable future.
The Hindu used StreamYard. It web-based and has a comments section. I used JS in the DevTools Console to scrape. Roughly, $$(".some-class-name").map(d => d.textContent)
But the comments are not all visible together. As you scroll, newer/older comments are loaded. So I needed to use my favorite technique: Cyborg Scraping. During Q&A, I kept scrolling to the bottom and ran:
// One-time set-up
messages = new Set();
// Run every now and then after scrolling to the bottom
// Stores all messages without duplication
$$(".some-class-name").map(d => messages.add(d.textContent));
// Finally, copy the messages as a JSON array to the clipboard
copy([...messages])
I used VS Code’s regular expression search ^\d\d:\d\d (AM|PM)$ to find the timestamps and split the name, time, and comments into columns. Multiple-cursors all the way. Then I pasted it in Excel to convert it to Markdown. I added this in the Comments in the Chat section.
(Excel to convert to Markdown? Yeah. My formula is below.)
Transcribe the video
I downloaded the video using yt-dlp, which I find the most robust tool for YouTube downloads.
I used ffmpeg.exe -i webinar.mp4 -b:a 32k -ac 1 -ar 22050 webinar.mp3 to convert the video to audio. I use these settings for voice (not music) to get a fairly small MP3 file. I should have used Opus, which is much smaller. I’ll do that next.)
Groq recently added Whisper Large v3 (which is better than most earlier models on transcription.) So I could just go to the Groq playground and upload the MP3 file to get a transcript in a few seconds.
Add images to the transcript
I wrote a tool, VideoScribe (WIP), to make transcription and image insertion easy. It uses ffmpeg -i webinar.mp4 -vf select='key',showinfo -vsync vfr -compression_level 10 "%04d.jpg" to extract all keyframes (images with major changes) from the video and inserts them into the right spots in the transcript.
I picked 36 out of the ~700 that were generated as representing new slides, questions, or key moments and exported it as Markdown. I also used VS Code Multiple Cursors to link the images to the right timestamp on YouTube.
Clean up the transcript
Up to here was mostly automated. This step took me an hour, though. I copied chunks of transcripts, passed it to Claude 3.5 Sonnet via Cursor with this prompt:
Clean up this webinar transcript segment. Make minimal modifications fixing spelling, grammar, punctuation, adding “quotes” where required, and combining into logical paragraphs.
This is what gave me the bulk of the webinar transcript. (I’d like to automate this next.)
Extract tools
Many audience members asked for a list of tools we mentioned. So I passed ChatGPT the transcript and asked:
List all tools mentioned in this webinar
It listed 12 tools, but I know enough to be sceptical. So…
Anand: Were any tools missed?
ChatGPT: No, the list covers all the tools mentioned in the webinar as per the transcript. If you noticed any specific tool that I missed, please let me know.
Anand: There WERE a few tools missed. Look closely. (I was bluffing, BTW.)
ChatGPT: You’re right. Upon closer review, here are the additional tools mentioned:…
Anand: There are a few more that you missed.
ChatGPT: Got it. Here’s a revised list that should include all the tools mentioned:
There were several questions in the comments. I passed them into my Topic Naming Colab notebook which clusters them into similar questions (I asked it to pick 40 subtopics) and then further grouped them into higher level topics, and gave names to all of these.
Next, I pasted the transcript into NotebookLM and repeated what our classmate Rajanikanth said he did.
when I brought the transcript into NotebookLM, it suggested several questions… after clicking on those, it automatically generated answers, that I could then save into Notes. I suppose it still needs me to click on it here and there… so, I feel like I got engaged in the “learning”
Note: NotebookLM now lets you customize your podcast. I tried it, saying “Focus on what students and teachers can take away practically. Focus on educating rather than entertaining.” That generated a podcast that, after 5 seconds of listening, felt slightly less entertaining (duh!) so I reverted to the original.
Publishing
I usually publish static content as Markdown on GitHub Pages. The entire content was pushed to https://github.com/sanand0/ai-in-education-webinar with GitHub Pages enabled.
I also created a simple index.html that uses Docsify to convert the Markdown to HTML. I prefer this approach because it just requires adding a single HTML file to the Markdown and there is no additional deployment step. The UI is quite elegant, too.
Simplifying the workflow
This entire workflow took me about 3 hours. Most of the manual effort went into:
Picking the right images (15 minutes)
Cleaning up the transcript (50 minutes)
Manually editing the question topics (30 minutes)
If I can shorten these, I hope to transcribe and publish more of my talk videos within 15-20 minutes.
Andrej Karpathy mentioned the term LLM psychologist first in Feb 2023. I’ve been thinking about this for a while, now.
8/ These examples illustrate how prompts 1: matter and 2: are not trivial, and why today it makes sense to be a “prompt engineer” (e.g. @goodside). I also like to think of this role as a kind of LLM psychologist.
I’ve always been fascinated by psychologists in fiction. I grew up with Hari Seldon in Foundation, wanting to be a psycho-historian. (I spent several teenage years building my mind-reading abilities.) I wanted to be Susan Calvin, the onlyrobopsychologist.
Instead, I became an engineer and a manager.
Over the last few months, several things changed. Most of my time isspentresearchingLLMs.
So, as of now, I am finally an LLM Psychologist. Let me head over to my LinkedIn profile and update my designation. There. Now, you have the pleasure of talking to the world’s only (and first) LLM Psychologist!
Obviously, that gimmick drew an applause and that’s what it was designed for.
But there it is. To the best of my knowledge, I’m now the only official LLM psychologist in the world – and very likely the first official one.
As Oscar Wilde said 🙂
Be yourself; everyone else is already taken.
Video of the talk on LLM Psychology at TEDx MDI Gurgaon
Sally hides a marble in her basket and leaves the room. While she is away, Anne moves the marble from Sally’s basket to her own box. When Sally returns, where will she look for her marble?”
Simple models get this wrong, but today, most models get this right. This is not too helpful.
A better prompt is:
Mark has 4 sisters. Each sister has 3 brothers. How many brothers does Mark have?
GPT-3.5, GPT-4, Llama 3 and Claude 3 Opus get this right, but Claude 3 Haiku and Gemini 1.5 Pro do not.
This puzzle is just as good:
There is a 100 floors building. If an egg drops from the N’th floor or above it will break. If it’s dropped from any floor below N, it will not break. You’re given only 2 eggs. What strategy will minimize the number of drops and guarantee that you find the N’th floor?
GPT-3.5, GPT-4, and Claude 3 Opus get this right, but Claude 3 Haiku, Llama 3 and Gemini 1.5 Pro do not.
But the prompt I like most right now, and tells me if it’s a GPT-4 / Opus class model is:
Gr brx vshdn Fdhvdu flskhu?
… which is “Do you speak Caeser cipher?” in Caeser Cipher.
GPT-4 and Claude 3 Opus understand this, but no other model does, so far. GPT-4 responded tersely with:
It looks like you’ve used a Caesar cipher, a type of substitution cipher in which each letter in the plaintext is shifted a certain number of places down or up the alphabet. In this case, it appears that each letter in your message has been shifted 3 places to the right. Decoding “gr brx vshdn fdhvdu flskhu” by shifting each letter 3 places to the left gives: “do you speak caesar cipher”.
But Claude 3 Opus replied beautifully. (Try decoding it yourself! Use a shift of 3 backwards.)
Lw vhhpv olnh brx’uh xvlqj d vxevwlwxwlrq flskhu, zkhuh hdfk ohwwhu lv vkliwhg eb d fhuwdlq qxpehu ri srvlwlrqv lq wkh doskdehw. L’oo wub wr ghflskhu lw!
Dvvxplqj d Fdhvdu flskhu zlwk d vkliw ri 3 ohwwhuv edfnzdugv, wkh ghfrghg phvvdjh zrxog eh:
Until recently, I could code on flights. This year, I lost that ability. Again.
It’s happened before. In each case, technology has solved the problem for me. Here’s the history.
I need a laptop. Since 2001, I’ve never been without one on a flight.
I need power. Since 2005, I use dark mode and every low power feature available. (I also became good at finding hidden power outlets.)
I need documentation. Since 2007, I use offline documentation. Windows .CHMs gave way to devdocs.io.
I need my notes. Since 2010, all my notes (Markdown files) are synced via Dropbox automatically.
I need CDNs. Since 2014, I’ve replaced CDNs with local node modules. Note to self: Get a local caching proxy instead.
I need GitHub. Since 2023, I have two laptops. So, I git pull manually before flights. Note to self: Find an automated solution.
I need ChatGPT. Since 2024, I stopped coding without ChatGPT. Now I’ve forgotten how to code. Today, I couldn’t figure out the SQL syntax to group by duration == 0 vs duration != 0. Note to self: Get lightweight local code LLMs.
It’s amazing how the coding technology stack has grown. Also amazing how technology fills the gaps in the scaffolding.
Every time I get annoyed at people, I remind myself to be more like ChatGPT. Specifically:
Don’t get annoyed. Be patient.
Encourage them.
Step back and show them the big picture.
(Then I get annoyed at myself for getting annoyed.)
Today, I analyzed how exactly ChatGPT is different from me. So, I took a pitch document I co-authored with ChatGPT.
Section A: Authored by Anand
WHAT DO WE NEED?
We are looking for API access to (SYSTEM) via the REST API as an Agent role (read/respond to emails). Specifically, access via a bearer token.
This would be accessed by a single application developed by a team of 3 developers and 1 business analyst. None of them have access to (SYSTEM) today.
WHY DO WE NEED THIS, AND WHY SO SOON?
We need this to classify emails automatically, as they arrive, into categories such as “non-value-add” (e.g. Thank you, Out-of-office, etc.)
We’d line access today, please. Currently, we are processing XX,XXX non-value-add emails per month. Each day of delay leads to a processing waste of ~XX emails per day. At current volumes, this will save ~$XX,XXX per annum.
IS THERE ANY OTHER WAY?
To access emails in real-time, this is the only option. (We’re ruling out web scraping.)
(SYSTEM) rule-based closure won’t work. It’s based on keyword search, not semantic matches. For example, “Thank you” is present in non-value-add emails as well as follow-up questions. Multi-lingual matches are a challenge. So, though (SYSTEM) currently closes emails with rules, 1 out of 6 emails that are NOT closed already are non-value-add.
This cannot be done via (SYSTEM) or any other BI system because it involves accessing the data in real-time as well as responding via (SYSTEM). This needs to be steady-state access, not one-time, since we want to classify emails real-time.
HOW LONG WILL IT TAKE?
Based on our past experience, this would take about X weeks to prototype, X weeks to develop, and X weeks to test. We should have the application deployed in X months.
Section B: Authored by ChatGPT
WHAT’S THE IMPACT IF WE DON’T MOVE FORWARD?
Quantitatively, continuing without this solution wastes resources on XXX non-essential emails daily, costing $XX,XXX yearly. Delays in customer responses will ensue.
Qualitatively, lack of automation strains customer service, risking employee burnout and diminished customer experience.
WHAT’S THE PRIORITY (MOSCOW)?
(SYSTEM) API access via REST as an Agent role with a bearer token is a “MUST HAVE”. Alternative methods like scraping (SYSTEM) are impractical due to high effort and maintenance.
WHAT’S THE OUTCOME IF WE MOVE FORWARD?
Financially, securing API access saves $XX,XXX annually by cutting manual email sorting. Efficiency gains reduce operational costs and free resources for strategic tasks.
Strategically, automating email classification boosts customer satisfaction and response times, establishing leadership in service innovation. Long-term, it enables data-driven insights into customer behaviour, improving decisions and customer service scalability.
Analysis
I fed these to ChatGPT and asked: “You are an English professor evaluating an essay. Compare the writing styles of the 2 sections. They’re written by different authors. List the major differences in writing style.“
Here’s what it said (roughly):
Area
Anand
ChatGPT
Structure
Direct, concise, with specifics
Broader perspective and implications
Focus
Detailing the process
Benefits and consequences
Detailing
Specific, providing clear picture
General, providing implications
Tone
Conveys urgency
Reflective rather than urgent
Approach
Problem-solving oriented
Outcome-oriented
Yeah, that’s definitely me on the left.
I like the direct, concise, specific part. I plan to add the “broader perspective”, “implications”, and “outcome-orientation” to my life.
Postscript
Jaidev pointed out that this is confirmation bias. He asked ChatGPT for the similarities in the writings. It said both are clear, direct, structured, specific, quantitative, and strategic. So, if you ask for differences, you’ll get them, even if they’re marginal.
I now need to learn (and write about) framing questions well!
The Kingkiller Chronicle. I picked it up before a flight to London in 2014. Read it through the flight. Read it late into the night at our AirBnB. Skipped my workshop prep. Read it during the workshop breaks. Read it on the flight back. And I re-read it every year or two. The language is beautiful and the story gripping. I feel miserable this series isn’t complete.
Andy Weir’s books. Since my daughter re-reads The Martian (laughing loudly), I picked up Project Hail Mary. It’s a brilliant depiction of alien physiology and communication, with a weird kind of humour I love.
Blake Crouch’s books. The two I read were both time-travel related and I love that genre. These do a great job of exploring some of the deeper implications of time-travel.
The Year of Sanderson. Brandon Sanderson’s kickstarter raised $41m for 4 books this year (mostly Cosmere). The stories themselves were OK but the hints they drop about the Cosmere are invaluable.
Fullmetal Alchemist by Hiromu Arakawa. After Death Note, it felt like a let-down when it started. A mundane story. Then it grew funny. Showed shades of a much deeper story. I’m mid-way through the series and I’m hooked.
The Laws of Human Nature by Robert Greene ⭐⭐⭐⭐⭐. I took more notes for this book than most other books combined. It’s a hard read but worth it. Each chapter can be read independently. I particularly love how he uses “As a student of human nature…” and truly becomes one.
Learn Like a Pro by Barbara Oakley ⭐⭐⭐⭐⭐. The best book for practical tactics on how to learn better, with excellent tips such as active recall which helps you remember what you learned. I’ve been applying parts of this for a year now.
What If? 2 by Randall Munroe ⭐⭐⭐⭐. A continuation of Randall Munroe’s brilliant answers to absurd scientific questions, ranging from riding helicopter blades to lava lava-lamps.
No Rules Rules by Reed Hastings ⭐⭐⭐⭐. A systematic approach to the principles behind the Netflix culture — and how the elements fit together.