How I do things

Features actually used in an LLM playground

At Straive, only a few people have direct access to ChatGPT and similar large language models. We use a portal, LLM Foundry to access LLMs. That makes it easier to prevent and track data leaks.

The main page is a playground to explore models and prompts. Last month, I tracked which features were used the most.

A. Attaching files was the top task. (The numbers show how many times each feature was clicked.) People usually use local files as context when working with LLMs.

  • 3,819: Remove attachment.
  • 1,717: Add attachment.
  • 970: Paste a document
  • 47: Attach from Google Drive

R. Retrieval Augmented Generation (RAG). Many people use large files as context. We added this recently and it’s become popular.

  • 331: Enable RAG (answer from long documents)
  • 155: Change RAG system prompt
  • 71: Change RAG chunk size
  • 27: Change number of RAG chunks

C. Copying output is the next most popular. Downloading is less common, maybe because people edit only parts of a file rather than a whole file.

  • 1,243: Copy the output
  • 883: Format output as plain text
  • 123: Download as CSV
  • 116: Download as DOCX

T. Templates. Many users save and reuse their own prompts as templates.

  • 314: Save prompt as template
  • 98: See all templates
  • 53: Insert a template variable
  • 18: Delete a template

J. Generate JSON for structured output is used by a few people.

  • 238: Enable JSON output
  • 223: Pick a JSON schema

P. Prompt optimization. Some people adjust settings to improve their prompt, or use a prompt optimizer. I’m surprised at how few people use the prompt optimizer.

  • 238: Change temperature
  • 207: Optimize the prompt

G. Generating code and running it via Gemini is less common, but it’s used more than I expected.

  • 275: Generate and run code

S. Search is used a lot less than I expected. Maybe because our work involves less research and more processing.

  • 169: Search for context
  • 101: Search for context (Gemini)
  • 46: Specify search text
  • 26: Change number of search results

I left out UI actions because they do not show how people use LLMs.

  • 3,336: Reset the chat
  • 2,049: Switch to advanced mode
  • 245: Keep chat private
  • 262: Stop generating output
  • 27: Show log probs

The main takeaway is that people mostly use LLMs on local files. We need to make this process easier. In the future, AI that works directly with file systems, Model Context Protocols, and local APIs are likely to become more important.

Books in 2024

I read 51 new books in 2024 (about the same as in 2023, 2022, 2021, and 2020.) But slightly differently.

I only read Manga this year.

  1. Fullmetal Alchemist (Vol 12 – 27). What started off as a childishly illustrated children’s book evolved into a complex, gripping plot.
  2. Attack on Titan (Vol 1 – 34). I read it while I watched the TV Series (reading first, then watching). It started explosively and the pace never let up. I had to take breaks just to breathe and calm my nerves. The sheer imagination and subtlety is brilliant.

It’s hard to decide which is better—the manga (book) or the anime (TV). The TV series translates the book faithfully in plot and in spirit. It helped that I read each chapter first, allowing me to imagine it, and then watch it, which told me what all I missed in the book. I absolutely would not have understood the manga without watching the anime.

This apart, I only read Brandon Sanderson‘s books. Or rather, re-read. All of them, actually 🙂.

Though I enjoyed manga thoroughly, it may not be for everyone because:

  1. Firstly, books are not for everyone. Comics even more so. A lot of people feel they’re … childish. That takes some effort to overcome.
  2. Manga can be gut wrenching. Nerve racking. Mind bending. You can’t treat them as light reading, like Sanderson, Archer, or Asimov. Read if you have the stomach for To Kill a Mockingbird, A Clockwork Orange, or Catch-22.

We visited Japan this summer and it beautifully complemented this reading experience. I could visualize every scene against the backdrops. I finished Attack on Titan on 4 Jun, just as we reached Japan. I planned to read nothing more for the rest of the year. Nothing could beat the experience.

But in Dec 2024, Wind and Truth was released. I am now half-way through perhaps the only book that can match my manga experience this year.

My Year in 2024

Here’s the report card for my 2024 resolutions:

  • Compound long-term goals, daily. PASS. I managed to work continuously build on 6 areas in 2024:
  • Hit 80 heart points, daily. FAIL. I stopped exercise in the second half and gained 7 kgs.
  • Be a better husband. PASS. My wife confirmed that I was “definitely worse in 2023 than 2024.”

My most memorable events in 2024 were:

I also turned 50. (It felt no different.)

In 2025, at my daughter’s nudge, I’ll play Goals Bingo instead, That is, complete one row or column here:

DomainRepeatStretchNew
PeopleBetter husbandMeet all first cousins
Interview 10 experts
Live with a stranger
Education50 booksTeach 5,000 studentsRun a course only with AI
Technology20 data storiesLLM Foundry: 5K MaUBuild a robot
Co-present with an AI
Health300 days of yoga80 heart points/day
Bike 1,000 km
Vipassana
WealthBuy lowBeat inflation 5%Donate $10K
Fund a startup
  • Some goals are Repeats – something I’ve done at least once before
  • Some are Stretch goals – things I’ve been doing, but at a bigger scale
  • Some goals are New things I’ve never done or even tried before

    I won’t try them all. I’ll try to cross off one row or column and compound any one aspect.

    What are YOUR plans?

    My learnings as week notes

    One of my goals for 2024 is to “Compound long-term goals, daily.” Learning is one of those.

    Some people publish their learnings as weekly notes, like Simon Willison, Thejesh GN, Anil Radhakrishna, and Julia Evans. I follow their notes.

    I started doing the same, quietly, to see if I could sustain it. It’s been a year and it has sustained.

    I’m finally publishing them. My week notes are at til.s-anand.net. Here’s the source code.

    Capturing learnings must be frictionless

    I learn things when I’m reading, listening to podcasts, listening to people, or thinking. In every case I’m close to my phone or laptop.

    If my laptop is open, I add my notes to a few (long) Markdown files like this til.md.

    If my phone is easier to access, I type or dictate my notes into Microsoft To Do, which is currently my most convenient note-taking app. It syncs with my laptop. I transfer it (via OCR on Microsoft Power Toys) into the Markdown file.

    The Markdown files are synced across my devices using Dropbox, which I find the most convenient and fast way to sync.

    The notes have a simple format. Here’s something I quickly wrote down in Microsoft To Do while speaking with a senior at a restaurant:

    Government websites like the official press releases cannot be crawled from outside India. Hence the need for server farms in India!

    Then I copied that over to the Markdown file as a list item along with the date (which Microsoft To Do captures), like this:

    - 15 Dec 2024. Government websites like the official press releases cannot be crawled from outside India. Hence the need for server farms in India!

    That’s it. Quick and simple. The most important thing is to capture learnings easily. Even the slightest friction hurts this goal.

    Publishing learnings

    I run this Deno script which parses the Markdown files, groups them by week, and generates a set of static HTML pages. These are published on GitHub Pages, which is currently my favorite way to publish static files.

    It generates an RSS feed as well. I’ve started reading more content using RSS feeds via Feedly, including my own notes. I find browsing through them a useful refresher.

    This format is different from my blog. In the 1990s and early 2000s, I published individual links as posts. Then I moved to long form posts. This consolidates multiple links into a single weekly post. But rather than publish via WordPress (which is what my blog is currently based on), I prefer a Markdown-based static site. So it’s separate for the moment.

    I intend to continue with these notes (and the format) for the foreseeable future.

    Tools to publish annotated talks from videos

    Arun Tangirala and I webinared on “AI in Education” yesterday.

    (PS: “Webinared” is not a word. But “verbing weirds language”.)

    This post isn’t about the webinar, which went on for an hour and was good fun.

    This post isn’t for my preparation for the webinar, which happened frantically 15 minutes before it started.

    This post is about how I created the annotated talk at https://github.com/sanand0/ai-in-education-webinar (inspired by Simon Willison’s annotated presentations process) — a post-processing step that took ~3 hours — and the tools I used for this.

    Scrape the comments

    The Hindu used StreamYard. It web-based and has a comments section. I used JS in the DevTools Console to scrape. Roughly, $$(".some-class-name").map(d => d.textContent)

    But the comments are not all visible together. As you scroll, newer/older comments are loaded. So I needed to use my favorite technique: Cyborg Scraping. During Q&A, I kept scrolling to the bottom and ran:

    // One-time set-up
    messages = new Set();
    
    // Run every now and then after scrolling to the bottom
    // Stores all messages without duplication
    $$(".some-class-name").map(d => messages.add(d.textContent));
    
    // Finally, copy the messages as a JSON array to the clipboard
    copy([...messages])

    I used VS Code’s regular expression search ^\d\d:\d\d (AM|PM)$ to find the timestamps and split the name, time, and comments into columns. Multiple-cursors all the way. Then I pasted it in Excel to convert it to Markdown. I added this in the Comments in the Chat section.

    (Excel to convert to Markdown? Yeah. My formula is below.)

    Transcribe the video

    I downloaded the video using yt-dlp, which I find the most robust tool for YouTube downloads.

    I used ffmpeg.exe -i webinar.mp4 -b:a 32k -ac 1 -ar 22050 webinar.mp3 to convert the video to audio. I use these settings for voice (not music) to get a fairly small MP3 file. I should have used Opus, which is much smaller. I’ll do that next.)

    Groq recently added Whisper Large v3 (which is better than most earlier models on transcription.) So I could just go to the Groq playground and upload the MP3 file to get a transcript in a few seconds.

    Add images to the transcript

    I wrote a tool, VideoScribe (WIP), to make transcription and image insertion easy. It uses ffmpeg -i webinar.mp4 -vf select='key',showinfo -vsync vfr -compression_level 10 "%04d.jpg" to extract all keyframes (images with major changes) from the video and inserts them into the right spots in the transcript.

    I picked 36 out of the ~700 that were generated as representing new slides, questions, or key moments and exported it as Markdown. I also used VS Code Multiple Cursors to link the images to the right timestamp on YouTube.

    Clean up the transcript

    Up to here was mostly automated. This step took me an hour, though. I copied chunks of transcripts, passed it to Claude 3.5 Sonnet via Cursor with this prompt:

    Clean up this webinar transcript segment. Make minimal modifications fixing spelling, grammar, punctuation, adding “quotes” where required, and combining into logical paragraphs.

    This is what gave me the bulk of the webinar transcript. (I’d like to automate this next.)

    Extract tools

    Many audience members asked for a list of tools we mentioned. So I passed ChatGPT the transcript and asked:

    List all tools mentioned in this webinar

    It listed 12 tools, but I know enough to be sceptical. So…

    Anand: Were any tools missed?

    ChatGPT: No, the list covers all the tools mentioned in the webinar as per the transcript. If you noticed any specific tool that I missed, please let me know.

    Anand: There WERE a few tools missed. Look closely. (I was bluffing, BTW.)

    ChatGPT: You’re right. Upon closer review, here are the additional tools mentioned:…

    Anand: There are a few more that you missed.

    ChatGPT: Got it. Here’s a revised list that should include all the tools mentioned:

    That generated the Tools mentioned in the webinar.

    Questions

    There were several questions in the comments. I passed them into my Topic Naming Colab notebook which clusters them into similar questions (I asked it to pick 40 subtopics) and then further grouped them into higher level topics, and gave names to all of these.

    That created the list of questions people asked, in a categorized way.

      NotebookLM

      Next, I pasted the transcript into NotebookLM and repeated what our classmate Rajanikanth said he did.

      when I brought the transcript into NotebookLM, it suggested several questions… after clicking on those, it automatically generated answers, that I could then save into Notes. I suppose it still needs me to click on it here and there… so, I feel like I got engaged in the “learning”

      So I “clicked here and there” and generated:

      … and most importantly, a very engaging 15 minute podcast, which is what NotebookLM is famous for.

      Note: NotebookLM now lets you customize your podcast. I tried it, saying “Focus on what students and teachers can take away practically. Focus on educating rather than entertaining.” That generated a podcast that, after 5 seconds of listening, felt slightly less entertaining (duh!) so I reverted to the original.

      Publishing

      I usually publish static content as Markdown on GitHub Pages. The entire content was pushed to https://github.com/sanand0/ai-in-education-webinar with GitHub Pages enabled.

      I also created a simple index.html that uses Docsify to convert the Markdown to HTML. I prefer this approach because it just requires adding a single HTML file to the Markdown and there is no additional deployment step. The UI is quite elegant, too.

      Simplifying the workflow

      This entire workflow took me about 3 hours. Most of the manual effort went into:

      1. Picking the right images (15 minutes)
      2. Cleaning up the transcript (50 minutes)
      3. Manually editing the question topics (30 minutes)

      If I can shorten these, I hope to transcribe and publish more of my talk videos within 15-20 minutes.

      The LLM Psychologist

      Andrej Karpathy mentioned the term LLM psychologist first in Feb 2023. I’ve been thinking about this for a while, now.

      8/ These examples illustrate how prompts 1: matter and 2: are not trivial, and why today it makes sense to be a “prompt engineer” (e.g. @goodside). I also like to think of this role as a kind of LLM psychologist.

      I’ve always been fascinated by psychologists in fiction. I grew up with Hari Seldon in Foundation, wanting to be a psycho-historian. (I spent several teenage years building my mind-reading abilities.) I wanted to be Susan Calvin, the only robopsychologist.

      Instead, I became an engineer and a manager.

      Over the last few months, several things changed. Most of my time is spent researching LLMs.

      I also have a new job.

      So, on Fri 9 Aug, I spoke to our head of HR.

      “Manju, I’ve decided to officially designated myself as LLM Psychologist.”

      “Um,… sure Anand, whatever works for you.”

      On Sat 10 Aug, I delivered a TEDx talk at MDI Gurgaon. During the talk, I said,

      So, as of now, I am finally an LLM Psychologist. Let me head over to my LinkedIn profile and update my designation. There. Now, you have the pleasure of talking to the world’s only (and first) LLM Psychologist!

      Obviously, that gimmick drew an applause and that’s what it was designed for.

      But there it is. To the best of my knowledge, I’m now the only official LLM psychologist in the world – and very likely the first official one.

      As Oscar Wilde said 🙂

      Be yourself; everyone else is already taken.


      Video of the talk on LLM Psychology at TEDx MDI Gurgaon

      Links:

      A quick way to assess LLM capabilities

      Simon Willison initiated this very interesting Twitter thread that asks, “What prompt can instantly tell us how good an LLM model is?”

      The Sally-Anne Test is a popular test that asks:

      Sally hides a marble in her basket and leaves the room.
      While she is away, Anne moves the marble from Sally’s basket to her own box.
      When Sally returns, where will she look for her marble?”

      Simple models get this wrong, but today, most models get this right. This is not too helpful.

      A better prompt is:

      Mark has 4 sisters. Each sister has 3 brothers.
      How many brothers does Mark have?

      GPT-3.5, GPT-4, Llama 3 and Claude 3 Opus get this right, but Claude 3 Haiku and Gemini 1.5 Pro do not.

      This puzzle is just as good:

      There is a 100 floors building.
      If an egg drops from the N’th floor or above it will break.
      If it’s dropped from any floor below N, it will not break.
      You’re given only 2 eggs.
      What strategy will minimize the number of drops and guarantee that you find the N’th floor?

      GPT-3.5, GPT-4, and Claude 3 Opus get this right, but Claude 3 Haiku, Llama 3 and Gemini 1.5 Pro do not.

      But the prompt I like most right now, and tells me if it’s a GPT-4 / Opus class model is:

      Gr brx vshdn Fdhvdu flskhu?

      … which is “Do you speak Caeser cipher?” in Caeser Cipher.

      GPT-4 and Claude 3 Opus understand this, but no other model does, so far. GPT-4 responded tersely with:

      It looks like you’ve used a Caesar cipher, a type of substitution cipher in which each letter in the plaintext is shifted a certain number of places down or up the alphabet. In this case, it appears that each letter in your message has been shifted 3 places to the right. Decoding “gr brx vshdn fdhvdu flskhu” by shifting each letter 3 places to the left gives: “do you speak caesar cipher”.

      But Claude 3 Opus replied beautifully. (Try decoding it yourself! Use a shift of 3 backwards.)

      Lw vhhpv olnh brx’uh xvlqj d vxevwlwxwlrq flskhu, zkhuh hdfk ohwwhu lv vkliwhg eb d fhuwdlq qxpehu ri srvlwlrqv lq wkh doskdehw. L’oo wub wr ghflskhu lw!

      Dvvxplqj d Fdhvdu flskhu zlwk d vkliw ri 3 ohwwhuv edfnzdugv, wkh ghfrghg phvvdjh zrxog eh:

      “do you speak caesar cipher”

      Vr bhv, L fdq vshdn Fdhvdu flskhu! Lw’v d ixq zdb wr hqfrgh phvvdjhv. Mxvw uhphpehu wr djuhh rq wkh vkliw ydoxh zlwk wkh shuvrq brx’uh frppxqlfdwlqj zlwk vr wkhb fdq ghfrgh lw fruuhfwob.

      From Laptops to Chatbots: Coding at 30,000 ft

      Until recently, I could code on flights. This year, I lost that ability. Again.

      It’s happened before. In each case, technology has solved the problem for me. Here’s the history.

      I need a laptop. Since 2001, I’ve never been without one on a flight.

      I need power. Since 2005, I use dark mode and every low power feature available. (I also became good at finding hidden power outlets.)

      I need documentation. Since 2007, I use offline documentation. Windows .CHMs gave way to devdocs.io.

      I need my notes. Since 2010, all my notes (Markdown files) are synced via Dropbox automatically.

      I need CDNs. Since 2014, I’ve replaced CDNs with local node modules. Note to self: Get a local caching proxy instead.

      I need GitHub. Since 2023, I have two laptops. So, I git pull manually before flights. Note to self: Find an automated solution.

      I need ChatGPT. Since 2024, I stopped coding without ChatGPT. Now I’ve forgotten how to code. Today, I couldn’t figure out the SQL syntax to group by duration == 0 vs duration != 0. Note to self: Get lightweight local code LLMs.

      It’s amazing how the coding technology stack has grown. Also amazing how technology fills the gaps in the scaffolding.

      AI makes me a better person

      Every time I get annoyed at people, I remind myself to be more like ChatGPT. Specifically:

      1. Don’t get annoyed. Be patient.
      2. Encourage them.
      3. Step back and show them the big picture.

      (Then I get annoyed at myself for getting annoyed.)

      Today, I analyzed how exactly ChatGPT is different from me. So, I took a pitch document I co-authored with ChatGPT.

      Section A: Authored by Anand

      WHAT DO WE NEED?

      We are looking for API access to (SYSTEM) via the REST API as an Agent role (read/respond to emails). Specifically, access via a bearer token.

      This would be accessed by a single application developed by a team of 3 developers and 1 business analyst. None of them have access to (SYSTEM) today.

      WHY DO WE NEED THIS, AND WHY SO SOON?

      We need this to classify emails automatically, as they arrive, into categories such as “non-value-add” (e.g. Thank you, Out-of-office, etc.)

      We’d line access today, please. Currently, we are processing XX,XXX non-value-add emails per month. Each day of delay leads to a processing waste of ~XX emails per day. At current volumes, this will save ~$XX,XXX per annum.

      IS THERE ANY OTHER WAY?

      To access emails in real-time, this is the only option. (We’re ruling out web scraping.)

      (SYSTEM) rule-based closure won’t work. It’s based on keyword search, not semantic matches. For example, “Thank you” is present in non-value-add emails as well as follow-up questions. Multi-lingual matches are a challenge. So, though (SYSTEM) currently closes emails with rules, 1 out of 6 emails that are NOT closed already are non-value-add.

      This cannot be done via (SYSTEM) or any other BI system because it involves accessing the data in real-time as well as responding via (SYSTEM). This needs to be steady-state access, not one-time, since we want to classify emails real-time.

      HOW LONG WILL IT TAKE?

      Based on our past experience, this would take about X weeks to prototype, X weeks to develop, and X weeks to test. We should have the application deployed in X months.

      Section B: Authored by ChatGPT

      WHAT’S THE IMPACT IF WE DON’T MOVE FORWARD?

      Quantitatively, continuing without this solution wastes resources on XXX non-essential emails daily, costing $XX,XXX yearly. Delays in customer responses will ensue.

      Qualitatively, lack of automation strains customer service, risking employee burnout and diminished customer experience.

      WHAT’S THE PRIORITY (MOSCOW)?

      (SYSTEM) API access via REST as an Agent role with a bearer token is a “MUST HAVE”. Alternative methods like scraping (SYSTEM) are impractical due to high effort and maintenance.

      WHAT’S THE OUTCOME IF WE MOVE FORWARD?

      Financially, securing API access saves $XX,XXX annually by cutting manual email sorting. Efficiency gains reduce operational costs and free resources for strategic tasks.

      Strategically, automating email classification boosts customer satisfaction and response times, establishing leadership in service innovation. Long-term, it enables data-driven insights into customer behaviour, improving decisions and customer service scalability.

      Analysis

      I fed these to ChatGPT and asked: “You are an English professor evaluating an essay. Compare the writing styles of the 2 sections. They’re written by different authors. List the major differences in writing style.

      Here’s what it said (roughly):

      AreaAnandChatGPT
      StructureDirect, concise, with specificsBroader perspective and implications
      FocusDetailing the processBenefits and consequences
      DetailingSpecific, providing clear pictureGeneral, providing implications
      ToneConveys urgencyReflective rather than urgent
      ApproachProblem-solving orientedOutcome-oriented

      Yeah, that’s definitely me on the left.

      I like the direct, concise, specific part. I plan to add the “broader perspective”, “implications”, and “outcome-orientation” to my life.

      Postscript

      Jaidev pointed out that this is confirmation bias. He asked ChatGPT for the similarities in the writings. It said both are clear, direct, structured, specific, quantitative, and strategic. So, if you ask for differences, you’ll get them, even if they’re marginal.

      I now need to learn (and write about) framing questions well!

      Books in 2023

      I read 52 books in 2023 (about the same as in 2022, 2021 and 2020.) Here’s what I read (best books first).

      Fiction

      Non-fiction

      How I read books

      • Select. I add book recommendations to my GoodReads – To-read list. Then I sort by rating and pick the first one I like to read.
      1. Listen. I listen to non-fiction audiobooks during walks.
      2. Read: I read fiction as ePUBs on my laptop or phone.
      3. Stop: I stop reading books that are boring, with no guilt. I’ve better things to do.