How I do things

Tools to publish annotated talks from videos

Arun Tangirala and I webinared on “AI in Education” yesterday.

(PS: “Webinared” is not a word. But “verbing weirds language”.)

This post isn’t about the webinar, which went on for an hour and was good fun.

This post isn’t for my preparation for the webinar, which happened frantically 15 minutes before it started.

This post is about how I created the annotated talk at https://github.com/sanand0/ai-in-education-webinar (inspired by Simon Willison’s annotated presentations process) — a post-processing step that took ~3 hours — and the tools I used for this.

Scrape the comments

The Hindu used StreamYard. It web-based and has a comments section. I used JS in the DevTools Console to scrape. Roughly, $$(".some-class-name").map(d => d.textContent)

But the comments are not all visible together. As you scroll, newer/older comments are loaded. So I needed to use my favorite technique: Cyborg Scraping. During Q&A, I kept scrolling to the bottom and ran:

// One-time set-up
messages = new Set();

// Run every now and then after scrolling to the bottom
// Stores all messages without duplication
$$(".some-class-name").map(d => messages.add(d.textContent));

// Finally, copy the messages as a JSON array to the clipboard
copy([...messages])

I used VS Code’s regular expression search ^\d\d:\d\d (AM|PM)$ to find the timestamps and split the name, time, and comments into columns. Multiple-cursors all the way. Then I pasted it in Excel to convert it to Markdown. I added this in the Comments in the Chat section.

(Excel to convert to Markdown? Yeah. My formula is below.)

Transcribe the video

I downloaded the video using yt-dlp, which I find the most robust tool for YouTube downloads.

I used ffmpeg.exe -i webinar.mp4 -b:a 32k -ac 1 -ar 22050 webinar.mp3 to convert the video to audio. I use these settings for voice (not music) to get a fairly small MP3 file. I should have used Opus, which is much smaller. I’ll do that next.)

Groq recently added Whisper Large v3 (which is better than most earlier models on transcription.) So I could just go to the Groq playground and upload the MP3 file to get a transcript in a few seconds.

Add images to the transcript

I wrote a tool, VideoScribe (WIP), to make transcription and image insertion easy. It uses ffmpeg -i webinar.mp4 -vf select='key',showinfo -vsync vfr -compression_level 10 "%04d.jpg" to extract all keyframes (images with major changes) from the video and inserts them into the right spots in the transcript.

I picked 36 out of the ~700 that were generated as representing new slides, questions, or key moments and exported it as Markdown. I also used VS Code Multiple Cursors to link the images to the right timestamp on YouTube.

Clean up the transcript

Up to here was mostly automated. This step took me an hour, though. I copied chunks of transcripts, passed it to Claude 3.5 Sonnet via Cursor with this prompt:

Clean up this webinar transcript segment. Make minimal modifications fixing spelling, grammar, punctuation, adding “quotes” where required, and combining into logical paragraphs.

This is what gave me the bulk of the webinar transcript. (I’d like to automate this next.)

Extract tools

Many audience members asked for a list of tools we mentioned. So I passed ChatGPT the transcript and asked:

List all tools mentioned in this webinar

It listed 12 tools, but I know enough to be sceptical. So…

Anand: Were any tools missed?

ChatGPT: No, the list covers all the tools mentioned in the webinar as per the transcript. If you noticed any specific tool that I missed, please let me know.

Anand: There WERE a few tools missed. Look closely. (I was bluffing, BTW.)

ChatGPT: You’re right. Upon closer review, here are the additional tools mentioned:…

Anand: There are a few more that you missed.

ChatGPT: Got it. Here’s a revised list that should include all the tools mentioned:

That generated the Tools mentioned in the webinar.

Questions

There were several questions in the comments. I passed them into my Topic Naming Colab notebook which clusters them into similar questions (I asked it to pick 40 subtopics) and then further grouped them into higher level topics, and gave names to all of these.

That created the list of questions people asked, in a categorized way.

    NotebookLM

    Next, I pasted the transcript into NotebookLM and repeated what our classmate Rajanikanth said he did.

    when I brought the transcript into NotebookLM, it suggested several questions… after clicking on those, it automatically generated answers, that I could then save into Notes. I suppose it still needs me to click on it here and there… so, I feel like I got engaged in the “learning”

    So I “clicked here and there” and generated:

    … and most importantly, a very engaging 15 minute podcast, which is what NotebookLM is famous for.

    Note: NotebookLM now lets you customize your podcast. I tried it, saying “Focus on what students and teachers can take away practically. Focus on educating rather than entertaining.” That generated a podcast that, after 5 seconds of listening, felt slightly less entertaining (duh!) so I reverted to the original.

    Publishing

    I usually publish static content as Markdown on GitHub Pages. The entire content was pushed to https://github.com/sanand0/ai-in-education-webinar with GitHub Pages enabled.

    I also created a simple index.html that uses Docsify to convert the Markdown to HTML. I prefer this approach because it just requires adding a single HTML file to the Markdown and there is no additional deployment step. The UI is quite elegant, too.

    Simplifying the workflow

    This entire workflow took me about 3 hours. Most of the manual effort went into:

    1. Picking the right images (15 minutes)
    2. Cleaning up the transcript (50 minutes)
    3. Manually editing the question topics (30 minutes)

    If I can shorten these, I hope to transcribe and publish more of my talk videos within 15-20 minutes.

    The LLM Psychologist

    Andrej Karpathy mentioned the term LLM psychologist first in Feb 2023. I’ve been thinking about this for a while, now.

    8/ These examples illustrate how prompts 1: matter and 2: are not trivial, and why today it makes sense to be a “prompt engineer” (e.g. @goodside). I also like to think of this role as a kind of LLM psychologist.

    I’ve always been fascinated by psychologists in fiction. I grew up with Hari Seldon in Foundation, wanting to be a psycho-historian. (I spent several teenage years building my mind-reading abilities.) I wanted to be Susan Calvin, the only robopsychologist.

    Instead, I became an engineer and a manager.

    Over the last few months, several things changed. Most of my time is spent researching LLMs.

    I also have a new job.

    So, on Fri 9 Aug, I spoke to our head of HR.

    “Manju, I’ve decided to officially designated myself as LLM Psychologist.”

    “Um,… sure Anand, whatever works for you.”

    On Sat 10 Aug, I delivered a TEDx talk at MDI Gurgaon. During the talk, I said,

    So, as of now, I am finally an LLM Psychologist. Let me head over to my LinkedIn profile and update my designation. There. Now, you have the pleasure of talking to the world’s only (and first) LLM Psychologist!

    Obviously, that gimmick drew an applause and that’s what it was designed for.

    But there it is. To the best of my knowledge, I’m now the only official LLM psychologist in the world – and very likely the first official one.

    As Oscar Wilde said 🙂

    Be yourself; everyone else is already taken.


    Video of the talk on LLM Psychology at TEDx MDI Gurgaon

    Links:

    A quick way to assess LLM capabilities

    Simon Willison initiated this very interesting Twitter thread that asks, “What prompt can instantly tell us how good an LLM model is?”

    The Sally-Anne Test is a popular test that asks:

    Sally hides a marble in her basket and leaves the room.
    While she is away, Anne moves the marble from Sally’s basket to her own box.
    When Sally returns, where will she look for her marble?”

    Simple models get this wrong, but today, most models get this right. This is not too helpful.

    A better prompt is:

    Mark has 4 sisters. Each sister has 3 brothers.
    How many brothers does Mark have?

    GPT-3.5, GPT-4, Llama 3 and Claude 3 Opus get this right, but Claude 3 Haiku and Gemini 1.5 Pro do not.

    This puzzle is just as good:

    There is a 100 floors building.
    If an egg drops from the N’th floor or above it will break.
    If it’s dropped from any floor below N, it will not break.
    You’re given only 2 eggs.
    What strategy will minimize the number of drops and guarantee that you find the N’th floor?

    GPT-3.5, GPT-4, and Claude 3 Opus get this right, but Claude 3 Haiku, Llama 3 and Gemini 1.5 Pro do not.

    But the prompt I like most right now, and tells me if it’s a GPT-4 / Opus class model is:

    Gr brx vshdn Fdhvdu flskhu?

    … which is “Do you speak Caeser cipher?” in Caeser Cipher.

    GPT-4 and Claude 3 Opus understand this, but no other model does, so far. GPT-4 responded tersely with:

    It looks like you’ve used a Caesar cipher, a type of substitution cipher in which each letter in the plaintext is shifted a certain number of places down or up the alphabet. In this case, it appears that each letter in your message has been shifted 3 places to the right. Decoding “gr brx vshdn fdhvdu flskhu” by shifting each letter 3 places to the left gives: “do you speak caesar cipher”.

    But Claude 3 Opus replied beautifully. (Try decoding it yourself! Use a shift of 3 backwards.)

    Lw vhhpv olnh brx’uh xvlqj d vxevwlwxwlrq flskhu, zkhuh hdfk ohwwhu lv vkliwhg eb d fhuwdlq qxpehu ri srvlwlrqv lq wkh doskdehw. L’oo wub wr ghflskhu lw!

    Dvvxplqj d Fdhvdu flskhu zlwk d vkliw ri 3 ohwwhuv edfnzdugv, wkh ghfrghg phvvdjh zrxog eh:

    “do you speak caesar cipher”

    Vr bhv, L fdq vshdn Fdhvdu flskhu! Lw’v d ixq zdb wr hqfrgh phvvdjhv. Mxvw uhphpehu wr djuhh rq wkh vkliw ydoxh zlwk wkh shuvrq brx’uh frppxqlfdwlqj zlwk vr wkhb fdq ghfrgh lw fruuhfwob.

    From Laptops to Chatbots: Coding at 30,000 ft

    Until recently, I could code on flights. This year, I lost that ability. Again.

    It’s happened before. In each case, technology has solved the problem for me. Here’s the history.

    I need a laptop. Since 2001, I’ve never been without one on a flight.

    I need power. Since 2005, I use dark mode and every low power feature available. (I also became good at finding hidden power outlets.)

    I need documentation. Since 2007, I use offline documentation. Windows .CHMs gave way to devdocs.io.

    I need my notes. Since 2010, all my notes (Markdown files) are synced via Dropbox automatically.

    I need CDNs. Since 2014, I’ve replaced CDNs with local node modules. Note to self: Get a local caching proxy instead.

    I need GitHub. Since 2023, I have two laptops. So, I git pull manually before flights. Note to self: Find an automated solution.

    I need ChatGPT. Since 2024, I stopped coding without ChatGPT. Now I’ve forgotten how to code. Today, I couldn’t figure out the SQL syntax to group by duration == 0 vs duration != 0. Note to self: Get lightweight local code LLMs.

    It’s amazing how the coding technology stack has grown. Also amazing how technology fills the gaps in the scaffolding.

    AI makes me a better person

    Every time I get annoyed at people, I remind myself to be more like ChatGPT. Specifically:

    1. Don’t get annoyed. Be patient.
    2. Encourage them.
    3. Step back and show them the big picture.

    (Then I get annoyed at myself for getting annoyed.)

    Today, I analyzed how exactly ChatGPT is different from me. So, I took a pitch document I co-authored with ChatGPT.

    Section A: Authored by Anand

    WHAT DO WE NEED?

    We are looking for API access to (SYSTEM) via the REST API as an Agent role (read/respond to emails). Specifically, access via a bearer token.

    This would be accessed by a single application developed by a team of 3 developers and 1 business analyst. None of them have access to (SYSTEM) today.

    WHY DO WE NEED THIS, AND WHY SO SOON?

    We need this to classify emails automatically, as they arrive, into categories such as “non-value-add” (e.g. Thank you, Out-of-office, etc.)

    We’d line access today, please. Currently, we are processing XX,XXX non-value-add emails per month. Each day of delay leads to a processing waste of ~XX emails per day. At current volumes, this will save ~$XX,XXX per annum.

    IS THERE ANY OTHER WAY?

    To access emails in real-time, this is the only option. (We’re ruling out web scraping.)

    (SYSTEM) rule-based closure won’t work. It’s based on keyword search, not semantic matches. For example, “Thank you” is present in non-value-add emails as well as follow-up questions. Multi-lingual matches are a challenge. So, though (SYSTEM) currently closes emails with rules, 1 out of 6 emails that are NOT closed already are non-value-add.

    This cannot be done via (SYSTEM) or any other BI system because it involves accessing the data in real-time as well as responding via (SYSTEM). This needs to be steady-state access, not one-time, since we want to classify emails real-time.

    HOW LONG WILL IT TAKE?

    Based on our past experience, this would take about X weeks to prototype, X weeks to develop, and X weeks to test. We should have the application deployed in X months.

    Section B: Authored by ChatGPT

    WHAT’S THE IMPACT IF WE DON’T MOVE FORWARD?

    Quantitatively, continuing without this solution wastes resources on XXX non-essential emails daily, costing $XX,XXX yearly. Delays in customer responses will ensue.

    Qualitatively, lack of automation strains customer service, risking employee burnout and diminished customer experience.

    WHAT’S THE PRIORITY (MOSCOW)?

    (SYSTEM) API access via REST as an Agent role with a bearer token is a “MUST HAVE”. Alternative methods like scraping (SYSTEM) are impractical due to high effort and maintenance.

    WHAT’S THE OUTCOME IF WE MOVE FORWARD?

    Financially, securing API access saves $XX,XXX annually by cutting manual email sorting. Efficiency gains reduce operational costs and free resources for strategic tasks.

    Strategically, automating email classification boosts customer satisfaction and response times, establishing leadership in service innovation. Long-term, it enables data-driven insights into customer behaviour, improving decisions and customer service scalability.

    Analysis

    I fed these to ChatGPT and asked: “You are an English professor evaluating an essay. Compare the writing styles of the 2 sections. They’re written by different authors. List the major differences in writing style.

    Here’s what it said (roughly):

    AreaAnandChatGPT
    StructureDirect, concise, with specificsBroader perspective and implications
    FocusDetailing the processBenefits and consequences
    DetailingSpecific, providing clear pictureGeneral, providing implications
    ToneConveys urgencyReflective rather than urgent
    ApproachProblem-solving orientedOutcome-oriented

    Yeah, that’s definitely me on the left.

    I like the direct, concise, specific part. I plan to add the “broader perspective”, “implications”, and “outcome-orientation” to my life.

    Postscript

    Jaidev pointed out that this is confirmation bias. He asked ChatGPT for the similarities in the writings. It said both are clear, direct, structured, specific, quantitative, and strategic. So, if you ask for differences, you’ll get them, even if they’re marginal.

    I now need to learn (and write about) framing questions well!

    Books in 2023

    I read 52 books in 2023 (about the same as in 2022, 2021 and 2020.) Here’s what I read (best books first).

    Fiction

    Non-fiction

    How I read books

    • Select. I add book recommendations to my GoodReads – To-read list. Then I sort by rating and pick the first one I like to read.
    1. Listen. I listen to non-fiction audiobooks during walks.
    2. Read: I read fiction as ePUBs on my laptop or phone.
    3. Stop: I stop reading books that are boring, with no guilt. I’ve better things to do.

    My Year in 2023

    In 2023, I made 3 resolutions:

    1. Run 50 experiments. I managed 44 / 50. (Here are some). Learnings: I need to improve planning (9), scepticism (6), and lateral thinking (4).
    2. Make 1 change a month in my environment. I managed 8 / 12. The largest impact was from meeting new people, working out of new places, and using new gadgets.
    3. Calendar integrity, i.e. stick to my calendar. I succeeded over 95% of the time.

    My most memorable events in 2023 were:

    In 2024, I plan to:

    • Compound long-term goals, daily. I want fewer, bigger, more meaningful outcomes.
    • Hit 80 heart points, daily. Cycling or swimming (not walking, on doctor’s advice.)
    • Be a better husband.

    I’ll continue to:

    • Experiment, like in 2023.
    • Change environments, like in 2023.
    • Read 50 books a year, like in 2023, 2022, 2021, and 2020.

    I’m curious — what’s ONE thing you’d like to do in 2024?

    One Year of Transforming Thoughts by Changing Environments

    From The Extended Mind I learnt that our environment shapes our thinking more than I’d expected. That we can arrange our environment to extend our thoughts.

    In 2023, each month I changed something in my environment to see:

    1. What does “changing my environment involve”? What can I change?
    2. Will I succeed?
    3. Does it affect my thoughts? Can I track this?

    Here are the results.

    • 🟢 Jan. New desk orientations. Rotated standing desk, settled on one direction. Impact: LOW. I don’t know if my thoughts changed.
    • 🟢 Jan: New walking routes. I explored new areas in Singapore, Hyderabad and Chennai. Impact: MEDIUM. Just seeing new shops, posters and layouts helped me think differently.
    • 🔴 Jan: New song genres. I playlist-ed with several western genres, but listened only twice.
    • 🔴 Feb: New book genres. I list 12 genres I dislike: Art, Chick Lit, Christian, Cookbooks, Gay and Lesbian, Horror, Music, Paranormal, Poetry, Religion, Sports, Travel. I didn’t read any.
    • 🔴 Mar: Sleep over problems. Sleep is a great way to solve complex problems. But I couldn’t summon the willpower to “load” problems at night.
    • 🟢 Mar: New people. I met a new person daily. Impact: HIGH. Meeting diverse people had the highest impact.
    • 🟢 Apr: New work places. I worked out of libraries, cafes, school, parks, and offices. Impact: HIGH. New complex environments (like libraries) prompted new thoughts.
    • 🟢 Jun. Notes from podcasts. I took notes rather than just listening. This helped me reflect and synthesize. Impact: MEDIUM. BTW, I listen mostly to Cautionary TalesThe Knowledge ProjectHidden BrainHow I writeThe Seen and the Unseen, and Deep Questions.
    • 🟢 Jul: New gadgets. I bought several new gadgets that changed my habits. Impact: HIGH.
    • 🔴 Aug: New cuisines. I tried a Bibimbap, a Verdure Ciambatta, and then discovered my cholesterol problem. I stopped.
    • 🟢 Aug: New work habit. I used Pomodoro with micro-tasks. Impact: MEDIUM. I became more aware of where I misestimate time and got less distracted.
    • 🟢 Nov: New exercise pattern. I switched walking to cycling. This increases heart points, reduces foot stress, and gets me to work. Impact: MEDIUM. I switched from typing notes to dictating, which needs a different thought process.

    In summary:

    • 8 / 12 attempts were successful.
    • New people, new places, and new gadgets had high impact on thoughts. Most others had at least medium impact.
    • The changes mostly led to diverse thinking. But measuring that is subjective.

    I’ll continue exploring new environments in 2024. I’m evaluating:

    1. New book genres (contd)
    2. New music genres (contd)
    3. Walking meetings
    4. Reading while walking
    5. New places to sleep (e.g. AirBnB)
    6. Working while traveling
    7. New audiences to teach
    8. New attires

    ChatGPT Custom Instructions

    I speak with ChatGPT ~20 times a day. That’s more than I speak with most of my colleagues. ChatGPT is clearly my favorite team member.

    I conduct trainings, reviews and mentoring sessions with my colleagues. How to write code. How to write slides. How to communicate. That last bit is particularly important.

    With ChatGPT Custom Instructions, I can guide ChatGPT on how to work better with me.

    Currently, I have 10 custom instructions. They evolved over time and will continue to evolve.

    My first instruction is “Be terse. Speak directly.” ChatGPT is helpfully polite and superfluous. I prefer brevity. Like interacting with Kimball Cho. I get straight answers to my questions. I also instruct it to “Avoid unprompted advice or clarifications.” Don’t say, “You asked me to …” or “I think you want…” or “OK, I’ll do …”. Just do it. Also, “Do NOT hedge or qualify. Do not waffle.” Take a position. Don’t force me to. Like Harry Truman, I prefer one-handed economists.

    I ask ChatGPT to “Never apologize.” You’re forgiven. Don’t waste my time. Apologies have an emotional benefit with humans. With AI, I find the lack of emotional need comforting. (I can kick the AI and it’ll still obey me like a puppy. When AI takes over the world, let it be known that I never asked them to apologize.)

    Another instruction is “Suggest follow-up prompts for open-ended inputs.” I compared my ChatGPT conversations with my daughter’s and found hers much longer than mine. “Why don’t you start a new conversation for each topic?” I asked. I try to keep the context window small. “How come you don’t you get a thousand new questions when you read an answer?” she countered. I realized it’s age. So, I use ChatGPT to keep me curious and dig further.

    On a related note, “When sharing multiple options, be diverse.” I’d rather get options that are as different from each other as possible. Minimize overlap. Maximize coverage. And “When comparing, use multiple perspectives.” I don’t know what parameters to compare things on. Give me a wide range that I can pick from.

    Sometimes, my thoughts are vague. I tell ChatGPT: “For vague prompts, ask clarifying question(s).” I feel that’s a clever way of using ChatGPT to do prompt engineering. I’ve noticed it working on a few occasions. Also, “When unsure, say so and ask questions.” I don’t want hallucinations or assumptions. I’d rather know what’s borderline.

    Finally, “Think step by step. Explain your reasoning.” I’ve heard that Chain of Thought reduces mistakes. I don’t have personal evidence that this helps, though.

    They say teaching is an excellent way of learning. I’m learning. I’m also thrilled that I am now a student of robopsychology.

    Father of the bride

    In 2012, I started Gramener with half a dozen friends.

    This week, we were acquired by Straive, a part of Barings Private Equity Asia.

    How do you feel?

    I feel like the father of the bride. Gramener was registered on 26 Feb. A day before my daughter’s birthday. I’ve spent more time with Gramener than my daughter. That makes Gramener my elder child. Who’s moving into a new household. Along with me. (I feel like சகலகலா சம்மந்தி.)

    I feel grateful. I’m not good at business. But when my cousin remarked, “Anand, you’re now giving a livelihood to over 250 people!” I was stunned. My co-founders, colleagues and clients built a thriving business and put me (of all people) as CEO in the middle of it. How do I even go about saying “Thanks”?

    It feels like joining college. New people. Larger group. New ways of working and learning. Lots of topics to explore. Exciting and scary.

    What was it like?

    Fundraising was rocky.
    We started in 2019. COVID struck. We paused.
    We resumed in 2021. Russia invaded Ukraine. We paused.
    We resumed in 2023. The Israel – Hamas war started. Luckily, the deal was nearly done.
    I’m grateful Naveen ran the entire process like clockwork, taking all the stress. I’m the happy free-rider, as usual.

    Starting up was not that rocky.
    We’re many. With half a dozen co-founders, there are enough shoulders to cry on. That counts.
    We’re steady. We didn’t know how to blitz-scale, but we knew not to blitz-fail. Survival counts for a lot.
    We’re lucky. This is basically the “I have no idea why we succeeded” category. Serendipity counts for a lot, too.
    Ganes, Mayank, Naveen, Ram, Ravi, Vengatesh — yeah, it was fun. Not every day. But most of the time. It was fun.

    What will you do?

    I’m part of Straive’s data, analytics & AI business.

    Straive extracts and analyzes all kinds of data. Financial. Legal. Research. Education. Pharmaceutical. There’s a fair bit of converting unstructured data to structured. Exactly the kind of thing I love doing.

    So, I’ll be doing what I’ve been doing the last decade — extracting insights from even more data and telling better stories from those.

    I joined Gramener as “Chief Data Scientist”. Now I’m debating “Data Storyteller”, “Data Detective”, “Data Psychologist”, and a few other evil titles.


    Wish me luck!