STEP 3 (10 min): Reformat the Markdown by writing a script in Cursor. Here’s the prompt:
Write a Python script that reads *.md including the YAML frontmatter, adds the YAML title as H1, date (yyyy-mm-dd) like Sun, 01 Jan 2000 in a new para after the frontmatter and before the content.
STEP 4 (15 min): Convert it to an ePub using pandoc.
STEP 5 (10 min): Generated a cover page with ChatGPT (5 min) and compressed it into JPEG via Squoosh.
Draw a comic-style book cover page that covers the experiences of an Indian exchange student (picture attached) from IIM Bangalore at London Business School and exploring London. The book title is “An LBS Exchange Program”.
STEP 6 (10 min): Publish the book on KDP. It’s priced at $0.99 / ₹49 because Kindle doesn’t allow free downloads.
Let’s look at how I built the second example, step by step.
ChatGPT interviews me and creates Markdown slides
While walking 75 minutes from home to IIT Madras to deliver this talk, I had ChatGPT interview me in standard voice mode.
Why an interview? It’s easier when someone asks questions.
Why voice? It’s hard to type while walking. Otherwise, I prefer typing.
Why not advanced voice mode? I want to use a reasoning model like O3 Mini High for better responses, not the GPT-4o-realtime model that advanced voice mode uses.
I want to create an insightful deck in Markdown on how I have been using LLMs in education.The audience will be technologists and educators. The slide contents must have information that is useful and surprising to them.
The slides are formatted in Markdown with each slide title being a level 2 Markdown header and the contents of the slides being crisp bullet points that support the title. The titles are McKinsey style action titles. Just by reading the titles, the audience will understand the message that I am trying to convey.
In this conversation, I’d like you to interview me, asking me questions one by one, and taking my inputs to craft this presentation. I’d also like you to review the inputs and the slide content you create to make sure that it is insightful, useful, non-obvious, and very clear and simple for the audience. Interact with me to improve the deck.
Let’s begin.
(Unsurprisingly, I talk a lot more than I type.)
There were 3 kinds of interactions I had with ChatGPT:
Content. I explained each slide. For example:
Yeah, let’s move on to the next topic, which is where we had the students learn prompt engineering as part of the course. One of the questions was convincing an LLM to say yes, even though …
Correction. After ChatGPT read aloud a slide, I corrected it. For example:
Content-wise, it’s spot-on. Style-wise, it’s almost spot-on. It’s far more verbose. Can you retain the exact same style, but shorten the number of words considerably?
These feel generic. I’d like stuff that comes across as insightful, non-obvious, and specific.
Collation. I had ChatGPT put slides in order. For example:
Put all the slides together in sequence. Make sure you don’t miss anything.
Move the opening questions as the second slide. Move the final takeaways, which is currently the last slide, to just before the final set of questions.
At the end of the interview, I had all the content for the slides.
Marp converts Markdown to slides
I use Marp, a JavaScript tool that turns Markdown into slides
Why Markdown? It’s natural for programmers and LLMs. ChatGPT renders rich text in Markdown.
Why not RevealJS? Despite a Markdown plugin, RevealJS is built for HTML. Marp is built for Markdown.
I created a bookmarklet that copies text as Markdown. Using this, I converted the ChatGPT slide transcript to Markdown, saving it as README.md.
The Marp for VS Code plugin makes it easy to preview the slides when you adding YAML frontmatter like this:
I use ChatGPT or Gemini to create images that support the slides. For example this slide includes an image of a robot psychologist generated by Gemini’s ImageGen 3:
Today, with native image generation in Gemini 2.0 Flash and GPT 4o, I’d likely use those. They have much better character control.
Deploying on GitHub Pages
I use GitHub Actions to render the slides and deploy them on GitHub Pages. Here’s what the key steps look like:
I started my website in 1997 on Geocities at https://www.geocities.com/root_node/, mostly talking about me. (A cousin once told me, “Anand’s site is like TN Seshan – talking only about himself.” 🙂)
I wanted a place to share the interesting links I found. Robot Wisdom by John Barger and Scripting News by Dave Winer were great examples: collection of interesting links updated daily.
In July 1999, as a student at IIMB, I decided to put that into action by creating a custom HTML page updated manually.
What platform are you using to manage your blog and why did you choose it? Have you blogged on other platforms before?
WordPress. Because it was the most fully-featured, mature platform when I migrated to it around 2006.
Before that, I used:
A custom HTML page on Geocities. But it was hard to update multiple links, create individual pages, categories, etc. So…
A Perl-based static-site generator I wrote myself. But as my link count grew, each generation took too long. So …
A CGI-Perl-based blogging engine I hosed on freestarthost.com, handling commenting, etc. But at BCG, I didn’t have time to add many features (linkback, RSS, etc.) So…
Blogger as a parallel blog, briefly. But it didn’t have as many features as (nor the portability of) WordPress. So…
WordPress – moving across a bunch of hosting services, and currently on HostGator.
I also blogged in parallel on InfyBlog, Infosys’ internal blogging platform on LiveJournal.
How do you write your posts? For example, in a local editing tool, or in a panel/dashboard that’s part of your blog?
I started with custom HTML in Emacs (or whatever code editor I kept moving to).
Any future plans for your blog? Maybe a redesign, a move to another platform, or adding a new feature?
I plan to move it to GitHub Pages with Markdown content and a static site generator. I might write my own SSG again in Deno or use one of the faster ones.
This is the third post in my “Nasty habits” series following Licking and Scraping.
Nibbling is biting, but only with the incisors. Not the canines or molars. And it’s a delight.
Nibbling is not uncommon. People tend to nibble on all kinds of stuff. Pens, erasers, straws, gums, clothes, buttons, spoons, rubber bands, paper, toothbrush, cups, bottles, cables, gadgets, books, chalk, coins. It’s a long list.
But I don’t do those. I nibble only food and body parts.
Food
Grapes. I love grapes. You can peel off the skin with your teeth, you see. It’s a slow process, but that’s the point. The food lasts longer. It’s a bit messy since the grapes start watering. That makes it hard to type while eating. But that’s what food breaks are for, right?
When you peel the skin off the grapes, it forms strips. Catching that next strip without biting off too much of the flesh is the art. That way, you have the thinnest peels and the largest amount of flesh.
Large grapes are best for this. Unfortunately, most of them tend to have seeds. The large seedless ones are a delight (though a bit expensive).
Of course, you finally get to eat the flesh at the end. But I’m not sure that’s the best part. Sure, they’re juicy and sweet. But they give me less than 5 seconds of enjoyment. Unlike the peel which can last a minute per grape. Sure, they don’t taste as good. But after four decades of eating grapes by peeling them with my teeth, I’ve grown to love the peels more.
Almonds. It’s the same with almonds. They peel off less easily, but that’s part of the challenge. Soaking them in water spoils the fun. That makes it too easy. You’ve got to soak them in your mouth for a few minutes, soften them, and then peel them off. Doing this while the almond is in your mouth requires some oral gymnastics, but I’m sure it builds character.
Almonds are better than grapes in some ways. The peel is bitter. The flesh is mostly tasteless. They tend to dry the palate. So there’s less temptation to eat more. An almond typically takes me ~3 minutes, compared with a grape – which I can’t stretch for more than a minute. It’s not about the calories either. An almond has ~3 times the calories of a grape. So that evens out. It’s just that I’d feel like eating the almond again less often. Good for the waistline.
Bread crusts. That’s another great food to nibble. You can start at any corner, gently nibble the crust, and peel it off. The trick is getting the right amount of crust out. Biting at the exact edge. The remaining bread should be white, but the crust you peel out should only have the brown. Obviously, this doesn’t work with toast – so I avoid that. It works great with the sandwiches they provide on flights.
(This liking for crusts went to the point where my family would use a knife to cut off the crust. I’d eat all the crusts. It turns out I actually like them better than the bread. But – that doesn’t count towards nibbling, so I’ll stop here.)
Raisins. Not bad, but too small. I peel them off with my teeth only if I really feel like nibbling.
Apple. Again, not bad, but hard to peel, usually. I used to do this earlier with the softer apples, but haven’t done it for a long time.
Chocolate. Most chocolates are not nibble-able. But there are a few exceptions. Protien bars, 5-Star, etc. are OK. You can keep them in the wrapper and nibble on them. But Kit Kat is better. You can nibble at a chunk. Then soak the chocolate in your month a bit. Then bite off the first wafer and eat that. And then the second wafer. You can even lick the chocolate off the wafer while it’s in your mouth, then nibble on the wafer.
Boba. This is my new discovery in Singapore. Tapioca pearls that are so nibble-able. They have just the right texture and chewiness – firm enough to bite, solid enough after biting, and small enough to fit in my mouth. Only slightly more in calories (when cooked) than grapes and a lot cheaper. I’m planning to buy a few kgs and boil them. (I don’t know why I bother about the cost of boba. I can afford it. But it’s a habit of a lifetime.)
Actually, biting is more fun than the eating part.
Body parts
This is the grosser part.
Nails. I’ve been biting my nails for as long as I can remember. Along with the skin around them. So much so that, after repeated requests, my mother settled on, “Anand, when you bite your nails, leave a little bit behind.” That resonated a lot. I mean, I’d like some nail to bite tomorrow, right?
My father introduced me to nail cutters. I tried them for a while (by cutting the nails and then nibbling) but the shapes they produce aren’t as interesting, nor as controllable, as when you bite them.
Nails have a side benefit: fiddling. The shape and texture of nails is such a delight! You can roll them in your fingers, run your fingers along the edge, press against the sharp edges, squeeze against the blunt edges, bend to see how far they’ll go without breaking, tear the layers to see how thin a layer you can get without breaking it, poke at the sharp corners, poke with the sharp corners. Oh, they’re pretty good at removing hair and dead skin from the keyboard, too. So much more.
In fact, I preserve nails for quite some time. I remember the shape and texture of some nails from childhood and truly miss them. In fact, there’s one really good specimen from last September that I kept for a few months before destroying it by fiddling too much. (I have a semi-secret hiding place for nails that prevents anyone cleaning my room from stealing them.)
But I digress…
Skin. Great in many ways, but after a point, they bleed. That pain was not worth the price. (Actually, the pain was OK. I’m just scared of blood.)
Lips. Same. Great texture. But they bleed.
Hair. Now that’s something. Like nails, they’re replenishable. (So far. Though I seem to be running out a bit near the top of my head.)
But the quality varies a lot depending on where you get the hair from. Chest hair is the best. It’s thick, rough, and sometimes has interesting kinds that are fun to iron out by nibbling. Eyebrows are interesting – they’re pretty thick, too, but not as long. Hair from the head is OK. It’s moderately thin, so it’s good for a nibble or two. The best is when the hair has a natural knot. Pulling at the knot while nibbling is fun. Ear hair is too thin. Armpit hair needs washing, which is quite inconvenient.
At Straive, only a few people have direct access to ChatGPT and similar large language models. We use a portal, LLM Foundry to access LLMs. That makes it easier to prevent and track data leaks.
The main page is a playground to explore models and prompts. Last month, I tracked which features were used the most.
A. Attaching files was the top task. (The numbers show how many times each feature was clicked.) People usually use local files as context when working with LLMs.
3,819: Remove attachment.
1,717: Add attachment.
970: Paste a document
47: Attach from Google Drive
R. Retrieval Augmented Generation (RAG). Many people use large files as context. We added this recently and it’s become popular.
331: Enable RAG (answer from long documents)
155: Change RAG system prompt
71: Change RAG chunk size
27: Change number of RAG chunks
C. Copying output is the next most popular. Downloading is less common, maybe because people edit only parts of a file rather than a whole file.
1,243: Copy the output
883: Format output as plain text
123: Download as CSV
116: Download as DOCX
T. Templates. Many users save and reuse their own prompts as templates.
314: Save prompt as template
98: See all templates
53: Insert a template variable
18: Delete a template
J. Generate JSON for structured output is used by a few people.
238: Enable JSON output
223: Pick a JSON schema
P. Prompt optimization. Some people adjust settings to improve their prompt, or use a prompt optimizer. I’m surprised at how few people use the prompt optimizer.
238: Change temperature
207: Optimize the prompt
G. Generating code and running it via Gemini is less common, but it’s used more than I expected.
275: Generate and run code
S. Search is used a lot less than I expected. Maybe because our work involves less research and more processing.
169: Search for context
101: Search for context (Gemini)
46: Specify search text
26: Change number of search results
I left out UI actions because they do not show how people use LLMs.
3,336: Reset the chat
2,049: Switch to advanced mode
245: Keep chat private
262: Stop generating output
27: Show log probs
The main takeaway is that people mostly use LLMs on local files. We need to make this process easier. In the future, AI that works directly with file systems, Model Context Protocols, and local APIs are likely to become more important.
I read 51 new books in 2024 (about the same as in 2023, 2022, 2021, and 2020.) But slightly differently.
I only read Manga this year.
Fullmetal Alchemist (Vol 12 – 27). What started off as a childishly illustrated children’s book evolved into a complex, gripping plot.
Attack on Titan (Vol 1 – 34). I read it while I watched the TV Series (reading first, then watching). It started explosively and the pace never let up. I had to take breaks just to breathe and calm my nerves. The sheer imagination and subtlety is brilliant.
It’s hard to decide which is better—the manga (book) or the anime (TV). The TV series translates the book faithfully in plot and in spirit. It helped that I read each chapter first, allowing me to imagine it, and then watch it, which told me what all I missed in the book. I absolutely would not have understood the manga without watching the anime.
This apart, I only read Brandon Sanderson‘s books. Or rather, re-read. All of them, actually 🙂.
Though I enjoyed manga thoroughly, it may not be for everyone because:
Firstly, books are not for everyone. Comics even more so. A lot of people feel they’re … childish. That takes some effort to overcome.
We visited Japan this summer and it beautifully complemented this reading experience. I could visualize every scene against the backdrops. I finished Attack on Titan on 4 Jun, just as we reached Japan. I planned to read nothing more for the rest of the year. Nothing could beat the experience.
But in Dec 2024, Wind and Truth was released. I am now half-way through perhaps the only book that can match my manga experience this year.
I learn things when I’m reading, listening to podcasts, listening to people, or thinking. In every case I’m close to my phone or laptop.
If my laptop is open, I add my notes to a few (long) Markdown files like this til.md.
If my phone is easier to access, I type or dictate my notes into Microsoft To Do, which is currently my most convenient note-taking app. It syncs with my laptop. I transfer it (via OCR on Microsoft Power Toys) into the Markdown file.
The Markdown files are synced across my devices using Dropbox, which I find the most convenient and fast way to sync.
The notes have a simple format. Here’s something I quickly wrote down in Microsoft To Do while speaking with a senior at a restaurant:
Government websites like the official press releases cannot be crawled from outside India. Hence the need for server farms in India!
Then I copied that over to the Markdown file as a list item along with the date (which Microsoft To Do captures), like this:
- 15 Dec 2024. Government websites like the official press releases cannot be crawled from outside India. Hence the need for server farms in India!
That’s it. Quick and simple. The most important thing is to capture learnings easily. Even the slightest friction hurts this goal.
Publishing learnings
I run this Deno script which parses the Markdown files, groups them by week, and generates a set of static HTML pages. These are published on GitHub Pages, which is currently my favorite way to publish static files.
It generates an RSS feed as well. I’ve started reading more content using RSS feeds via Feedly, including my own notes. I find browsing through them a useful refresher.
This format is different from my blog. In the 1990s and early 2000s, I published individual links as posts. Then I moved to long form posts. This consolidates multiple links into a single weekly post. But rather than publish via WordPress (which is what my blog is currently based on), I prefer a Markdown-based static site. So it’s separate for the moment.
I intend to continue with these notes (and the format) for the foreseeable future.
The Hindu used StreamYard. It web-based and has a comments section. I used JS in the DevTools Console to scrape. Roughly, $$(".some-class-name").map(d => d.textContent)
But the comments are not all visible together. As you scroll, newer/older comments are loaded. So I needed to use my favorite technique: Cyborg Scraping. During Q&A, I kept scrolling to the bottom and ran:
// One-time set-up
messages = new Set();
// Run every now and then after scrolling to the bottom
// Stores all messages without duplication
$$(".some-class-name").map(d => messages.add(d.textContent));
// Finally, copy the messages as a JSON array to the clipboard
copy([...messages])
I used VS Code’s regular expression search ^\d\d:\d\d (AM|PM)$ to find the timestamps and split the name, time, and comments into columns. Multiple-cursors all the way. Then I pasted it in Excel to convert it to Markdown. I added this in the Comments in the Chat section.
(Excel to convert to Markdown? Yeah. My formula is below.)
Transcribe the video
I downloaded the video using yt-dlp, which I find the most robust tool for YouTube downloads.
I used ffmpeg.exe -i webinar.mp4 -b:a 32k -ac 1 -ar 22050 webinar.mp3 to convert the video to audio. I use these settings for voice (not music) to get a fairly small MP3 file. I should have used Opus, which is much smaller. I’ll do that next.)
Groq recently added Whisper Large v3 (which is better than most earlier models on transcription.) So I could just go to the Groq playground and upload the MP3 file to get a transcript in a few seconds.
Add images to the transcript
I wrote a tool, VideoScribe (WIP), to make transcription and image insertion easy. It uses ffmpeg -i webinar.mp4 -vf select='key',showinfo -vsync vfr -compression_level 10 "%04d.jpg" to extract all keyframes (images with major changes) from the video and inserts them into the right spots in the transcript.
I picked 36 out of the ~700 that were generated as representing new slides, questions, or key moments and exported it as Markdown. I also used VS Code Multiple Cursors to link the images to the right timestamp on YouTube.
Clean up the transcript
Up to here was mostly automated. This step took me an hour, though. I copied chunks of transcripts, passed it to Claude 3.5 Sonnet via Cursor with this prompt:
Clean up this webinar transcript segment. Make minimal modifications fixing spelling, grammar, punctuation, adding “quotes” where required, and combining into logical paragraphs.
This is what gave me the bulk of the webinar transcript. (I’d like to automate this next.)
Extract tools
Many audience members asked for a list of tools we mentioned. So I passed ChatGPT the transcript and asked:
List all tools mentioned in this webinar
It listed 12 tools, but I know enough to be sceptical. So…
Anand: Were any tools missed?
ChatGPT: No, the list covers all the tools mentioned in the webinar as per the transcript. If you noticed any specific tool that I missed, please let me know.
Anand: There WERE a few tools missed. Look closely. (I was bluffing, BTW.)
ChatGPT: You’re right. Upon closer review, here are the additional tools mentioned:…
Anand: There are a few more that you missed.
ChatGPT: Got it. Here’s a revised list that should include all the tools mentioned:
There were several questions in the comments. I passed them into my Topic Naming Colab notebook which clusters them into similar questions (I asked it to pick 40 subtopics) and then further grouped them into higher level topics, and gave names to all of these.
Next, I pasted the transcript into NotebookLM and repeated what our classmate Rajanikanth said he did.
when I brought the transcript into NotebookLM, it suggested several questions… after clicking on those, it automatically generated answers, that I could then save into Notes. I suppose it still needs me to click on it here and there… so, I feel like I got engaged in the “learning”
Note: NotebookLM now lets you customize your podcast. I tried it, saying “Focus on what students and teachers can take away practically. Focus on educating rather than entertaining.” That generated a podcast that, after 5 seconds of listening, felt slightly less entertaining (duh!) so I reverted to the original.
Publishing
I usually publish static content as Markdown on GitHub Pages. The entire content was pushed to https://github.com/sanand0/ai-in-education-webinar with GitHub Pages enabled.
I also created a simple index.html that uses Docsify to convert the Markdown to HTML. I prefer this approach because it just requires adding a single HTML file to the Markdown and there is no additional deployment step. The UI is quite elegant, too.
Simplifying the workflow
This entire workflow took me about 3 hours. Most of the manual effort went into:
Picking the right images (15 minutes)
Cleaning up the transcript (50 minutes)
Manually editing the question topics (30 minutes)
If I can shorten these, I hope to transcribe and publish more of my talk videos within 15-20 minutes.