S Anand

Embeddings similarity threshold

text-embedding-ada-002 used to give high cosine similarity between texts. I used to consider 85% a reasonable threshold for similarity. I almost never got a similarity less than 50%.

text-embedding-3-small and text-embedding-3-large give much lower cosine similarities between texts.

For example, take these 5 words: “apple”, “orange”, “Facebook”, “Jamaica”, “Australia”. Here is the similarity between every pair of words across the 3 models:

For our words, new text-embedding-3-* models have an average similarity of ~43% while the older text-embedding-ada-002 model had ~85%.

Today, I would use 45% as a reasonable threshold for similarity with the newer models. For example, “apple” and “orange” have a similarity of 45-47% while Jamaica and apple have a ~20% similarity.

Here’s a notebook with these calculations. Hope that gives you a feel to calibrate similarity thresholds.

Auto vs GPT

I was crossing a not-too-busy street on a not-too-busy day in Chennai.

I was having a voice conversation with ChatGPT (about the log probabilities of tokens on LLMs, if you’re curious) when I was rudely interrupted by an auto rikshaw rapidly honking at me. “Honk honk honk honk honk” in rapid succession.

Not unusual. Mildly annoying. The street was empty. The auto was empty. The traffic policeman was visible. I gave way and carried on.

A few seconds later, I heard a voice in my ear.

“It sounds like you’re in a good mood! Anything else you’d like to discuss or know more about?”

ChatGPT was still listening (perhaps to background noise) and responding. But I didn’t realize what random noise it thought put me in a good mood. Here’s what I saw on the chat window.

ChatGPT had transcribed the auto’s honking to “Hee hee hee hee hee!”

A client once told me, while visiting Hyderabad, that “these honks in India are a language of their own.” If ChatGPT is to be believed, the autos are laughing at us.

This is, incidentally, the very first time ChatGPT added an exclamation point to my words. I’ve never managed to achieve that so far. No matter how emphatically I spoke.

Also, I’d never have learnt this walking in the streets of Singapore. Friends have warned me about the dangers of long walks on Indian roads. Here’s an example of the lessons we learn — if only we keep our eyes and ears (and microphones) open.

Postscript

While cycling in Singapore, ChatGPT interprets the sounds very differently. At least twice, it transcribed the traffic noise into “Thank you. Thank you.” Clearly even traffic noise in Singapore is more graceful than in Chennai!

What does Gramener ask ChatGPT?

I looked at how Gramener uses ChatGPT Plus by evaluating 600+ chats asked over 3 months from Oct 2023 to Jan 2024.

The team asks 6 questions a day. We don’t track who or how many actively use ChatGPT Plus. This also excludes personal ChatGPT accounts. Still, 6/day is low for an entire team put together.

The questions fall into 8 categories.

Category%
Excel, data exploration & analysis25%
Text extraction and summarization13%
HTML, CSS, or JavaScript code13%
Python code13%
LLMs, AI and use cases9%
OCR and image analysis9%
Generate images, logos, and designs7%
General knowledge, policy & environment5%
Audio and translation5%

Here are some questions from each category – to give you an idea of emergent ChatGPT Plus usage.

Excel, data exploration & analysis (25%)

  • Excel clean and merge. There are 2 worksheets in this excel with data, can you clean up the data and merge the data in both the sheets
  • Excel CO2 Data Analysis. You are an expert Data Analyst who is capable of extracting insights out of data. Analyze this sheet and let me know the findings
  • Excel Chi-Square Analysis Guide. how to perform chi square analysis in excel
  • Log Data Insights & KPIs. Looking at the columns from this excel, what kind of insights are possible, what are key KPIs to be looked at

Text extraction and summarization (13%)

  • Complaint Investigation Summary. The following is the summary of an internal investigation for a customer complaint. Now this internal summary is to be paraphrased (in 3-4 lines) as part of a closure
  • Extracting Tables from RTF. Can you write a script to extract the tables from this document
  • Extracting Entities from Text. [{'word1': '(P)', 'nearest_word1': 'P/N:', 'nearest_word2': '0150-25034', 'nearest_word3': 'CARTIRIDGE'}, {'word1': 'P/N:', 'nearest_word1': '(P)', 'nearest_word2': '015...
  • Extract PDF Font Details. Extract text formatting information from this document. Especially find font styles, families and sizes.

HTML, CSS, or JavaScript code (13%)

  • HTML/CSS Chart Template. Give me HTML, CSS and chart code for this design.
  • CSS Font Stack: Explanation. Explain this CSS font convention: Arial, Helvetica, Segoe UI, sans-serif
  • Checkbox Validation with JavaScript. In HTML form, I have a set of checkboxes. How do I write the form so that at least one of them being checked is mandatory?
  • Prevent Text Wrapping CSS. <span class="text">Chief Communications Officer</span> I need CSS such the text inside should not wrap create new line
  • ReactJS App with Routing. Give me developed version using ReactJS use react router for sidebar section navigation to the pages use Tailwind css for styling. Use styled components for conditional …

Python code (13%)

  • Python Code Documentation Guide. Can you generate documentation for a project code written in python?
  • Linux Commands for Python. Give me list of linux commands to work on python coding
  • Code explanation request. What’s this code about? …
  • FastAPI Async Testing. Write a fastapi code and a python client to test the asynchronous nature of the fastapi package.
  • Streamlit App for Translation. Given the following python code, give me a simple streamlit app that takes file upload and converts that into a target language: …

An interesting sub-topic was interview question generation.

  • Python Decorator for Database Queries. Create one medium level question for Decorators in python Industryy usecase specific with solution

LLM, AI and use cases (9%)

  • LLMs for Data “What Ifs”. You are an LLM Expert. Can you tell me how can we leverage LLM for implementing What IF scenarios on Data?
  • LLMs: Current Challenges & Concerns. what are current challenges with LLMs
  • LLM Applications in Marketing. Show LLM applications for the marketing function of a music company.
  • Gen AI usage. What industries are using Gen AI the most
  • Best LLMs in 2023. Search the internet for the most recent LLMs and list the best LLMs in terms of performance
  • Best Image Classification Models. suggest best models to tell what there in the image

OCR and image analysis (9%)

  • Browser history OCR. This is a screenshot of my browser history. Convert that to text. Categorize these into common topics.
  • Extracted C Code. This image contains C code. Extract it.
  • Image text extraction and annotation. Extract the text from this image and annotate the boundaries of the text
  • Detecting Document Image Orientation. oreientation detection of documnet image
  • AI Project with OpenCV & YOLO. Consider yourself as Open CV and Yolo expert and help me with AI project
  • Image Correction Techniques. what are the approaches we have in computer vision where my image is tilted or rotated in reverse or image is not in readable format

Generate images, logos, and designs (7%)

  • Google Chacha and ChatGPT Bhatija. Generate an image of Google Chacha and ChatGPT Bhatija
  • Regenerative Systems Group Image. Generate an Image with below context > “A group of people interested in Regenerative systems. The focus is on reusing food, energy and mental health”
  • Twitter Reply Icons Design. Give me three icons: icon16.png, icon48.png, icon128.png for an extension that I’m building that suggests replies to tweets
  • Generate flowcharts. Make a flowchart of the underlying working of a web app. Here’s how it works. 1. The user uploads a document – a PDF or an image. They then select the language that …
  • Create Animated GIF from Photos. I have 4 photos I want to make an animated gif out of them. How can i do that?
  • Climate Impact Illustration. An illustration showcasing the impact of climate change on daily life, focusing on a rural setting near the coast. In the foreground, a small farm is visibly struggling, …

General knowledge, policy & environment (5%)

  • Design Thinking Overview. What is Design thinking
  • Arthashastra. What can Arthashastra teach us about modern politics?
  • Community Impact on Habits. Is there research to suggest the impact of community on habit building?
  • Focus at Age 28. What should a 28 year old focus on?
  • Superconductors. Explain superconductors like I’m five years old.
  • Climate Career: Impactful Choices. You a career counsellor at a University campus. You want to create 4 to 5 talking points for students to consider a career in Climate space.
  • Sustainability Division Vision. I run a software outsourced product development company. I want to start a new division that focuses on sustainability services offerings. Please draft a vision…

Audio and translation (5%)

  • Audio Timestamp Mapping. timestamp mapping for transcribed audio
  • Transcribe Lengthy Audio: Segment. Transcribe this audio file.
  • Traducción del MOU al Español. Translate this document to Spanish, and create a new translated document. Maintain text formatting.
  • Telugu Transcription into Hindi. Transcribe the following telugu text into hindi. You are supposed to transcribe, not translate. శ్రీనివాస పూజావిధానము …
  • GPT lacks native audio support. Does gpt support audio in audio out natively?

Books in 2023

I read 52 books in 2023 (about the same as in 2022, 2021 and 2020.) Here’s what I read (best books first).

Fiction

Non-fiction

How I read books

  • Select. I add book recommendations to my GoodReads – To-read list. Then I sort by rating and pick the first one I like to read.
  1. Listen. I listen to non-fiction audiobooks during walks.
  2. Read: I read fiction as ePUBs on my laptop or phone.
  3. Stop: I stop reading books that are boring, with no guilt. I’ve better things to do.

My Year in 2023

In 2023, I made 3 resolutions:

  1. Run 50 experiments. I managed 44 / 50. (Here are some). Learnings: I need to improve planning (9), scepticism (6), and lateral thinking (4).
  2. Make 1 change a month in my environment. I managed 8 / 12. The largest impact was from meeting new people, working out of new places, and using new gadgets.
  3. Calendar integrity, i.e. stick to my calendar. I succeeded over 95% of the time.

My most memorable events in 2023 were:

In 2024, I plan to:

  • Compound long-term goals, daily. I want fewer, bigger, more meaningful outcomes.
  • Hit 80 heart points, daily. Cycling or swimming (not walking, on doctor’s advice.)
  • Be a better husband.

I’ll continue to:

  • Experiment, like in 2023.
  • Change environments, like in 2023.
  • Read 50 books a year, like in 2023, 2022, 2021, and 2020.

I’m curious — what’s ONE thing you’d like to do in 2024?

One Year of Transforming Thoughts by Changing Environments

From The Extended Mind I learnt that our environment shapes our thinking more than I’d expected. That we can arrange our environment to extend our thoughts.

In 2023, each month I changed something in my environment to see:

  1. What does “changing my environment involve”? What can I change?
  2. Will I succeed?
  3. Does it affect my thoughts? Can I track this?

Here are the results.

  • 🟢 Jan. New desk orientations. Rotated standing desk, settled on one direction. Impact: LOW. I don’t know if my thoughts changed.
  • 🟢 Jan: New walking routes. I explored new areas in Singapore, Hyderabad and Chennai. Impact: MEDIUM. Just seeing new shops, posters and layouts helped me think differently.
  • 🔴 Jan: New song genres. I playlist-ed with several western genres, but listened only twice.
  • 🔴 Feb: New book genres. I list 12 genres I dislike: Art, Chick Lit, Christian, Cookbooks, Gay and Lesbian, Horror, Music, Paranormal, Poetry, Religion, Sports, Travel. I didn’t read any.
  • 🔴 Mar: Sleep over problems. Sleep is a great way to solve complex problems. But I couldn’t summon the willpower to “load” problems at night.
  • 🟢 Mar: New people. I met a new person daily. Impact: HIGH. Meeting diverse people had the highest impact.
  • 🟢 Apr: New work places. I worked out of libraries, cafes, school, parks, and offices. Impact: HIGH. New complex environments (like libraries) prompted new thoughts.
  • 🟢 Jun. Notes from podcasts. I took notes rather than just listening. This helped me reflect and synthesize. Impact: MEDIUM. BTW, I listen mostly to Cautionary TalesThe Knowledge ProjectHidden BrainHow I writeThe Seen and the Unseen, and Deep Questions.
  • 🟢 Jul: New gadgets. I bought several new gadgets that changed my habits. Impact: HIGH.
  • 🔴 Aug: New cuisines. I tried a Bibimbap, a Verdure Ciambatta, and then discovered my cholesterol problem. I stopped.
  • 🟢 Aug: New work habit. I used Pomodoro with micro-tasks. Impact: MEDIUM. I became more aware of where I misestimate time and got less distracted.
  • 🟢 Nov: New exercise pattern. I switched walking to cycling. This increases heart points, reduces foot stress, and gets me to work. Impact: MEDIUM. I switched from typing notes to dictating, which needs a different thought process.

In summary:

  • 8 / 12 attempts were successful.
  • New people, new places, and new gadgets had high impact on thoughts. Most others had at least medium impact.
  • The changes mostly led to diverse thinking. But measuring that is subjective.

I’ll continue exploring new environments in 2024. I’m evaluating:

  1. New book genres (contd)
  2. New music genres (contd)
  3. Walking meetings
  4. Reading while walking
  5. New places to sleep (e.g. AirBnB)
  6. Working while traveling
  7. New audiences to teach
  8. New attires

ChatGPT Custom Instructions

I speak with ChatGPT ~20 times a day. That’s more than I speak with most of my colleagues. ChatGPT is clearly my favorite team member.

I conduct trainings, reviews and mentoring sessions with my colleagues. How to write code. How to write slides. How to communicate. That last bit is particularly important.

With ChatGPT Custom Instructions, I can guide ChatGPT on how to work better with me.

Currently, I have 10 custom instructions. They evolved over time and will continue to evolve.

My first instruction is “Be terse. Speak directly.” ChatGPT is helpfully polite and superfluous. I prefer brevity. Like interacting with Kimball Cho. I get straight answers to my questions. I also instruct it to “Avoid unprompted advice or clarifications.” Don’t say, “You asked me to …” or “I think you want…” or “OK, I’ll do …”. Just do it. Also, “Do NOT hedge or qualify. Do not waffle.” Take a position. Don’t force me to. Like Harry Truman, I prefer one-handed economists.

I ask ChatGPT to “Never apologize.” You’re forgiven. Don’t waste my time. Apologies have an emotional benefit with humans. With AI, I find the lack of emotional need comforting. (I can kick the AI and it’ll still obey me like a puppy. When AI takes over the world, let it be known that I never asked them to apologize.)

Another instruction is “Suggest follow-up prompts for open-ended inputs.” I compared my ChatGPT conversations with my daughter’s and found hers much longer than mine. “Why don’t you start a new conversation for each topic?” I asked. I try to keep the context window small. “How come you don’t you get a thousand new questions when you read an answer?” she countered. I realized it’s age. So, I use ChatGPT to keep me curious and dig further.

On a related note, “When sharing multiple options, be diverse.” I’d rather get options that are as different from each other as possible. Minimize overlap. Maximize coverage. And “When comparing, use multiple perspectives.” I don’t know what parameters to compare things on. Give me a wide range that I can pick from.

Sometimes, my thoughts are vague. I tell ChatGPT: “For vague prompts, ask clarifying question(s).” I feel that’s a clever way of using ChatGPT to do prompt engineering. I’ve noticed it working on a few occasions. Also, “When unsure, say so and ask questions.” I don’t want hallucinations or assumptions. I’d rather know what’s borderline.

Finally, “Think step by step. Explain your reasoning.” I’ve heard that Chain of Thought reduces mistakes. I don’t have personal evidence that this helps, though.

They say teaching is an excellent way of learning. I’m learning. I’m also thrilled that I am now a student of robopsychology.

Winning the alphabetical race

Since my name (Anand) begins with “A”, I used to get called on fairly early at school. In attendance. Answering questions. Classroom exercises. Quizzes. Even the distribution of test results.

A few people later told me that it is good training, since I’d always be prepared. (Maybe. I’ve no idea.)

At IBM and IIMB, Ajit was the only one ahead of me, alphabetically. Then he went a step ahead and named his son Aadi. I thought that’s impossible to beat.

Today, we recruited Aabhas Bharadwaj. I checked on LinkedIn. I can’t find a single name on LinkedIn that’s ahead of his, alphabetically.

So, does he win the alphabetical race? Can you find one ahead of his?

LLMs can teach experts

I am a fairly good programmer. So, when I see a problem, my natural tendency is to code.

I’m trying to break that pattern. Instead, I ask ChatGPT.

For example, I asked:

Write a compact 1-line Python expression that checks if user.id ends with @gramener.com or @straive.com

user.id.endswith(('@gramener.com', '@straive.com'))

After 15 years of using Python, I learnt that .endswith() supports tuple suffixes. This has been around since Python 2.5 (released in 2006 — before I knew Python.) The documentation has a tiny sentence in the middle saying “suffix can also be a tuple of suffixes to look for.”

I checked with a few colleagues, including Jaidev. They didn’t know it either.

It’s small little things like this that made me conclude.

I’m not going to code anymore. ChatGPT will, instead.

Father of the bride

In 2012, I started Gramener with half a dozen friends.

This week, we were acquired by Straive, a part of Barings Private Equity Asia.

How do you feel?

I feel like the father of the bride. Gramener was registered on 26 Feb. A day before my daughter’s birthday. I’ve spent more time with Gramener than my daughter. That makes Gramener my elder child. Who’s moving into a new household. Along with me. (I feel like சகலகலா சம்மந்தி.)

I feel grateful. I’m not good at business. But when my cousin remarked, “Anand, you’re now giving a livelihood to over 250 people!” I was stunned. My co-founders, colleagues and clients built a thriving business and put me (of all people) as CEO in the middle of it. How do I even go about saying “Thanks”?

It feels like joining college. New people. Larger group. New ways of working and learning. Lots of topics to explore. Exciting and scary.

What was it like?

Fundraising was rocky.
We started in 2019. COVID struck. We paused.
We resumed in 2021. Russia invaded Ukraine. We paused.
We resumed in 2023. The Israel – Hamas war started. Luckily, the deal was nearly done.
I’m grateful Naveen ran the entire process like clockwork, taking all the stress. I’m the happy free-rider, as usual.

Starting up was not that rocky.
We’re many. With half a dozen co-founders, there are enough shoulders to cry on. That counts.
We’re steady. We didn’t know how to blitz-scale, but we knew not to blitz-fail. Survival counts for a lot.
We’re lucky. This is basically the “I have no idea why we succeeded” category. Serendipity counts for a lot, too.
Ganes, Mayank, Naveen, Ram, Ravi, Vengatesh — yeah, it was fun. Not every day. But most of the time. It was fun.

What will you do?

I’m part of Straive’s data, analytics & AI business.

Straive extracts and analyzes all kinds of data. Financial. Legal. Research. Education. Pharmaceutical. There’s a fair bit of converting unstructured data to structured. Exactly the kind of thing I love doing.

So, I’ll be doing what I’ve been doing the last decade — extracting insights from even more data and telling better stories from those.

I joined Gramener as “Chief Data Scientist”. Now I’m debating “Data Storyteller”, “Data Detective”, “Data Psychologist”, and a few other evil titles.


Wish me luck!