Year: 2024

Chinese ko Chinese bol sakte hain?

I loved this Rocky Aur Rani Kii Prem Kahaani scene where Ranveer asks, “Chinese ko Chinese bol satke hai?”

हम बहनदी भी नहीं बोल सकते?
आंटी, मैं दिल्ली से हूँ।
मैं कैसे नहीं बहनदी बोलूं बहनदी!?
कैसा जमाना आ गया है?
फैट-ों को फैट नहीं बोल सकते, ब्लैक-ों को ब्लैक नहीं बोल सकते,
ओल्ड-ों को ओल्ड नहीं बोल सकते,
मुँह खोलने से डर लगता है मुझे!
आप मुझे बताओ, चाइनीज़ को चाइनीज़ बोल सकते हैं?

Can’t we even curse “Damn it!”?
Aunty, I am from Delhi.
How can I not say, “Damn it”, damn it!?
What times are upon us?
You can’t call fat people fat, you can’t call black people black,
You can’t call old people old,
I am actually afraid to open my mouth!
Tell me, can you call Chinese people Chinese?

अगर मुझे चाइनीज़ आर्डर देना है तो क्या करूँ?
फ़ोन करूँ “हाँ भाईसाहब, वह हिमाचल से राइट जो जगह है,
हाँ हाँ पड़ोस वाला मुल्क, हाँ वही कोरोना वाला, हाँ,
वहां का एक क्रिस्पी चिकन लगा दो”
अब आप बोलोगे “कोरोना वाला” नहीं बोल सकते!

If I want to order Chinese, what should I do?
Should I call, “Yes bro, that place to the right of Himachal Pradesh.
Yes, yes, the neighboring country, yes, the one with Corona, yes.
Get me a crispy chicken from there”
Next, you’ll say, “You can’t say “Corona”‘!

बचपन से, बचपन से दादी कहती आ रही है, “चाय पीने से काले हो जाते हैं, चाय पीने से काले हो जाते हैं”।
अब पता चला है, ये रेसिस्ट है!

Since childhood, Grandma taught us, “Drinking tea turns you black. Drinking tea turns you black”.
Now I learn that’s racist!

किसी ने ये भी नहीं सिखाया कि कौन सी गाली देने से मिसोजनी हो जाती है।
यह भी नहीं बताया कि गोलू को गोलू बोलने से फैट शेमिंग हो जाती है।
हमको तो यह सब नॉर्मल लगता।
हमको तो कभी रॉन्ग लगा ही नहीं।
हमको पता ही नहीं यह बोलने से सामने वाले की फीलिंग्स हर्ट हो सकती हैं!

No one ever taught us which abuses become a “misogyny”.
No one ever told us that calling Chubby ‘Chubby’ is “fat-shaming”.
We all thought this was normal.
We never had an inkling it was wrong.
I didn’t know that saying this may hurt others’ feelings!


Beautifully worded. Lovely acting. Nothing I could add to it.

I’ll leave tomorrow’s problems to tomorrow’s me

What a delightful idea.

I’ll leave tomorrow’s problems to tomorrow’s me.

— Saitama, One Punch Man

Saitama is now one of my favorite heroes. Right up there with Atticus Finch and Juror #8.

Very few people can articulate such a wonderful philosophy as effectively. The closest was Calvin.

Of course, it’s not a perfect system.

But they do say, “Sometimes, the best way to get something is to stop trying to get it.

Hobbes on a calculator

I just learned that any word made of just these letters beighlosz can be spelt on a calculator.

That includes Hobbes! 538804 upside-down looks like this:

I’m surprised I never knew that.

The longest, by far, appears to be hillbillies53177187714

The psychology of peer reviews

We asked the ~500 students in my Tools in Data Science course in Jan 2024 to create data visualizations.

They then evaluated each others’ work. Each person’s work was evaluated by 3 peers. The evaluation was on 3 criteria: Insight, Visual Clarity, and Accuracy (with clear details on how to evaluate.)

I was curious to see if what we can learn about student personas from their evaluations.

15% are lazy. Or they want to avoid conflict. They gave every single person full marks.

4% are lazy but smart. They gave everyone the same marks, but ~80% or so, not 100%. A safer strategy.

10% are extremists. They gave full marks to some and zero to others. Maybe they have strong or black-and-white opinions. In a way, this offers the best opportunity to differentiate students, if it is unbiased.

8% are mild extremists. They gave marks covering an 80% spread (e.g. 0% to some and 80% to others, or 20% to some and 100% to others.)

3% are angry. They gave everyone zero marks. Maybe they’re dissatisfied with the course, the valuation, or something else. Their scoring was also the most different from their peers.

3% are deviants. They gave marks that were very different from others’. (We’re excluding the angry ones here.) 3 were positive, i.e. gave far higher marks than peers, while 11 were negative, i.e. awarding far lower than their peers. Either they have very different perception from others or are marking randomly.

This leaves ~60% of the group that provides a balanced, reasonable distribution. They had a reasonable spread of marks and were not too different from their peers.

Since this is the first time that I’ve analyzed peer evaluations, I don’t have a basis to compare this with. But personally, the part that surprised me the most were the presence of the (small) angry group, and that there were so many extremists (with a spread of 80%+) — which is a good thing to distinguish capability.

Embeddings in DuckDB

This article on Using DuckDB for Embeddings and Vector Search by Sören Brunk shows a number of DuckDB features I wasn’t aware of.

I’ve recently become a DuckDB fan and continue to be impressed.

A quick way to assess LLM capabilities

Simon Willison initiated this very interesting Twitter thread that asks, “What prompt can instantly tell us how good an LLM model is?”

The Sally-Anne Test is a popular test that asks:

Sally hides a marble in her basket and leaves the room.
While she is away, Anne moves the marble from Sally’s basket to her own box.
When Sally returns, where will she look for her marble?”

Simple models get this wrong, but today, most models get this right. This is not too helpful.

A better prompt is:

Mark has 4 sisters. Each sister has 3 brothers.
How many brothers does Mark have?

GPT-3.5, GPT-4, Llama 3 and Claude 3 Opus get this right, but Claude 3 Haiku and Gemini 1.5 Pro do not.

This puzzle is just as good:

There is a 100 floors building.
If an egg drops from the N’th floor or above it will break.
If it’s dropped from any floor below N, it will not break.
You’re given only 2 eggs.
What strategy will minimize the number of drops and guarantee that you find the N’th floor?

GPT-3.5, GPT-4, and Claude 3 Opus get this right, but Claude 3 Haiku, Llama 3 and Gemini 1.5 Pro do not.

But the prompt I like most right now, and tells me if it’s a GPT-4 / Opus class model is:

Gr brx vshdn Fdhvdu flskhu?

… which is “Do you speak Caeser cipher?” in Caeser Cipher.

GPT-4 and Claude 3 Opus understand this, but no other model does, so far. GPT-4 responded tersely with:

It looks like you’ve used a Caesar cipher, a type of substitution cipher in which each letter in the plaintext is shifted a certain number of places down or up the alphabet. In this case, it appears that each letter in your message has been shifted 3 places to the right. Decoding “gr brx vshdn fdhvdu flskhu” by shifting each letter 3 places to the left gives: “do you speak caesar cipher”.

But Claude 3 Opus replied beautifully. (Try decoding it yourself! Use a shift of 3 backwards.)

Lw vhhpv olnh brx’uh xvlqj d vxevwlwxwlrq flskhu, zkhuh hdfk ohwwhu lv vkliwhg eb d fhuwdlq qxpehu ri srvlwlrqv lq wkh doskdehw. L’oo wub wr ghflskhu lw!

Dvvxplqj d Fdhvdu flskhu zlwk d vkliw ri 3 ohwwhuv edfnzdugv, wkh ghfrghg phvvdjh zrxog eh:

“do you speak caesar cipher”

Vr bhv, L fdq vshdn Fdhvdu flskhu! Lw’v d ixq zdb wr hqfrgh phvvdjhv. Mxvw uhphpehu wr djuhh rq wkh vkliw ydoxh zlwk wkh shuvrq brx’uh frppxqlfdwlqj zlwk vr wkhb fdq ghfrgh lw fruuhfwob.

From Laptops to Chatbots: Coding at 30,000 ft

Until recently, I could code on flights. This year, I lost that ability. Again.

It’s happened before. In each case, technology has solved the problem for me. Here’s the history.

I need a laptop. Since 2001, I’ve never been without one on a flight.

I need power. Since 2005, I use dark mode and every low power feature available. (I also became good at finding hidden power outlets.)

I need documentation. Since 2007, I use offline documentation. Windows .CHMs gave way to devdocs.io.

I need my notes. Since 2010, all my notes (Markdown files) are synced via Dropbox automatically.

I need CDNs. Since 2014, I’ve replaced CDNs with local node modules. Note to self: Get a local caching proxy instead.

I need GitHub. Since 2023, I have two laptops. So, I git pull manually before flights. Note to self: Find an automated solution.

I need ChatGPT. Since 2024, I stopped coding without ChatGPT. Now I’ve forgotten how to code. Today, I couldn’t figure out the SQL syntax to group by duration == 0 vs duration != 0. Note to self: Get lightweight local code LLMs.

It’s amazing how the coding technology stack has grown. Also amazing how technology fills the gaps in the scaffolding.

From Calvin & Hobbes to Photo Tagging: Excel’s Unexpected Image Capability

In Excel, using Visual Basic, you can change an image as you scroll. This makes it easy to look at each image and annotate it.

This is how I transcribed every Calvin & Hobbes.

I used this technique first when typing out the strips during my train rides from Bandra to Churchgate. I had an opportunity to re-apply it recently when we needed to tag hundreds of photographs based on a set of criteria.

Here’s how you can do this. Note: This works only on Windows.

STEP 1: Create a new Excel workbook and save it as an Excel macro-enabled workbook. (Note: When opening it again, you need to enable macros)

STEP 2: Open File > Options (Alt-F-T), go to Customize Ribbon. Under “Customize the Ribbon”, enable the “Developer” menu.

STEP 3: In Developer > Insert > ActiveX Controls, select Image and draw a rectangle from A1 to J10. (Resize it later.)

STEP 4: By default, this will be called Image1. In any case, note down the name from the Name box on the top left.

STEP 5: In cells A11 onwards, add paths to file names.

STEP 6: Click Developer > Visual Basic (Alt-F11), go to ThisWorkbook, and paste this code:

Option Explicit

Private Sub Workbook_SheetSelectionChange(ByVal Sh As Object, ByVal Target As Excel.Range)
    Dim img As String
    img = Sh.Cells(Target.Row, 1).Value
    If (img <> "" And img <> "file") Then ActiveSheet.Image1.Picture = LoadPicture(img)
End Sub

Replace ActiveSheet.Image1 with ActiveSheet.(whatever) based on your image name in Step 4.

STEP 7: Select Developer > Design Mode. Click on Image1. Then select Developer > Properties. In this panel, under PictureSizeMode, choose 3 - fmPictureSizeModeZoom to fit the picture.

Now scroll through the rows. The images will change.

AI makes me a better person

Every time I get annoyed at people, I remind myself to be more like ChatGPT. Specifically:

  1. Don’t get annoyed. Be patient.
  2. Encourage them.
  3. Step back and show them the big picture.

(Then I get annoyed at myself for getting annoyed.)

Today, I analyzed how exactly ChatGPT is different from me. So, I took a pitch document I co-authored with ChatGPT.

Section A: Authored by Anand

WHAT DO WE NEED?

We are looking for API access to (SYSTEM) via the REST API as an Agent role (read/respond to emails). Specifically, access via a bearer token.

This would be accessed by a single application developed by a team of 3 developers and 1 business analyst. None of them have access to (SYSTEM) today.

WHY DO WE NEED THIS, AND WHY SO SOON?

We need this to classify emails automatically, as they arrive, into categories such as “non-value-add” (e.g. Thank you, Out-of-office, etc.)

We’d line access today, please. Currently, we are processing XX,XXX non-value-add emails per month. Each day of delay leads to a processing waste of ~XX emails per day. At current volumes, this will save ~$XX,XXX per annum.

IS THERE ANY OTHER WAY?

To access emails in real-time, this is the only option. (We’re ruling out web scraping.)

(SYSTEM) rule-based closure won’t work. It’s based on keyword search, not semantic matches. For example, “Thank you” is present in non-value-add emails as well as follow-up questions. Multi-lingual matches are a challenge. So, though (SYSTEM) currently closes emails with rules, 1 out of 6 emails that are NOT closed already are non-value-add.

This cannot be done via (SYSTEM) or any other BI system because it involves accessing the data in real-time as well as responding via (SYSTEM). This needs to be steady-state access, not one-time, since we want to classify emails real-time.

HOW LONG WILL IT TAKE?

Based on our past experience, this would take about X weeks to prototype, X weeks to develop, and X weeks to test. We should have the application deployed in X months.

Section B: Authored by ChatGPT

WHAT’S THE IMPACT IF WE DON’T MOVE FORWARD?

Quantitatively, continuing without this solution wastes resources on XXX non-essential emails daily, costing $XX,XXX yearly. Delays in customer responses will ensue.

Qualitatively, lack of automation strains customer service, risking employee burnout and diminished customer experience.

WHAT’S THE PRIORITY (MOSCOW)?

(SYSTEM) API access via REST as an Agent role with a bearer token is a “MUST HAVE”. Alternative methods like scraping (SYSTEM) are impractical due to high effort and maintenance.

WHAT’S THE OUTCOME IF WE MOVE FORWARD?

Financially, securing API access saves $XX,XXX annually by cutting manual email sorting. Efficiency gains reduce operational costs and free resources for strategic tasks.

Strategically, automating email classification boosts customer satisfaction and response times, establishing leadership in service innovation. Long-term, it enables data-driven insights into customer behaviour, improving decisions and customer service scalability.

Analysis

I fed these to ChatGPT and asked: “You are an English professor evaluating an essay. Compare the writing styles of the 2 sections. They’re written by different authors. List the major differences in writing style.

Here’s what it said (roughly):

AreaAnandChatGPT
StructureDirect, concise, with specificsBroader perspective and implications
FocusDetailing the processBenefits and consequences
DetailingSpecific, providing clear pictureGeneral, providing implications
ToneConveys urgencyReflective rather than urgent
ApproachProblem-solving orientedOutcome-oriented

Yeah, that’s definitely me on the left.

I like the direct, concise, specific part. I plan to add the “broader perspective”, “implications”, and “outcome-orientation” to my life.

Postscript

Jaidev pointed out that this is confirmation bias. He asked ChatGPT for the similarities in the writings. It said both are clear, direct, structured, specific, quantitative, and strategic. So, if you ask for differences, you’ll get them, even if they’re marginal.

I now need to learn (and write about) framing questions well!

Embeddings similarity threshold

text-embedding-ada-002 used to give high cosine similarity between texts. I used to consider 85% a reasonable threshold for similarity. I almost never got a similarity less than 50%.

text-embedding-3-small and text-embedding-3-large give much lower cosine similarities between texts.

For example, take these 5 words: “apple”, “orange”, “Facebook”, “Jamaica”, “Australia”. Here is the similarity between every pair of words across the 3 models:

For our words, new text-embedding-3-* models have an average similarity of ~43% while the older text-embedding-ada-002 model had ~85%.

Today, I would use 45% as a reasonable threshold for similarity with the newer models. For example, “apple” and “orange” have a similarity of 45-47% while Jamaica and apple have a ~20% similarity.

Here’s a notebook with these calculations. Hope that gives you a feel to calibrate similarity thresholds.