Data Archives - S Anand

It’s not what you know. It’s how you learn

March 27, 2025 / Data, LLMs / 1 Comment

Simon Willison’s blog post mentioned MDN’s browser compatibility tables that list the earliest release date for each browser feature. I figured: let’s see which browsers release features fastest.

I calculated average delay for each browser’s feature release. For each browser, I looked at how many days after the first release it took to add a feature, averaged it, and published an interactive, scrolly-telling data story.

What’s interesting is that I built almost all of this using LLMs in about 4 hours with

Cursor + Claude 3.7 Sonnet for data disualization, and
Gemini 2.5 Experimental 03-25 for the story

Here’s what I learned in the process.

The real winners are off-beat stories. Earlier, I’d spend 16-24 hours per visual. So, I’d stick to the “important” stories I wanted to tell. Now it takes four hours. That frees me to experiment and share those lesser data stories that get overlooked. This change is incredibly powerful.

LLMs don’t replace all expertise. For example, when I saw the data, it didn’t immediately tell a story. It took me some time to realize the story isn’t how slow browsers are, but how browsers’ speed evolved over time. For example, in Firefox’s early days, it was the only browser actively releasing features. These days, it’s one of the slowest. Figuring that out took expertise.

I spent two decades studying data visualization. So, this comes naturally to me. How does someone new build expertise?

Expertise is a moving frontier.

At BCG in the early 2000s, I built interactive stories with PowerPoint. My PowerPoint skill was the critical expertise.
At Gramener in the early 2010s, I used D3 for interactive stories. My programming skill was the critical expertise.
Now, in the mid-twenties, LLMs write code with ease. My expertise is in choosing the right visual and shape the right narrative.

As tools change, expertise evolves. I don’t know what the next frontier of expertise will be. I couldn’t predict the last few. I can’t predict the next.

But LLMs can help build expertise. In this project, I missed an opportunity to learn. I should have asked the LLM to show me a dozen options to visualize the data. For example, “Show a version geared toward an executive, a technologist, or a general audience”. “Critique each.” Such practice can help anyone – beginner or expert – build skill and learn. Practicing this is hard, but LLMs do help in this process.

But what gives me confidence is that LLMs help me learn. So, when the next frontier arrives, I’m less worried I’ll be too old. I think we’ll have tools to build expertise too.

Update (28 Mar 2025): Earlier, I wrote that “LLMs don’t replace expertise”. I inferred that because I (an expert) could use an LLM well. This research with 700+ people at P&G shows that when given LLMs, outsiders perform as well as insiders. So, I corrected my statement to say, “LLMs don’t replace all expertise.”

It’s not what you know. It’s how you learn Read More »

Students who are more engaged score more

February 8, 2025 / Data, Education / Leave a Comment

This is about as insightful as the Ig Nobel winning papers “Boredom begets boredom” and “Whatever will bore, will bore” that methodically documented that bored teachers lead to bored students. But in the spirit of publishing all research without bias for success or novelty, let me share this obvious result.

The Y-axis represents the total score of ~2,000 students on 4 graded assignments, each of ~10 marks. The X-axis represents the percent rank of engagement. The most engaged students are at 100%. The least are at 0%.

How do I measure engagement? By the number of times they visit the page and how early they visit the page (both computed as percent ranks). So, the student who visits the assignment page the most often, and the student who visits the assignment page first, score highest.

For every 10% increase in the engagement, the score increases by about 3 marks. What that means is, if a student leapfrogs ahead of 10% of their batchmates, that effort typically leads to scoring about 3 / 40 = 7.5% more overall.

Students who are more engaged score more Read More »

Halving a deadline costs 1.4% of marks each time

February 8, 2025 / Data, Education / Leave a Comment

Does it make a difference if you submit early vs submit late? Here’s some empirical data.

About ~1,000 students at IIT Madras took 3 online quizzes (GA1, GA2, GA3) in the last few weeks. The deadlines were all at midnight (India) on different days. Here’s when they submitted their final answers:

There was a spurt of submissions at the last minute.
~1 out of 8 students submit with < 10 minutes remaining.
Most students submitted ~4 hours before the deadline.
In fact, 3 out of 4 students submit on the same day as the deadline.
A fair number of students submitted the previous day/night.
1 out of 6 are diligent and submit a day early.

But does submitting late help, since you get more time? Apparently not.

On average, every time the deadline is halved, the score drops by 1.4%.

For example, on average:

Submitting 1 minute before scores 1.4% less than submitting 2 minutes before
Submitting 2 minutes before scores 1.4% less than submitting 4 minutes before
Submitting 4 minutes before scores 1.4% less than submitting 8 minutes before
… etc.

This means that submitting early morning instead of midnight could give you a 15% advantage.

Of course, this might be confusing cause and effect. Maybe students who do well submit early, and those who struggle submit late.

But is there a merit in faking it till you make it? Perhaps by pretending your deadline is a day early, to get the best of both worlds? Something to think about…

Halving a deadline costs 1.4% of marks each time Read More »

What does Gramener ask ChatGPT?

January 14, 2024 / Data, Experiments, LLMs / Leave a Comment

I looked at how Gramener uses ChatGPT Plus by evaluating 600+ chats asked over 3 months from Oct 2023 to Jan 2024.

The team asks 6 questions a day. We don’t track who or how many actively use ChatGPT Plus. This also excludes personal ChatGPT accounts. Still, 6/day is low for an entire team put together.

The questions fall into 8 categories.

Category	%
Excel, data exploration & analysis	25%
Text extraction and summarization	13%
HTML, CSS, or JavaScript code	13%
Python code	13%
LLMs, AI and use cases	9%
OCR and image analysis	9%
Generate images, logos, and designs	7%
General knowledge, policy & environment	5%
Audio and translation	5%

Here are some questions from each category – to give you an idea of emergent ChatGPT Plus usage.

Excel, data exploration & analysis (25%)

Excel clean and merge. There are 2 worksheets in this excel with data, can you clean up the data and merge the data in both the sheets
Excel CO2 Data Analysis. You are an expert Data Analyst who is capable of extracting insights out of data. Analyze this sheet and let me know the findings
Excel Chi-Square Analysis Guide. how to perform chi square analysis in excel
Log Data Insights & KPIs. Looking at the columns from this excel, what kind of insights are possible, what are key KPIs to be looked at

Text extraction and summarization (13%)

Complaint Investigation Summary. The following is the summary of an internal investigation for a customer complaint. Now this internal summary is to be paraphrased (in 3-4 lines) as part of a closure
Extracting Tables from RTF. Can you write a script to extract the tables from this document
Extracting Entities from Text. [{'word1': '(P)', 'nearest_word1': 'P/N:', 'nearest_word2': '0150-25034', 'nearest_word3': 'CARTIRIDGE'}, {'word1': 'P/N:', 'nearest_word1': '(P)', 'nearest_word2': '015...
Extract PDF Font Details. Extract text formatting information from this document. Especially find font styles, families and sizes.

HTML, CSS, or JavaScript code (13%)

HTML/CSS Chart Template. Give me HTML, CSS and chart code for this design.
CSS Font Stack: Explanation. Explain this CSS font convention: Arial, Helvetica, Segoe UI, sans-serif
Checkbox Validation with JavaScript. In HTML form, I have a set of checkboxes. How do I write the form so that at least one of them being checked is mandatory?
Prevent Text Wrapping CSS. <span class="text">Chief Communications Officer</span> I need CSS such the text inside should not wrap create new line
ReactJS App with Routing. Give me developed version using ReactJS use react router for sidebar section navigation to the pages use Tailwind css for styling. Use styled components for conditional …

Python code (13%)

Python Code Documentation Guide. Can you generate documentation for a project code written in python?
Linux Commands for Python. Give me list of linux commands to work on python coding
Code explanation request. What’s this code about? …
FastAPI Async Testing. Write a fastapi code and a python client to test the asynchronous nature of the fastapi package.
Streamlit App for Translation. Given the following python code, give me a simple streamlit app that takes file upload and converts that into a target language: …

An interesting sub-topic was interview question generation.

Python Decorator for Database Queries. Create one medium level question for Decorators in python Industryy usecase specific with solution

LLM, AI and use cases (9%)

LLMs for Data “What Ifs”. You are an LLM Expert. Can you tell me how can we leverage LLM for implementing What IF scenarios on Data?
LLMs: Current Challenges & Concerns. what are current challenges with LLMs
LLM Applications in Marketing. Show LLM applications for the marketing function of a music company.
Gen AI usage. What industries are using Gen AI the most
Best LLMs in 2023. Search the internet for the most recent LLMs and list the best LLMs in terms of performance
Best Image Classification Models. suggest best models to tell what there in the image

OCR and image analysis (9%)

Browser history OCR. This is a screenshot of my browser history. Convert that to text. Categorize these into common topics.
Extracted C Code. This image contains C code. Extract it.
Image text extraction and annotation. Extract the text from this image and annotate the boundaries of the text
Detecting Document Image Orientation. oreientation detection of documnet image
AI Project with OpenCV & YOLO. Consider yourself as Open CV and Yolo expert and help me with AI project
Image Correction Techniques. what are the approaches we have in computer vision where my image is tilted or rotated in reverse or image is not in readable format

Generate images, logos, and designs (7%)

Google Chacha and ChatGPT Bhatija. Generate an image of Google Chacha and ChatGPT Bhatija
Regenerative Systems Group Image. Generate an Image with below context > “A group of people interested in Regenerative systems. The focus is on reusing food, energy and mental health”
Twitter Reply Icons Design. Give me three icons: icon16.png, icon48.png, icon128.png for an extension that I’m building that suggests replies to tweets
Generate flowcharts. Make a flowchart of the underlying working of a web app. Here’s how it works. 1. The user uploads a document – a PDF or an image. They then select the language that …
Create Animated GIF from Photos. I have 4 photos I want to make an animated gif out of them. How can i do that?
Climate Impact Illustration. An illustration showcasing the impact of climate change on daily life, focusing on a rural setting near the coast. In the foreground, a small farm is visibly struggling, …

General knowledge, policy & environment (5%)

Design Thinking Overview. What is Design thinking
Arthashastra. What can Arthashastra teach us about modern politics?
Community Impact on Habits. Is there research to suggest the impact of community on habit building?
Focus at Age 28. What should a 28 year old focus on?
Superconductors. Explain superconductors like I’m five years old.
Climate Career: Impactful Choices. You a career counsellor at a University campus. You want to create 4 to 5 talking points for students to consider a career in Climate space.
Sustainability Division Vision. I run a software outsourced product development company. I want to start a new division that focuses on sustainability services offerings. Please draft a vision…

Audio and translation (5%)

Audio Timestamp Mapping. timestamp mapping for transcribed audio
Transcribe Lengthy Audio: Segment. Transcribe this audio file.
Traducción del MOU al Español. Translate this document to Spanish, and create a new translated document. Maintain text formatting.
Telugu Transcription into Hindi. Transcribe the following telugu text into hindi. You are supposed to transcribe, not translate. శ్రీనివాస పూజావిధానము …
GPT lacks native audio support. Does gpt support audio in audio out natively?

What does Gramener ask ChatGPT? Read More »

Learning to speak better

October 17, 2022 / Data, How I do things / Leave a Comment

Microsoft ported its PowerPoint Speaker Coach to Teams. Since September, it’s given me suggestions covering 11 hours in 77 calls (I speak ~10 min/call.)

I say “uhh” a lot. That’s intentional

I use the filler word “uhh” in 70% of my calls. That did not surprise me. I do that intentionally.

On a poor network, they know I’m still connected
They know I’m going to say something
I sound less confident. That invites critique I can learn from

But I also use filler words like “You know” and “I mean” in half the calls, and “like”, “actually”, and “basically” in a fifth. That’s NOT intentional, and I’ll be conscious.

Filler words	% of calls	# / call
uhh	70%	3.6
You know	48%	2.4
I mean	43%	2
like	22%	1.4
actually	19%	1
basically	18%	1.2
anyway	14%	1.1
hmm	16%	1.1
umm	9%	1.4
ah	4%	1.3

I say “maybe” a lot. That’s surprising

What did surprise me was “maybe“. I use it every fourth call, but when I do, I say “maybe” ten times per call. That’s a lot of maybe!

Sometimes, I say maybe because I’m communicating uncertainty.

Maybe we’ll have 20-30% success rate…
So and I had to switch 3 laptops or maybe 4.
… then she said, “OK, maybe it’s some other Sam”

Sometimes I’m proposing tentatively.

… one of the reasons why I’m nudging towards that is maybe a large reuse initiative is high return,
We can even put this in as part of the project by maybe offering it to different teams…
Maybe by having dedicated support…
Maybe I’ll drop off. Bye

But sometimes, it’s testable hypotheses.

Uh, maybe I’m getting the names wrong, but I think it was Socrates…
Maybe it’s me, but yeah, I guess…
You know, maybe it’s because I don’t store any of my stuff in…

One of my year’s goals is to run 50 experiments. I’d been doing well until April, and then fizzled out. Partly motivation. Partly a lack of testable hypotheses.

And now, in October, I discovered that I literally speak out one testable hypothesis every call — roughly every 10 minutes I speak! I’m amazed at how blind I’ve been, and how easy it can be to find experiments to test. I guess I need more of a scientific mindset. (Or just plain curiosity.)

The next time I say, “maybe” (or see it in my transcript), I’ll write it down as a hypothesis to test.

Repetitive words cluster

Another discovery was: I tend to pick a phrase and use it repeatedly in calls. For example, I said “let’s say” twelve times in just one call of 15 minutes. I said “main” 20 times over 2 calls of 8 minutes each. I said “cool” 7 times in an 11-minute call.

Repetitive word	# calls	# / call
lets say	1	12
main	2	10
also	1	8
only	2	7.5
correct	7	7.4
in terms of	1	7
alright	3	6.3
that is	3	6
cool	2	5

Clearly it’s something to watch out for. But maybe repetition of words isn’t a bad thing if it’s not the same phrase repeated across calls? (There! I said “maybe”. Let me find out!)

Modulate the pace

In a third of my calls, I need to speed up. In a third of my calls, I need to slow down. (On some calls, I need to do both!)

Clearly, I need to vary my pace a lot more, consciously. It’s not that I talk fast or slow. I do both. But I get stuck in one mode of speaking for too long.

Takeaways

I used to think I was a pretty good speaker. That’s not a bad thought, but it can blind me to feedback and improvements. There’s no end to learning how to speak. Speaker Coach is a great “in-your-face” feedback mechanism. I hope Microsoft adds more features to it.

But what I’m going to do now is:

Every time I say “maybe”, write down an experiment
Speed up and slow down more in calls
Watch for words I use repeatedly

Learning to speak better Read More »

Old songs in my music library

June 6, 2022 / Data, How I do things / Leave a Comment

My music library has around 1,000 songs (mostly Tamil and Hindi, with some Telugu and English film songs).

I spent this morning tagging them by year with mp3tag. (Manually. You don’t automate the pleasures of life.)

I thought my 1990s collection would be the largest. I was in college, listening to lots of music then. But surprisingly, my collection has grown post the 1990s.

I have 3 guesses why.

Recency bias. I re-built this collection recently. Maybe I forgot older songs?
Digitization bias. Maybe I listened to more songs as the cost of transmission/storage fell?
Worsening standards. Maybe I used to be choosier about music?

Though I’m not sure of the above, there’s another interesting anomaly.

There is a spike in the 1960s.

I don’t need to guess this one. I know why. Those are the songs my parents liked. I grew up hearing them.

The oldest song Tamil song is from Thiruneelakantar (1939). It’s from my father’s collection. I’ve heard it often enough to still enjoy it.

The oldest Hindi song is from Jaal (1952). He has a fondness for Dev Anand’s songs. So do I. This one is a beauty.

The oldest Tamil song my mother introduced me to is from Parasakthi (1952). She used to dance to this song when young.

The earliest Hindi song she introduced me to was from Jhanak Jhanak Payal Baaje (1955). It’s the song I grew up on, and it’s still among my favorites. What a melody!

My wife prefers newer songs. But I have low standards and few preferences. It makes my life rather happy.

So, in celebration of Make Music Day on 21 June, I’m treating myself to 2 weeks of my collection from the 1960s!

PS: My full collection is at https://gist.github.com/sanand0/877637165b17239aa27beac03749c9a6

Old songs in my music library Read More »

How to find a Chinese actor to cast in Hollywood

February 20, 2022 / Data / Leave a Comment

Film actors mostly act within their own industry.

For example, Hollywood actors act outside Hollywood just 10% of the time. Chinese actors act with non-Chinese actors just 1% of the time.

So, if you’re a Hollywood producer trying to cast a Chinese actor, how would you find them?

One way is to list Chinese actors with the largest number of Hollywood co-stars. Let’s see who tops that list.

#5. Pei-Pei Cheng

You may know her as Jade Fox, the sly governess in Ang Lee’s Crouching Tiger, Hidden Dragon (2000), or Golden Swallow, the skilled swordsman sister in Come Drink With Me (1966), or even as the voice of the matchmaker who disgraces Mulan in Mulan (2020).

She mainly acts in Chinese films, co-starring nearly 180 times with actors like Hua Yueh, Lieh Lo, and Chung-Hsin Huang. But she’s also co-starred over 20 times with Hollywood actors like Jamie King (of Sin City), Peter Bowles (of The Bank Job), and Sandra Oh (of Grey’s Anatomy).

#4. Jet Li

You may know him as Han Sing, the martial artist and ex-cop in Romeo Must Die (2000), or Gabe Law, the former MultiVerse Authority agent in The One (2001), or Yin Yang, the unarmed member of The Expendables (2010).

He has co-starred over 100 times with Chinese actors like Jackie Chan, Simon Yam, and Sammo Kam-Bo Hung. But he’s also co-starred 30 times with Hollywood actors like Antonio Banderas, Morgan Freeman, and Sylvester Stallone.

#3. Joan Chen

She’s famous as Wanrong, the Chinese empress in The Last Emperor (1987), Josie Packard, the owner of the Twin Peaks mill in Twin Peaks (1989), or Dr Ilsa Hayden, assistant to the villain Rico Dredd in Judge Dredd (1995).

She’s co-starred over 80 times with Chinese actors like Tony Chiu-Wai Leung, Leon Lai, and Tony Ka Fai Leung. But she’s co-starred over 40 times with Hollywood actors like Michael Caine, Peter O’Toole, and Christopher Walken.

#2. Jackie Chan

The most famous Chinese martial arts actor in the world, and one of the highest-paid actors in the world, is famous as Detective Inspector Lee in Rush Hour (1998), Mr Han in The Karate Kid (2010), and the voice of Monkey in Kung Fu Panda (2008).

He has co-starred nearly 200 times with Chinese actors like Sammo Kam-Bo Hung, Maggie Cheung, and Kent Cheng. But he’s co-starred over 50 times with Hollywood actors like Arnold Schwarzenegger, Owen Wilson, and Chris Tucker.

#1. Michelle Yeoh

You may know her as Wai Lin, the Chinese spy and James Bond’s ally in Tomorrow Never Dies (1997), Yu Shu Lien, the warrior swordswoman in Crouching Tiger, Hidden Dragon (2000), or as Eleanor Young, the domineering mother-in-law in Crazy Rich Asians (2018).

She’s an actress at the borderline of the Chinese – Hollywood clusters. She’s acted ~60 times with Chinese actors like Maggie Cheung, Chow Yun-Fat and Jet Li. But she’s acted almost as many times with Hollywood actors like Sigourney Weaver, Zoe Saldana and Sam Worthington.

More actors

Here are half a dozen more Chinese actors that have acted with Hollywood actors often.

It’s interesting to see that 3 of the top 6 (Chow Yun-Fat, Pei-Pei Cheng, and Michelle Yeoh) had all acted in the blockbuster Crouching Tiger, Hidden Dragon (2000).

So, perhaps the simple message to our Hollywood producer is to “look no further than the cast of the first foreign-language film to break the $100mn mark in the USA.”

How to find a Chinese actor to cast in Hollywood Read More »

How isolated is Bollywood from world cinema?

January 5, 2022 / Data, Visualisation / 2 Comments

These are the major group actors based on who they act with most.

Actors mostly act with other actors in the same…

Language. Not country. For example, the Spanish / Mexican group is across countries. But Indian actors divide into North Indian and South Indian. It’s language, not country.
Time period. Old American actors are a separate group from Hollywood. (Naturally. Brad Pitt was born after Humphrey Bogart died. They couldn’t have acted together.)
Genre. Hollywood Porn actors don’t act with mainstream Hollywood. Same with Japanese Porn, Hollywood TV, and Hollywood Horror actors.

How are these groups themselves connected? Do Chinese actors act with Hollywood often? How isolated is Bollywood from world cinema?

Hollywood is the core group

Take groups that act with other groups at least 5% of the time. Mainstream Hollywood acts with British and Hollywood TV/Horror actors. All other clusters are isolated.

Indian & Japanese clusters emerge

Let’s go more liberal. Take groups that act with other groups at least 2% of the time. Hollywood forms a big connected cluster. It includes most of Europe — British, German, French, Czech, Yugoslavian & Italian actors.

North & South Indian actors form the first non-Hollywood cross-language cluster.

The Japanese and Japanese porn actors form a cluster too. (Interestingly, it’s easy for a Japanese porn actor to act with mainstream Japanese actors. Hollywood porn actors find it far harder to act with Hollywood.)

Among groups that **act with other groups at least 1% of the time**, we have:

Chinese & Korean cluster emerges

Chinese & South Korean actors form the first cross-country cross-language cluster.

Hollywood expands to act with Scandinavian, Spanish, Polish, Brazilian & Nigerian films.

Other film industries (Russian, Greek, Egyptian — even Hollywood Porn — are still isolated.)

World Cinema vs the rest

Among groups that act with other groups at least 0.5% of the time, we have:

Turkish & Iranian groups coming together
Indonesian actors acting with the Chinese
Hollywood expanding to cover Russian, Greek, Egyptian, and finally, Hollywood Porn. (It’s easier for Brazilian / Nigerian to act with Hollywood than to be a Hollywood Porn actor.)

At this point, there are 6 actor groups that act with each other at least 1 out of 200 times (0.5%).

World Cinema (Hollywood & friends)
Japanese (mainstream & porn)
Indian (North & South)
Chinese, South Korean & Indonesian
Turkish & Iranian
Filipino

One world of cinema

If we look at groups that act with other groups at least 0.5% of the time, we have a far more unified picture. Almost every actor group acts with another group at least 1 out of 400 times.

But even here, there’s an exception. Filipino actors — the most insular major actor group in the world.

So, how isolated is Bollywood from World Cinema? For its size, it’s one of the most isolated actor groups. (But not as much as Iranian/Turkish or Filipino.)

How isolated is Bollywood from world cinema? Read More »

Can foreigners enter Hollywood?

December 21, 2021 / Data / 6 Comments

An aspiring Malaysian actor posted on Reddit:

I am a 18-year old biracial Malaysian kid who wants to be an actor in Hollywood. I’m taking a diploma for performing arts in a college called Sunway University in 8 days and I’m considering pulling out of it because why do something that I like when my dreams might never be fulfilled and the price for taking this diploma is seriously expensive. I am starting to doubt my chances of making it to Hollywood and I suffer from extreme anxiety. Is it possible for someone like me to enter Hollywood? What are my chances?

Breaking into Hollywood is hard. As a foreigner, it would be even harder. So I asked myself:

Do Hollywood actors act with foreigners?

Let’s take Will Smith. He frequently acts with Martin Lawrence, Tommy Lee Jones, Jaden Smith, Jon Voight, and 84 other actors.

His every co-star is a Hollywood actor, except the Spanish actor Jordi Mollà in Bad Boys II, and the Dutch actor Marwan Kenzari in Aladdin. Will Smith acts with just 2% of foreign co-stars.

On the other hand, Jackie Chan is more cosmopolitan. He acts with:

Chinese actors like Yuen Siu-Tin in Drunken Master
Hollywood actors like Chris Tucker in Rush Hour
Japanese actors like Kumiko Goto in City Hunter — which is based on a Japanese Manga of the same name.
South Korean actors like Su-cheon Bae in Huo shao shao lin men — a Korean/Mandarin film
Indian actors like Disha Patani in Kung Fu Yoga – an Indo-Chinese film
Spanish actors like Eva Cobo in Operation Condor – shot in Spain
Danish actors like Pilou Asbæk in the upcoming film Snafu

Of his 224 co-stars, 70 are non-Chinese. Jackie Chan acts with over 30% foreign co-stars.

Are Chinese films be more foreigner-friendly? Should our Malaysian friend try there instead?

Is Hollywood less open to foreigners than other countries?

I took all movie actors across the world and broke them into groups using a community structure. Actors within the group act mostly within themselves, and less with other groups.

The largest group is Hollywood, with ~80,000 actors (mostly American). They act with each other 90% of the time and act with other groups only 10% of the time.

In comparison, the Chinese group has ~20,000 actors. They act with each other 98% of the time. When they do act outside the group, it’s mostly with Hollywood (0.5%), Japanese (0.3%), South Korean (0.3%), and Indonesian (0.1%)

Clearly, Jackie Chan is more the exception than the norm.

But among the large groups, there are 2 groups that are even more insular than Chinese actors.

The ~8,200 Turkish actors act only with each other 99.1% of the time, occasionally venturing to act with Iranian actors (0.2%).

Even more insular are the ~7,000 Filipino actors who act with each other 99.3% of the time. They occasionally venture out to act in Hollywood 0.2% of the time.

There are no other sizeable groups of actors that’re as insulated.

Hollywood is actually among the most cosmopolitan groups, along with the West European films. So, to our budding Malaysian actor, I’d say:

It’s hard to get an acting break. As a foreigner, it’s 10 times harder in Hollywood. But you’re better off in Hollwood or Western Europe than in any other country, where it would be 50 to 100 times as hard!

Can foreigners enter Hollywood? Read More »

Releasing modified mosquitoes precisely

February 16, 2021 / Coding, Data / Leave a Comment

At PyCon Indonesia, I spoke about a project we worked on with the World Mosquito Program.

The World Mosquito Program (WMP) modifies mosquitoes with a bacteria — Wolbachia. This reduces their ability to carry deadly viruses. (It makes me perversely happy that we’re infecting mosquitoes now 😉.)

Modifying mosquitoes is an expensive process. With a limited set of “good mosquitoes”, it is critical to find the best release points that will help them replicate rapidly.

But planning the release points took weeks of manual effort. It involved ground personnel going through several iterations.

So our team took high-resolution satellite images, figured out the building density, estimated population density based on that, and generated a release plan. This model is 70% more accurate and reduced the time from 3 weeks to 2 hours.

More details at the Gramener website.

The slides for the talk are below.

Saving Lives with Geospatial AI – Pycon Indonesia 2020 from Gramener

Releasing modified mosquitoes precisely Read More »

Data