Data

Students who are more engaged score more

This is about as insightful as the Ig Nobel winning papers “Boredom begets boredom” and “Whatever will bore, will bore” that methodically documented that bored teachers lead to bored students. But in the spirit of publishing all research without bias for success or novelty, let me share this obvious result.

The Y-axis represents the total score of ~2,000 students on 4 graded assignments, each of ~10 marks. The X-axis represents the percent rank of engagement. The most engaged students are at 100%. The least are at 0%.

How do I measure engagement? By the number of times they visit the page and how early they visit the page (both computed as percent ranks). So, the student who visits the assignment page the most often, and the student who visits the assignment page first, score highest.

For every 10% increase in the engagement, the score increases by about 3 marks. What that means is, if a student leapfrogs ahead of 10% of their batchmates, that effort typically leads to scoring about 3 / 40 = 7.5% more overall.

Halving a deadline costs 1.4% of marks each time

Does it make a difference if you submit early vs submit late? Here’s some empirical data.

About ~1,000 students at IIT Madras took 3 online quizzes (GA1, GA2, GA3) in the last few weeks. The deadlines were all at midnight (India) on different days. Here’s when they submitted their final answers:

  • There was a spurt of submissions at the last minute.
    ~1 out of 8 students submit with < 10 minutes remaining.
  • Most students submitted ~4 hours before the deadline.
    In fact, 3 out of 4 students submit on the same day as the deadline.
  • A fair number of students submitted the previous day/night.
    1 out of 6 are diligent and submit a day early.

But does submitting late help, since you get more time? Apparently not.

On average, every time the deadline is halved, the score drops by 1.4%.

For example, on average:

  • Submitting 1 minute before scores 1.4% less than submitting 2 minutes before
  • Submitting 2 minutes before scores 1.4% less than submitting 4 minutes before
  • Submitting 4 minutes before scores 1.4% less than submitting 8 minutes before
  • … etc.

This means that submitting early morning instead of midnight could give you a 15% advantage.

Of course, this might be confusing cause and effect. Maybe students who do well submit early, and those who struggle submit late.

But is there a merit in faking it till you make it? Perhaps by pretending your deadline is a day early, to get the best of both worlds? Something to think about…

What does Gramener ask ChatGPT?

I looked at how Gramener uses ChatGPT Plus by evaluating 600+ chats asked over 3 months from Oct 2023 to Jan 2024.

The team asks 6 questions a day. We don’t track who or how many actively use ChatGPT Plus. This also excludes personal ChatGPT accounts. Still, 6/day is low for an entire team put together.

The questions fall into 8 categories.

Category%
Excel, data exploration & analysis25%
Text extraction and summarization13%
HTML, CSS, or JavaScript code13%
Python code13%
LLMs, AI and use cases9%
OCR and image analysis9%
Generate images, logos, and designs7%
General knowledge, policy & environment5%
Audio and translation5%

Here are some questions from each category – to give you an idea of emergent ChatGPT Plus usage.

Excel, data exploration & analysis (25%)

  • Excel clean and merge. There are 2 worksheets in this excel with data, can you clean up the data and merge the data in both the sheets
  • Excel CO2 Data Analysis. You are an expert Data Analyst who is capable of extracting insights out of data. Analyze this sheet and let me know the findings
  • Excel Chi-Square Analysis Guide. how to perform chi square analysis in excel
  • Log Data Insights & KPIs. Looking at the columns from this excel, what kind of insights are possible, what are key KPIs to be looked at

Text extraction and summarization (13%)

  • Complaint Investigation Summary. The following is the summary of an internal investigation for a customer complaint. Now this internal summary is to be paraphrased (in 3-4 lines) as part of a closure
  • Extracting Tables from RTF. Can you write a script to extract the tables from this document
  • Extracting Entities from Text. [{'word1': '(P)', 'nearest_word1': 'P/N:', 'nearest_word2': '0150-25034', 'nearest_word3': 'CARTIRIDGE'}, {'word1': 'P/N:', 'nearest_word1': '(P)', 'nearest_word2': '015...
  • Extract PDF Font Details. Extract text formatting information from this document. Especially find font styles, families and sizes.

HTML, CSS, or JavaScript code (13%)

  • HTML/CSS Chart Template. Give me HTML, CSS and chart code for this design.
  • CSS Font Stack: Explanation. Explain this CSS font convention: Arial, Helvetica, Segoe UI, sans-serif
  • Checkbox Validation with JavaScript. In HTML form, I have a set of checkboxes. How do I write the form so that at least one of them being checked is mandatory?
  • Prevent Text Wrapping CSS. <span class="text">Chief Communications Officer</span> I need CSS such the text inside should not wrap create new line
  • ReactJS App with Routing. Give me developed version using ReactJS use react router for sidebar section navigation to the pages use Tailwind css for styling. Use styled components for conditional …

Python code (13%)

  • Python Code Documentation Guide. Can you generate documentation for a project code written in python?
  • Linux Commands for Python. Give me list of linux commands to work on python coding
  • Code explanation request. What’s this code about? …
  • FastAPI Async Testing. Write a fastapi code and a python client to test the asynchronous nature of the fastapi package.
  • Streamlit App for Translation. Given the following python code, give me a simple streamlit app that takes file upload and converts that into a target language: …

An interesting sub-topic was interview question generation.

  • Python Decorator for Database Queries. Create one medium level question for Decorators in python Industryy usecase specific with solution

LLM, AI and use cases (9%)

  • LLMs for Data “What Ifs”. You are an LLM Expert. Can you tell me how can we leverage LLM for implementing What IF scenarios on Data?
  • LLMs: Current Challenges & Concerns. what are current challenges with LLMs
  • LLM Applications in Marketing. Show LLM applications for the marketing function of a music company.
  • Gen AI usage. What industries are using Gen AI the most
  • Best LLMs in 2023. Search the internet for the most recent LLMs and list the best LLMs in terms of performance
  • Best Image Classification Models. suggest best models to tell what there in the image

OCR and image analysis (9%)

  • Browser history OCR. This is a screenshot of my browser history. Convert that to text. Categorize these into common topics.
  • Extracted C Code. This image contains C code. Extract it.
  • Image text extraction and annotation. Extract the text from this image and annotate the boundaries of the text
  • Detecting Document Image Orientation. oreientation detection of documnet image
  • AI Project with OpenCV & YOLO. Consider yourself as Open CV and Yolo expert and help me with AI project
  • Image Correction Techniques. what are the approaches we have in computer vision where my image is tilted or rotated in reverse or image is not in readable format

Generate images, logos, and designs (7%)

  • Google Chacha and ChatGPT Bhatija. Generate an image of Google Chacha and ChatGPT Bhatija
  • Regenerative Systems Group Image. Generate an Image with below context > “A group of people interested in Regenerative systems. The focus is on reusing food, energy and mental health”
  • Twitter Reply Icons Design. Give me three icons: icon16.png, icon48.png, icon128.png for an extension that I’m building that suggests replies to tweets
  • Generate flowcharts. Make a flowchart of the underlying working of a web app. Here’s how it works. 1. The user uploads a document – a PDF or an image. They then select the language that …
  • Create Animated GIF from Photos. I have 4 photos I want to make an animated gif out of them. How can i do that?
  • Climate Impact Illustration. An illustration showcasing the impact of climate change on daily life, focusing on a rural setting near the coast. In the foreground, a small farm is visibly struggling, …

General knowledge, policy & environment (5%)

  • Design Thinking Overview. What is Design thinking
  • Arthashastra. What can Arthashastra teach us about modern politics?
  • Community Impact on Habits. Is there research to suggest the impact of community on habit building?
  • Focus at Age 28. What should a 28 year old focus on?
  • Superconductors. Explain superconductors like I’m five years old.
  • Climate Career: Impactful Choices. You a career counsellor at a University campus. You want to create 4 to 5 talking points for students to consider a career in Climate space.
  • Sustainability Division Vision. I run a software outsourced product development company. I want to start a new division that focuses on sustainability services offerings. Please draft a vision…

Audio and translation (5%)

  • Audio Timestamp Mapping. timestamp mapping for transcribed audio
  • Transcribe Lengthy Audio: Segment. Transcribe this audio file.
  • Traducción del MOU al Español. Translate this document to Spanish, and create a new translated document. Maintain text formatting.
  • Telugu Transcription into Hindi. Transcribe the following telugu text into hindi. You are supposed to transcribe, not translate. శ్రీనివాస పూజావిధానము …
  • GPT lacks native audio support. Does gpt support audio in audio out natively?

Learning to speak better

Microsoft ported its PowerPoint Speaker Coach to Teams. Since September, it’s given me suggestions covering 11 hours in 77 calls (I speak ~10 min/call.)

I say “uhh” a lot. That’s intentional

I use the filler word “uhh” in 70% of my calls. That did not surprise me. I do that intentionally.

  1. On a poor network, they know I’m still connected
  2. They know I’m going to say something
  3. I sound less confident. That invites critique I can learn from

But I also use filler words like “You know” and “I mean” in half the calls, and “like”, “actually”, and “basically” in a fifth. That’s NOT intentional, and I’ll be conscious.

Filler words% of calls# / call
uhh70%3.6
You know48%2.4
I mean43%2
like22%1.4
actually19%1
basically18%1.2
anyway14%1.1
hmm16%1.1
umm9%1.4
ah4%1.3

I say “maybe” a lot. That’s surprising

What did surprise me was “maybe“. I use it every fourth call, but when I do, I say “maybe” ten times per call. That’s a lot of maybe!

Sometimes, I say maybe because I’m communicating uncertainty.

Maybe we’ll have 20-30% success rate…

So and I had to switch 3 laptops or maybe 4.

… then she said, “OK, maybe it’s some other Sam”

Sometimes I’m proposing tentatively.

… one of the reasons why I’m nudging towards that is maybe a large reuse initiative is high return,

We can even put this in as part of the project by maybe offering it to different teams…

Maybe by having dedicated support…

Maybe I’ll drop off. Bye

But sometimes, it’s testable hypotheses.

Uh, maybe I’m getting the names wrong, but I think it was Socrates…

Maybe it’s me, but yeah, I guess…

You know, maybe it’s because I don’t store any of my stuff in…

One of my year’s goals is to run 50 experiments. I’d been doing well until April, and then fizzled out. Partly motivation. Partly a lack of testable hypotheses.

And now, in October, I discovered that I literally speak out one testable hypothesis every call — roughly every 10 minutes I speak! I’m amazed at how blind I’ve been, and how easy it can be to find experiments to test. I guess I need more of a scientific mindset. (Or just plain curiosity.)

The next time I say, “maybe” (or see it in my transcript), I’ll write it down as a hypothesis to test.

Repetitive words cluster

Another discovery was: I tend to pick a phrase and use it repeatedly in calls. For example, I said “let’s say” twelve times in just one call of 15 minutes. I said “main” 20 times over 2 calls of 8 minutes each. I said “cool” 7 times in an 11-minute call.

Repetitive word# calls# / call
lets say112
main210
also18
only27.5
correct77.4
in terms of17
alright36.3
that is36
cool25

Clearly it’s something to watch out for. But maybe repetition of words isn’t a bad thing if it’s not the same phrase repeated across calls? (There! I said “maybe”. Let me find out!)

Modulate the pace

In a third of my calls, I need to speed up. In a third of my calls, I need to slow down. (On some calls, I need to do both!)

Clearly, I need to vary my pace a lot more, consciously. It’s not that I talk fast or slow. I do both. But I get stuck in one mode of speaking for too long.

Takeaways

I used to think I was a pretty good speaker. That’s not a bad thought, but it can blind me to feedback and improvements. There’s no end to learning how to speak. Speaker Coach is a great “in-your-face” feedback mechanism. I hope Microsoft adds more features to it.

But what I’m going to do now is:

  1. Every time I say “maybe”, write down an experiment
  2. Speed up and slow down more in calls
  3. Watch for words I use repeatedly

Old songs in my music library

My music library has around 1,000 songs (mostly Tamil and Hindi, with some Telugu and English film songs).

I spent this morning tagging them by year with mp3tag. (Manually. You don’t automate the pleasures of life.)

I thought my 1990s collection would be the largest. I was in college, listening to lots of music then. But surprisingly, my collection has grown post the 1990s.

I have 3 guesses why.

  1. Recency bias. I re-built this collection recently. Maybe I forgot older songs?
  2. Digitization bias. Maybe I listened to more songs as the cost of transmission/storage fell?
  3. Worsening standards. Maybe I used to be choosier about music?

Though I’m not sure of the above, there’s another interesting anomaly.

There is a spike in the 1960s.

I don’t need to guess this one. I know why. Those are the songs my parents liked. I grew up hearing them.

The oldest song Tamil song is from Thiruneelakantar (1939). It’s from my father’s collection. I’ve heard it often enough to still enjoy it.

The oldest Hindi song is from Jaal (1952). He has a fondness for Dev Anand’s songs. So do I. This one is a beauty.

The oldest Tamil song my mother introduced me to is from Parasakthi (1952). She used to dance to this song when young.

The earliest Hindi song she introduced me to was from Jhanak Jhanak Payal Baaje (1955). It’s the song I grew up on, and it’s still among my favorites. What a melody!


My wife prefers newer songs. But I have low standards and few preferences. It makes my life rather happy.

So, in celebration of Make Music Day on 21 June, I’m treating myself to 2 weeks of my collection from the 1960s!

PS: My full collection is at https://gist.github.com/sanand0/877637165b17239aa27beac03749c9a6

How to find a Chinese actor to cast in Hollywood

Film actors mostly act within their own industry.

For example, Hollywood actors act outside Hollywood just 10% of the time. Chinese actors act with non-Chinese actors just 1% of the time.

So, if you’re a Hollywood producer trying to cast a Chinese actor, how would you find them?

One way is to list Chinese actors with the largest number of Hollywood co-stars. Let’s see who tops that list.

#5. Pei-Pei Cheng

You may know her as Jade Fox, the sly governess in Ang Lee’s Crouching Tiger, Hidden Dragon (2000), or Golden Swallow, the skilled swordsman sister in Come Drink With Me (1966), or even as the voice of the matchmaker who disgraces Mulan in Mulan (2020).

She mainly acts in Chinese films, co-starring nearly 180 times with actors like Hua Yueh, Lieh Lo, and Chung-Hsin Huang. But she’s also co-starred over 20 times with Hollywood actors like Jamie King (of Sin City), Peter Bowles (of The Bank Job), and Sandra Oh (of Grey’s Anatomy).

#4. Jet Li

You may know him as Han Sing, the martial artist and ex-cop in Romeo Must Die (2000), or Gabe Law, the former MultiVerse Authority agent in The One (2001), or Yin Yang, the unarmed member of The Expendables (2010).

He has co-starred over 100 times with Chinese actors like Jackie Chan, Simon Yam, and Sammo Kam-Bo Hung. But he’s also co-starred 30 times with Hollywood actors like Antonio Banderas, Morgan Freeman, and Sylvester Stallone.

#3. Joan Chen

She’s famous as Wanrong, the Chinese empress in The Last Emperor (1987), Josie Packard, the owner of the Twin Peaks mill in Twin Peaks (1989), or Dr Ilsa Hayden, assistant to the villain Rico Dredd in Judge Dredd (1995).

She’s co-starred over 80 times with Chinese actors like Tony Chiu-Wai Leung, Leon Lai, and Tony Ka Fai Leung. But she’s co-starred over 40 times with Hollywood actors like Michael Caine, Peter O’Toole, and Christopher Walken.

#2. Jackie Chan

The most famous Chinese martial arts actor in the world, and one of the highest-paid actors in the world, is famous as Detective Inspector Lee in Rush Hour (1998), Mr Han in The Karate Kid (2010), and the voice of Monkey in Kung Fu Panda (2008).

He has co-starred nearly 200 times with Chinese actors like Sammo Kam-Bo Hung, Maggie Cheung, and Kent Cheng. But he’s co-starred over 50 times with Hollywood actors like Arnold Schwarzenegger, Owen Wilson, and Chris Tucker.

#1. Michelle Yeoh

You may know her as Wai Lin, the Chinese spy and James Bond’s ally in Tomorrow Never Dies (1997), Yu Shu Lien, the warrior swordswoman in Crouching Tiger, Hidden Dragon (2000), or as Eleanor Young, the domineering mother-in-law in Crazy Rich Asians (2018).

She’s an actress at the borderline of the Chinese – Hollywood clusters. She’s acted ~60 times with Chinese actors like Maggie Cheung, Chow Yun-Fat and Jet Li. But she’s acted almost as many times with Hollywood actors like Sigourney Weaver, Zoe Saldana and Sam Worthington.

More actors

Here are half a dozen more Chinese actors that have acted with Hollywood actors often.

Chow Yun-Fat
Donnie Yen
Andy Lau
Simon Yam
Gong Li
Josie Ho

It’s interesting to see that 3 of the top 6 (Chow Yun-Fat, Pei-Pei Cheng, and Michelle Yeoh) had all acted in the blockbuster Crouching Tiger, Hidden Dragon (2000).

So, perhaps the simple message to our Hollywood producer is to “look no further than the cast of the first foreign-language film to break the $100mn mark in the USA.”

How isolated is Bollywood from world cinema?

These are the major group actors based on who they act with most.

Actors mostly act with other actors in the same…
  1. Language. Not country. For example, the Spanish / Mexican group is across countries. But Indian actors divide into North Indian and South Indian. It’s language, not country.
  2. Time period. Old American actors are a separate group from Hollywood. (Naturally. Brad Pitt was born after Humphrey Bogart died. They couldn’t have acted together.)
  3. Genre. Hollywood Porn actors don’t act with mainstream Hollywood. Same with Japanese Porn, Hollywood TV, and Hollywood Horror actors.

How are these groups themselves connected? Do Chinese actors act with Hollywood often? How isolated is Bollywood from world cinema?

Hollywood is the core group

Take groups that act with other groups at least 5% of the time. Mainstream Hollywood acts with British and Hollywood TV/Horror actors. All other clusters are isolated.


Indian & Japanese clusters emerge

Let’s go more liberal. Take groups that act with other groups at least 2% of the time. Hollywood forms a big connected cluster. It includes most of Europe — British, German, French, Czech, Yugoslavian & Italian actors.

North & South Indian actors form the first non-Hollywood cross-language cluster.

The Japanese and Japanese porn actors form a cluster too. (Interestingly, it’s easy for a Japanese porn actor to act with mainstream Japanese actors. Hollywood porn actors find it far harder to act with Hollywood.)

Among groups that act with other groups at least 1% of the time, we have:

Chinese & Korean cluster emerges

Chinese & South Korean actors form the first cross-country cross-language cluster.

Hollywood expands to act with Scandinavian, Spanish, Polish, Brazilian & Nigerian films.

Other film industries (Russian, Greek, Egyptian — even Hollywood Porn — are still isolated.)


World Cinema vs the rest

Among groups that act with other groups at least 0.5% of the time, we have:

  1. Turkish & Iranian groups coming together
  2. Indonesian actors acting with the Chinese
  3. Hollywood expanding to cover Russian, Greek, Egyptian, and finally, Hollywood Porn. (It’s easier for Brazilian / Nigerian to act with Hollywood than to be a Hollywood Porn actor.)

At this point, there are 6 actor groups that act with each other at least 1 out of 200 times (0.5%).

  1. World Cinema (Hollywood & friends)
  2. Japanese (mainstream & porn)
  3. Indian (North & South)
  4. Chinese, South Korean & Indonesian
  5. Turkish & Iranian
  6. Filipino

One world of cinema

If we look at groups that act with other groups at least 0.5% of the time, we have a far more unified picture. Almost every actor group acts with another group at least 1 out of 400 times.

But even here, there’s an exception. Filipino actors — the most insular major actor group in the world.


So, how isolated is Bollywood from World Cinema? For its size, it’s one of the most isolated actor groups. (But not as much as Iranian/Turkish or Filipino.)

Can foreigners enter Hollywood?

An aspiring Malaysian actor posted on Reddit:

I am a 18-year old biracial Malaysian kid who wants to be an actor in Hollywood. I’m taking a diploma for performing arts in a college called Sunway University in 8 days and I’m considering pulling out of it because why do something that I like when my dreams might never be fulfilled and the price for taking this diploma is seriously expensive. I am starting to doubt my chances of making it to Hollywood and I suffer from extreme anxiety. Is it possible for someone like me to enter Hollywood? What are my chances?

Breaking into Hollywood is hard. As a foreigner, it would be even harder. So I asked myself:

Do Hollywood actors act with foreigners?

Let’s take Will Smith. He frequently acts with Martin Lawrence, Tommy Lee Jones, Jaden Smith, Jon Voight, and 84 other actors.

His every co-star is a Hollywood actor, except the Spanish actor Jordi Mollà in Bad Boys II, and the Dutch actor Marwan Kenzari in Aladdin. Will Smith acts with just 2% of foreign co-stars.

On the other hand, Jackie Chan is more cosmopolitan. He acts with:

Of his 224 co-stars, 70 are non-Chinese. Jackie Chan acts with over 30% foreign co-stars.

Are Chinese films be more foreigner-friendly? Should our Malaysian friend try there instead?

Is Hollywood less open to foreigners than other countries?

I took all movie actors across the world and broke them into groups using a community structure. Actors within the group act mostly within themselves, and less with other groups.

The largest group is Hollywood, with ~80,000 actors (mostly American). They act with each other 90% of the time and act with other groups only 10% of the time.

In comparison, the Chinese group has ~20,000 actors. They act with each other 98% of the time. When they do act outside the group, it’s mostly with Hollywood (0.5%), Japanese (0.3%), South Korean (0.3%), and Indonesian (0.1%)

Clearly, Jackie Chan is more the exception than the norm.

But among the large groups, there are 2 groups that are even more insular than Chinese actors.

The ~8,200 Turkish actors act only with each other 99.1% of the time, occasionally venturing to act with Iranian actors (0.2%).

Even more insular are the ~7,000 Filipino actors who act with each other 99.3% of the time. They occasionally venture out to act in Hollywood 0.2% of the time.

There are no other sizeable groups of actors that’re as insulated.

Hollywood is actually among the most cosmopolitan groups, along with the West European films. So, to our budding Malaysian actor, I’d say:

It’s hard to get an acting break. As a foreigner, it’s 10 times harder in Hollywood. But you’re better off in Hollwood or Western Europe than in any other country, where it would be 50 to 100 times as hard!

Releasing modified mosquitoes precisely

At PyCon Indonesia, I spoke about a project we worked on with the World Mosquito Program.

The World Mosquito Program (WMP) modifies mosquitoes with a bacteria — Wolbachia. This reduces their ability to carry deadly viruses. (It makes me perversely happy that we’re infecting mosquitoes now 😉.)

Modifying mosquitoes is an expensive process. With a limited set of “good mosquitoes”, it is critical to find the best release points that will help them replicate rapidly.

But planning the release points took weeks of manual effort. It involved ground personnel going through several iterations.

So our team took high-resolution satellite images, figured out the building density, estimated population density based on that, and generated a release plan. This model is 70% more accurate and reduced the time from 3 weeks to 2 hours.

More details at the Gramener website.

The slides for the talk are below.

Jolie No. 1

There are more Bollywood actors in Hollywood. Some are even turning down Hollywood roles.

So we wondered: How easily can a Bollywood actor connect to a Hollywood actor?

As part of the Oct 2019 Gramener data story hackathon, AnandKishore, and Niyas created a Jolie No 1 — a data video where Govinda announces (in our imagination) that he will act with Angelina Jolie in Jolie No 1, but declines to comment on who introduced them.

We picked a theme first

The hackathon theme was “movies”. We explored 5 themes:

  1. Who acts most in cameo roles, and what’s the impact on revenue? (Based on The Numbers)
  2. Which actors acted often together? (Based on IMDb data)
  3. Which movies become hits on TV? (Based on BARC TV data)
  4. What is the social network of actors in individual movies (https://www.xkcd.com/657/)
  5. Correlation of TV series actors and their revenues

We explored insights next

We picked the first two themes because we liked them.

1. Cameo appearances

Some observations were:

  • Stan Lee starred in 45 cameo roles. No one even comes close. Some roles are:
    • A school bus driver in Avengers: Infinity War (2018)
    • A strip club DJ in Deadpool (2016)
    • A hot-dog vendor in X-Men (1995)
  • Jay Leno (25) and Larry King (21) follow, mostly starring as themselves
  • Alfred Hitchcock (16) has famous cameo appearances in most of his films, such as:
    • Man mailing letter in Suspicion (1941)
    • Man winding the clock in Rear Window (1954)
    • Man walking the docs in The Birds (1963)

We didn’t have inflation-adjusted box-office revenues, so we couldn’t compare the revenues.

2. Which actors acted often together

Some observations were:

  • Top hero-heroine combo:
    • Overall: Prem Nazir & Jayabharati
    • Hollywood: Billy Dee & Mike Horner (pornstars)
    • Tollywood: Krishna Ghattamaneni & Jaya Prada
    • Bollywood: Jeetendra & Rekha
  • Top male combo: Sivaji Ganesan & Nagesh (more recently, Senthil & Goundamani)
  • Top female combination: Lalitha & Padmini
  • Top pair of:
    • Shah Rukh Khan: Rani Mukherji
    • Amitabh Bachchan: Hema Malini
    • Kamal Haasan: Sridevi
    • Rajinikanth: Sridevi
    • Sridevi: Krishna Ghattamaneni
    • Chiranjeevi: Vijayshanti
    • Dev Anand: Madhubala

The observations focus on Bollywood and Hollywood (because of our familiarity) — but there are number of insights on Japanese and French films too.

We decided to go with this theme because it offered multiple storylines:

  • Some actors pair up with each other, e.g. Gemini – Savithri
  • Some actors have a big “following” e.g. RajinikanthKamal HassanJitendra have acted most with Sridevi
  • Some actors form cliques — working only with each other
  • Often, comedians are the bridge between cliques
  • It’s interesting to see how actors from one clique can connect to another

Creating the storyline

When exploring of actors’ connections, we found a clearly delineated network structure.

Actor SNA

The group of densely clustered actors is the Bollywood-Tollywood-Mollywood-Kollywood nexus. It appears disconnected from the Hollywood cluster. (We excluded anyone who hadn’t acted together in at least 4 films.)

The data was created using this Jupyter notebook.

We realized that it’s tough for someone in Bollywood to connect to Hollywood. Maybe that could be the plot? For example, what if Amitabh Bachchan wants to act with Metryl Streep?

But this isn’t an interesting story. So we asked:

The plot summary was: Govinda wants to act with Angelina Jolie. Who can connect them?

The analysis is in this Jupyter notebook.

Write the screenplay

The morning of the hackathon was spent finalizing the screenplay and dialogues, written on Dropbox Paper.

CUT TO:
    - Video of Govinda "declining James Cameron's Avatar" on Aap Ki Adalat
    - Niyas: On July 29, 2019, Govinda announces he declined a role in Avatar.
    - Video: https://youtu.be/NyFF18a7e-Y
    - Picture: https://twitter.com/mohan_rajkeshav/status/1156148768049262592

CUT TO:
    - Visual: Show an interview video of Govinda and of Angelina
    - Niyas: Today, he announced his next film with Angelina Jolie.
             A “close friend” connected them, but didn't say who.
    - Kishore: Who is this close friend? Why is he not naming them?
    - Video: https://youtu.be/NyFF18a7e-Y (Govinda)
    - Video: https://youtu.be/JNrH1W7aKc8 (Angelina)

CUT TO:
    - Visual: Show the top 8 heroines Govinda has acted with.
              Visualize this data with animation.
              One option is to have Govinda’s pic in the center,
              and have each of these 9 heroine’s images appear around him
              as a circle, with the number of pictures in a link.
              Or as the inverse link distance (e.g. 11 is closest)

    11 Neelam Kothari
    10 Kimi Katkar
    10 Karisma Kapoor
     9 Raveena Tandon
     9 Farha Naaz
     8 Juhi Chawla
     6 Anita Raj
     6 Mandakini
     5 Shilpa Shetty Kundra

    - Niyas: Maybe it’s because it’s one of his heroines?
             He’s mostly acted with Neelam, Kimi and Karishma.
             But none of them has acted with any Hollywood actor.

MORPH TO: 
    - Visual: Add these actors with pics to the same visual,
              but clearly differentiated by gender. Also add their names.

    22 Shakti Kapoor
    18 Kader Khan
    13 Gulshan Grover
     9 Anupam Kher
     8 Dharmendra
     7 Johnny Lever
     6 Sadashiv Amrapurkar
     6 Vikas Anand
     6 Sanjay Dutt
     6 Prem Chopra
     6 Asrani

    - Kishore: So maybe this “close friend” is a male actor?
    - Niyas: He’s acted with Gulshan Grover, Kader Khan and Shakti Kapoor a lot.
    - Kishore: Shakti Kapoor is practically his boyfriend!

MORPH TO:
    - Visual: Zoom into Gulshan Grover and Anupam Kher.
              Build a network of film posters around them
              with their Hollywood films (max 2-4)
        - Anupam Kher
            - Bend It Like Beckham
            - Lust & Caution
            - Silver Linings Playbook
            - A Family Man
        - Gulshan Grover
            - Prisoners of the Sun
            - The Second Jungle Book
            - Marigold
            - Monsoon
    - Niyas: Gulshan Grover and Anupam Kher have acted in a number of Hollywood films
    - Kishore: But have they acted with Angelina Jolie?
    - Niyas: No, never with Angelina Jolie.
    - Kishore: But what if any of them connected him to someone who connected him to Angelina?

CUT TO:
    - Visual: Show Angelina Jolie with ~100 actors around her. Highlight the following:
        - Jack Black, 3
        - Dustin Hoffman, 3
        - Giovanni Ribisi, 2
        - Robert De Niro, 2
        - Brad Pitt, 2
        - Elle Fanning, 2
        - Bryan Cranston, 2
        - 92 other actors with only 1 film each
        - Highlight Irrfan Khan — A Mighty Heart
    - Niyas: Angelina Jolie has acted with less than 100 actors.
             Dustin Hoffman and Jack Black, mostly.
             Only one of them is an Indian actor: Irrfan Khan

MORPH TO:
    - Visual: Expand the connection between Angelina and Irrfan
    - Kishore: So, Govinda needs to connect to Irrfan Khan somehow.

MORPH TO:
    - Visual: Connect Govinda to Irrfan Khan via
        - Gulshan Grover via Knock Out
        - Sanjay Dutt via Knock Out
        - Tabu via Saajan Chale Sasural, Dil Ne Phir Yaad Kiya (and 2 others)    
    - Niyas: That should be easy.
             Gulshan Grover and Irrfan Khan have acted together in Knock Out.
             So has Sanjay Dutt.
             But Tabu will be a better option. Govinda and Irrfan Khan have acted with her in 4 movies each.

MORPH TO:
    - Visual: Show path from Govinda to Tabu to Irrfan to Angelina.
    - Kishore: Then, Govinda must have connected to Tabu
               who introduced him to Irrfan Khan,
               who in turn connected him with Angelina Jolie.

Create the video

Anand and Niyas created the visuals on PowerPoint, collaborating on Dropbox.

This is the first version of the presentation. It uses morph transitions extensively.

PPT screenshot

Niyas and Kishore recorded the audio in two parts on their phone, shared it with Anand via WhatsApp.

We integrated these using the Windows 10 video editor. It’s simple, but now powerful. For our use, simplicity was more important.

The process took 6 hours (from 8 am to 2 pm).

  • Writing the screenplay and dialogues: 1.5 hours
  • Creating the presentation: 2 hours
  • Recording the audio: 1 hour
  • Integrating into the video: 1.5 hours

At the last minute, we picked the title “Jolie No. 1” as a parody of Govinda’s No. 1 film series).

We published this on Google Drive, and then on YouTube.