S Anand

Maps, Delimitation, and Gerrymandering

I delivered a talk at PyCon India 2019. My slides are on Github.

This is a transcript of that talk.

What I’m going to be talking about is how you can get insights by joining two maps but before we go there, just some basic bookkeeping things.

In case you’re tweeting, these are the hashtags, you probably want to be using the #PyconIndia, my hashtag my IDs, #SANAND0, you don’t need to worry about the slides, they are online. I’ve already posted on Twitter, the link to the slide deck, the slide deck that you’re using but if you desperately do want to take notes, then one small suggestion. Research has shown that taking notes on pen and paper is much better than taking notes on laptops if you want to remember stuff or on mobile phones. So this was a discovery for me. In fact, it was my discovery of the year and I’m following it diligently. Do give it a shot if you want to take notes. Let’s dive in.

The story begins at the Karnataka elections in 2018. Say about one-eighth of the voters are Muslim, and both the Congress as well as the JD(S), were trying to get their support while on the other hand, BJP was taking potshots saying both of them are just trying to appease the community. The Hindu newspaper wanted to write a piece about how large a factor this is, and where all the Muslim vote is strong. You see, here we have a problem.

The thing is that the proportion of the population by religion is available only at the district level or the village level, depending on where you get the data from and this is from the census. Unfortunately, elections are not conducted by the district, elections are conducted by constituency and these are two very different maps. So, I have data in one map, which shows me how many Muslims exists in a particular region and I want to see how many Muslims live in a different region on another map and even though they overlap, there really is no direct way of getting the data from one layer on to the other. So, we literally don’t know how many Muslims live in a constituency.

So, how do we solve this problem? Well, the logical way is you could take one district and a constituency or a set of constituencies, and let’s say the district has a population of 100, out of which we know that 13% are Muslims and we want to split it evenly across a bunch of constituencies.

We could just overlay them. So one district could cover multiple constituencies, one constituency could cover multiple districts, and there is a many to many mapping between these there is sometimes full coverage, sometimes partial coverage.

So this district, for instance, covers at least one constituency fully and maybe this takes up about 1/3 of the total area. So I can say approximately 1/3 of the district’s population, which is that red area lives in this constituency.

Or let’s take another constituency that overlaps with this district. So, now only a portion of this constituency overlaps with this district. So in this area, which takes up maybe about 1/5, or about 20% of that district population, I can say that population lives here. In other words, we are simply making an assumption that within a district, which is the lowest level of data that you have, or if you have village data that’s far more granular, the population is uniformly distributed. That’s the basic assumption.

Now, what we can do is fragment each of these districts and constituencies by overlaying them and creating an intersection out of those, and reassembling those and this is a process that I call reshaping the map.

How much of this can we do in Python? There is a library called reshaper that we put together. The reshaper Library is something that’s very work in progress by the way. You can find it github.com/gramener/reshaper. It does exactly what I’m about to show you right now.

So let’s give it a shot. I’m going to open up the IPython notebook. The library that we are going to be using for this primarily, the core library is Geopandas. For those of you who have been working with data, Pandas is pretty much the de-facto library to use for any kind of data processing. Geopandas is becoming that kind of a standard for any shapefile. So if you have any shapefile and you want to do any kind of geospatial processing, an easy way of doing it is Geopandas and an easy way of installing it is through Conda using Anaconda. Rather than trying to do a pip install by yourself. It’s a little more efficient on most machines. So let’s import Geopandas. Now I have a shapefile that has the Karnataka census data which will eventually appear. I’m going to load it once it appears on the screen. (Just taking a long time. Okay, there we are back again.)

So GPD, which is the abbreviation for Geopandas, has a from_file function that lets you load any shapefile. Now, the other question you’ll have is where am I going to get these shapefiles from? We’ll come to that in a bit. It’s not as difficult as you might think. Let’s say you’ve downloaded the shapefiles. This particularly is the Karnataka census shapefile and what does this look like? Geopandas has a plotting function, which lets you see what the map looks like. So if you look at these districts, this is a pretty large district, this is Bangalore.

Let’s take the area for these. Geopandas offers an attribute called .geometry, which has an attribute called .area, which gets you the overall area of each of these regions, and if you want to look at what that data frame looks like, each of these regions corresponds to one row. So the Bagalkot district is one row, Bangalore rural district is one row and so on. All of the data in the shapefile also comes in here, you have a column called geometry, which has the additional geometry details, this is a pretty large column, which you probably won’t be going into the details of it. We’ve just now added one column called area, and this has the area of each of these regions and at the very least, you can figure out which are the larger regions, which are the smaller regions.

Let’s do the same for the constituencies data set. So here we have the constituencies that are more, these are parliamentary constituencies, by the way, not assembly constituencies. The difference being if you’re electing someone for the parliament, it’s or an MP, then it’s a parliamentary constituency. If you’re electing them for the assembly, which is an MLA, then it’s the assembly constituency. Parliamentary constituencies are bigger. So you’ll notice that out here, there are multiple parliamentary constituencies that sit in the same region that this district sits in, but it’s not a perfect match. Again, let’s take the area and see what this looks like. We have a bunch of these parliamentary constituencies like Gulbarga, Bijapur, etc, and their respective areas.

Now, Geopandas has a function called sjoin, which lets you take two shapefiles and create all the intersections around those shapefiles the fragments that I just showed you out here. So yeah, creating all these fragments is what the sjoin function does. So, if we do that, then what it’s now done is created a new data frame called merged and that has all these shapes. Let’s validate that. So there are 30 districts and 28 constituencies, but when you overlay them, it turns out that there are 147 fragments, each of which represents an intersection of a district and a constituency. Now, given this, it should be possible to just take any metric, like the percentage of Muslim voters or the number of Muslims, the size of the Muslim population, from the district data into the data that you have on the constituencies.

But it turns out that it’s a little trickier than that. So, you have to do a little more calculation and that’s what’s available in the reshaper library, you can take a look at the code, what it does is moves the metrics from one layer to another in a way that is seamless.

Once we have this, the result is an Excel sheet that kind of looks like this. It has all the attributes from both layers.

So, it says, for instance, that this particular assembly constituency is broken up into three regions, each of which maps to different districts. So some of it overlaps with more and some of it with shift saagar some of it with data and in fact, these are across different states and what is the area of each of these, along with a variety of other metrics that you can calculate and the proportion of area that is overlapping. Once you have this kind of data set, what can we do with it? So let’s revert back to our story.

What actually happened to the Muslim vote?

Well, this is the constituency-wise Muslim voter population in Karnataka.

This was used by the Hindu to publish an article around where exactly the bulk of the voters are concentrated. So, there is a chunk here, there’s a chunk here, there’s a chunk here.

Now, what was happening at this particular point was there was a fight for an alliance. The AIMIM, which is a Muslim party whose name is very long, and I can’t even say it fully. But they had won a number of seats in Telangana, and were looking to also participate in the Karnataka elections. They plan to contest in 60 seats. Now, to make sure that they get the Muslim vote, both JD(S) and the Congress were vying for an alliance with the party and in April 2018, AMIM decided that they will not be directly contesting in the elections, but instead would be supporting JD(S). Now, we have the results of the elections by constituency, we know the voter population by constituency. Let us see what happened to JD(S).

Turns out that where there were more Muslim populations, JD(S) actually got lower votes. So you can see the net result of this election and the alliance.

Congress, on the other hand, had a mildly higher vote share and where there was a significantly larger voter population.

It turns out that BJP was the one that gained the most.

Now while I’m moderately okay, at Python, I’m terrible at electoral analysis. So I have no idea what this means. Okay, I’ll let you figure it out. The elections in Maharashtra and Haryana are also coming up and it turns out that Congress is aligned with AIMIM and, well, let’s just leave it at that.

So, what can we do with this? What kinds of datasets exist and what is the potential of being able to join data sets across two spaces? That’s something that I’m pretty keen on.

It turns out that in India, there are broadly three kinds of geographic hierarchies. There is a political boundary hierarchy, a postal boundary hierarchy and an administrative boundary hierarchy.

By political boundary, I mean, the state parliamentary constituency, assembly constituency, going all the way down to polling booth. This has all of the results of the elections and one of the important aspects of this is that policies get made to a good extent at this level because the MPs and the MLS are focused on their respective constituencies.

The second is a postal code boundary. There is a zone within which there is a sorting district within which there is a post office and there is a PIN code, there are about 110,000 of these in total.

The third is the administrative boundary hierarchy. So there is a state there is a district, there’s also something called a division, but we’ll leave that aside, then there could be a sub district block or village, if it’s a rural area, or it could be municipality zone and ward if it’s a township.

Now, this apart, there is one other way we can create our own hierarchies. But before that, in case you’re looking for shapefiles, for many of these, the easiest way to get the shapefile for India is to search for “Datameet maps”. Datameet is a group that it’s a discussion forum and there is a lot of active discussion on various kinds of maps, pretty much any kind of map, there’s a decent chance that you’ll find it on Datameet and if it’s not there on Datameet, ask the people, they might be able to post something, and if not, it probably just doesn’t exist.

But you can also create your own boundaries. If you have a single location, you can look at the area that is closest to this particular location than any other location. So for example, if this were a network of, let’s say, schools, then what is that vision that is closest to a particular school than any other school.

So if I take this particular point as a school, then this red region represents all of those points which are closer to this school than any other school.

This particular process is called Voronoi tessellation and is something that comes out of box with QGIS, it’s something that you can create with the command line prompt again using the reshape or library, but what that means is that now you can take literally any point and convert that into a region and the potential for that is quite high.

So if I look at the kinds of data sets that you can create with location boundaries, right, so there’s… take all the hospitals, take all the schools, take all the bank branches, take all the petrol pumps take all the locations where crimes have been reported, take any address or take all the telephone towers, take all the locations where there are stores of a particular brand.

All of these are datasets for which you can get an address and an address can be geocoded into a point. If it can be geocoded into a point, you can convert that into a region and for each of these, you naturally have some kind of data for schools, you know how many teachers or how many students that are for telecom towers, you know, which is the organization that runs that tower, potentially the telecom organization will know how many calls are flowing through it, if it’s healthcare data, you know, how many facilities that hospital has, how many patients, how many doctors, all of these are data sets that can be added to that particular cell in your respective region.

But what this means, therefore, is that if we take any of these data sets that which you can create from location boundaries, or that often already exist by administrative boundaries — and this is a pretty powerful set as well. Census gives us demographic data, asset ownership, who owns laptops, internet connections, TV, cars, fridges, social and religious data, economic indicators, well, income, household indicators, is the house made of a mud roof brick roof, do you have a toilet in the house not have a toilet in the house, practically every government scheme is tracked this way. So how many people have benefited from the National Rural Employment Guarantee act? Banking data is reported this way health data is reported this way.

So effectively, anything that the government runs is reported by administrative boundaries. Anything that the corporate sector runs, by and large, is reported by locations. So between these two, there is enormous potential. But there’s also the fact of how decision making happens. Ultimately, political boundaries are owned, in some sense by an MP or an MLA. And, of course, there is also the associated IAS equivalents, who usually run it by administrative boundaries. So if I wanted somebody on the political side to make decisions, then I could take any of this data and put it on to the constituency boundaries. If I wanted an administrative official to make a decision, then I could take any of this data and put it on to a district. If I wanted a manager or a principal of a school, or the CEO of a hospital to make a decision, I could take all of that data and put it onto their geographic boundary.

For example, one of the things that the Hindu again did was found that the Congress is doing much better in the agrarian areas and they did that by taking the census data, which had the percentage of farmers and mapping that on to the voter constituency regions.

If we took, for example, census demographic data and school data, we can answer a question, where should we open new schools so that students don’t have to travel far or where there is a reasonably equal distribution of students across schools?

If we took economic indicators how well the country is growing versus bank branch data? Then we can answer questions like are the bank branches distributed? Based on population? That is, does every person roughly have equal access to the bank or based on wealth? Does every rupee have roughly equal access to the bank or if it’s in between? How close is it from one to the other?

We could find out whether increasing the district’s wealth leads to more theft. So that means people get richer. So does that mean that it does that lead to increase in crime? Or does it lead to less theft? Because the people are richer, and they don’t need to steal therefore? And these are data sets that are available and can be joined.

Similarly, with health data, does poor health lead to an increase in the number of pharmacies that are set up in that region because the pharmacies can sell more. Vice versa, if you actually set up more pharmacies? And does that have a positive impact on the people’s health in that particular region?

Now, the reason these questions are trivial to ask, but nearly impossible to solve today is because merging the data across different kinds of layers of maps is non-trivial. But both conceptually and technologically is quite an easy exercise.

What can we do to solve problems like this? Well, me personally, I’d love to see more of these hidden insights come out but there are a few things that you can do, literally right now.

First, if you have an idea, take a look at these data sets, any of the data sets that you know and raise an issue on this particular repository and I’d invite all of you to share this with people. It’ll be great to see what kinds of ideas can be solved using these problems and I’d like to crowd source this to a number of people on the administrative side, on the NGO side, and on the corporate side, to create a repository that says here are things that we can do.

If you want to try solving one of these and discovering your own insight, to build your own portfolio to share some useful knowledge. Then start by finding a map. Like I said, Datameet is a good place where you can find them up. You can find the reshaper library on https://github.com/gramener/reshaper. The links are again, on https://github.com/gramener/pycon2019. This is the one link that you need to remember and if you find something, do share it on Twitter. Please tag me @sanand0, I’d love to share it at least with the media and get some people to understand the power of geospatial joins.

If you want to contribute to the library right now it’s in a terrible state. Or if you want to learn more, I’m planning to organize a series of workshops on geospatial joins, do drop me an email. My email ID is s.anand@gramener.com and I’ll mail you the workshops.
If nothing else, if you just enjoyed the talk and you’ve learned something about it, then tweet about it. The tags are #PyconIndia2019, my ID, @sanand0. More than anything else, I’d love to see some insights come out by joining data.

Happy mapping!

Question: So my question is, basically, I’m belonging to northeast part of India. So I’m from Assam. So what happened in terms of the documentation for this geographic data and so those are always kept in a sort of, you know, register. We call registers or something, so how we use that image processing and all like, to enable those things into a more of a like a public space?

Answer: Okay. There are broadly three ways in which you can get this kind of data out. The first is beg, borrow, steal. Somebody in the government may have this data. So for example, if you go to the Survey of India, they sell these shape files. Of course, I’ve been trying to buy one of these shape files for the last six years now and have failed and I’ve tried it through the Prime Minister’s office and I still failed. But it’s actually easier to just walk over to the Survey of India office and give them a USB stick, and they’ll give it. So, depending on how you approach it, it may prove relatively straightforward.

On the other hand, sometimes the maps don’t exist. So for example, most interesting anecdote was the former head of the postal College of Mysore was trying to create a postal map, the region of all of the PIN codes. It turns out that nobody knows what the region is that a PIN code covers. So he created that he uploaded that ISRO’s Bhuvan, and then after about a year and a half realize that people have permission to upload into ISRO but download from Bhuvan. So after one and a half years of putting all the data, the data is locked, it’s not even there. So today, what is the best source of getting PIN code data? It turns out that what people did was took various locations, geocoded them, they said, this location is at that this particular PIN code, this location is at that this particular PIN code, let’s draw a region around it using the concept of Voronoi polygons, and publish it. So the second possibility is to create such maps.

The third possibility that you talked about, which is can we use image processing to detect it? Some features can be detected that way. So, for example, if you want to detect urban regions are constructed regions, that’s possible using satellite photography, if you want to locate water bodies, and whether they are growing or shrinking, so for example, in Chennai, the Chembarambakkam lake actually drying up, that’s something that you can draw a boundary around using image processing and that’s a straightforward method. But the thing is, I don’t think a single method will work for a wide variety of data sets, which is why we have many of these.

But the biggest lesson that I’ve learned is that 90% of the things that we want, somebody else has usually wanted, and has managed to get their inputs. So, I find that the most efficient ways to ask and Datameet is a pretty good place to ask if somebody already has his data.

Question: Anand, thank you very much for your talk. I’ve got a question regarding shape files. I had the requirement of using the map of India a few times and I suddenly realized that our external boundaries in a lot of places are in dispute and the kind of shape files that we get are not matching with what politically we want our file boundaries to be. So is there any official place from where we can get these shape files because the only shape file which are available are those distorted shape files, and I finally had to change the shape files myself to use it. I couldn’t find any official place from where to get the shape file.

Answer: So, the official place is the Survey of India, which claims to sell these maps like I said, for the last five, six years now I’ve been trying to buy these maps it’s actually not possible. But there are people who have succeeded and stock is being recorded, right? Okay. Let’s just say that if you go to Datameet maps, you will get unofficial but correct maps.

Question: So, since you are in the field, shouldn’t we have a system of getting correct official maps? Isn’t there a process being put in place or something?

Answer: I tried talking to a couple of people at the Prime Minister’s office and suggested this. They put me on the phone with the Inspector General of Surveys or some such high ranking official who said yes, absolutely, connected me to some person, who connected me to some person, who connected me to some person, who is exactly the same person I talked to in the first place. So, I don’t know. I’m sure there is a process. I don’t know it well enough.

Cyborg scraping

LinkedIn has a page that shows the people who most recently followed you.

At first, it shows just 20 people. But as you scroll, it keeps fetching the rest. I’d love to get the full list on a spreadsheet. I’m curious about:

  1. What kind of people follow me?
  2. Which of them has the most followers?
  3. Who are my earliest followers?

But first, I need to scrape this list. Normally, I’d spend a day writing a program. But I tried a different approach yesterday.

Aside: it’s easy to get bored in online meetings. I have a surplus of partially distracted time. So rather than writing code to save me time, I’d rather create simple tasks to keep me occupied. Like scrolling.

So here’s my workflow to scrape the list of followers.

Step 1: Keep scrolling all the way to the bottom until you get all followers.

Step 2: Press F12, open the Developer Tools – Console, and paste this code.

copy($$('.follows-recommendation-card').map(v => {
  let name = v.querySelector('.follows-recommendation-card__name')
  let headline = v.querySelector('.follows-recommendation-card__headline')
  let subtext = v.querySelector('.follows-recommendation-card__subtext')
  let link = v.querySelector('.follows-recommendation-card__avatar-link')
  let followers = '', match
  if (subtext) {
    if (match = subtext.innerText.match(/([\d\.K]+) follower/)) {
      followers = match[1]
    } else if (match = subtext.innerText.match(/([\d\.K]+) other/)) {
      followers = match[1]
    }
  }
  followers = followers.match(/K$/) ? parseFloat(followers) * 1000 : parseFloat(followers)
  return {
    name: name ? name.innerText : '',
    headline: headline ? headline.innerText : '',
    followers: followers,
    link: link ? link.href : ''
  }
}))

Step 3: The name, headline, followers and link are now in the clipboard as JSON. Visit https://www.convertcsv.com/json-to-csv.htm and paste it in “Select your input” under “Enter Data”.

Step 4: Click on the “Download Result” button. The JSON is converted into a CSV you can load into a spreadsheet.

I call this “Cyborg scraping“. I do half the work (scrolling, copy-pasting, etc.) The code does half the work. It’s manual. It’s a bit slow. But it gets the job done quick and dirty.

I’ll share later what I learned about my followers. For now, I’m looking forward to meetings 😉

PS: A similar script to scrape LinkedIn invitations is below. You can only see 100 invitations per page, though.

copy($$('.invitation-card').map(v => ({
  name: (v.querySelector('.invitation-card__title') || {}).innerText || '',
  link: v.querySelector('.invitation-card__link').href,
  subtitle: (v.querySelector('.invitation-card__subtitle') || {}).innerText || '',
  common: (v.querySelector('.member-insights__count') || {}).innerText || '',
  message: (v.querySelector('.invitation-card__custom-message') || {}).innerText || '',
})))

PS: A similar script to scrape LinkedIn people search results is below.

copy($$('.entity-result').map(v => {
  const name = v.querySelector('.entity-result__title-text [aria-hidden="true"]');
  const link = v.querySelector('a');
  const badge = v.querySelector('.entity-result__badge [aria-hidden="true"]');
  const title = v.querySelector('.entity-result__primary-subtitle');
  const subtitle = v.querySelector('.entity-result__secondary-subtitle');
  const summary = v.querySelector('.entity-result__summary--2-lines');
  const insight = v.querySelector(".entity-result__simple-insight-text");
  return {
    name: name?.innerText || '',
    link: (link?.href || '').split('?')[0],
    badge: badge?.innerText || '',
    title: title?.innerText || '',
    subtitle: subtitle?.innerText || '',
    summary: summary?.innerText || '',
    insight: insight?.innerText || '',
  }
}))

Designing Complex Shapes in PowerPoint

I use PowerPoint instead of Adobe Illustrator or Sketch. I’m familiar with it, and it does everything I need.

One of the features I’m really excited by in PowerPoint is the ability to manipulate shapes.

Let’s say you have a rectangle and a circle. You can select both of these shapes and in the Shape Format > Merge Shapes dropdown, you can:

  • merge them with a union
  • combine them (like an XOR operation in Boolean algebra)
  • fragment them, which breaks them up into pieces
  • intersect them
  • subtract them

This is so powerful that you can create any kind of shape. Let’s take an icon from Font Awesome at random — say an address card — and create it.

Here’s the video of the process. I’ll explain it step-by-step below.

First, let’s take a screenshot of this and copy it into PowerPoint.

Now let’s draw over this. So we’ll start with a rounded rectangular box with the same color as the address card. We can use the eyedropper to pick the right color. Remove the outline. Then match the edges as closely as you can. (Add a bit of transparency so you can see through it — that helps match edges closely.)

Move this card boundary to a new page.

Now, on top of the original image we copy-pasted from Font Awesome, trace 3 rounded rectangles for the address lines. Trace a circle over the head. Fill them white. Remove the outline. It should look like these.

Next, let’s create the body. We’ll create a rounded rectangle that matches the bottom half of the body, another that matches the top half of the body, and intersect them, like this:

Then, draw a large circle around the head and subtract it from the body, like this:

Finally, copy all these shapes over the card boundary on the next page. Select the card boundary first. Then select these copied shapes (3 address lines, head, and bust). Select Shape Format > Merge Shapes > Subtract.

With that, we have a single shape that contains the entire address card. The white areas are transparent.

You can download the Merge-Shapes.pptx file below with each of the steps.

Like I said, I don’t bother with Adobe Illustrator or Sketch. PowerPoint does it all for me 😊.

Vadivelu Comedy Dialogues

Here are some famous funny dialogues by Tamil comedian Vadivelu. Can you guess which movie they’re from?

Don’t worry about the spelling. Just spell it like it sounds, and the box will turn green.

Releasing modified mosquitoes precisely

At PyCon Indonesia, I spoke about a project we worked on with the World Mosquito Program.

The World Mosquito Program (WMP) modifies mosquitoes with a bacteria — Wolbachia. This reduces their ability to carry deadly viruses. (It makes me perversely happy that we’re infecting mosquitoes now 😉.)

Modifying mosquitoes is an expensive process. With a limited set of “good mosquitoes”, it is critical to find the best release points that will help them replicate rapidly.

But planning the release points took weeks of manual effort. It involved ground personnel going through several iterations.

So our team took high-resolution satellite images, figured out the building density, estimated population density based on that, and generated a release plan. This model is 70% more accurate and reduced the time from 3 weeks to 2 hours.

More details at the Gramener website.

The slides for the talk are below.

Jolie No. 1

There are more Bollywood actors in Hollywood. Some are even turning down Hollywood roles.

So we wondered: How easily can a Bollywood actor connect to a Hollywood actor?

As part of the Oct 2019 Gramener data story hackathon, AnandKishore, and Niyas created a Jolie No 1 — a data video where Govinda announces (in our imagination) that he will act with Angelina Jolie in Jolie No 1, but declines to comment on who introduced them.

We picked a theme first

The hackathon theme was “movies”. We explored 5 themes:

  1. Who acts most in cameo roles, and what’s the impact on revenue? (Based on The Numbers)
  2. Which actors acted often together? (Based on IMDb data)
  3. Which movies become hits on TV? (Based on BARC TV data)
  4. What is the social network of actors in individual movies (https://www.xkcd.com/657/)
  5. Correlation of TV series actors and their revenues

We explored insights next

We picked the first two themes because we liked them.

1. Cameo appearances

Some observations were:

  • Stan Lee starred in 45 cameo roles. No one even comes close. Some roles are:
    • A school bus driver in Avengers: Infinity War (2018)
    • A strip club DJ in Deadpool (2016)
    • A hot-dog vendor in X-Men (1995)
  • Jay Leno (25) and Larry King (21) follow, mostly starring as themselves
  • Alfred Hitchcock (16) has famous cameo appearances in most of his films, such as:
    • Man mailing letter in Suspicion (1941)
    • Man winding the clock in Rear Window (1954)
    • Man walking the docs in The Birds (1963)

We didn’t have inflation-adjusted box-office revenues, so we couldn’t compare the revenues.

2. Which actors acted often together

Some observations were:

  • Top hero-heroine combo:
    • Overall: Prem Nazir & Jayabharati
    • Hollywood: Billy Dee & Mike Horner (pornstars)
    • Tollywood: Krishna Ghattamaneni & Jaya Prada
    • Bollywood: Jeetendra & Rekha
  • Top male combo: Sivaji Ganesan & Nagesh (more recently, Senthil & Goundamani)
  • Top female combination: Lalitha & Padmini
  • Top pair of:
    • Shah Rukh Khan: Rani Mukherji
    • Amitabh Bachchan: Hema Malini
    • Kamal Haasan: Sridevi
    • Rajinikanth: Sridevi
    • Sridevi: Krishna Ghattamaneni
    • Chiranjeevi: Vijayshanti
    • Dev Anand: Madhubala

The observations focus on Bollywood and Hollywood (because of our familiarity) — but there are number of insights on Japanese and French films too.

We decided to go with this theme because it offered multiple storylines:

  • Some actors pair up with each other, e.g. Gemini – Savithri
  • Some actors have a big “following” e.g. RajinikanthKamal HassanJitendra have acted most with Sridevi
  • Some actors form cliques — working only with each other
  • Often, comedians are the bridge between cliques
  • It’s interesting to see how actors from one clique can connect to another

Creating the storyline

When exploring of actors’ connections, we found a clearly delineated network structure.

Actor SNA

The group of densely clustered actors is the Bollywood-Tollywood-Mollywood-Kollywood nexus. It appears disconnected from the Hollywood cluster. (We excluded anyone who hadn’t acted together in at least 4 films.)

The data was created using this Jupyter notebook.

We realized that it’s tough for someone in Bollywood to connect to Hollywood. Maybe that could be the plot? For example, what if Amitabh Bachchan wants to act with Metryl Streep?

But this isn’t an interesting story. So we asked:

The plot summary was: Govinda wants to act with Angelina Jolie. Who can connect them?

The analysis is in this Jupyter notebook.

Write the screenplay

The morning of the hackathon was spent finalizing the screenplay and dialogues, written on Dropbox Paper.

CUT TO:
    - Video of Govinda "declining James Cameron's Avatar" on Aap Ki Adalat
    - Niyas: On July 29, 2019, Govinda announces he declined a role in Avatar.
    - Video: https://youtu.be/NyFF18a7e-Y
    - Picture: https://twitter.com/mohan_rajkeshav/status/1156148768049262592

CUT TO:
    - Visual: Show an interview video of Govinda and of Angelina
    - Niyas: Today, he announced his next film with Angelina Jolie.
             A “close friend” connected them, but didn't say who.
    - Kishore: Who is this close friend? Why is he not naming them?
    - Video: https://youtu.be/NyFF18a7e-Y (Govinda)
    - Video: https://youtu.be/JNrH1W7aKc8 (Angelina)

CUT TO:
    - Visual: Show the top 8 heroines Govinda has acted with.
              Visualize this data with animation.
              One option is to have Govinda’s pic in the center,
              and have each of these 9 heroine’s images appear around him
              as a circle, with the number of pictures in a link.
              Or as the inverse link distance (e.g. 11 is closest)

    11 Neelam Kothari
    10 Kimi Katkar
    10 Karisma Kapoor
     9 Raveena Tandon
     9 Farha Naaz
     8 Juhi Chawla
     6 Anita Raj
     6 Mandakini
     5 Shilpa Shetty Kundra

    - Niyas: Maybe it’s because it’s one of his heroines?
             He’s mostly acted with Neelam, Kimi and Karishma.
             But none of them has acted with any Hollywood actor.

MORPH TO: 
    - Visual: Add these actors with pics to the same visual,
              but clearly differentiated by gender. Also add their names.

    22 Shakti Kapoor
    18 Kader Khan
    13 Gulshan Grover
     9 Anupam Kher
     8 Dharmendra
     7 Johnny Lever
     6 Sadashiv Amrapurkar
     6 Vikas Anand
     6 Sanjay Dutt
     6 Prem Chopra
     6 Asrani

    - Kishore: So maybe this “close friend” is a male actor?
    - Niyas: He’s acted with Gulshan Grover, Kader Khan and Shakti Kapoor a lot.
    - Kishore: Shakti Kapoor is practically his boyfriend!

MORPH TO:
    - Visual: Zoom into Gulshan Grover and Anupam Kher.
              Build a network of film posters around them
              with their Hollywood films (max 2-4)
        - Anupam Kher
            - Bend It Like Beckham
            - Lust & Caution
            - Silver Linings Playbook
            - A Family Man
        - Gulshan Grover
            - Prisoners of the Sun
            - The Second Jungle Book
            - Marigold
            - Monsoon
    - Niyas: Gulshan Grover and Anupam Kher have acted in a number of Hollywood films
    - Kishore: But have they acted with Angelina Jolie?
    - Niyas: No, never with Angelina Jolie.
    - Kishore: But what if any of them connected him to someone who connected him to Angelina?

CUT TO:
    - Visual: Show Angelina Jolie with ~100 actors around her. Highlight the following:
        - Jack Black, 3
        - Dustin Hoffman, 3
        - Giovanni Ribisi, 2
        - Robert De Niro, 2
        - Brad Pitt, 2
        - Elle Fanning, 2
        - Bryan Cranston, 2
        - 92 other actors with only 1 film each
        - Highlight Irrfan Khan — A Mighty Heart
    - Niyas: Angelina Jolie has acted with less than 100 actors.
             Dustin Hoffman and Jack Black, mostly.
             Only one of them is an Indian actor: Irrfan Khan

MORPH TO:
    - Visual: Expand the connection between Angelina and Irrfan
    - Kishore: So, Govinda needs to connect to Irrfan Khan somehow.

MORPH TO:
    - Visual: Connect Govinda to Irrfan Khan via
        - Gulshan Grover via Knock Out
        - Sanjay Dutt via Knock Out
        - Tabu via Saajan Chale Sasural, Dil Ne Phir Yaad Kiya (and 2 others)    
    - Niyas: That should be easy.
             Gulshan Grover and Irrfan Khan have acted together in Knock Out.
             So has Sanjay Dutt.
             But Tabu will be a better option. Govinda and Irrfan Khan have acted with her in 4 movies each.

MORPH TO:
    - Visual: Show path from Govinda to Tabu to Irrfan to Angelina.
    - Kishore: Then, Govinda must have connected to Tabu
               who introduced him to Irrfan Khan,
               who in turn connected him with Angelina Jolie.

Create the video

Anand and Niyas created the visuals on PowerPoint, collaborating on Dropbox.

This is the first version of the presentation. It uses morph transitions extensively.

PPT screenshot

Niyas and Kishore recorded the audio in two parts on their phone, shared it with Anand via WhatsApp.

We integrated these using the Windows 10 video editor. It’s simple, but now powerful. For our use, simplicity was more important.

The process took 6 hours (from 8 am to 2 pm).

  • Writing the screenplay and dialogues: 1.5 hours
  • Creating the presentation: 2 hours
  • Recording the audio: 1 hour
  • Integrating into the video: 1.5 hours

At the last minute, we picked the title “Jolie No. 1” as a parody of Govinda’s No. 1 film series).

We published this on Google Drive, and then on YouTube.

How to direct a data movie

Ganes and I created a data movie on speed-cubing records as part of a Gramener hackathon.

Here’s a video of us talking about how we created it.

Anand: We picked the Rubik’s cube story for this hackathon. Tell me more about how this excited you.

Ganes: Since my son started solving the Rubik’s cube a few months back, I’ve been fascinated with these competitions. I still don’t know how to solve it, but I like watching it.

Anand: But he does?

Ganes: Yeah, he does. So, in the competitions, I’ve seen kids solving the Rubik’s cube in under 10 seconds. So that was the first source of amazement. I’ve seen kids doing it with one hand, blindfolded. I first couldn’t believe it. Doing it with their legs. So that got me really interested.

When we were talking about this, and I was sharing my amazement, we were talking about the hackathon and the conversations kind of merged. So that, I think, the curiosity around it led to picking this as the story.

Anand: And what was the next step?

Ganes: I have always seen the World Cube Association publishing these records. Their website is great. So I thought maybe we could scrape from that, and that’s when I start looking at the website and the competitions we can pick. and then I stumbled on the export feature where they have multiple formats neatly curated that you can take and directly start the analysis.

Anand: Which was actually a big factor in deciding to go for this. Big data set. Very rich, interesting possibilities.

Ganes: So we had had some five or six ideas. This immediately shot up to the top. So after we got the idea, you kind of took over. I think after I mentioned that all these formats were available, it got you excited. So what did you do after that?

Anand: Then it became a question of what all interesting things we can find. It’s almost an exploratory data analysis, but my approach to EDA (exploratory data analysis) is: let’s formulate the hypotheses and then validate, and see if there Is an interesting story behind it.

So it begins with, for instance, the speed at which records have been broken. Today, it’s at 3½ seconds. We know that. But how fast did it fall? Or: what’s the spread of solving-speed for somebody who solves it fast? Does the same person solve it really fast sometimes and really slow sometimes? Is there a movement in their average? You said, “Let’s see how much longer it takes to solve bigger cubes.” Nikhil was going to take the demographics of solvers and see how they’re spread out. There are definitely a lot of Chinese solves in the spread. So, the thing was, let’s look at possible ideas that could lead to an interesting answer, and then validate those.

Ganes: It was almost like “What would we be interested in finding out” and not necessarily like looking at the column of data.

Anand: Yes. And that I think is important, because, from the data, there may be some ideas. But after absorbing it, knowing what’s interesting is what should drive the story.

Ganes: Right. Yeah. So that was a good starting point. We listed all of these on the board. Then, what did you do next?

Anand: Then it’s about proving these. So, we know here are some possible interesting stories, and let us explore and validate whether these are, in fact, interesting, or can be turned into something interesting. So, when I looked at the speed at which records were broken, for instance, I thought that would be an interesting story. But it wasn’t. It was just getting broken at a steadily successive pace.

But something that I did not expect emerged, which is that Wusheng Du, who holds the world record, is not the person who was there in the records consistently. In fact, Felix Zemdegs has been the consistent winner for the last 10 years and is the only cubing champion who’s won the WCA twice. So, that was something that emerged from doing the analysis. So, that has the ability, therefore, of both proving what we’re looking to prove (or disproving), and also coming up with new stuff that we can choose to incorporate into the story.

Ganes: Almost like starting with a business hypothesis, or what, in the enterprise world, the business wants to know, and then once you get into the data, the data is revealing a few interesting insights, and then you kind of marry both. Looks just like that.

Anand: Exactly. Exactly.

Ganes: So, we identified the insights. And then, the target here was to come up with a 2 minute video. So how did you plan from insights to the video.

Anand: So, one of my cousins is a director, and she tried explaining to me the concept of a screenplay. I never really understood it, even though I’ve read a number of screenplays. So, in the last hackathon, when I was creating a (data) movie, that’s when I realised: as I started writing what I want to shoot (because it requires a whole lot of planning), I was effectively writing a screenplay.

The steps are, basically, you have to decide what are the frames or the sequences you want to shoot. So, one sequence was: we want to introduce this Rubik’s cube win. Another sequence was: we want to show how quickly different types of cubes can be solved, etc.

So, for each of these, what I do is: create a storyline that has the following structure. One: what is the message I want people to take away from that.

Ganes: The headline from there.

Anand: Exactly.

And then, in order to do that, what are the words I would narrate on top of it? That literally forms the dialogue. The third thing is, what are the visuals that prove the dialogue. That I structure in the form of a video. The fourth thing is the transition — from one video to another, or from one sequence to another, how do I flow. These are the 4 things that I captured.

When I write down the full dialogue. I speak it out, put in a timer, and then say “OK, this took 10 seconds, this took 15 seconds, this took 14 seconds” and so on.

Then comes the process of recording (the audio). Assembling the visuals, yes, but timing it and sequencing it based on the recording is pretty critical. So, actually, I wanted your voice – it’s better. And initially, I wanted you to do the recording, but because you were busy in the Dell workshop, I had to do the recording to make sure that I get the timing. Then you re-recorded post that.

That recording makes a huge difference. The audio quality on my iPhone is better than the laptop. I transfer it via Dropbox on to the system.

Ganes: Were there some issues because you have some insights and you have a certain sequence, but it may not add up to 2 minutes. Or, there might be something which will just not flow. How do you correct those issues?

Anand: I found that I consistently underestimate (the time). I thought that we only have material for 1½ minutes, but I knew at that point that invariably, because of this bloat, it will somehow add up to 2 minutes. Which is exactly what happened. It moved to 2 minutes 4 seconds.

Ganes: Yes. Exactly. Yeah.

Anand: So, once you’ve done it once or twice, that amount of correction is there. It’s in fact a whole lot easier to control a video than something as crazy as a (software) program, for instance. The estimation error in programming is much higher than this.

The good part is that post production or editing can take care of a lot of stuff. That 2-minute video can be cut to 1½ if required.

Ganes: Yeah, it can be improved, but my biggest fear is: after recording, the post production is a nightmare. It takes hours and hours of effort. A five-minute video, to post, probably takes 2 hours.

Anand: That is true.

Ganes: How do you go about it? After having these audio clippings, videos and images, how do you stitch all together into a video?

Anand: My workflow is on PowerPoint, mostly, and then on Windows Video Editor. And then you introduced iMovie into the mix.

PowerPoint makes it fairly simple. I can put in an audio in the background. I can handle the animations. It’s not a great tool at all, but it’s a tool I’m very familiar with. So, my workflow is: one slide is one shot or one headline in the storyline. Then I record the video independently or download it from YouTube, put it in the background or wherever. Create all the visuals, create the animations around it, put it there. At this point, the raw material is in. Then I insert the audio and let it play the background for that particular slide. Then I time the animation to the audio.

This is a slow process because PowerPoint doesn’t have the right tools. So I play the audio till that point and then set the animation. Then I start from the beginning again, play the audio to the next point, and then set that animation. Which takes a long duration. But once that’s sorted out, I play that full slide and it works out, I then go back and correct.

The good part is that the audio is the time keeper. I pre-recorded the audio. So I know that the entire duration is only going to be 1.8 minutes (and then towards the end we added a few more vidoes that took it to 2 minutes). So the audio keeps you in control, and if you synchronize everything to the audio, then it becomes easier.

Then I exported it into a video file from PowerPoint directly, and then did a little bit of post-processing, adding a background music and adding a few captions, mostly, on Windows Video Editor, and then gave it to you. Which was at around 9 o’clock or so. What did you do from 9 o’clock to 3 o’clock?

Ganes: So, the first thing — on the PowerPoint, I couldn’t believe that you’d done all this on PowerPoint. Yes, you’re taking the tool beyond the limit it was designed for.

I’ve been working with iMovie for a year, and I find it very powerful. For someone who doesn’t come from that background, it was very easy for me to pick up. I had the images and raw video footage for the different portions we were trying to introduce. I was able to split the audio that you recorded from the video, and then was able to record mine and add it. iMovie has these multiple streams you can insert and remove. I had one stream for my audio for my voice over. And there was this video which you had.

On top of that, I could overlay the pictures and other videos that I had towards the end — two videos playing side-by-side. So all of that was possible. and then I could also introduce background music at the very end. iMovie makes it very easy to move all of these things around. And even the synchronization issue which you told about, that’s much easier to resolve in iMovie.

So, all of this finally coming together, I think, at 3 o’clock… when I had all of this, at 3 o’clock I was hunting for the background music (laughs). I was playing all kinds of clips and finally I chose one. So that’s how we got the final YouTube video.

Anand: My lesson from this is: make sure you have a team member who has a Mac!

Ganes: Right, yeah. So let’s go back and look at our video and see what we can learn from it. Thank you!

Contronyms

Contronyms are words that have two meanings that are the opposite of each other.

Sanction, for example, may mean restricting something (e.g. sanction against imports) or approving something (e.g. sanctioning imports).

Scan may mean to look at cursorily (e.g. scan a document) or look at in detail (e.g. scan an X-Ray)

Fine may mean excellent (e.g. fine wine) or average (e.g. the wine’s fine).

I enjoyed this list of 75 contronyms.

Programming Minecraft with Websockets

Minecraft lets you connect to a websocket server when you’re in a game. The server can receive and send any commands. This lets you build a bot that you can … (well, I don’t know what it can do, let’s explore.)

Minecraft has commands you can type on a chat window. For example, type / to start a command and type setblock ~1 ~0 ~0 grass changes the block 1 north of you into grass. (~ means relative to you. Coordinates are specified as X, Y and Z.)

Minecraft grass block

Note: These instructions were tested on Minecraft Bedrock 1.16. I haven’t tested them on the Java Edition.

Connect to Minecraft

You can send any command to Minecraft from a websocket server. Let’s use JavaScript for this.

First, run npm install ws uuid. (We need ws for websockets and uuid to generate unique IDs.)

Then create this mineserver1.js:

const WebSocket = require('ws')
const uuid = require('uuid')        // For later use

// Create a new websocket server on port 3000
console.log('Ready. On MineCraft chat, type /connect localhost:3000')
const wss = new WebSocket.Server({ port: 3000 })

// On Minecraft, when you type "/connect localhost:3000" it creates a connection
wss.on('connection', socket => {
  console.log('Connected')
})

On Minecraft > Settings > General > Profile, turn off the “Require Encrypted Websockets” setting.

Run node mineserver1.js. Then type /connect localhost:3000 in a Minecraft chat window. You’ll see 2 things:

  1. MineCraft says “Connection established to server: ws://localhost:3000”
  2. Node prints “Connected”

Now, our program is connected to Minecraft, and can send/receive messages.

Minecraft chat connect

Notes:

  • The Python equivalent is in mineserver1.py. Run python mineserver1.py.
  • If you get an Uncaught Error: Cannot find module 'ws', make sure you ran npm install ws uuid.
  • If you get an “Encrypted Session Required” error, make sure you turned off the “Require Encrypted Websockets” setting mentioned above.
  • To disconnect, run /connect off

Subscribe to chat messages

Now let’s listen to the players’ chat.

A connected websocket server can send a “subscribe” message to Minecraft saying it wants to “listen” to specific actions. For example, you can subscribe to “PlayerMessage”. Whenever a player sents a chat message, Minecraft will notify the websocket client.

Here’s how to do that. Add this code in the wss.on('connection', socket => { ... }) function.

  // Tell Minecraft to send all chat messages. Required once after Minecraft starts
  socket.send(JSON.stringify({
    "header": {
      "version": 1,                     // We're using the version 1 message protocol
      "requestId": uuid.v4(),           // A unique ID for the request
      "messageType": "commandRequest",  // This is a request ...
      "messagePurpose": "subscribe"     // ... to subscribe to ...
    },
    "body": {
      "eventName": "PlayerMessage"      // ... all player messages.
    },
  }))

Now, every time a player types something in the chat window, the socket will receive it. Add this code below the above code:

  // When MineCraft sends a message (e.g. on player chat), print it.
  socket.on('message', packet => {
    const msg = JSON.parse(packet)
    console.log(msg)
  })

This code parses all the messages it receives and prints them.

This code in is mineserver2.js. Run node mineserver2.js. Then type /connect localhost:3000 in a Minecraft chat window. Then type a message (e.g. “alpha”) in the chat window. You’ll see a message like this in the console.

{
  header: {
    messagePurpose: 'event',        // This is an event
    requestId: '00000000-0000-0000-0000-000000000000',
    version: 1                      // using version 1 message protocol
  },
  body: {
    eventName: 'PlayerMessage',
    measurements: null,
    properties: {
      AccountType: 1,
      ActiveSessionID: 'e0afde71-9a15-401b-ba38-82c64a94048d',
      AppSessionID: 'b2f5dddc-2a2d-4ec1-bf7b-578038967f9a',
      Biome: 1,                     // Plains Biome. https://minecraft.gamepedia.com/Biome
      Build: '1.16.201',            // That's my build
      BuildNum: '5131175',
      BuildPlat: 7,
      Cheevos: false,
      ClientId: 'fcaa9859-0921-348e-bc7c-1c91b72ccec1',
      CurrentNumDevices: 1,
      DeviceSessionId: 'b2f5dddc-2a2d-4ec1-bf7b-578038967f9a',
      Difficulty: 'NORMAL',         // I'm playing on normal difficulty
      Dim: 0,
      GlobalMultiplayerCorrelationId: '91967b8c-01c6-4708-8a31-f111ddaa8174',
      Message: 'alpha',             // This is the message I typed
      MessageType: 'chat',          // It's of type chat
      Mode: 1,
      NetworkType: 0,
      Plat: 'Win 10.0.19041.1',
      PlayerGameMode: 1,            // Creative. https://minecraft.gamepedia.com/Commands/gamemode
      Sender: 'Anand',              // That's me.
      Seq: 497,
      WorldFeature: 0,
      WorldSessionId: '8c9b4d3b-7118-4324-ba32-c357c709d682',
      editionType: 'win10',
      isTrial: 0,
      locale: 'en_IN',
      vrMode: false
    }
  }
}

Notes:

Build structures using chat

Let’s create a pyramid of size 10 around us when we type pyramid 10 in the chat window.

The first step is to check if the player sent a chat message like pyramid 10 (or another number). Add this code below the above code:

  // When MineCraft sends a message (e.g. on player chat), act on it.
  socket.on('message', packet => {
    const msg = JSON.parse(packet)
    // If this is a chat window
    if (msg.body.eventName === 'PlayerMessage') {
      // ... and it's like "pyramid 10" (or some number), draw a pyramid
      const match = msg.body.properties.Message.match(/^pyramid (\d+)/i)
      if (match)
        draw_pyramid(+match[1])
    }
  })

If the user types “pyramid 3” on the chat window, draw_pyramid(3) is called.

In draw_pyramid(), let’s send commands to build a pyramid. To send a command, we need to create a JSON with the command (e.g. setblock ~1 ~0 ~0 grass). Add this code below the above code:

  function send(cmd) {
    const msg = {
      "header": {
        "version": 1,
        "requestId": uuid.v4(),     // Send unique ID each time
        "messagePurpose": "commandRequest",
        "messageType": "commandRequest"
      },
      "body": {
        "version": 1,               // TODO: Needed?
        "commandLine": cmd,         // Define the command
        "origin": {
          "type": "player"          // Message comes from player
        }
      }
    }
    socket.send(JSON.stringify(msg))  // Send the JSON string
  }

Let’s write draw_pyramid() to create a pyramid using glowstone by adding this code below the above code:

  // Draw a pyramid of size "size" around the player.
  function draw_pyramid(size) {
    // y is the height of the pyramid. Start with y=0, and keep building up
    for (let y = 0; y < size + 1; y++) {
      // At the specified y, place blocks in a rectangle of size "side"
      let side = size - y;
      for (let x = -side; x < side + 1; x++) {
        send(`setblock ~${x} ~${y} ~${-side} glowstone`)
        send(`setblock ~${x} ~${y} ~${+side} glowstone`)
        send(`setblock ~${-side} ~${y} ~${x} glowstone`)
        send(`setblock ~${+side} ~${y} ~${x} glowstone`)
      }
    }
  }

This code in is mineserver3.js.

  • Run node mineserver3.js.
  • Then type /connect localhost:3000 in a Minecraft chat window.
  • Then type pyramid 3 in the chat window.
  • You’ll be surrounded by a glowstone pyramid.
Minecraft glowstone pyramid

Notes:

  • The Python equivalent is in mineserver3.py. Run python mineserver3.py.
  • The “requestId” needs to be a UUID — at least for block commands. I tried unique “requestId” values like 1, 2, 3 etc. That didn’t work.

Understand Minecraft’s responses

For every command you send, Minecraft sends a response. It’s “header” looks like this:

{
  "header": {
    "version": 1,
    "messagePurpose": "commandResponse",                  // Response to your command
    "requestId": "97dee9a3-a716-4caa-aef9-ddbd642f2650"   // ... and your requestId
  }
}

If the command is successful, the response has body.statusCode == 0. For example:

{
  "body": {
    "statusCode": 0,                  // No error
    "statusMessage": "Block placed",  // It placed the block you wanted
    "position": { "x": 0, "y": 64, "z": 0 }   // ... at this location
  },
}

If the command failed, the response has a negative body.statusCode. For example:

{
  "body": {
    "statusCode": -2147352576,        // This is an error
    "statusMessage": "The block couldn't be placed"
  },
}

To print these, add this to socket.on('message', ...):

    // If we get a command response, print it
    if (msg.header.messagePurpose == 'commandResponse')
      console.log(msg)

This code in is mineserver4.js.

  • Run node mineserver4.js.
  • Then type /connect localhost:3000 in a Minecraft chat window.
  • Then type pyramid 3 in the chat window.
  • You’ll be surrounded by a glowstone pyramid, and the console will show every command response.

Notes on common error messages:

  • The block couldn't be placed (-2147352576): The same block was already at that location.
  • Syntax error: Unexpected "xxx": at "~0 ~9 ~-1 >>xxx<<" (-2147483648): You gave wrong arguments to the command.
  • Too many commands have been requested, wait for one to be done (-2147418109): Minecraft only allows 100 commands can be executed without waiting for their response.
  • More error messages here.

Wait for commands to be done

Typing “pyramid 3” works just fine. But try “pyramid 5” and your pyramid is incomplete.

Minecraft incomplete pyramid

That’s because Minecraft only allows up to 100 messages in its queue. On the 101st message, you get a Too many commands have been requested, wait for one to be done error.

{
  "header": {
    "version": 1,
    "messagePurpose": "error",
    "requestId": "a5051664-e9f4-4f9f-96b8-a56b5783117b"
  },
  "body": {
    "statusCode": -2147418109,
    "statusMessage": "Too many commands have been requested, wait for one to be done"
  }
}

So let’s modify send() to add to a queue and send in batches. We’ll create two queues:

  const sendQueue = []        // Queue of commands to be sent
  const awaitedQueue = {}     // Queue of responses awaited from Minecraft

In wss.on('connection', ...), when Minecraft completes a command, we’ll remove it from the awaitedQueue. If the command has an error, we’ll report it.

    // If we get a command response
    if (msg.header.messagePurpose == 'commandResponse') {
      // ... and it's for an awaited command
      if (msg.header.requestId in awaitedQueue) {
        // Print errors 5(if any)
        if (msg.body.statusCode < 0)
          console.log(awaitedQueue[msg.header.requestId].body.commandLine, msg.body.statusMessage)
        // ... and delete it from the awaited queue
        delete awaitedQueue[msg.header.requestId]
      }
    }
    // Now, we've cleared all completed commands from the awaitedQueue.

Once we’ve processed Minecraft’s response, we’ll send pending messages from sendQueue, upto 100 and add them to the awaitedQueue.

     // We can send new commands from the sendQueue -- up to a maximum of 100.
     let count = Math.min(100 - Object.keys(awaitedQueue).length, sendQueue.length)
     for (let i = 0; i < count; i++) {
       // Each time, send the first command in sendQueue, and add it to the awaitedQueue
       let command = sendQueue.shift()
       socket.send(JSON.stringify(command))
       awaitedQueue[command.header.requestId] = command
     }
     // Now we've sent as many commands as we can. Wait till the next PlayerMessage/commandResponse

Finally, in function send(), instead of socket.send(JSON.stringify(msg)), we use sendQueue.push(msg) to add the message to the queue.

This code in is mineserver5.js.

  • Run node mineserver5.js.
  • Then type /connect localhost:3000 in a Minecraft chat window.
  • Then type pyramid 6 in the chat window.
  • You’ll be surrounded by a large glowstone pyramid.
  • The console will print messages like setblock ~0 ~6 ~0 glowstone The block couldn't be placed because we’re trying to place duplicate blocks.
Minecraft glowstone pyramid