How I do things

Playing sounds backwards

You can play a video backwards and still recognise the scenes quite well. Can you do that with sound?

I tried it on this Bryan Adams clip of Summer of ’69 (mp3). When played backwards (mp3), it almost sounds like Arabic!

Instruments sound weird backwards too, like the guitar played backwards and drums played backwards.

It’s seems obvious once you see the wave file. The picture below shows the guitar. The sounds are clearly not symmetric left to right.

Sound wave diagram of a guitar

Whereas this guitar is a lot more symmetric, and doesn’t sound too different backwards.

Sound wave diagram of another guitar

So how come we can’t recognise sounds played backwards, but can recognise video played backwards? (Initially, I thought it was a trivial question. But I couldn’t find a trivial answer. The question may be subtler than it looks.)

Google searches that lead to my site

I stopped using Google Analytics when I redesigned my site. I track my own statistics. This gives me access to raw data, and I can do my own analyses.

I wanted to know the keywords on Google that led to my site. (Google Analytics only gives you phrases.) I also wanted independent words. If you search for “Calvin and Hobbes”, I want to count only “Calvin”, knowing that it’s in the context of “Hobbes”.

So I did this analysis. Here are the keywords that lead to my site. (This is based on 3 weeks of data).

  1. excel in the context of cell, formula, function, leading to my Excel tips. People mostly want to know how to remove errors like #N/A.
  2. calvin in the context of hobbes, fight, club. (There was a great article on how Fight Club is really Calvin and Hobbes.) Most of these queries are searches for specific quotes, and I’ve typed out all the Calvin and Hobbes quotes.
  3. indian in the context of torrents, tv. One of my most popular posts is Indian Torrents. I simply linked to a couple of Google searches, so it’s popularity is unjustified.
  4. tamil in the context of songs, lyrics, movie. This is mostly thanks to the recent tamil quizzes I’ve put up.
  5. mumbai in the context of local, schedule, train. A shockingly large number of people search for Mumbai bus and train schedule, landing on my link to the IIT-B Mumbai Navigator.
  6. anand in the context of s anand, bcg, infosys. This is people searching for me.
  7. irr in the calculating, excel, formula. Calculating IRR turned out to be another unexpectedly popular post.
  8. interview in the context of lehman brothers, bcg, landing at some of my interview experiences.
  9. mckinsey in the context of ppt, presentation. Most of these people are looking for presentations, while I have a link to the McKinsey pre-placement talk at LBS. Interesting that BCG is not on the top 10.
  10. google in the context of engedu, types, authors@google. Though I have several posts about Google, the ones about Google video like Meet the author and on Google TechTalks are the most popular.

Having read the actual queries, I’ve concluded that only the keywords excel, mumbai, anand, irr and interview definitely lead to relevant hits. The rest are debatable. Maybe I should reduce the importance of the less relevant posts on my sitemaps file.

Experiments in sound

Wikipedia says the human voice frequency for speech is between 85 to 155 Hz for men, and 165 to 255 Hz for women. That set me thinking.

  1. What is the limit to our hearing?
  2. How do sounds differ?
  3. How can we synthesise speech?

What are the limits to our hearing?

Kids can hear frequencies from 20 Hz to 20 kHz, while adults hear only up to 12-14 kHz (Frequency Range of Human Hearing).

To check the lower frequency limit, I created an MP3 with sounds from 1 Hz to 100 Hz at 1 second intervals. Just play the sound, and see when you start hearing something. (Of course, whether you can hear something also depends on the volume of your speaker, the ambient noise, etc.) I could hear nothing for the first 40 seconds: so I can’t hear frequencies lower than 40 Hz.

PS: Don’t be worried if you don’t hear anything for a while. You’re not supposed to! Keep the volume at full level, though.

To check the upper frequency limit, I created this MP3 with sounds from 1 kHz to 20 kHz in 1 second intervals. Just play the sound, and see when you stop hearing anything. I couldn’t hear anything beyond 14 seconds: so I can’t hear frequencies beyond 14 kHz.

How do sounds differ?

I took this audio file of someone reciting vowels and plotted a spectrogram (below). A spectrogram plots time on the X axis and frequency on the Y-axis.

Vowels spectrogram

Some observations:

  • All the vowels have evenly spaced bars. (In this case, they’re all multiples of something around 120 Hz.)
  • ‘u’ has the lowest frequency mix. ‘a’ spans from low to high. ‘i’ has a bit of low and a bit of high, nothing in the middle. ‘ai’ and ‘au’ look like ‘a’ followed by ‘i’ and ‘u’ respectively.

How can we synthesise speech?

I don’t know. There are lots of speech synthesizers. They sound robotic. I’m trying to see if knowing what sounds look like improves things. I’ll let you know if I do well.

Link to a Google search rather than a site

When you make a link, there’s no guarantee that the link will work 5 years later. Sites change their URL structure. I’m finding that many of my blog entries from 2000 are invalid.

Sometimes you want to link to a concept rather than a site. In such cases, it’s better to link to a Google query.

For example, rather than link to a site that defines SVG, I could link to the Google search define:SVG.

Rather than link to a tutorial on Excel array formulas, I could link to the Google search excel array formulas. I could even link to the first hit on Google for excel array formulas, mimicking the “I’m feeling lucky” button. This may change over time, but 5 years from now, it’ll still point to the most relevant link.

To link to the Google query for “excel array formulas”, just link to the URL http://www.google.com/search?q=excel+array+formulas. To link directly to the first result, add &btnI=I'm+Feeling+Lucky to the URL. (Linking to A9 is simpler: http://a9.com/excel+array+formulas)

PS: An alternative is to link to a permanent copy of the page from the Wayback machine (it has copies of my page all the way from May 2001 to Mar 2005). (You can’t use Google’s cache. When the site changes, the cache will soon change. But it’s a good defence against site downtime. Manually doing this is a lot of effort. Ideally, future browsers will automatically take you to the Wayback machine or the Google cache. (The Firefox plugins ErrorZilla and CacheIt come close.)

Making a Tamil transliterator

I’ve built a simple Tamil transliterator. You can type in words in English and it will spell them out in Tamil. You can copy-paste the Tamil above into Microsoft Word, etc.

You may need to turn on tamil scripts to see the Tamil fonts above. If you have Windows 98, it may not work well. If you’ve visited this page recently, you will need to refresh this page as well (press F5).

Browse through my Javascript to see how it works. Feel free to reuse.

I’ve also made a Google Gadget that searches Google in Tamil using this tool.

Here’s what to type:

Tamil English
a
A or aa
i
I or ee
u
U or oo
e
E
ai
o
O
au
k or g
n
ch or s
j
n
t or d
N
th or dh
n
p or b
m
y
r
l
v
zh
L
R
sh
S
h

I also have a gadget that lets you search in Tamil.

Statistically improbable phrases

Calvin and Hobbes has some recurrent themes, like Hobbes pouncing, snow art, polls, letters to Santa, …

Over the last 5 years, I’ve transcribed the Calvin and Hobbes comics, and tagged them manually by theme. But can I generate themes automatically?

One way is to use Amazon’s statistically improbable phrases. It’s a list of words that occur a lot in a book, but rarely occur in others. It gives you a good feel of what topics the book is about.

Here’s how I did it:

  1. Transcribe Calvin & Hobbes. This is 99% of the work.
  2. Make a C&H word list. Just join all the words in Calvin and Hobbes. (Be careful about punctuation, and colloquialisms like “dunno”, “leggo”, etc.)
  3. Get an English corpus. That is, get a big list of words in normally occurring text. I have some e-books, and I picked 23 megabytes worth of these as my corpus.
  4. Compare the word frequency in C&H with the corpus. That is, compare the % of occurrences of a word in Calvin and Hobbes versus the corpus.
  5. Display those with significantly higher frequency in C&H.

The list below has common Calvin & Hobbes words occurring 10 times as often as in normal text. It’s incredible how closely it relates to most of the themes.

(Big words occur more often. Dark words are more improbable.)


allowance assignment babe balloon bat bath beanie bedtime bee beep bet bike blaster boring bug bus butter calvin calvinball cartoon cent cereal cheat chew chocolate click comic cookie crunch dad dame derkins dictator-for-life dinosaur disgusting doll doomed dumb duplicate earthling explorer fang fearless ferocious flip flush frog frosted fun fuzzy genius goggle goodness goon grade gross grown-up gum hack hamburger hamster hate hero hideous hobbes homework huey insect invent jelly jerk jurassic kid leaf loot martian math mild-mannered mom monster moron motto munch mushy nickel oatmeal ouija pant peanut perspective pit playground poll porridge poster quiz recess rosalyn rotten rub sandwich santa scary sculpture scum shovel
sissy sitter sled slimy slug slushball sniff snow snowball snowman soak spaceman spiff splash spoil sport squirt steer sting stuffed stupendous sugar susie tickle tiger toy transmogrifier transmogrify tub tuna twinky tyrannosaur underwear vacation weird wham whiff worm wormwood


Summary: “Statistically improbable phrases” are a powerful tool for text analysis. You can apply it on any content and figure out what topics it talks about.

Update: Technically, these are “Statistically improbable WORDS”, not phrases. So I re-did this analysis using phrases instead of words.

How I use Google Spreadsheets

I work across multiple computers (my office laptop, home laptop, client desktop) on a daily basis.

I used to transfer data across these by e-mailing them before I travelled. (I often forgot to do so.) Mostly, these are notes — like telephone numbers, things to buy, places to visit, etc.

Google Notebook solves the problem. But not entirely. I store a lot of my notes on spreadsheets, as lists. For example:

  • Gadgets to buy (and accompanying research)
  • Movies I’ve seen
  • Books to read (and which library they’re available from)
  • To do lists

That’s what I use Google Spreadsheets for — to share lists with myself, across computers.

Demographics prediction from online behaviour

Microsoft adCenter Labs has a demographics prediction engine. Based on a person’s search queries and web sites visited, it can predict their gender and age.

So I tried that on parts of the body, to see what men were interested in vs women.

topic male female
hair 25% 75%
eyes 33% 67%
cheek 33% 67%
hands 33% 67%
lips 36% 64%
ears 39% 61%
fingers 40% 60%
forehead 42% 58%
nose 43% 57%
neck 46% 54%
beard 55% 45%
moustache 58% 42%
leg 60% 40%
palm 61% 39%
toe 64% 36%

While I can understand men being more interested in beards and moustaches (perhaps even legs), why are they far more interested in toes than women?

Cut-and-paste is not understanding

Cut and paste has become easier. So we make less effort to understand. We don’t need to. Like when we pay less attention if we’re recording a lecture.

Solution? I suggest the Tunnel in the Sky strategy. Rod Walker is going for survival training on an alien planet, and asks his sister, Captain Walker…

“Uh, Sis, what sort of gun should I carry?”

“Huh? Why the deuce do you want a gun?”

“Why, for things I might run into of course.”

“Your only purpose is to stay alive. Not to be brave, not to fight. One time in a hundred a gun might save your life; the other ninety-nine it will tempt you into folly.”

“Did you take a gun on your solo test?”

“I did. And I lost it the first day. Which saved my life. I know how good a gun makes you feel. You’re ready for anything and hoping you’ll find it. Which is exactly what is dangerous about it – because you aren’t anything of that sort.”

So, don’t take a gun.

Don’t record lectures. Don’t give yourself the illusion of perfect memory.

Don’t bookmark for future reading. You won’t read it later.

Don’t cut and paste. You don’t understand it now. You won’t understand it later.