How I do things

Most bookmarked pages

These are the most bookmarked pages on my site:

  1. My home page
  2. Excel tips
  3. Calvin & Hobbes quotes (I typed them all)
  4. Indian torrents (I have a search engine for Indian torrents)
  5. Tamil Transliterator (Lets you type Tamil in English)
  6. Tamil songs quiz
  7. Movie quote quiz
  8. My best links
  9. Top 10 lists

But this post is not about these links.

It’s about how I found this out.

Think about it… how could I know what pages have been bookmarked? The browser doesn’t send any information about bookmarks.

Some months ago, I moved away from Google Analytics mainly to have more control over tracking visitors. Among other things, I track referrers. When you click on any page and go to another one, the second page knows the first page you came from. That first page is the referrer.

So I know every page people clicked on to get to my site. Usually it’s Google. Sometimes it’s someone’s blog. Sometimes, it’s blank.

The blank referrers either indicate that the browser has blocked the referrer page, or that the person didn’t visit any page before mine. The former is rare (less than 1%). So, realistically, a blank referrer has either the person typed in the URL, or bookmarked it.

To make sure, I did a quick survey over the weekend. Those who came to my page without a referrer saw a survey form, asking where they came from. Almost all of them had either bookmarked or typed my page. The typists went directly to my home page. All the other links you see above are bookmarks.

So all I had to do was count the number of hits for each page with blank referrers. That’s the list above.

Absence of information can be a powerful indicator too.

Google custom search engine

I didn’t realise the power of Google Coop’s custom search engines (CSE) until I watched Scoble interviewing Google’s Shashi Seth. In a nutshell, CSE lets you create a search engine that’s focuses on specific sites, like UK blogs or Photoshop sites

Anyone can create these. You can edit other people’s search engines too. There are a huge number of custom search engines you can volunteer to edit.

I’ve created a bunch of search engines myself:

You’ll find that the Tamil mp3s and lyrics searches are very poor. This is because Google CSE does not show results in Google’s “supplemental index” — which has most of the useful results for MP3 searches. Fortunately Google plans to add supplemental results.

You can improve these searches. Just click on the search link and click on “Volunteer to contribute to this search engine” at the bottom.

PS: I’m working on a books search engine as well, but until the supplemental index is added, there’s not much I can do with it.

Wishlist for movies

I watch a lot of movies. Over the last year, I’ve watched over 250 movies (and read 50 books, but that’s another story). Other than making time to watch movies, my biggest problem is figuring out what to watch next.

The IMDb top 250 is a good guideline, and I’m running my way down the list. Twofifty.org has been useful to track what I’ve seen as well. But I have interests outside of the IMDb Top 250, and I need a way of tracking these.

I started a “to watch” Excel sheet. But there were three problems:

  1. I would forget what the movie was about
  2. I wouldn’t know what to watch next
  3. I’d have to manually delete movies I’d seen

So I wrote a program to do this automatically and create a Movie Wishlist. I just write the names of movies I want to see, and the program finds these movies on IMDb, gets their ratings and links them. It also goes through my “seen” movies and strikes out stuff I’ve seen.

So I can just click on the movie to see what it’s about. I can sort by rating or votes to decide what to see next. And I don’t have to manually strike out anything.

Take a look.

My Fuji Finepix S5600

My digital camera conked off. The cover that holds the battery fell off, and I can’t use it any more.

I went back to my buying principles, and prepared an Excel sheet to choose my next camera. Here’s what I was looking for:

  • Low-light photography. Flashes are lousy. This effectively means I need ISO control.
  • Shutter speed control. I sometimes take really long exposure (3-10s) snaps, and sometimes can’t afford the blur (1/250s).
  • Long battery life. My current camera consumed batteries like crazy.
  • Fast start-up. By the time I got my earlier camera out and it started, it was too late.
  • RAW mode. Gives me more control in Photoshop.

I didn’t care about:

  • megapixels. 2 megapixels (1600×1200) is more than enough, even for my printouts. Takes too much space besides.
  • zoom. I need wide-angle more than zoom, really.
  • removeable lens. I’m not going to carry around multiple lenses.

After scouting around on Amazon for many months, I found the Fuji Finepix S5600. Not an SLR, but had all the features that I wanted, and at a pretty reasonable price.

Fuji Finepix S5600

Here’s a shot I took from my drawing room. This is a 3-second exposure on ISO 100 at F 3.2. The streaks on the road are car headlights.

2006-11-28 01 Newbury Park

As a bonus, it had a pretty good (10X) zoom too. See the brightly lit buildings towards the top-left? That’s Canary Wharf. Below is a blow-up of those buildings from the same spot I took the above photo from.

2006-11-29 01 Canary Wharf

Google search in Tamil

When I wrote my Tamil song lyrics quizzes, I had two problems:

  1. I can’t write in Tamil (not on paper, nor on a computer)
  2. I can’t spell right in Tamil (ந vs ன, ர vs ற)

I overcame the first using a Tamil transliterator. I write in English, and you see it in Tamil.

The problem of ந vs ன was simple. ந occurs as the first letter of a word, and just before த. Nowhere else. (Is this always true?)

But ர vs ற can’t be solved except through experience, and I’m short of that. So, rather than bother my family with every quiz, I used the wisdom of crowds. I googled both spellings of the word. The correct spelling has more Google hits than the incorrect one.

I did this so often, I made a Google gadget out of it.

Just type the word in English, click ‘Search’, and my gadget will search in tamil. It’s amazing how much stuff there is in Tamil on the Web, from song lyrics to texts (thirukkuraL, for example).

You can add this gadget to:

  • your desktop (in the Search Gadgets box, type “http://www.s-anand.net/a/tamilsearchgadget.xml”)
  • your website or blog (click here for the code)
  • Google Reader. Add to Google

Here’s the transliteration table:

Tamil English
a
A or aa
i
I or ee
u
U or oo
e
E
ai
o
O
au
k or g
n
ch or s
j
n
t or d
N
th or dh
n
p or b
m
y
r
l
v
zh
L
R
sh
S
h

Automated resume filtering

I had to screen resumes from a leading MBA school. I’m lazy, and there were hundreds of CVs. So after procrastinating until this morning, I decided on 2 principles:

  1. I will not spend more than 45 minutes on this. (That’s the duration of my train ride to office.)
  2. I will not read a single CV. (I would write a program.)

The CVs were in a single PDF file. I saved it as text (it shrunk from 66MB to 1.6MB without the photos). Then I wrote a Perl program to filter CVs by keywords. We were looking for people with an interest and/or experience in IT consulting, so I picked “technology”, “consulting”, “SAP”, “IBM”, “Accenture”, “Deloitte”, etc.

Anyone without these keywords would fall out of the list. This eliminated 75% of the crowd. But since I didn’t want to read the rest, I used my favourite text-analysis technique: concordance. I extracted 3 words on either side of each keywords, and just read those. It was easy to see who’d “worked with suppliers like IBM” as opposed to who’d worked at IBM.

That’s it! I managed to cut the list down to 10%. Better yet, I also had a preference ranking. People with multiple keywords ranked higher than those with fewer keywords. And all this took little more than my train ride to office.

I can see this going to the next level. It’s easy to write a customised rejection letter, depending on which keywords are missing for each person.

Now, if it’s this easy to filter resumes, I can see every organisation do it in a few years. Which means, you need to write resumes for machines as well, not just for humans! For example, on my next CV, I’ll make sure I include the words “Boston Consulting Group” as well as “BCG” — just in case the software searches for only one of those keywords. Further, I’ll make sure I avoid spelling mistakes!

Playing sounds backwards

You can play a video backwards and still recognise the scenes quite well. Can you do that with sound?

I tried it on this Bryan Adams clip of Summer of ’69 (mp3). When played backwards (mp3), it almost sounds like Arabic!

Instruments sound weird backwards too, like the guitar played backwards and drums played backwards.

It’s seems obvious once you see the wave file. The picture below shows the guitar. The sounds are clearly not symmetric left to right.

Sound wave diagram of a guitar

Whereas this guitar is a lot more symmetric, and doesn’t sound too different backwards.

Sound wave diagram of another guitar

So how come we can’t recognise sounds played backwards, but can recognise video played backwards? (Initially, I thought it was a trivial question. But I couldn’t find a trivial answer. The question may be subtler than it looks.)

Google searches that lead to my site

I stopped using Google Analytics when I redesigned my site. I track my own statistics. This gives me access to raw data, and I can do my own analyses.

I wanted to know the keywords on Google that led to my site. (Google Analytics only gives you phrases.) I also wanted independent words. If you search for “Calvin and Hobbes”, I want to count only “Calvin”, knowing that it’s in the context of “Hobbes”.

So I did this analysis. Here are the keywords that lead to my site. (This is based on 3 weeks of data).

  1. excel in the context of cell, formula, function, leading to my Excel tips. People mostly want to know how to remove errors like #N/A.
  2. calvin in the context of hobbes, fight, club. (There was a great article on how Fight Club is really Calvin and Hobbes.) Most of these queries are searches for specific quotes, and I’ve typed out all the Calvin and Hobbes quotes.
  3. indian in the context of torrents, tv. One of my most popular posts is Indian Torrents. I simply linked to a couple of Google searches, so it’s popularity is unjustified.
  4. tamil in the context of songs, lyrics, movie. This is mostly thanks to the recent tamil quizzes I’ve put up.
  5. mumbai in the context of local, schedule, train. A shockingly large number of people search for Mumbai bus and train schedule, landing on my link to the IIT-B Mumbai Navigator.
  6. anand in the context of s anand, bcg, infosys. This is people searching for me.
  7. irr in the calculating, excel, formula. Calculating IRR turned out to be another unexpectedly popular post.
  8. interview in the context of lehman brothers, bcg, landing at some of my interview experiences.
  9. mckinsey in the context of ppt, presentation. Most of these people are looking for presentations, while I have a link to the McKinsey pre-placement talk at LBS. Interesting that BCG is not on the top 10.
  10. google in the context of engedu, types, authors@google. Though I have several posts about Google, the ones about Google video like Meet the author and on Google TechTalks are the most popular.

Having read the actual queries, I’ve concluded that only the keywords excel, mumbai, anand, irr and interview definitely lead to relevant hits. The rest are debatable. Maybe I should reduce the importance of the less relevant posts on my sitemaps file.

Experiments in sound

Wikipedia says the human voice frequency for speech is between 85 to 155 Hz for men, and 165 to 255 Hz for women. That set me thinking.

  1. What is the limit to our hearing?
  2. How do sounds differ?
  3. How can we synthesise speech?

What are the limits to our hearing?

Kids can hear frequencies from 20 Hz to 20 kHz, while adults hear only up to 12-14 kHz (Frequency Range of Human Hearing).

To check the lower frequency limit, I created an MP3 with sounds from 1 Hz to 100 Hz at 1 second intervals. Just play the sound, and see when you start hearing something. (Of course, whether you can hear something also depends on the volume of your speaker, the ambient noise, etc.) I could hear nothing for the first 40 seconds: so I can’t hear frequencies lower than 40 Hz.

PS: Don’t be worried if you don’t hear anything for a while. You’re not supposed to! Keep the volume at full level, though.

To check the upper frequency limit, I created this MP3 with sounds from 1 kHz to 20 kHz in 1 second intervals. Just play the sound, and see when you stop hearing anything. I couldn’t hear anything beyond 14 seconds: so I can’t hear frequencies beyond 14 kHz.

How do sounds differ?

I took this audio file of someone reciting vowels and plotted a spectrogram (below). A spectrogram plots time on the X axis and frequency on the Y-axis.

Vowels spectrogram

Some observations:

  • All the vowels have evenly spaced bars. (In this case, they’re all multiples of something around 120 Hz.)
  • ‘u’ has the lowest frequency mix. ‘a’ spans from low to high. ‘i’ has a bit of low and a bit of high, nothing in the middle. ‘ai’ and ‘au’ look like ‘a’ followed by ‘i’ and ‘u’ respectively.

How can we synthesise speech?

I don’t know. There are lots of speech synthesizers. They sound robotic. I’m trying to see if knowing what sounds look like improves things. I’ll let you know if I do well.

Link to a Google search rather than a site

When you make a link, there’s no guarantee that the link will work 5 years later. Sites change their URL structure. I’m finding that many of my blog entries from 2000 are invalid.

Sometimes you want to link to a concept rather than a site. In such cases, it’s better to link to a Google query.

For example, rather than link to a site that defines SVG, I could link to the Google search define:SVG.

Rather than link to a tutorial on Excel array formulas, I could link to the Google search excel array formulas. I could even link to the first hit on Google for excel array formulas, mimicking the “I’m feeling lucky” button. This may change over time, but 5 years from now, it’ll still point to the most relevant link.

To link to the Google query for “excel array formulas”, just link to the URL http://www.google.com/search?q=excel+array+formulas. To link directly to the first result, add &btnI=I'm+Feeling+Lucky to the URL. (Linking to A9 is simpler: http://a9.com/excel+array+formulas)

PS: An alternative is to link to a permanent copy of the page from the Wayback machine (it has copies of my page all the way from May 2001 to Mar 2005). (You can’t use Google’s cache. When the site changes, the cache will soon change. But it’s a good defence against site downtime. Manually doing this is a lot of effort. Ideally, future browsers will automatically take you to the Wayback machine or the Google cache. (The Firefox plugins ErrorZilla and CacheIt come close.)