Year: 2006

Visualisation of data

I have managed to fill hard disks of all capacities within a few months. My first PC had 10MB of disk space, while I work on 140GB today (remember: that’s 14 thousand times more capacity in 14 years). Both were filled within 2 months. (An aside: the number of files / folders hasn’t growth by 14,000. The files themselves have grown in size. I have roughly the same number of files/folders today on my machine as I had 14 years ago.)

To regain space, I used to go through every file and delete the unnecessary ones. My favourite tool was the UNIX utility du (Disk Usage). It lists the disk space used by every subdirectory. I would sort the result and find big, useless stuff. Here are the first few lines of a sorted du output:

1342507 ./Books
1188020 ./Non-Fiction
1047607 ./Comics
842832 ./Non-Fiction.Magazines
594939 ./Audio
298737 ./Books/kokona – Business
172166 ./Books/Terry Pratchett
164246 ./Books/Terry Pratchett/Discworld
162287 ./Calvin
142274 ./Books/S
77407 ./Scripts
74858 ./Science

It would take 5 minutes to create the list, and 15 minutes to read.

Nowadays I use WinDirStat, which shows every file and folder in an intuitive, graphical manner.

Treemap from WinDirStat

This view is called a Treemap. Each small block is a file. Bigger blocks are folders. Colours indicate the type of file (MP3s are blue, AVIs are red, WMVs are yellow, JPGs are green, etc.). This view has many advantages:

  • I can see the relative sizes of files and folders.
  • I can get an idea of the % of free space (grey block).
  • I can see what type of files occupy the most space.
  • etc. etc.

But the most important thing is, I see the useful stuff at a single glance.

That’s the key in visualisation: conveying a complex topic so people get it in a second.

(Incidentally, Google has a TechTalk on visualisation, including treemaps.)

Google searches that lead to my site

I stopped using Google Analytics when I redesigned my site. I track my own statistics. This gives me access to raw data, and I can do my own analyses.

I wanted to know the keywords on Google that led to my site. (Google Analytics only gives you phrases.) I also wanted independent words. If you search for “Calvin and Hobbes”, I want to count only “Calvin”, knowing that it’s in the context of “Hobbes”.

So I did this analysis. Here are the keywords that lead to my site. (This is based on 3 weeks of data).

  1. excel in the context of cell, formula, function, leading to my Excel tips. People mostly want to know how to remove errors like #N/A.
  2. calvin in the context of hobbes, fight, club. (There was a great article on how Fight Club is really Calvin and Hobbes.) Most of these queries are searches for specific quotes, and I’ve typed out all the Calvin and Hobbes quotes.
  3. indian in the context of torrents, tv. One of my most popular posts is Indian Torrents. I simply linked to a couple of Google searches, so it’s popularity is unjustified.
  4. tamil in the context of songs, lyrics, movie. This is mostly thanks to the recent tamil quizzes I’ve put up.
  5. mumbai in the context of local, schedule, train. A shockingly large number of people search for Mumbai bus and train schedule, landing on my link to the IIT-B Mumbai Navigator.
  6. anand in the context of s anand, bcg, infosys. This is people searching for me.
  7. irr in the calculating, excel, formula. Calculating IRR turned out to be another unexpectedly popular post.
  8. interview in the context of lehman brothers, bcg, landing at some of my interview experiences.
  9. mckinsey in the context of ppt, presentation. Most of these people are looking for presentations, while I have a link to the McKinsey pre-placement talk at LBS. Interesting that BCG is not on the top 10.
  10. google in the context of engedu, types, authors@google. Though I have several posts about Google, the ones about Google video like Meet the author and on Google TechTalks are the most popular.

Having read the actual queries, I’ve concluded that only the keywords excel, mumbai, anand, irr and interview definitely lead to relevant hits. The rest are debatable. Maybe I should reduce the importance of the less relevant posts on my sitemaps file.

Movie jigsaw quiz 5 – Tamil

These are stills from Kamal’s movies. Each link points to a different movie. I have jumbled the images. You can move the jumbled blocks around, like a jigsaw. Can you guess the movie?

An honest in-flight announcement

What would an honest in-flight announcement sound like? Among other things, it would say…

Please switch off all mobile phones, since they can interfere with the aircraft’s navigation systems. At least, that’s what you’ve always been told. The real reason to switch them off is because they interfere with mobile networks on the ground, but somehow that doesn’t sound quite so good.