How I do things

Visualising the IMDb

The IMDb Top 250, as a source of movies, dries out quickly. In my case, I’ve seen about 175/250. Not sure how much I want to see the rest.

When chatting with Col Needham (who’s working his way through every movie with over 40,000 votes), I came up with this as a useful way of finding what movies to watch next.

visualising-the-imdb-1

Each box is one or more movies. Darker boxes mean more movies. Those on the right have more votes.  Those on top have a better rating. The ones I’ve seen are green, the rest are red. (I’ve seen more movies than that – just haven’t marked them green yet 🙂

I think people like to watch the movies on the top right – that popularity compensates (at least partly) for rating, and the number of votes is an indication of popularity.

For example, my movie pattern tells me that I ought to see Cidade de Deus, Inglourious Basterds and Heat – which I knew from the IMDb Top 250, but also that I ought to cover Kick-Ass, The Hangover and Juno.

visualising-the-imdb-2

It’s easy to pick movies in a specific genre as well.

visualising-the-imdb-3

Clearly, there are many more Comedy movies in the list than any other type – though Romance and Action are doing fine too. And I seem the have a strong preference for the Fantasy genre, in stark contrast to Horror.

(Incidentally, I’ve given up trying to see The Shining after three attempts. Stephen King’s scary enough. The novel kept me awake checking under my bed for a week at night. Then there’s Stanley Kubrick’s style. A Clockwork Orange was disturbing enough, but Haley Joel Osment in the first part of A.I. was downright scary. Finally, there’s Jack Nicholson. Sorry, but I won’t risk that combination on a bright sunny day with the doors open.)

You can track your list at http://250.s-anand.net/visual.

For those who want to play with the code, it’s at http://code.google.com/p/two-fifty/source/browse/trunk/visual.html.

Google search via e-mail

I’ve updated Mixamail to access Google search results via e-mail.

For those new here, Mixamail is an e-mail client for Twitter. It lets you read and update Twitter just using your e-mail (you’ll have to register once via Twitter, though).

Now, you can send an e-mail to twitter@mixamail.com with a subject of “Google” and a body containing your query. You’ll get a reply within a few seconds (~20 seconds on my BlackBerry) with the top 8 search results along with the snippets.

It’s the snippets that contain the useful information, as far as I’m concerned. Just yesterday, I managed to find the show timings for Manmadan Ambu at the Ilford Cine World via a search on Mixamail. (Mixamail win, but the movie was a let down, given expectations.)

You don’t need to be registered to use this. So if you’re ever stuck with just e-mail access, just mail twitter@mixamail.com with a subject “Google”.

PS: The code is on Github.

Automated image enhancement

There are some standard enhancements that I apply to my photos consistently: auto-levels, increase saturation, increase sharpness, etc. I’d also read that Flickr sharpens uploads (at least, the resized ones) so that they look better.

So last week, I took 100 of my photos and created 4 versions of each image:

  1. The base image itself (example)
  2. A sharpened version (example). I used a sharpening factor of 200%
  3. A saturated version (example). I used a saturation factor of 125%
  4. An auto-levelled version (example)

I created a test asking people to compare these. The differences between these are not always noticeable when placed side-by-side, so the test flashed two images at the same place.

After about 800 ratings, here are the results. (Or, see the raw data.)

Sharpening clearly helps. 86% of the sharpened images were marked as better than the base images. Only 2 images (base/sharp, base/sharp) received a consistent feedback that the sharpened images were worse. (I have my doubts about those two as well.) On the whole, it seems fairly clear that sharpening helps.

Saturation and levels were roughly equal, and somewhat unclear. 69% of the saturated images and 68% of auto-levelled images were marked as better than the base images. And almost an equal number of images (52%) showed saturation as being better than the auto-levelled version. For a majority of images (60%), there’s a divided opinion on whether saturation was better than levelling or the other way around.

On the whole, sharpening is a clear win. When in doubt, sharpen images.

For saturation and levelling, there certainly appears to be potential. 2 in 3 images are improved by either of these techniques. But it isn’t entirely obvious which (or both) to apply.

Is there someone out there with some image processing experience to shed light on this?

Surviving in prison

As promised, here are some tips from the trenches on surviving in prison. (For those who don’t follow my blog, prison is where your Internet access is restricted.)

There are two things you need to know better: software and people. I’ll try and cover the software in this post, and the more important topic in the next.

Portable apps

You’re often not in control of your laptops / PCs. You don’t have administrator access. You can’t install software. The solution is to install Portable Apps. Most popular applications have been converted into Portable Apps that you can install on to a USB stick. Just plug them into any machine and use them. I use Firefox and Skype quite extensively this way, but increasingly, I have a preference for Portable Apps for just about everything. It makes my bloated Start Menu a lot more manageable. Some of the other portable apps I have are: Audacity, Camstudio, GIMP, Inkscape and Notepad++.

Admin access

The other possibility is that you try and gain admin access. I did this once at a client site (a large bank). We didn’t have admin access. I wasn’t particularly thrilled. So I borrowed a floppy, installed an offline password recovery tool, rebooted, and got the admin password within a few minutes. This is with the full knowledge of the (somewhat worried) client. This is where the people part comes in, and I’ll talk about that later.

Proxies

But before you do any of these, you need to be able to download the files, most of which are executables. Those are probably blocked. Heck, the sites from which you can download these files are probably blocked in the first place.

Sometimes, internal proxies help. Proxies for different geographies may have different degrees of freedom. When I was at IBM, the Internet was accessible from most US proxies, just not from the Indian proxy. So it may just be a matter of finding the right internal proxy.

Or you can search for external public proxies. Sadly, many of these are blocked. Another option is for you to set up your own proxy. You can install mirrorrr on AppEngine for free, for example.

The most effective option, of course, is to use SSH tunnels. I’ve covered this is some detail earlier.

Google

Google has a wide range of tools that can help access blocked sites. If the site you’re accessing provides public RSS feeds, use Google Reader to access these. Public feeds for Twitter, for example, are available as RSS feeds.

Google’s cache is another way of getting the same information. Search for the URL, click on the “Cache” link to read the text even if it’s blocked.

To find more such help, Google for it!

Peopleware

… but all of this is, honestly, just a small part of it. The key, really, is to understand the people restricting your access. I’ll talk about this next.

Twitter via e-mail

Since I don’t have Internet access on my BlackBerry (because I’m in prison), I’ve had a pretty low incentive to use Twitter. Twitter’s really handy when you’re on the move, and over the last year, there were dozens of occasions where I really wanted to tweet something, but didn’t have anything except my BlackBerry on hand. Since T-Mobile doesn’t support Twitter via SMS, e-mail is my only option, and I haven’t been able to find a decent service that does what I want it to do.

So, obviously, I wrote one this weekend: Mixamail.com.

I’ve kept it as simple as I could. If I send an email to twitter@mixamail.com, it replies with the latest tweets on my Twitter home page. If I mail it again, it replies with new tweets since the last email.

I can update my status by sending a mail with “Update” as the subject. The first line of the body is the status.

I can reply to tweets. The tweets contain a “reply” link that opens a new mail and replies to it.

I can subscribe to tweets. Sending an email with “subscribe” as the subject sends the day’s tweets back to me every day at the same hour that I subscribed. (I’m keeping it down to daily for the moment, but if I use it enough, may expand it to a higher frequency.)

Soon enough, I’ll add re-tweeting and (update: added retweets on 27 Oct) a few other things. I intend keeping this free. Will release the source as well once I document it. The source code is at Github.

Give it a spin: Mixamail.com. Let me know how it goes!


For the technically minded, here are a few more details. I spent last night scouting for a small, nice, domain name using nxdom. I bought it using Google Apps for $10. The application itself is written in Python and hosted on AppEngine. I use the Twitter API via OAuth and store the user state via Lilcookies. The HTML is based on html5boilerplate, and has no images.

R scatterplots

I was browsing through Beautiful Data, and stumbled upon this gem of a visualisation.

r-scatterplots

This is the default plot R provides when supplied with a table of data. A beautiful use of small multiples. Each box is a scatterplot of a pair of variables. The diagonal is used to label the rows. It shows for every pair of variables their correlation and spread – at a glance.

Whenever I get any new piece of data, this is going to be the very first thing I do:

plot(data)

The Calvin and Hobbes search Takedown

Eight years ago, I started typing out each of the Calvin and Hobbes strips by hand. Four years ago, I set up a site that let people search for strips. Early this month, I was asked to take it down.

This is the story.


I can’t quite remember when I started reading Calvin & Hobbes. The earliest reference I can find in my blogs is in July 1999. I remember it didn’t take me long to become a fan. I’d read every strip on the newspaper; hunt them out at bookshops; and spend a fair bit of time searching for archives online.

At some point, I discovered a few archives of the complete Calvin & Hobbes images. These aren’t hard to find, and they’re still around in plenty. So that gave me a few more months of delight.

The trouble, though, was that I never could quite find a strip when I wanted to. A friend would refuse to accept something, and I’d want to pull out that strip where Calvin declares to reside in the state of “Denial”. Or if they said something fancy, I’d want to pull out the one where Hobbes says “I notice your oeuvre is monochromatic”. Or those strips where Calvin’s Dad explains how things work (“They build bigger and bigger trucks over the bridge until it breaks.”)

There were a few Calvin and Hobbes search engines around. None quite did what I wanted them to – which was to search the text, and show me the strip, with a nice scrollable interface.

So I set out to build one. I can’t remember when, exactly, but it was before Sep 11, 2002.

It took me many years. I’d spend several train rides and evenings typing this stuff out. My friends, employers and family were a bit puzzled, but just added it to my list of eccentricities and carried on. I was halfway there in 2005, pushed further in 2006, and with some help, I managed to finally complete it.

I was able to do a lot of cool stuff with this, like statistically improbable phrases and some amusing posts as well.

It also increased traffic to my site, which was a bit disconcerting. I didn’t want to attract attention. In 2007, I removed the page from Google’s indexes, which cut the number of hits a fair bit. Since then, the site was only visited by a few people that knew of it, and the occasional stumblers.

A month ago, I got reddit-ed and MetaFiltered.

It didn’t take me long to figure that a takedown notice would be on its way. It turned out to be quite a friendly mail, actually – scary only in parts. (A bit of a carrot-and-stick approach, perhaps.) Anyway, it took me all of 2 minutes to remove all of the pages and links.

Of course, the reason I went to all of this effort was because the original Calvin & Hobbes site does not have the search feature. I’ve reached out to United Media, offering my transcripts and code. Let’s see what happens.

Make backgrounds transparent

This is the simplest way that I’ve found to make the background colour of an image transparent.

  1. Download GIMP
  2. Open your image. I’ll pick this one:
  3. Optional: Select Image – Mode – RGB if it’s not RGB.
  4. Select Colors – Colors to Alpha…
  5. Click on the white button next to “From” and select the eye-dropper.
  6. Pick the green colour on the image, and click OK

The anti-aliasing is preserved as well.

Dear Tesco, your books are expensive

Dear Tesco,

I do like you. Really. Your products are invariably cheaper than I can find at most other places. I am a methodical, crazy gadget freak, and I find your gadget pricing impressive. I don’t always find what I want, but you often have the items I finally pick as the best value for money, and at very low prices.

But.

Your books are expensive.

Of Amazon’s bestsellers, just 2 out of the 100 books are cheaper on your site. And this is apart from the fact that I’d get free delivery from Amazon on 37 of those books (over £5), while you’d give me free delivery on 5 (over £15).

On average, that book list costs £5.66 on Amazon. With you, it’s £7.20. I don’t fancy paying 27% more. (36% if I include delivery.)

I’m not making this up. You can check: the books in red are cheaper at Amazon.
(as of 6pm on a cold, rainy Tuesday the end of March.)

Book Amazon Tesco
The Girl Who Kicked the Hornets’ Nest 3.86 3.86
The Girl with the Dragon Tattoo 3.48 3.86
The Girl Who Played with Fire 3.79 3.86
Wolf Hall 3.86 3.86
61 Hours 9.49 9.49
Solar 8.90 13.00
One Day 3.79 3.86
Mums Know Best: The Hairy Bikers’ Family Cookbook 8.98 13.00
The Lovely Bones 2.98 2.98
Breaking Dawn (Twilight Saga) 7.49 14.99
Eclipse (Twilight Saga) 3.99 6.99
New Atkins for a New You: The Ultimate Diet for Shedding Weight and Feeling Great 3.99 5.99
The Return: Nightfall (The Vampire Diaries) 3.49 6.99
New Moon (Twilight Saga) 2.98 6.99
Twilight (Twilight Saga) 3.44 6.99
The Struggle: Bks. 1 & 2 (The Vampire Diaries) 3.48 6.99
Brooklyn 3.86 3.86
Vampire Diaries: Bks. 3 & 4 (The Vampire Diaries) 3.49 5.99
Hamlyn All Colour 200 Slow Cooker Recipes (Hamlyn All Colour Cookbooks) 2.48 2.48
Shutter Island 3.98 3.59
101 One-pot Dishes: Tried-and-tested Recipes (Good Food 101) 1.97 1.97
The Secret Ingredient: Delicious, Easy Recipes Which Might Just Save Your Life 5.97 5.98
Percy Jackson and the Sea of Monsters 3.48 6.99
Percy Jackson and the Last Olympian 3.99 6.99
The Guernsey Literary and Potato Peel Pie Society 3.49 3.49
The Double Comfort Safari Club (No 1 Ladies Detective Agency) 7.99 12.00
The Hummingbird Bakery Cookbook 8.49 8.49
The Gruffalo 2.96 5.99
Percy Jackson and the Titan’s Curse 3.99  
Percy Jackson and the Battle of the Labyrinth 3.48  
The Big Short: Inside the Doomsday Machine 12.50 17.50
The Secret 4.55 4.55
The Time Traveler’s Wife 3.82 3.86
Miss Dahl’s Voluptuous Delights 11.50 13.00
Ching’s Chinese Food in Minutes 9.98 9.98
The Little Stranger 3.86 3.86
ReWork: Change the Way You Work Forever 5.50 7.69
Percy Jackson and the Lightning Thief 3.48 6.99
The Girl Who Kicked the Hornets’ Nest 9.49 9.49
Dead and Gone: A True Blood Novel (Sookie Stackhouse Vampire 9) 5.20 5.20
The Forgotten Highlander: My Incredible Story of Survival During the War in the Far East 9.48 9.48
Room on the Broom 2.98 5.99
Lord Sunday (The Keys to the Kingdom) 3.49 3.49
The Official Highway Code 1.65 1.65
The Gruffalo’s Child 2.94 5.99
The Return: Shadow Souls (The Vampire Diaries) 3.50 5.24
Three Cups of Tea 4.98 4.98
The End of the Party 12.50 12.50
The Very Hungry Caterpillar [Board Book] 2.97 5.99
Annabel Karmel’s New Complete Baby and Toddler Meal Planner 8.69 8.69
Bad Science 3.57 3.57
The Snail and the Whale 2.96 5.99
True Blood Boxed Set (Sookie Stackhouse Vampire) 19.95 19.95
Gone Tomorrow 3.86 3.86
The Italian Diet 6.98 6.98
The Snowman 6.48 9.09
Wedlock: How Georgian Britain’s Worst Husband Met His Match 3.86 3.86
Little Darlings 4.89 8.00
Eat, Pray, Love: One Woman’s Search for Everything 3.98 3.98
101 Meals for Two: Tried-and-tested Recipes (Good Food 101) 1.97 1.97
The Immortals: Blue Moon 3.48 6.99
The Host 3.82 3.82
The Catcher in the Rye 4.48 4.48
The Final Fantasy XIII Complete Official Guide 11.24 14.24
The Book Thief 3.95 3.95
Mexican Food Made Simple 9.99 9.99
Tea Time for the Traditionally Built: The No.1 Ladies’ Detective Agency: The No.1 Ladies’ Detective Agency, Book 10 (No 1 Ladies Detective Agency10) 3.86 3.86
The Oxford Companion to Food (Oxford Companions) 20.00 28.00
Sacred Hearts 3.86 3.86
A Squash and a Squeeze 3.00 5.99
Pokemon HeartGold/ SoulSilver Official Guide 9.74
Cutting for Stone 3.86 3.86
Trespass 8.98 8.98
Wheels on the Bus (Pre School Songs) 2.44 4.76
Alone in Berlin (Penguin Modern Classics) 4.98 4.98
The Good Man Jesus and the Scoundrel Christ (Myths) 7.50 10.00
Twenties Girl 3.85 3.86
The Last Straw (Diary of a Wimpy Kid) 3.48 6.99
Rodrick Rules: Diary of a Wimpy Kid (Book 2) 3.48 6.99
The Natural Navigator 7.48 10.49
The White Queen 3.99 6.39
The Shack 3.95 3.95
Good Food, 101 Cakes and Bakes 1.97 3.45
Faces (Baby’s Very First Book) 2.48 4.99
Lustrum 6.38 6.38
Blacklands 3.86 3.86
The Lost Symbol 8.78 9.00
A Touch of Dead (Sookie Stackhouse Vampire Myst) 5.20 9.09
The Spirit Level: Why More Equal Societies Almost Always Do Better 4.96 4.96
Jamie’s Ministry of Food: Anyone Can Learn to Cook in 24 Hours 15.00 15.00
Ottolenghi: The Cookbook 12.50 16.25
The Ice Cream Girls 5.89 9.00
The Best of Times 3.79 3.86
Whoops!: Why Everyone Owes Everyone and No One Can Pay 11.50 11.50
The Children’s Book 3.86 3.86
Twilight: v. 1: The Graphic Novel (Twilight the Graphic Novel 1) 6.49 12.99
Too Big to Fail: Inside the Battle to Save Wall Street 7.48 7.49
Dark Days (Skulduggery Pleasant – book 4) 5.84 12.99
It’s Only a Movie: Reel Life Adventures of a Film Obsessive 6.00 5.98
The Smartest Giant in Town 3.00 5.99

I do like you. Really. (Despite trying to stop me scraping via user agent detection.) I don’t mind that you don’t have every book. I trust you to pick what I’d most likely want. You’re good at that.

Please make your books less expensive?

Shopping with Cooliris

John Lewis jackets scrolling on CoolIris plugin

Zoom-in view of a jacket at John Lewis

I just put together this little demo that scrapes John Lewis’ site and creates a MediaRSS file out of it.

CoolIris has got to be the best way to shop. Apart from being really pretty, it’s quite useful when you know what something looks like, but don’t quite know how to search for it. For example, I was trying to look for a headphone-microphone (you know, the ones that connect into an iPhone or a Blackberry). I didn’t have a clue what it’s called. (TRRS, if you’re interested. I found out later.) The only way I could get it was to browse the wall…

Amazon search for ear microphones on CoolIris

For the curious, here’s the 50-line source code.