How I do things

Downloading songs from YouTube

Five years ago, I built a song search engine – mainly because I needed to listen to songs. Three years ago, I stopped updating it – mainly because I stopped listening to songs actively, and have been busy since. For those of you who have been using my site for music: my apologies.

These days, I don’t really find the need to download music. YouTube has most of the songs I need. Bandwidth is pretty good too even when on the move.

But when I do need to download music, this is my new workflow.

  1. Find the song on YouTube. (Misspellings are still an issue, but you’ll usually find what you need)
  2. Download the video. Keepvid is the simple option. youtube-dlis the geek’s option (for multiple downloads)
  3. Use VLC – the swiss-army knife of media – to convert the video into an MP3.

That last step requires a bit of explaining. It’s very simple once you know how, but it took me a few months to get it right. So here goes.

Select the Convert / Save option in the Media menu.

audio-conversion-1

Click on Add to open file you want to convert. You can pick a track from an disk as well if you want to rip an audio CD or a DVD.

audio-conversion-2

Choose the file.

audio-conversion-3

Click on Convert / Save.

audio-conversion-4

Type the destination filename. Make sure you type the full file name, and not just the name of the folder.

audio-conversion-5

Select the output format you want under Settings – Profile. You can tweak the bitrate with the settings button, but I usually don’t bother.

audio-conversion-6

When you click on the Start button, the file will be converted or the CD will be ripped. You’ll see the position marker move fairly fast.

audio-conversion-7

 

The only problem I have with this method is that I can’t seem to do batch conversions easily enough with the GUI. Does anyone have any other workflow they like?

Update (31 Jul 2012): Aditya Sengupta suggests the following: (should’ve guessed VLC would have something up its sleeve)

vlc -I dummy $FILENAME --no-sout-video --sout "#transcode{acodec=mp3,5Dab=AUDIO_BITRATE,channels=2}:std{access=file,mux=raw,dst=$NAME.mp3}" vlc://quit

Scraping for a laptop

I’ve returned my laptop, and it’s time to buy a new one. For the first time in my life, I’m buying a laptop for myself.

I have a fairly clear idea of what I want: a 500GB+ 7200 rpm hard disk with 4GB of RAM and an Intel Core i7. I thought that would make finding one of those powerful laptops for producing music since I record some stuff too out of hobby.

Sheer naïveté. Not a single site let me filter by hard disk rpm in India. (To be fair, I haven’t found any sites outside India that did that either.)

After spending a good two hours hunting for the details and collating it, I did what I normally would: spend 30 minutes writing a scraper. The scraper runs through all laptops on Flipkart and pulls out all of their specs. Thanks to the diligence of the good folks at Flipkart, this information is readily available on each page. The HTML is structured quite neatly too, so it was just a 30-line program to scrape it all. Full credit to ScraperWiki as well — I could use it on a netbook without any developer tools installed.

The scraper took 2 hours to run. Feel free to filter through the output (CSV) for your favourite laptop, or fork the code and pull any other data you like.

The next chapter of my life

I’m writing this post on a one-way flight from London back to India. I’ve moved on from Infosys Consulting, and am starting up on my own.

I’ve wanted to do this for a long time. There’s always more freedom in your own company than someone else’s. There’s often more money in it too, if you’re lucky enough. But my upbringing is a bit too conservative to make that bold step. However, given that my father runs his own firm, I figured it was just a question of time for me to do the same.

Two years ago, in Jan 2010, I picked up Rashmi Bansal’s Stay Hungry Stay Foolish at an airport. That book killed the last bit of resistance I had. If the people in that book could succeed, I felt I could too. And if what they did (building small companies, not huge ones) could be called a success, I could be successful too.

After the flight, it was clear in my mind. I would be an entrepreneur. I would create a small company that would probably fold. Then I’d do it again. And again, 10 times, because 1 in 10 companies survive. And finally, I’d be running a small business that’d be called successful by virtue of having survived. A modest, achievable ambition that I had the courage for.

I usually make big decisions without analysis, by just sleeping over them. I slept over it and announced it to my family the next day. I’m not sure they believed me.

Two months later, along with a friend, I built a dynamic digital image resizing product. We had our wives start a company in the UK, and tried selling it to retailers. There clearly was a demand. The problem was, we didn’t know how to sell. After a year and having spent £500 with no sales, it was clear to us that venture #1 had failed. We eventually shut it down.

In the middle of this, my ex- boss from IBM told me that he was looking to start a venture, focusing on mobile, rural BPO and energy management. This later on changed to data analytics and visualisation. They all sounded like fun, so I said I’ll help out in my spare time.

A few months later, a classmate told me he’d started a business digitising school report cards. That sounded like fun too, so I said I’d help out in my spare time.

Now, if that sounds like I had a lot of spare time on my hands — you’re right, I did. And it’s time to talk about the jobs in my life. My first 3 years at IBM were fun. I was coding, learning, and leading a bachelor’s life with friends, money, and no responsibilities. My 4 years at BCG were strenuous with 80-hour weeks, but it was interesting and challenging. I was newly married, and between work and home responsibilities, I had no time for fun.

I moved to Infosys Consulting in the UK with the specific aim of rectifying that (and for health reasons as well). In the last 7 years, the work has (except on occasion) been a bit boring, but very relaxing. On most days, I would spend 4 hours working, and 4 hours learning new stuff. The things I learnt only helped me be more efficient. So I ended up getting even more work done in less time.

Many things came out of this. Firstly, I recovered my health. We had a daughter, and I spent more time with her. I started coding in earnest again. By 2007, I was writing code as part of my projects — stuff that others whose job it was were unable to. By 2009, I had a few websites running, like an Indian music search engine, an IMDb Top 250 tracker, a few transliterators, and so on.

So when I said I’d help out with these startups, it wasn’t an empty promise. For the last 18 months, I’ve had a day job and three night jobs. I never did justice to any of them in my opinion, but I had more fun than ever in my life, I learnt more than ever in my life, and I produced more tangible output than ever in my life. Sometimes, quantity beats quality or reliability.

Both these startups are doing well today. Gramener.com offers data visualisation and IT services. I will be joining them as Chief Data Scientist. Reportbee.com offers a hosted report card solution. I will continue helping them out. And I will continue working with a few NGOs.

You’ll see me a lot more active online now. I can publicly write about my work — something I’ve been unable to do the last 11 years.

I am relocating to Bangalore. From a professional front, it’s an obvious choice. That’s where the geeks are. In my last visit to India, I was at Bangalore, Chennai and Hyderabad. In the latter two, it’s tough to meet geeks. And when you do, it’s no easier to find the next. Bangalore has many more geeks, and they’re fairly well networked.

From a personal front, too, Bangalore works well. It’s close enough to Chennai without actually being in Chennai.

It’s 10am on Thu 12th Jan. Our flight is descending into Delhi airport. It’s the start of a new chapter in my life. Scary, but exciting. Wish me luck!

Eating more for less

A couple of years ago, I managed to lose a fair bit of weight. At the start of 2010, I started putting it back on, and the trajectory continues. I’m at the stage where I seriously need to lose weight. I subscribe to The Hacker’s Diet principle – that you lose weight by eating less, not exercising.
An hour of jogging is worth about one Cheese Whopper. Now, are you going to really spend an hour on the road every day just to burn off that extra burger? You don't exercise to lose weight (although it certainly helps). You exercise because you'll live longer and you'll feel better.
I’m afraid I’ll live too long anyway, so I won't bother exercising just yet. It's down to eating less. Sadly, I like food. So to make my “diet” work, I need foods that add less calories per gram. Usually, when browsing stores, I check these manually. But being a geek, I figured there’s an easier way. Below is a graph of some foods (the kind I particularly need to avoid, but still end up eating). The ones on the top add a lot of calories (per 100g), and better to avoid. The ones at the right cost a lot more. Now, I’m no longer at the point where I need to worry about food expenses, but still, I can’t quite kick the habit, also you might want to check out this Rootine's comparison of B12 methylcobalamin and cyanocobalamin that will help you in your diet. Hover over the foods to see what they are, and click on them to visit the product. (If you’re using an RSS reader and this doesn’t work, read on my site.)
(The data was picked from Tesco.) It’s interesting that cereals are in the middle of the calorie range. I always thought they’d be low calories per gram. Turns out that if I want to to have such foods, I’m better off with desserts or ice creams (profiterole, lemon meringue or tiramisu). In fact, even jams have less calories than cereals. But there are some desserts to avoid. Nuts are a disaster. So are chocolates. Gums, dates and honey are in the middle – about as good as cereals. Salsa dip seems surprisingly low. Custards seem to hit the sweet spot – cheap, and very low in calories. Same for jellies. So: custards and jelly. My daughter’s going to be happy.

Visualising the IMDb

The IMDb Top 250, as a source of movies, dries out quickly. In my case, I’ve seen about 175/250. Not sure how much I want to see the rest.

When chatting with Col Needham (who’s working his way through every movie with over 40,000 votes), I came up with this as a useful way of finding what movies to watch next.

visualising-the-imdb-1

Each box is one or more movies. Darker boxes mean more movies. Those on the right have more votes.  Those on top have a better rating. The ones I’ve seen are green, the rest are red. (I’ve seen more movies than that – just haven’t marked them green yet 🙂

I think people like to watch the movies on the top right – that popularity compensates (at least partly) for rating, and the number of votes is an indication of popularity.

For example, my movie pattern tells me that I ought to see Cidade de Deus, Inglourious Basterds and Heat – which I knew from the IMDb Top 250, but also that I ought to cover Kick-Ass, The Hangover and Juno.

visualising-the-imdb-2

It’s easy to pick movies in a specific genre as well.

visualising-the-imdb-3

Clearly, there are many more Comedy movies in the list than any other type – though Romance and Action are doing fine too. And I seem the have a strong preference for the Fantasy genre, in stark contrast to Horror.

(Incidentally, I’ve given up trying to see The Shining after three attempts. Stephen King’s scary enough. The novel kept me awake checking under my bed for a week at night. Then there’s Stanley Kubrick’s style. A Clockwork Orange was disturbing enough, but Haley Joel Osment in the first part of A.I. was downright scary. Finally, there’s Jack Nicholson. Sorry, but I won’t risk that combination on a bright sunny day with the doors open.)

You can track your list at http://250.s-anand.net/visual.

For those who want to play with the code, it’s at http://code.google.com/p/two-fifty/source/browse/trunk/visual.html.

Google search via e-mail

I’ve updated Mixamail to access Google search results via e-mail.

For those new here, Mixamail is an e-mail client for Twitter. It lets you read and update Twitter just using your e-mail (you’ll have to register once via Twitter, though).

Now, you can send an e-mail to twitter@mixamail.com with a subject of “Google” and a body containing your query. You’ll get a reply within a few seconds (~20 seconds on my BlackBerry) with the top 8 search results along with the snippets.

It’s the snippets that contain the useful information, as far as I’m concerned. Just yesterday, I managed to find the show timings for Manmadan Ambu at the Ilford Cine World via a search on Mixamail. (Mixamail win, but the movie was a let down, given expectations.)

You don’t need to be registered to use this. So if you’re ever stuck with just e-mail access, just mail twitter@mixamail.com with a subject “Google”.

PS: The code is on Github.

Automated image enhancement

There are some standard enhancements that I apply to my photos consistently: auto-levels, increase saturation, increase sharpness, etc. I’d also read that Flickr sharpens uploads (at least, the resized ones) so that they look better.

So last week, I took 100 of my photos and created 4 versions of each image:

  1. The base image itself (example)
  2. A sharpened version (example). I used a sharpening factor of 200%
  3. A saturated version (example). I used a saturation factor of 125%
  4. An auto-levelled version (example)

I created a test asking people to compare these. The differences between these are not always noticeable when placed side-by-side, so the test flashed two images at the same place.

After about 800 ratings, here are the results. (Or, see the raw data.)

Sharpening clearly helps. 86% of the sharpened images were marked as better than the base images. Only 2 images (base/sharp, base/sharp) received a consistent feedback that the sharpened images were worse. (I have my doubts about those two as well.) On the whole, it seems fairly clear that sharpening helps.

Saturation and levels were roughly equal, and somewhat unclear. 69% of the saturated images and 68% of auto-levelled images were marked as better than the base images. And almost an equal number of images (52%) showed saturation as being better than the auto-levelled version. For a majority of images (60%), there’s a divided opinion on whether saturation was better than levelling or the other way around.

On the whole, sharpening is a clear win. When in doubt, sharpen images.

For saturation and levelling, there certainly appears to be potential. 2 in 3 images are improved by either of these techniques. But it isn’t entirely obvious which (or both) to apply.

Is there someone out there with some image processing experience to shed light on this?

Surviving in prison

As promised, here are some tips from the trenches on surviving in prison. (For those who don’t follow my blog, prison is where your Internet access is restricted.)

There are two things you need to know better: software and people. I’ll try and cover the software in this post, and the more important topic in the next.

Portable apps

You’re often not in control of your laptops / PCs. You don’t have administrator access. You can’t install software. The solution is to install Portable Apps. Most popular applications have been converted into Portable Apps that you can install on to a USB stick. Just plug them into any machine and use them. I use Firefox and Skype quite extensively this way, but increasingly, I have a preference for Portable Apps for just about everything. It makes my bloated Start Menu a lot more manageable. Some of the other portable apps I have are: Audacity, Camstudio, GIMP, Inkscape and Notepad++.

Admin access

The other possibility is that you try and gain admin access. I did this once at a client site (a large bank). We didn’t have admin access. I wasn’t particularly thrilled. So I borrowed a floppy, installed an offline password recovery tool, rebooted, and got the admin password within a few minutes. This is with the full knowledge of the (somewhat worried) client. This is where the people part comes in, and I’ll talk about that later.

Proxies

But before you do any of these, you need to be able to download the files, most of which are executables. Those are probably blocked. Heck, the sites from which you can download these files are probably blocked in the first place.

Sometimes, internal proxies help. Proxies for different geographies may have different degrees of freedom. When I was at IBM, the Internet was accessible from most US proxies, just not from the Indian proxy. So it may just be a matter of finding the right internal proxy.

Or you can search for external public proxies. Sadly, many of these are blocked. Another option is for you to set up your own proxy. You can install mirrorrr on AppEngine for free, for example.

The most effective option, of course, is to use SSH tunnels. I’ve covered this is some detail earlier.

Google

Google has a wide range of tools that can help access blocked sites. If the site you’re accessing provides public RSS feeds, use Google Reader to access these. Public feeds for Twitter, for example, are available as RSS feeds.

Google’s cache is another way of getting the same information. Search for the URL, click on the “Cache” link to read the text even if it’s blocked.

To find more such help, Google for it!

Peopleware

… but all of this is, honestly, just a small part of it. The key, really, is to understand the people restricting your access. I’ll talk about this next.

Twitter via e-mail

Since I don’t have Internet access on my BlackBerry (because I’m in prison), I’ve had a pretty low incentive to use Twitter. Twitter’s really handy when you’re on the move, and over the last year, there were dozens of occasions where I really wanted to tweet something, but didn’t have anything except my BlackBerry on hand. Since T-Mobile doesn’t support Twitter via SMS, e-mail is my only option, and I haven’t been able to find a decent service that does what I want it to do.

So, obviously, I wrote one this weekend: Mixamail.com.

I’ve kept it as simple as I could. If I send an email to twitter@mixamail.com, it replies with the latest tweets on my Twitter home page. If I mail it again, it replies with new tweets since the last email.

I can update my status by sending a mail with “Update” as the subject. The first line of the body is the status.

I can reply to tweets. The tweets contain a “reply” link that opens a new mail and replies to it.

I can subscribe to tweets. Sending an email with “subscribe” as the subject sends the day’s tweets back to me every day at the same hour that I subscribed. (I’m keeping it down to daily for the moment, but if I use it enough, may expand it to a higher frequency.)

Soon enough, I’ll add re-tweeting and (update: added retweets on 27 Oct) a few other things. I intend keeping this free. Will release the source as well once I document it. The source code is at Github.

Give it a spin: Mixamail.com. Let me know how it goes!


For the technically minded, here are a few more details. I spent last night scouting for a small, nice, domain name using nxdom. I bought it using Google Apps for $10. The application itself is written in Python and hosted on AppEngine. I use the Twitter API via OAuth and store the user state via Lilcookies. The HTML is based on html5boilerplate, and has no images.

R scatterplots

I was browsing through Beautiful Data, and stumbled upon this gem of a visualisation.

r-scatterplots

This is the default plot R provides when supplied with a table of data. A beautiful use of small multiples. Each box is a scatterplot of a pair of variables. The diagonal is used to label the rows. It shows for every pair of variables their correlation and spread – at a glance.

Whenever I get any new piece of data, this is going to be the very first thing I do:

plot(data)