How I do things

Hosting options

I’ve been trying out a number of options for hosting recently, and have settled on Amazon spot instances.

Here were my options:

  • Application hosting, like Google AppEngine. I used this a lot until 2 years ago. Then they changed their pricing, and I realised what “lock-in” means. I can’t just take that code and move it to another server. Besides, I’m a bit wary of Google pulling the plug. Heroku? Same problem. I just want to take the code elsewhere and run it.
  • Shared hosting, like Hostgator. This blog is run on Hostgator and I’m extremely happy with them. But the trouble is, with shared hosting, I don’t get to run long-running processes on any ports I like.
  • Run you own servers. The problem here is quite simple: power cuts in India.
  • Dedicated hosting, like Amazon EC2, Azure, GCE, etc. This remains as pretty much the main hosting option

I’m a price optimisation freak. So I ran the numbers for a year’s worth of usage. I was looking at the CPU cost of a large machine with 7-8GB RAM. Bandwidth and storage are negligible. The cost per hour worked out to:

  • Amazon: $0.32 / hr in Singapore, $0.24 in Virginia
  • Google: $0.29 / hr in Europe
  • Microsoft: $0.32 / hr in US

The price is not all that different, but I need low latency, so Singapore it what it’ll have to be.

EC2 location Latency (ms)
Singapore 139
Oregon, US 334
Japan 517
Ireland 618
Australia 620
California, US 677
Virginia, US 710

Now comes the choice of the right model. At $0.32 per hour, that’s $230 a month.

Amazon offers some ways of getting this down. Instead of on-demand instances, I could go for reserved instances. For a year of usage, that’d get the price down to about $131 a month, nearly halving it. ($739 upfront for a heavy utilisation large reserved instance, with $0.095 * 24 * 365.25 for the year.)

In this case, I know I’ll need the servers for a year. Probably more, but then, I might want to switch later. So this isn’t a bad move. But we can do better. Amazon also offers spot instances. Spot instances might get shut down any time – but in reality, so can on-demand instances. I need to plan for it anyway. I’m not going to host anything that’s so sensitive that if it’s down for a few hours, I’ll have a problem.

But what’s attractive is the pricing. Typically, it’s $0.04 per hour, making it about $29 per month. Even if it shoots up to twice that, at $58, it’s less than a fourth of the on-demand price and less than half the reserved instance price.

I’ve managed to script the entire setup up sequence as shell scripts, and it takes less than an hour to get a new server up and running the software I need. I need to work out a decent backup mechanism. Plus, I could use more reliable storage like like Amazon’s EBS to preserve the data. But on the whole, the pricing is far too attractive and makes the risks worthwhile.

Goodbye Google

Google Reader was where I spent most of my browsing time, but now, it’s shutting down.

Time for alternatives, but not just for Reader: for all Google products. I’m not sure when one of these might go down, become paid, or become unusable.

I just uninstalled Google Drive and Google Talk. but I don’t use it much (I use Skype), so no loss. I’ll leave Chrome for the while, but I’m hearing reports that Firefox is improving faster than Chrome is. Or there’s Chromium.

I’m not worried much about search services (including image, video, scholar and books). When needed, I can switch. Scholar might be a bit sad to lose, but I don’t use it much. Google Translate, too, isn’t essential.

Likewise for content. YouTube’s not a problem. There’re enough other video services. Trends are useful, but not critical. Maps might be, so I’ll try and switch to OpenStreetMap. I don’t use News or Picasa much.

I don’t care much for social media anyway, so Blogger, Orkut and Plus can die any time.

Google’s apps are the worrying ones. Mail and Calendar, in particular. I’ll probably migrate away from them last, but the attempt is on. I’ll be documenting the alternatives I find at https://gist.github.com/sanand0/5176161 (safely cloned locally).

Looks like there’s no safe long-term alternative to being able to host your own apps. Pity.

Streaming audio to iOS via VLC

You can play a song on your PC and listen to it on your iPhone / iPad – converting your PC into a radio station. As with most things VLC related, it’s tough to figure out but obvious in retrospect.

The first thing to do is set up the MIME type for the streaming. This is a bug that has been fixed, but might not have made it into your version of VLC.

Go to Tools – Preferences.

vlc-pref-1

Click on “All” to see all the settings.

vlc-pref-2

Under Stream output – Access output – HTTP, set Mime to audio/x-mpeg.

vlc-pref-3

At this point, you should restart VLC.

As I mentioned earlier, you might not need to do this if you have new enough a version of VLC that auto-detects the content’s MIME type.

Re-open VLC, and go to the Media – Stream menu.

vlc-stream-1

Click Add and choose the file you want to stream. Then click on Stream.

vlc-stream-2

Click Next.

vlc-stream-3

Select HTTP and click Add.

vlc-stream-4

Select Audio – MP3 and click on Stream.

vlc-stream-5

At this point, the audio is being streamed at port 8080 of your machine. You can change the port and path in the menu above. (To find your local IP address, open the Command Prompt and type ipconfig.)

Open Safari on your iPhone or iPad, and visit http://your-ip-address:8080/

vlc-ipad-streaming

I haven’t figured out the right codec and MIME type to do this for videos yet, but hopefully will figure it out soon.

Storytelling: Part 1

In a number of sessions I’ve been to, people ask analysts to make their results more interesting – to tell stories with them. I’m co-teaching a course, part of which involves telling stories with data. So this got me thinking: what is a story? How does one teach storytelling to, let’s say, an alien?

Consider this mini-paper.

ABSTRACT: Meter readings exhibit spikes at slab boundaries. We also
find significant evidence of improbably events at round numbers.

Electricity shortage is a serious problem in most Indian states. Part
of this problem is due to the inaccuracy of reporting procedures used
in monitoring meter readings. Our focus here is not to document or
experimentally determine the degree of inaccuracy. We have adopted a
data driven approach to this problem and attempt to model the extent
of inaccuracy using basic statistical analysis techniques such as
histograms and the comparison of means.

Our dataset comprises of the frequency analysis 12-month dataset
containing monthly meter readings of 1.8 million customers in the
State of Andhra Pradesh.

We find that a histogram of these readings shows unexpectedly high
values at the slab boundaries: 50 (+45.342%, t > 13.431), 100
(+55.134%, t > 16.384), 200 (+33.341%, t > 15.232), and 300
(+42.138%, t > 19.958).

We also detected spikes at round numbers: 10 (+15.341%, t > 5.315),
20 (+18.576%, t > 6.152), 30 (+11.341%, t > 4.319).

The statistical significance of every deviation listed above is over
99.9%. Further, every deviation has a positive mantissa. This leads us
to confidently declare the existence of a systematic bias in the meter
readings analysed.

You’re probably thinking: “I know why he’s put this example here. It must be a bad one. So, what a rotten paper it must be!”

Well, not quite. It’s a good piece of analysis. I did it myself and there’s a fair bit of effort and care behind these short paragraphs.

The trouble is, if I read it out to my daughter, she’d say “What?” and not understand a word. My wife’d say “So what?” and not care a bit. I might as well not have written it.

It’s like that Zen thing: If a tree falls in a forest and no on hears it, does it make a sound?

If you did a piece of analysis, and no one understands or cares about it, why did you do it in the first place?

Why do you do it?

That last question is important: why do we analyse?

Sometimes, we do it for fun. The knowledge is beautiful. Knowing Tetris is NP-Complete is rewarding, even though my colleague sarcastically remarked, “Thank God! I’m sooo relieved now that I know that Tetris is NP whatever.” If that’s the case with you, great. Write the analysis any which way you’ll enjoy.

Sometimes, we do it because we’re forced to. In class. At work. Wherever. But that’s another way of saying “I don’t know why I’m doing it.” In that case, I’d gently recommend watching 3 Idiots.

Most often, we do it to share knowledge and drive actions. In that case, if no on understands it, or does anything with it, why do it?

Keep it simple

We prerajulisation of Farhanitate flagellated with ...

Would your audience understand that? Or are you just scared that simple words indicate a simple mind?

I was once afraid. 15 years ago, when writing a paper on IBM India’s competitive advantage for the CXOs, I was worried about it being too simple. I didn’t know anything about management. So I filled it with jargon. They politely nodded when I presented it, but I wasn’t fooling anyone. If there’s no content, jargon doesn’t help.

Unfortunately, it’s become polite to accept jargon as a substitute for substance. Why were they not ripping me apart? Or at least, kindly asking me what on earth I wanted to say?

My friend Manoj did that. In his nice, humble way, he asked, “But Anand, what does this mean?” When I explained it to him, I found I didn’t have a clue. He was OK with that. He just wanted to make sure he hadn’t missed something.

(That’s the technique I use these days. Ask people to explain things clearly. It’s OK if they’re just lost in jargon. I just want to make sure I haven’t missed something.)

Don’t cloak your ignorance. No one will think less of you. In the long run, you’ll learn more, and won’t need the jargon.

Part 2 of the article will talk about focusing on people and actions; storylining and the pyramid principle; and the structure of messages.

Style of blogging

Until 2007, my blog was mostly just linking to stuff I found interesting on the Web. Since 2007, I’ve tried to write longer articles, mostly based on my own experiences.

At the moment, that’s unsustainable. Right now, being in a startup, I doing more stuff than I ever have in the past. (That does not mean working more hours, by the way.)

My posts, going forward, are likely to be smaller, less original, but hopefully more frequent.

Downloading songs from YouTube

Five years ago, I built a song search engine – mainly because I needed to listen to songs. Three years ago, I stopped updating it – mainly because I stopped listening to songs actively, and have been busy since. For those of you who have been using my site for music: my apologies.

These days, I don’t really find the need to download music. YouTube has most of the songs I need. Bandwidth is pretty good too even when on the move.

But when I do need to download music, this is my new workflow.

  1. Find the song on YouTube. (Misspellings are still an issue, but you’ll usually find what you need)
  2. Download the video. Keepvid is the simple option. youtube-dlis the geek’s option (for multiple downloads)
  3. Use VLC – the swiss-army knife of media – to convert the video into an MP3.

That last step requires a bit of explaining. It’s very simple once you know how, but it took me a few months to get it right. So here goes.

Select the Convert / Save option in the Media menu.

audio-conversion-1

Click on Add to open file you want to convert. You can pick a track from an disk as well if you want to rip an audio CD or a DVD.

audio-conversion-2

Choose the file.

audio-conversion-3

Click on Convert / Save.

audio-conversion-4

Type the destination filename. Make sure you type the full file name, and not just the name of the folder.

audio-conversion-5

Select the output format you want under Settings – Profile. You can tweak the bitrate with the settings button, but I usually don’t bother.

audio-conversion-6

When you click on the Start button, the file will be converted or the CD will be ripped. You’ll see the position marker move fairly fast.

audio-conversion-7

 

The only problem I have with this method is that I can’t seem to do batch conversions easily enough with the GUI. Does anyone have any other workflow they like?

Update (31 Jul 2012): Aditya Sengupta suggests the following: (should’ve guessed VLC would have something up its sleeve)

vlc -I dummy $FILENAME --no-sout-video --sout "#transcode{acodec=mp3,5Dab=AUDIO_BITRATE,channels=2}:std{access=file,mux=raw,dst=$NAME.mp3}" vlc://quit

Scraping for a laptop

I’ve returned my laptop, and it’s time to buy a new one. For the first time in my life, I’m buying a laptop for myself.

I have a fairly clear idea of what I want: a 500GB+ 7200 rpm hard disk with 4GB of RAM and an Intel Core i7. I thought that would make finding one of those powerful laptops for producing music since I record some stuff too out of hobby.

Sheer naïveté. Not a single site let me filter by hard disk rpm in India. (To be fair, I haven’t found any sites outside India that did that either.)

After spending a good two hours hunting for the details and collating it, I did what I normally would: spend 30 minutes writing a scraper. The scraper runs through all laptops on Flipkart and pulls out all of their specs. Thanks to the diligence of the good folks at Flipkart, this information is readily available on each page. The HTML is structured quite neatly too, so it was just a 30-line program to scrape it all. Full credit to ScraperWiki as well — I could use it on a netbook without any developer tools installed.

The scraper took 2 hours to run. Feel free to filter through the output (CSV) for your favourite laptop, or fork the code and pull any other data you like.

The next chapter of my life

I’m writing this post on a one-way flight from London back to India. I’ve moved on from Infosys Consulting, and am starting up on my own.

I’ve wanted to do this for a long time. There’s always more freedom in your own company than someone else’s. There’s often more money in it too, if you’re lucky enough. But my upbringing is a bit too conservative to make that bold step. However, given that my father runs his own firm, I figured it was just a question of time for me to do the same.

Two years ago, in Jan 2010, I picked up Rashmi Bansal’s Stay Hungry Stay Foolish at an airport. That book killed the last bit of resistance I had. If the people in that book could succeed, I felt I could too. And if what they did (building small companies, not huge ones) could be called a success, I could be successful too.

After the flight, it was clear in my mind. I would be an entrepreneur. I would create a small company that would probably fold. Then I’d do it again. And again, 10 times, because 1 in 10 companies survive. And finally, I’d be running a small business that’d be called successful by virtue of having survived. A modest, achievable ambition that I had the courage for.

I usually make big decisions without analysis, by just sleeping over them. I slept over it and announced it to my family the next day. I’m not sure they believed me.

Two months later, along with a friend, I built a dynamic digital image resizing product. We had our wives start a company in the UK, and tried selling it to retailers. There clearly was a demand. The problem was, we didn’t know how to sell. After a year and having spent £500 with no sales, it was clear to us that venture #1 had failed. We eventually shut it down.

In the middle of this, my ex- boss from IBM told me that he was looking to start a venture, focusing on mobile, rural BPO and energy management. This later on changed to data analytics and visualisation. They all sounded like fun, so I said I’ll help out in my spare time.

A few months later, a classmate told me he’d started a business digitising school report cards. That sounded like fun too, so I said I’d help out in my spare time.

Now, if that sounds like I had a lot of spare time on my hands — you’re right, I did. And it’s time to talk about the jobs in my life. My first 3 years at IBM were fun. I was coding, learning, and leading a bachelor’s life with friends, money, and no responsibilities. My 4 years at BCG were strenuous with 80-hour weeks, but it was interesting and challenging. I was newly married, and between work and home responsibilities, I had no time for fun.

I moved to Infosys Consulting in the UK with the specific aim of rectifying that (and for health reasons as well). In the last 7 years, the work has (except on occasion) been a bit boring, but very relaxing. On most days, I would spend 4 hours working, and 4 hours learning new stuff. The things I learnt only helped me be more efficient. So I ended up getting even more work done in less time.

Many things came out of this. Firstly, I recovered my health. We had a daughter, and I spent more time with her. I started coding in earnest again. By 2007, I was writing code as part of my projects — stuff that others whose job it was were unable to. By 2009, I had a few websites running, like an Indian music search engine, an IMDb Top 250 tracker, a few transliterators, and so on.

So when I said I’d help out with these startups, it wasn’t an empty promise. For the last 18 months, I’ve had a day job and three night jobs. I never did justice to any of them in my opinion, but I had more fun than ever in my life, I learnt more than ever in my life, and I produced more tangible output than ever in my life. Sometimes, quantity beats quality or reliability.

Both these startups are doing well today. Gramener.com offers data visualisation and IT services. I will be joining them as Chief Data Scientist. Reportbee.com offers a hosted report card solution. I will continue helping them out. And I will continue working with a few NGOs.

You’ll see me a lot more active online now. I can publicly write about my work — something I’ve been unable to do the last 11 years.

I am relocating to Bangalore. From a professional front, it’s an obvious choice. That’s where the geeks are. In my last visit to India, I was at Bangalore, Chennai and Hyderabad. In the latter two, it’s tough to meet geeks. And when you do, it’s no easier to find the next. Bangalore has many more geeks, and they’re fairly well networked.

From a personal front, too, Bangalore works well. It’s close enough to Chennai without actually being in Chennai.

It’s 10am on Thu 12th Jan. Our flight is descending into Delhi airport. It’s the start of a new chapter in my life. Scary, but exciting. Wish me luck!

Eating more for less

A couple of years ago, I managed to lose a fair bit of weight. At the start of 2010, I started putting it back on, and the trajectory continues. I’m at the stage where I seriously need to lose weight. I subscribe to The Hacker’s Diet principle – that you lose weight by eating less, not exercising.
An hour of jogging is worth about one Cheese Whopper. Now, are you going to really spend an hour on the road every day just to burn off that extra burger? You don't exercise to lose weight (although it certainly helps). You exercise because you'll live longer and you'll feel better.
I’m afraid I’ll live too long anyway, so I won't bother exercising just yet. It's down to eating less. Sadly, I like food. So to make my “diet” work, I need foods that add less calories per gram. Usually, when browsing stores, I check these manually. But being a geek, I figured there’s an easier way. Below is a graph of some foods (the kind I particularly need to avoid, but still end up eating). The ones on the top add a lot of calories (per 100g), and better to avoid. The ones at the right cost a lot more. Now, I’m no longer at the point where I need to worry about food expenses, but still, I can’t quite kick the habit, also you might want to check out this Rootine's comparison of B12 methylcobalamin and cyanocobalamin that will help you in your diet. Hover over the foods to see what they are, and click on them to visit the product. (If you’re using an RSS reader and this doesn’t work, read on my site.)
(The data was picked from Tesco.) It’s interesting that cereals are in the middle of the calorie range. I always thought they’d be low calories per gram. Turns out that if I want to to have such foods, I’m better off with desserts or ice creams (profiterole, lemon meringue or tiramisu). In fact, even jams have less calories than cereals. But there are some desserts to avoid. Nuts are a disaster. So are chocolates. Gums, dates and honey are in the middle – about as good as cereals. Salsa dip seems surprisingly low. Custards seem to hit the sweet spot – cheap, and very low in calories. Same for jellies. So: custards and jelly. My daughter’s going to be happy.