How I do things

Weight lines, again

A few years ago, I ended up losting weight, mostly by dieting. That worked out rather well up to a point: I lost about 20kgs rapidly. But I ended up putting them back on almost as rapidly.

What I learnt from this was that dieting made me more short-tempered. It also reduced my metabolic rate. My body would adjust to the hunger and enter a “starvation-mode”, using the limited food ridiculously efficiently. So I’d have to eat even less to continue losing weight.

This time, I’m going to try it the slow way.

Firstly, my targets are moderate. I plan to lose about 1kg every month. (It’ll take me a few years to achieve my target. That’s good – it postpones the time when I’ll say “Ah, I’m thin. I can eat now” and become fat again.)

Secondly, I’m not going to keep myself hungry. I’m just going try to stop eating when I’m not hungry.

This happens for two reasons. One’s because I usually watch a movie when I eat, and keep eating until the movie’s done. The other’s because when I’m between activities, I raid the kitchen. It’s not realistic to pretend that I can curb these tendencies. But it’s possible to be a bit more aware of them.

Stocking the house with healthier foods helps. I also find that some fruits in particular keep my stomach full for longer. They’re low value for money energy-wise, but great for dieting and health.

Thirdly, I’m going to start exercising, but in my own, slow, way. I’m not good at running or going to the gym, but weirdly, I rather like climbing stairs So the next step is to abandon lifts and only use the stairs.

None of these is a big step. But I’m not in a hurry, and these are more like habits I’d like to get into for the rest of my life rather than short-term measures.

Here’s to a lighter 2014!

Courtesy

We are often subject to body searches, baggage inspections, and identity verifications. At malls. At airports. At offices.

These are to ensure that no one carries ammunition inside, or goods or secrets outside. In other words, to deter terrorists and thieves.

It’s nothing personal, of course. When someone does not know me, I can choose to accept that (or not; the choice is mine).

When I’m invited somewhere, however, I assume that I am not deemed a security threat. Therefore, I expect that:

  • My and my belongings will not be searched or scanned
  • I need not leave behind my personal belongings
  • I need not carry an identity card

Please afford me this courtesy if you are inviting me.


For some months now, I’ve visited many corporate offices. The reception is comprised of security guards, a metal detector and a register. I’m given a tag and an escort.

I’m not fussy. I’m not worried about being greeted, for example. I’m quite happy to plug into a power socket and work on my laptop until logistics are sorted out. But when that happens at the security outpost with no sitting space, or outside the gate in the rain, it inconveniences me.

A few weeks ago, I was at Singapore, and visited a client’s office in slippers. One of them complemented my choice of footwear, and remarked that he had not yet risen high enough in the corporate ladder to afford this luxury. (There’s a series of stories behind my footwear that I’ll get to later.)

That told me something. After a long time, I now can afford this luxury. Especially if someone knows me well enough to invite me to their office.

I hope to point them to this blog post and request that security be arranged so that I can be afforded this small courtesy; be treated with trust rather than as a terrorist or a thief.

(If their organisation’s practice does not permit this, I’m happy to meet outside. Besides, our office is happy to extend warm hospitality.)

Open source in corporates

[This is a post that I’d published internally in InfyBlogs in Dec 2009. Time to share it.]

Last month, my first application went live.

I’ve been writing code for 20 years. Not one line of my code has been officially deployed in a corporate. (Loser…)

It’s a happy feeling. Someone defined happiness as the intersection of pleasure and meaning. Writing code is pleasurable. Others using it is meaningful.

But this post isn’t quite about that. It’s about the hoops I’ve had to jump through to make this happen.

I’ve been living in a nightmare since March 2009. That was when I decided that I’d try and get corporates to use open source.

March 2009

It began with a pitch to a VC firm. They were looking to build a content management system (CMS). Normally we’d pull together slides that say we’ll deliver the moon. This time, we put together demo based on WordPress’ CMS plugins.

The meeting went fabulously well. We said, “Here’s a demo we’ve built for you. Do you like it?” The business lead (Stuart) was drooling and declared that that’s exactly what they wanted. The IT lead (another Stuart) was happy too, but warned the business users: “Just remember: this isn’t how we do development, so don’t get your hopes up that we can deliver stuff like this :-)”

Time to make my point. I asked, “What’s your policy on open source software?”

The business lead went quiet. “I don’t know,” he finally said. Fair enough.

I turned to the IT lead. “Well, we don’t use it as a matter of policy… there are security concerns…” he said.

“Which web server do you use?”

”Oh, OK. I see what you mean. We use Apache. So on a case to case basis, we have exceptions. But generally we have security concerns.“

”Why? Do you believe open source software is more insecure than commercial software?“

He thought about it for a while. “Well… maybe. I don’t know.” We debated this a bit. Then we found the real issue: “It’s just that we don’t have control over the process. We don’t know enough about it to decide.”

A couple of weeks later, I tried pitching to a newspaper. This time, it was our sales team that raised the same question. “But… isn’t open source insecure?”

I didn’t even bother pitching any open source stuff to them. But I’d learnt my lessons:

  1. Demo the application. Don’t talk about it.
  2. Show it to the business first, and then tackle IT.

Aside: June 2009

In June, I got another chance at a client where we were building their new website. The very first thing I did was ask to see the Javascript. Total mess, and filled with browser-incompatible DOM requests. So I went over to their web development team.

“Look, why don’t you guys use a Javascript library? It’ll get you cross browser compatibility and compact maintainable code at the same time.”

And, to their credit, they said, “Sure. Which library?”

I showed them this and we agreed on jQuery. So, if nothing else, I’ve managed to get one open source library into a corporate.

July 2009

I was also looking at payments on the website, and our client was looking to replace their chargeback application. Since I had a week off, I built a working PCI compliant prototype on Django. (I must clarify what I mean by PCI compliant. You see, any application that stores credit card information must pass through a stringent security clearance process. I bypassed the problem by not storing the card information. I’ve realised that I’ve been building PCI compliant applications all my life – and it’s a huge benefit to let people know that.)

This time, I applied the lessons I’d learned, and demo-ed it to the business, who were thrilled. Time to tackle IT.

I started with the architecture team. Matt on the architecture team was the most approachable. So I went over, demo-ed it, and said, “Matt, this took a week to put together. It’s based on some new technologies. Are you game to try these out?”

He was. And quite enthused about it too. So we put together a proposal for the architecture review board, proposing a new technology stack: Django / Python and MySQL. As before, I showed the demo before I talked technology. I had prepared answers to all security related questions upfront (and practically memorised section 3 of the PCI guidelines.) The clincher, though, was the business case. To build it on Java, it would cost ~1,000 person days. On Django, I’d mostly done it in 5. There was no way of justifying 1,000 person days for an application that could save, at best £100,000 a year.

So they said “Go ahead, we’re fine if operations and infrastructure are fine.”

It was time to find a Django developer in Infy. I hunted for a couple of weeks but none was available. (Only 2 people that I knew knew Django in the first place.) So that effort got canned, and we were back to the 1,000 person day solution. (Which got canned too, later.)
But in the process, I’d learned my third lesson.

  1. If you’re trying new technologies, plan on delivering it yourself.

October 2009

Another application popped up that looked like a prime candidate for introducing open source. They were using an Excel application to fraud screen orders, and wanted to make a web app out of it.

I followed the same route as before. Demo it. Show it to business first, then IT. Built it myself. I skipped Architecture, since they’d already approved the technology stack, and took it straight to Infrastructure.

“This application uses Apache as the web server, MySQL as the database, and uses PHP and Javascript for the application logic. Could we get a Linux server to host it?”

Our entire conversation lasted 30 seconds. He said, “No. We use Windows servers” (I was fine)

“… and you’ll need to chance Apache to IIS” (fine again)

“… and we don’t support PHP, so it’ll have to be Java or .NET” (I don’t know .NET or Java… but fine)

“… and we don’t support MySQL, it’ll have to be SQL Server” (fine, I guess)

“… and we don’t have DBAs available until January, so you’ll have to wait.” (definitely not good.)

So back to the drawing board on the technology stack. I needed something in Java (I know very little Java, but nothing at all in .NET) and to avoid the DBA headache, it would have to bundle in a database. I first explored key-value stores like CouchDB, Redis, etc. None of them worked on Java. The only one I found that did was Persevere, and it was a JSON data store, which fit perfectly with my plans.

By this time, I’d also learn my my fourth and most important lesson.

  1. Don’t try to promote open source. Just deliver the application

I said, “This is a custom-built application that runs on Java. Could we get a Windows server to host it?”

The answer was “Yes”, and we had it live the next day.

PS: December 2009

The application’s deployed and running. It has about 10,000 orders fraud screened by now.
And the lessons are well learnt. So when some came over asking if there was any image resizing solution I knew off, I said: “Sure, who’s your business sponsor?” Then I went over and said, “Let me show you this open source application called ImageMagick. It handles aspect ratios correctly, and can crop too. Doesn’t this look professional?” Then I went over to IT and said, “It’s open source, so you can change it. It has Java bindings, so you can integrate it into your environment. It can handle 8 3000×2400 images a second on my puny laptop. It’s used by your competitors. And I can build it for you if you like.”

I might just have my second open source entry into a corporate this year.

The scary Internet

I’m not that difficult to scare, and this log message certainly didn’t help:

ip223.hichina.com [223.4.183.127] failed - POSSIBLE BREAK-IN ATTEMPT!

That’s the message I saw – one thousand five hundred and seventy times yesterday in /var/log/auth.log on one of my Amazon EC2 instances.

Someone, presumably from China, has been patiently trying out a variety of SSH keys to log into this system.

These were grouped as batches. There were exactly 314 attempts at 8am yesterday, then 314 at 12noon, then 314 at 4pm, then 314 at 8pm, then 232 at 3am today. (All times are in UTC – that is, UK time without daylight saving). Every burst took 9 minutes to run through all 314 attempts.

The worst part was, when I tried using SSH this morning, I wasn’t able to log in. (It turned out that I had made a configuration error, but this is the sort of thing that gets me quite worried.)

Perhaps I shouldn’t be complaining. I’ve written enough scrapers to make most webmasters cringe at their logs. I remember a few years ago, when I was working on a project at Tesco, and was scraping bestsellers lists from most sites. (Here’s a blog post about it.) We were putting together a prototype to see how real-time competitive pricing could help.

The scraper was a pretty mild one. It would visit a hundred links, roughly at the pace of one a second. No images were loaded, of course, just the HTML.

One fine day, a few weeks after this had started, I got a call from Andy.

“Hi Anand, are you running any scrapers on our books website?”

“Yes, why?”

“Oh! The site’s very slow. Could you shut it down immediately?”

Turns out that not a single page on the site loaded, and it had almost crawled to a halt. Now, obviously, my little 100-page script could hardly cause damage, but it’s easy to understand their reactions. No unauthorised scraping! After a few days of trying to figure out what the problem was, they increased the memory and things went back to normal. Not a bad solution, actually – throw hardware at the problem, and if it vanishes, it’s probably the cheapest solution.

But anyway, I’m sure it’s some nice chap who’s just curious to know what I’ve got on my servers. I’d be happy to share some of it. And even if it’s not so nice a chap, there’s little that I can do, is there?

Update (1pm India, 3rd June): Actually, I now realise that this has been happening ever four hours since May 29th, as regular as a clockwork. Wish I knew enough UNIX programming to pull a prank…

Hosting options

I’ve been trying out a number of options for hosting recently, and have settled on Amazon spot instances.

Here were my options:

  • Application hosting, like Google AppEngine. I used this a lot until 2 years ago. Then they changed their pricing, and I realised what “lock-in” means. I can’t just take that code and move it to another server. Besides, I’m a bit wary of Google pulling the plug. Heroku? Same problem. I just want to take the code elsewhere and run it.
  • Shared hosting, like Hostgator. This blog is run on Hostgator and I’m extremely happy with them. But the trouble is, with shared hosting, I don’t get to run long-running processes on any ports I like.
  • Run you own servers. The problem here is quite simple: power cuts in India.
  • Dedicated hosting, like Amazon EC2, Azure, GCE, etc. This remains as pretty much the main hosting option

I’m a price optimisation freak. So I ran the numbers for a year’s worth of usage. I was looking at the CPU cost of a large machine with 7-8GB RAM. Bandwidth and storage are negligible. The cost per hour worked out to:

  • Amazon: $0.32 / hr in Singapore, $0.24 in Virginia
  • Google: $0.29 / hr in Europe
  • Microsoft: $0.32 / hr in US

The price is not all that different, but I need low latency, so Singapore it what it’ll have to be.

EC2 location Latency (ms)
Singapore 139
Oregon, US 334
Japan 517
Ireland 618
Australia 620
California, US 677
Virginia, US 710

Now comes the choice of the right model. At $0.32 per hour, that’s $230 a month.

Amazon offers some ways of getting this down. Instead of on-demand instances, I could go for reserved instances. For a year of usage, that’d get the price down to about $131 a month, nearly halving it. ($739 upfront for a heavy utilisation large reserved instance, with $0.095 * 24 * 365.25 for the year.)

In this case, I know I’ll need the servers for a year. Probably more, but then, I might want to switch later. So this isn’t a bad move. But we can do better. Amazon also offers spot instances. Spot instances might get shut down any time – but in reality, so can on-demand instances. I need to plan for it anyway. I’m not going to host anything that’s so sensitive that if it’s down for a few hours, I’ll have a problem.

But what’s attractive is the pricing. Typically, it’s $0.04 per hour, making it about $29 per month. Even if it shoots up to twice that, at $58, it’s less than a fourth of the on-demand price and less than half the reserved instance price.

I’ve managed to script the entire setup up sequence as shell scripts, and it takes less than an hour to get a new server up and running the software I need. I need to work out a decent backup mechanism. Plus, I could use more reliable storage like like Amazon’s EBS to preserve the data. But on the whole, the pricing is far too attractive and makes the risks worthwhile.

Goodbye Google

Google Reader was where I spent most of my browsing time, but now, it’s shutting down.

Time for alternatives, but not just for Reader: for all Google products. I’m not sure when one of these might go down, become paid, or become unusable.

I just uninstalled Google Drive and Google Talk. but I don’t use it much (I use Skype), so no loss. I’ll leave Chrome for the while, but I’m hearing reports that Firefox is improving faster than Chrome is. Or there’s Chromium.

I’m not worried much about search services (including image, video, scholar and books). When needed, I can switch. Scholar might be a bit sad to lose, but I don’t use it much. Google Translate, too, isn’t essential.

Likewise for content. YouTube’s not a problem. There’re enough other video services. Trends are useful, but not critical. Maps might be, so I’ll try and switch to OpenStreetMap. I don’t use News or Picasa much.

I don’t care much for social media anyway, so Blogger, Orkut and Plus can die any time.

Google’s apps are the worrying ones. Mail and Calendar, in particular. I’ll probably migrate away from them last, but the attempt is on. I’ll be documenting the alternatives I find at https://gist.github.com/sanand0/5176161 (safely cloned locally).

Looks like there’s no safe long-term alternative to being able to host your own apps. Pity.

Streaming audio to iOS via VLC

You can play a song on your PC and listen to it on your iPhone / iPad – converting your PC into a radio station. As with most things VLC related, it’s tough to figure out but obvious in retrospect.

The first thing to do is set up the MIME type for the streaming. This is a bug that has been fixed, but might not have made it into your version of VLC.

Go to Tools – Preferences.

vlc-pref-1

Click on “All” to see all the settings.

vlc-pref-2

Under Stream output – Access output – HTTP, set Mime to audio/x-mpeg.

vlc-pref-3

At this point, you should restart VLC.

As I mentioned earlier, you might not need to do this if you have new enough a version of VLC that auto-detects the content’s MIME type.

Re-open VLC, and go to the Media – Stream menu.

vlc-stream-1

Click Add and choose the file you want to stream. Then click on Stream.

vlc-stream-2

Click Next.

vlc-stream-3

Select HTTP and click Add.

vlc-stream-4

Select Audio – MP3 and click on Stream.

vlc-stream-5

At this point, the audio is being streamed at port 8080 of your machine. You can change the port and path in the menu above. (To find your local IP address, open the Command Prompt and type ipconfig.)

Open Safari on your iPhone or iPad, and visit http://your-ip-address:8080/

vlc-ipad-streaming

I haven’t figured out the right codec and MIME type to do this for videos yet, but hopefully will figure it out soon.

Storytelling: Part 1

In a number of sessions I’ve been to, people ask analysts to make their results more interesting – to tell stories with them. I’m co-teaching a course, part of which involves telling stories with data. So this got me thinking: what is a story? How does one teach storytelling to, let’s say, an alien?

Consider this mini-paper.

ABSTRACT: Meter readings exhibit spikes at slab boundaries. We also
find significant evidence of improbably events at round numbers.

Electricity shortage is a serious problem in most Indian states. Part
of this problem is due to the inaccuracy of reporting procedures used
in monitoring meter readings. Our focus here is not to document or
experimentally determine the degree of inaccuracy. We have adopted a
data driven approach to this problem and attempt to model the extent
of inaccuracy using basic statistical analysis techniques such as
histograms and the comparison of means.

Our dataset comprises of the frequency analysis 12-month dataset
containing monthly meter readings of 1.8 million customers in the
State of Andhra Pradesh.

We find that a histogram of these readings shows unexpectedly high
values at the slab boundaries: 50 (+45.342%, t > 13.431), 100
(+55.134%, t > 16.384), 200 (+33.341%, t > 15.232), and 300
(+42.138%, t > 19.958).

We also detected spikes at round numbers: 10 (+15.341%, t > 5.315),
20 (+18.576%, t > 6.152), 30 (+11.341%, t > 4.319).

The statistical significance of every deviation listed above is over
99.9%. Further, every deviation has a positive mantissa. This leads us
to confidently declare the existence of a systematic bias in the meter
readings analysed.

You’re probably thinking: “I know why he’s put this example here. It must be a bad one. So, what a rotten paper it must be!”

Well, not quite. It’s a good piece of analysis. I did it myself and there’s a fair bit of effort and care behind these short paragraphs.

The trouble is, if I read it out to my daughter, she’d say “What?” and not understand a word. My wife’d say “So what?” and not care a bit. I might as well not have written it.

It’s like that Zen thing: If a tree falls in a forest and no on hears it, does it make a sound?

If you did a piece of analysis, and no one understands or cares about it, why did you do it in the first place?

Why do you do it?

That last question is important: why do we analyse?

Sometimes, we do it for fun. The knowledge is beautiful. Knowing Tetris is NP-Complete is rewarding, even though my colleague sarcastically remarked, “Thank God! I’m sooo relieved now that I know that Tetris is NP whatever.” If that’s the case with you, great. Write the analysis any which way you’ll enjoy.

Sometimes, we do it because we’re forced to. In class. At work. Wherever. But that’s another way of saying “I don’t know why I’m doing it.” In that case, I’d gently recommend watching 3 Idiots.

Most often, we do it to share knowledge and drive actions. In that case, if no on understands it, or does anything with it, why do it?

Keep it simple

We prerajulisation of Farhanitate flagellated with ...

Would your audience understand that? Or are you just scared that simple words indicate a simple mind?

I was once afraid. 15 years ago, when writing a paper on IBM India’s competitive advantage for the CXOs, I was worried about it being too simple. I didn’t know anything about management. So I filled it with jargon. They politely nodded when I presented it, but I wasn’t fooling anyone. If there’s no content, jargon doesn’t help.

Unfortunately, it’s become polite to accept jargon as a substitute for substance. Why were they not ripping me apart? Or at least, kindly asking me what on earth I wanted to say?

My friend Manoj did that. In his nice, humble way, he asked, “But Anand, what does this mean?” When I explained it to him, I found I didn’t have a clue. He was OK with that. He just wanted to make sure he hadn’t missed something.

(That’s the technique I use these days. Ask people to explain things clearly. It’s OK if they’re just lost in jargon. I just want to make sure I haven’t missed something.)

Don’t cloak your ignorance. No one will think less of you. In the long run, you’ll learn more, and won’t need the jargon.

Part 2 of the article will talk about focusing on people and actions; storylining and the pyramid principle; and the structure of messages.

Style of blogging

Until 2007, my blog was mostly just linking to stuff I found interesting on the Web. Since 2007, I’ve tried to write longer articles, mostly based on my own experiences.

At the moment, that’s unsustainable. Right now, being in a startup, I doing more stuff than I ever have in the past. (That does not mean working more hours, by the way.)

My posts, going forward, are likely to be smaller, less original, but hopefully more frequent.