How I do things

How often to write

If you look at the number of entries I’ve written every month since 2005, there has been a clear decline. While I was averaging almost an entry a day in 2005 and 2006, that dropped to 2-3 entries a month since mid-2007.

Number of entries per month declining

This doesn’t bother me. I’ve been lucky to never have lost sight of the purpose of this website. This website is meant for me. Not for you, the reader. For me, the author.

Writing helps me clarify my thoughts. It forces me to learn. It gives me input from a broad audience. It preserves my thoughts. It kills boredom. But nowhere in that list is the need to entertain or enlighten you.

Not that I care less about you, but rather that I care more about me. If I start writing because I need to keep up the pace of output, the quality declines and I stop enjoying it. (This contradicts what I said earlier about Quantity Always Trumps Quality. Well, let me take back the “quality declines” part. If I stop enjoying it, it’s not worth doing.)

So I’ve been taking micro-sabbaticals. Just 3 posts between July – November 2007. No posts in June – July 2008. Whenever I have something to write, or feel like writing, I just go ahead.

It’s very relaxing. I don’t feel the obligation to keep up the readership. In fact, I don’t keep track of the readership, so that helps.

But in fact, while the number of posts has dropped, the average volume of writing hasn’t changed all that much. If you look at the size of writing (I write about 25KB worth a month), except for a blip near end-2007, it hasn’t changed that much. Those blips in the middle were me copying and pasting articles on Classical Ilayaraja, so they don’t really count.

Size of entries per month has not changed much

In other words, I spend about as much time as before writing. I write about the same stuff as before. Except that I’m putting in a bit more work into each piece, and it takes longer.

It’s just a different way of doing things. I’m getting more out of building larger pieces than blogging fragmented threads, so I’ve moved that way. And in doing so, I need to take a break every now and then, because you just can’t get some stuff done at a stretch.

That’s fine by me, and I hope you don’t mind. In fact, as Asimov put it, “I’m not too proud to ask a favour. Please don’t mind.”


I’m writing this for two reasons. One is to tell you why you don’t see stuff regularly from me, and to tell you not to expect any regularity. Just subscribe to the RSS feed and we’re all better off.

The other is because I see bloggers abandoning some great blogs. (You know who you are.) I think it’s sort of like earthquakes and forest fires. The pressure to take a break from blogging keeps building up, and unless indulged in, bloggers quit. Something like Guru’s sabbatical is a great idea. It provides the option for the return, and reduces the cost of taking a break.

A new home page

I have a new home page design. (If you’re reading the RSS feed, check the home page.)

One reason is that the old home page’s design sucked. Almost everyone told me that it was drab in black and white. Personally, I think the new home page sucks in terms of colours as well. There’s too many. I suck at picking colours. The only good thing about these colours is that I left it to the judgement of experts. These are the colours in Powerpoint 2007‘s “Concourse” theme color, and I’ve just lifted them.

So, no, it wasn’t the colours that drove the redesign. My last redesign was over a year ago. I changed the structure from a list of links to two lists: one where I was just linking to interesting sites (bookmarking, really) and the other where I was writing content. The purpose behind that was to allow me to focus on writing stuff rather than just bookmarking.

And that worked pretty well for me.

In the last several months, I find myself writing more code than articles. I don’t quite have a way of sharing that. The new home page has a section dedicated to the sites I’m creating, and hopefully, it’ll let me share what I’m doing in a clearer way.

Another problem I have is that in attempting to write articles, I’ve cut myself off from writing the frivolous. Sometimes, I just need to share something small, like “I bought an Acer Aspire 5715Z” without going into the details of it. That’s not a bookmark. That’s not an article. I need a space in-between.

And that’s exactly the space micro-blogging captures.

I created a Twitter account last month. With the huge number of problems that twitter has, such as downtime and the lack of IM support, I hadn’t written a single tweet. Day-before, I created an account at identi.ca and it works just fine. Given that I now have 4 mobile devices, I should be able to do some decent microblogging.


This is actually my third or fourth attempt at redesigning my earlier home page. Every time, I’d start with a redesign, struggle with it, try to get things just right, and then eventually abandon the effort after a few weeks. This time, I succeeded — within a matter of three hours on my flight from Washington DC to London.

Two reasons. Yesterday, I found this CSS framework: 960.gs. It’s a grid system. And grids are absolutely the best way to get layouts for the web.

The other is an article from Coding Horror titled Quantity Always Trumps Quality. If you try to do stuff quickly, you end up doing better stuff than if you tried to do better stuff. To hell with perfection. Just get it out of the door.

Launching applications

Opening programs from the Start – All Programs menu is painful. For many years, I relied on the quick launch bar.

QuickLaunch

But it’s space constrained. There are only so many applications you can place there. I want space enough for frequently used documents as well. Recently, I decided that I need all the space on the screen. So my task bar is on auto hide, and that makes the quick launch bar a little tougher to use as well. And finally, I can’t use the quick launch bar with the keyboard. That’s important.

So I switched to the pinned menus on the Start Menu.

StartMenu

This works better with the keyboard. I access Word, I just type the Ctrl-Esc, W. Excel: Ctrl-Esc, E. But I run short of letters soon. I have trouble between Powerpoint and processing, for instance. And I can’t store documents.

I tried Enso Launcher and Launchy, both of which are great products, but I just can’t stand the thought of them hogging up all the memory that they do. Launchy in particular.

Given that I almost always have one or two command prompts open, I write my own little tool to do the job now. It’s a command line launcher I’ve written in Perl. I call it “o”. At the first run, it indexes my hard disk. (Well, not all of it. I’ve picked what I need.) Now, if I want to read Harry Potter and the Deathly Hallows, I just type:

> o harry potter hallows

If I wanted to pick a Harry Potter book, I could:

> o harry potter
    0: D:/Entertainment/Books/Hugo Awards/2001 - J K Rowling - Harry Potter and the Goblet Of Fire.rar
    1: D:/Entertainment/Books/J K Rowling.1.Harry Potter and The Sorcerer's Stone.pdf
    2: D:/Entertainment/Books/J K Rowling.2.Harry Potter and The Chamber of Secrets.pdf
    3: D:/Entertainment/Books/J K Rowling.3.Harry Potter and The Prisoner of Azkaban.pdf
    4: D:/Entertainment/Books/J K Rowling.4.Harry Potter and The Goblet of Fire.doc
    5: D:/Entertainment/Books/J K Rowling.5.Harry Potter and the Order of the Phoenix.pdf
    6: D:/Entertainment/Books/J K Rowling.6.Harry Potter and the Half-Blood Prince.pdf
    7: D:/Entertainment/Books/J K Rowling.7.Harry Potter and the Deathly Hallows.pdf
    8: D:/Entertainment/Books/J K Rowling.The Harry Potter Encyclopedia.doc
    9: D:/My Pictures/2005-06 London/2005-07-16 06 Waterstones Oxford Street Harry Potter release.JPG
    ... more
> (0-9, q, any word): prince
D:/Entertainment/Books/J K Rowling.6.Harry Potter and the Half-Blood Prince.pdf

The program lists the files matching the words I typed, and lets me filter within that.

I just wrote this yesterday, and already, I’ve used it dozens of times. Here’s the source.

PS: While I was at it, I downloaded a Flickr uploader for Perl. So I can now upload images with the command line. This easily saves me at least 5 minutes per article.

Time management

Some years ago, a friend asked me to write about how I manage my time. It seemed to him I was doing a good job of it, given that I had time to pursue my interests.

It’s something I tried to do consciously. Every few years, I used to go down the route of “time management”. I’d read stuff and try it out.

But over time, I’ve come to believe that “time” is not really “manageable”. Think about it: are most of your actions planned? Me, I just react out of habit, no matter how well planned I try to be. What I do is largely driven by what I’m in the habit of doing.

Not that time management advice is useless, but you’ll end up not following most of it. You act on a fraction of what you read. A fraction of that turns into a habit. That’s still useful. But the point is, rather than pick up 10 tips on time management, it’s more useful to pick one or two pieces of advice that you like, and are likely to act on. (You won’t do things you don’t like anyway.)

So time management is about acquiring habits that save time (and is not about reading tips that are tough to habitualise).

That begs an obvious question and a subtle one. The obvious one is what habits save time? The subtle one is why save time?

Why save time?

You’ve probably heard the phrase “time is money”. For a while, I took that statement literally. I tried to act by assigning monetary value to my time, and by doing the most profitable thing.

I was making Rs 10,000 a month at that time. That’s about Rs 50 an hour. So I figured I wouldn’t do anything that earned me less than Rs 50 an hour outside of work. I mean, if I’m making Rs 50 an hour at work, why should I make any less outside?

One small hitch. I wasn’t making any money outside of work. In fact, I was spending money. So unless I took up a night job, or started freelancing, that rule of thumb was useless. (Besides, I didn’t want to spend time outside of work working. I wanted to have fun. Watch movies, for instance.)

So I needed a different way of handling this. If I spend 3 hours at a movie for Rs 60, that could be a benchmark. If something’s more expensive than Rs 20/hour, I’d rather watch a movie. If it’s less expensive, I’d do that. Take books, for instance. A typical novel would cost Rs 180 and I’d finish it in 12 hours. At Rs 15 / hour it’s a more economical way of spending time.

Except that it doesn’t quite work that way. How much fun I had, had nothing to do with how much I paid for it.

Frankly, in daily life, I don’t think you can treat the phrase “time is money” literally. Time has nothing to do with money.

Time is like money in a different way, though. By itself, it isn’t worth much. Think about it. What can you do with money? Buy stuff you like. And if you can’t, it’s useless.

Obelix: How silly! Fancy throwing out good onion soup to make room for sesterii! Asterix: But Obelix, with sesterii, you can buy onion soup! Obelix: That's the point! Why throw out the onion soup when it was in the cauldron already?

If all you need is onion soup, why throw it out for sesterii?

Time’s like that. What can you do with time? Do stuff you like. And if you can’t, it’s useless.

There are usually two reasons people want to manage time. One is where they don’t enjoy something, and would rather spend as little time at it as possible. But look, if you don’t enjoy that stuff, time management isn’t your problem. You need to get out of your job or whatever. Managing time more efficiently is simply going to let you efficiently waste your time. (Though in the short run, that’s probably the best you can do — efficiently get rid of nuisances. I’ll talk about that shortly.)

The other reason is where they have too many (enjoyable) things to do, and can’t do all of them. But hey, if you have too much enjoyable stuff, you don’t have a problem! In a way, this is like wanting to buy many things and not having enough money. With money, you can earn more or wish for less. With time, you just have to wish for less. (Living longer may not be a practical option.) Just pick anything you like to do. Don’t regret the stuff you can’t. You only have 24 hours, and you’re among the lucky few who can fill it with things you enjoy.

So, I’m effectively saying, there’s no point trying to do things more efficiently in the long run. Picking what you do is more important than doing it efficiently. (That roughly correlates to the third habit in Stephen Covey’s Seven Habits: Put First Things First. It’s the key to time management.)


So, how do you pick what to do? You’d probably want to pick something that you like, or something that’s good for you.

But it’s tricky to predict what you like.

  • We don’t know what we want. Sometimes, it’s as simple as that — we just don’t know what we’d like to do.
  • Too much of anything… I love watching movies, but I’ve never managed to watch more than 4 a day. I’ve tried breaking that record many times. Just doesn’t work. At the end of the 4th movie, I’m sick and my bum is sore. Do I prefer movies to cleaning up? Usually. But by the end of the 4th, I’d rather clean up.
  • Preferences are not consistent. I prefer a 7 megapixel camera to a 2 megapixel one. I prefer a cheaper camera to a more expensive one. So between a $100 2MP camera and a $200 7MP camera, I’m just making a wild guess.
  • Preferences are not static. If I’m tired, I’d rather watch a movie I’ve seen before. If not, I’ll experiment with an art film. There’s no telling beforehand what my mood is going to be at any point.

It’s just as tricky to figure out what’s good for us. We have no clue what will happen tomorrow. We have no clue what consequences our actions will have. (Read The Black Swan to get a flavour of that.) So we’re really guessing and groping — though sometimes with a lot of confidence.

On the whole, it’s difficult to figure out what to pick. So what do you do?

This is completely outside the realm of time management. This is about choice. I have a few (bad) habits that guide me.

  1. Follow your moods
  2. Work less
  3. Procrastinate

Those are my principles. (But like Groucho Marx, I do have others.)

Follow your moods

There are times when people do certain things better. I’ve heard some people study best early in the morning. Others study best late at night. I don’t know if there’s any physiological benefit one way or the other, but even if it’s psychological, it makes a huge difference to study when you think you’ll learn better.

Sometimes I’m in a mood to write articles. When I do, the article usually writes itself. If not, I could spend days at it without any progress.

If there’s any reality to this, then the best thing to do is to do what you feel like doing. You’ll naturally accomplish this faster. That’s typically what I do when I’m given any work. I usually wait until I just feel like it. Then it’s usually a matter of a few hours before the job is done. Sometimes the mood doesn’t quite arrive before the deadline, in which case there’s always inspiration.

Calvin & Hobbes: Do you have an idea for your story yet? No, I'm waiting for inspiration. You can't just turn on creativity like a faucet. You have to be in the right mood. What mood is that? Last-minute panic.

Seriously: do what you feel like doing the most at the moment. That’s a great way of becoming more efficient.

In fact, I would go as far as saying, mood management is more important than time management. Moods are more precious than time. If you’re in a mood to call people, pick up the phone and talk to folks you’ve been out of touch with. That mood is rarer than the time to make calls. (At least for me, the reason I am not in touch is because I’m not in a mood — not because I don’t have time.)

Optimise that mood. Do what you’re in a mood for. And when your mood changes, go with the flow. Do a lot more of what you feel like doing. You’ll do more (which is probably good), and of what you like (which is certainly good).

Work less

I’ve talked about this in Less is more. At the end of the day, 90% of the stuff you do is useless. So why do it? Just focus on the 10%.

Procrastinate

I can’t put this better than Paul Graham’s article on procrastination.

Good procrastination is avoiding errands to do real work.

You won’t know what the important 10% until much later, so you may as well wait to find out if it’s important, and then do things.


So what am I saying?

  • Time management is about habits, not tips
  • Picking what you do is more important than doing it efficiently
  • But it’s difficult to figure out what to pick
  • So avoid doing stuff until you know it’s worth doing
  • Work when you’re in the mood — it’s faster that way

Think about it.

Reading books on a laptop

I have the habit of reading books on the screen. It’s something that started from the early 90s, when I got a copy of The MIT Guide to Lockpicking. Since I didn’t have access to a printer, I’d spent hours poring over the document on the screen. And then I discovered Project Gutenburg

I’ve heard many people ask if I have a problem with this. Personally, no. I’ve been staring at screens from the age of 12, and I’m quite used to it. My job requires me to stare at a screen for most of the day anyway. (I’m not saying there’s no a strain on the eye. My eyes are red at the end of the day. I don’t know if they would be less red if I’d been staring at paper instead of a screen. But my glasses have remained roughly the same power over ~15 years, so it’s probably not ruining my eyesight much.) For those who are like me who reads all the time and spends a lot of more time facing their laptops, you might want to check this sd card, a very good quality card that can be handy in the future.

To me, the main advantage of a book is that a book is a lot easier to handle.

  • You can fit a book into your bag, sometimes into your pocket.
  • You can hold it in your hand comfortably — it’s easy to grip, and light.
  • You can open it instantly (no need to boot up).
  • You can bookmark it (or even just remember the last page number) and quickly flip to that

None of these is possible on a computer.

Or is it?

On a desktop, I agree — it’s impossible to read for long. Your back would kill you. I’ve done it for many years, and it’s not worth the pain. With a laptop, however, you can lie down on the bed or sofa and read. It’s a huge advantage. (For just this one reason alone, I’d suggest that everyone buy a laptop.)

As for carrying books, I carry my laptop to work every day, so there’s no incremental burden. But if you weren’t doing that, it’s probably not a great idea. When I travel on weekends, I’d much rather take a physical book than a laptop. This is probably the single biggest problem with a laptop — that it doesn’t travel as easy as a book.

That’s probably offset by the advantage that a laptop isn’t really a book — it’s a library. I don’t need to decide which book to read. I can bring them all along, pick what I like, and when I’m done, move on to the next. And I’m not restricted to books. I have a fairly good collection of movie scripts and comics. Depending on how long I have on the train, and my mood, I can pick between these.

One thing that makes a laptop a lot easier to use is to rotate it.

Laptop in landscape mode

Laptop in portrait mode (rotated)

If you hold the laptop this way, it’s surprisingly easy to handle. I find that I can read this way even when standing on a crowded train — which is as much as I can expect from any book. (Strangely enough, it doesn’t seem to attract too much attention on the train either.)

If you have a decent graphics card, you can rotate your screen using the graphics properties. (I’m sure there are are hotkeys to do this. My two-year old daughter somehow knows them, and manages to turn the screen upside down in a fraction of a second, while I spend then next 5 minutes struggling to restore an upside-down screen.)

If not, you can just use a PDF reader (like FoxIt, which is better than Acrobat Reader) to rotate the page by 90°.

A laptop takes care of the problems of bookmarking and load time as well. I usually leave mine on hibernate, and it takes about 10 seconds to open up to where I left off. Sometimes I just leave the laptop on in the bag — for example if I’m changing trains.

The other solution, of course, is to try an ebook reader. Given my laptop, I haven’t tried one. But other than the ease of holding it, there’s no big I see.


The other question is, how do you find ebooks?. Other than buying them, I find that the easiest option is to search on Google. A surprisingly large number of them are indexed.

Here’s a custom search engine for ebooks.

Lazy bargain hunting

I’m thinking of buying a digital keyboard with touch sensitive keys and MIDI support. (The one other thing that I thought off — a pitch bend — puts the keyboards out of my budget.)

I’d like a good deal. (Who doesn’t?) But I don’t like to spend time searching for one. (Who does?)

So here’s the plan.

Firstly, I’ll restrict my search to Amazon.co.uk. For electronics items, I haven’t found anyone consistently cheaper. Tesco has some pretty low prices, but not the range. eBuyer is pretty good, but not often enough. Google Products is the only other one that gets me consistent lower prices, but I’ve had my credit card identity stolen once before while shopping online, so I’d rather not pick any random seller listed on Google.

Amazon has a secret discount. You can search for electronics items with 30% off or more. And then you can narrow it down to Sound & Vision > Musical Instruments > MIDI Keyboards. Further cap a 100 – 200 GBP restriction. That leaves us with one product:

MIDI keyboard on Amazon

While that matches my criteria, I’m in no hurry and can wait for more offers to come up. But I don’t want to keep checking this page every day. So, RSS to the rescue. You probably think I can’t get enough of RSS feeds. And you’d be right. The thing is, as an attention mechanism, it is incredibly powerful, and I never cease to be amazed that the things it lets me do.

Using my XPath checker and a bit of trial and error, I figured all product links link to “amazon.co.uk/dp/…” with a <span> inside. So this XPath gets all the links:

//a[contains(@href,'/dp/')][span]

And I made an RSS feed out of that using my XPath server and subscribed to it on Google Reader.

Combining a bunch of such searches, I have a shopping folder on Google Reader has all the items I’m searching for. Now that’s lazy bargain hunting.


Which is all very fine. But given that I’m buying a car in a hurry right now, and I’m not doing any bargain hunting, it’s a classic case of being penny-wise and pound-foolish. Sigh…

Handling missing pages

If something goes wrong with my site, I like to know of it. My top three problems are:

  1. The site is down
  2. A page is missing
  3. Javascript isn’t working

This article covers the second topic.

One thing I’m curious about is hits to non-existent pages (404s) on my site. I usually get 404s because:

  • I renamed the page
  • Someone typed a wrong URL
  • Someone followed a wrong link

Find the 404

The first problem is to know when someone gets a 404. I’ve seen sites that tell you to contact the administrator in case of a 404. That’s crazy. The administrator should automatically detect of 404s! Almost every web server provides this facility.

The real issue is attention. I receive 700 404s a day. That’s too much to manually inspect. And most of these are not for proper web pages, but for images (for example, almost all my 404s used to be for browsers requesting favicon.ico) or weird MS Office files.

I’m interested in a small subset of 404 errors. Those that hit a web page, not support files. And those requested by a human, not a search engine or a program.

A decent way of filtering these is to use Javascript in your 404 page. Javascript is typically executed only by browsers (i.e. humans, not search engines), and only in a web page (not images, etc.) So if you serve Javascript in your 404 page, and it gets executed, it’s likely to be a human requesting a web page.

I have a piece of Javascript in my custom 404 page that looks something like this:

<script>(new Image()).src = "/log.pl";</script>

Every time this code runs, it loads a new image. The source of the image is a Perl script, log.pl. Every time log.pl is accessed, it logs the URL from which it was called. I’m reasonably guaranteed that these are web pages a human tried to access.

The reduction in volume is tremendous. On a typical month, I get ~20,000 404 errors. With the Javascript logging, it’s down to around 200 a month, and most of them quite meaningful.

Point to the right page

Sometimes, the change happens because I changed the URLs. I keep fiddling with the site structure. Someone would have links to an old page that I’ve renamed. I may not even know that. Even if I did, they can’t be bothered with requests to change the link. So I’ve got to handle it.

The quickest way, I find, is to use Apache’s mod_rewrite. You can simply redirect the old URL to the new URL. For example, I used to have a link to /calvin.html which I now point to /calvinandhobbes.html. That becomes a simple line on my .htaccess file:

RewriteRule calvin.html  calvinandhobbes.html

I don’t do this for every site restructuring, though. I just restructure, wait for someone to request a wrong page, and when my 404 error log warns me, I create a line in the .htaccess. It keeps the redirections down to a minimum, and only for those links that are actually visited.

Be flexible with the URL structure

Sometimes people type in a wrong link. Often, these are unintentional. Here are some common misspellings for my Hindi songs search.

s-anand.net/hindi/
s-anand.net/Hindi
s-anand.net/hiundi

Occasionally, people are exploring the structure of my site:

s-anand.net/excel
s-anand.net/music
s-anand.net/hits

I need to decide what to do with both cases. For the former, sometimes my URL structure is too restrictive. I mean, why should someone have to remember to type /hindi instead of /Hindi or /hindi/? Who cares about case? Who cares about a trailing slash?

In such cases, I map all the variants to the right URL using mod_rewrite. For example, typing s-anand.net/HiNDi (with or without caps, with or without a slash at the end) will still take you to the right page.

As I keep discovering new mis-spellings, I take a call on whether to add it or not. The decision is usually based on volume. If two people make the same spelling mistake in a day, I almost certainly add the variant. Most of the time, it’s just typing errors like /hiundi which isn’t repeated oftener than once a month.

Provide search

To handle the exploratory URLs, and people following wrong links, I’ve turned my custom 404 page into a search engine.

For example, when someone types s-anand.net/excel, I know they’re searching for Excel. So I just do a Google Custom Search within my site for “excel” — that is, anything following the URL.

It’s a bit more complex than that, actually. I do a bit of tweaking to the URL, like convert punctuations (underscore, hyphen, full-stop, etc.) to spaces, remove common suffixes (.html, .htm) and ignore numbers. Quite often, it matches something on my site that they’re looking for. If not, ideally, I ought to try for various alternatives and subsets of the original search string to figure out a good match. But given that the number of mismatches is down to about one a day, I’m fairly comfortable right now.

What this means, incidentally, is that my site is, by default, a search engine for itself. To search for movie-related stuff on my site, just type s-anand.net/movie and you get a search of the word “movie” on my site. (Sort of like on a9.com, where searching for a9.com/keyword does a search on the keyword.)

Monitoring site downtime

If something goes wrong with my site, I like to know of it. My top three problems are:

  1. The site is down
  2. A page is missing
  3. Javascript isn’t working

I’ll talk about how I manage these over 3 articles.

My site used to go down a lot. Initially that was because I kept playing around with mod_rewrite and other Apache modules without quite understanding them. I’d make a change and upload it without testing. (I’m like that.) And then I’d go to sleep.

Next morning, the site’s down, and has been down all night.

This is a bit annoying. Partly because I couldn’t use my site, but mostly because of the Oh yeah, sorry — I goofed up last night replies that I have to send out the next day.

So I started using Site24x7 to track if my website was down. It’s a convenient (and free) service. It pings my site every hour. If it’s down, I get an SMS. If it’s back up, I get an SMS. It also keeps a history of how often the site is down.

Site24x7

Over time, I stopped making mistakes. But my site still kept going down, thanks to my hosting service (100WebSpace). When I goof up, it’s just an annoyance, and I can fix it. But when my hosting service goes down, it’s more than that. My site is where I listen to music, read comics, read RSS feeds, use custom search engines, watch movies, browse for books, etc. Not being able to do these things — nor fix the site — is suffocating.

Worse, I couldn’t sleep. I use my mobile as my alarm. It’s annoying to hear an SMS from under your pillow at 3am every day — especially if it says your site is down.

So I switched to HostGator a few months ago. Nowadays, the site is down a lot less. (In times of trouble, it becomes sluggish, but doesn’t actually go down.)

That came at a cost, though. I was paying 100 WebSpace about $25 per annum. I’m paying Hostgator about $75 per annum. Being the kind that analyses purchases to death, the big question for me was, is this worth it. There is where my other problem with the site being down kicks in. I get a bit of ad revenue from my site, and I lose that when the site’s down. (Not that it’s much. Still…)

According to Site24x7, my site was up ~98% of the time. So I’m losing about 2% of my potential ad revenue. For the extra $50 to be worth it, my ad revenue needs to be more than $50 / 2% = $2,500 per annum. I’m nowhere near it. So the switch isn’t actually a good idea economically, but it does make life convenient (which is pretty important) and I sleep better (much more so).

The important thing, I’ve realised, is not just to track this stuff. That’s useful, sure. But what really made Site24x7 useful to me is that it would alert me when there was a big problem.

There are many kinds of alerting.

There’s a report you can view whenever you remember to view it. (It could be an RSS feed, so at least you won’t have to remember the site. But you still need to read your feeds.)

Then there’s the more pushy alerting: sending you an e-mail. That may catch you instantly for the half of the day that you’re online. Or, if you’re like me, it may completely escape your attention. (I don’t read e-mail.)

And then there’s the equivalent of shaking you by the shoulder — getting an SMS. (At least, that’s how it is for me. Incidentally, I don’t reply to SMS either. Calling me gets a reply. Nothing else.)

The type of alerting is clearly a function of the severity of the problem. Wake me up when my site goes down. Don’t wake me up if a link is broken.

Site24x7 sends me an SMS when my site is down. Fits the bill perfectly.

Managing feed overload

I have only two problems with Google Reader.

The first is that it doesn’t support authenticated feeds. Ideally, I’d have liked to have a single reading list that combines my e-mail with newsfeeds. GMail offers RSS feeds of your e-mail. But the feeds require authentication (obviously) and Google Reader doesn’t support that right now. (So I usually don’t read e-mail 🙂

The second is that it’s tough to manage large feeds. It’s a personal quirk, really. I like to read all entries. If there are 100, I read all 100. If there are 1000, I struggle but read all 1000. I’m too scared to “Mark all read” because there are some sources that I don’t want to miss.

The 80-20 rule is at work here. There are some prolific writers (like Scoble) who write many entries a day. There are some prolific sources (del.icio.us or digg). I can’t keep up with such writers / sources. I don’t particularly want to. If I missed one day of del.icio.us popular items, I’ll just read the next day’s.

With Google Reader, that makes me uneasy. I don’t like having 200 unread items. I don’t like to mark them all read.

In such cases, popurls‘ approach is useful. It shows the top 15-30 entries of the popular sites as a snapshot. Any time you’re short of things to read, visit this. If you’re busy, don’t.

Using Google’s AJAX Feed API, it’s quite trivial to build your own feed reader. So I cloned popurls‘ layout into my bookmarks page, and put in feeds that I like.

You can customise my bookmarks page to use your own feeds. Save the page, open it in Notepad, and look for existing feeds. They’ll look like this:

"hacker news" : {
    entries:15,
    url:"http://news.ycombinator.com/rss"
},

The first line (“hacker news”) is the title of the feed. You can call it what you want. Set entries to the number of feed entries you want to show. Set url to the RSS feed URL. Save it, and you have your own feed reader. (If you want to put it up on your site, you may want to change the Google API key.)

Try it! Just save this page and edit the feeds.


Here, I must point out three things about Google’s AJAX Feed API that make it extremely powerful.

The obvious is that is allows Javascript access to RSS in a very easy way. That makes it very easy to integrate with any web page.

The second is subtler — it includes historical entries. So even if an RSS feed had only 10 entries, I could pick up the last 100 or 1,000, as long as Google has known about the feed for long enough. This is what makes Google Reader more of a platform rather than a simple feed reader application. Google Reader is a feed archiver — not just a feed reader.

The third (I’m a bit crazy here) is that you can use it to schedule tasks. Google’s FeedFetcher refreshes feeds every 3 hours or so. If you want to do something every three hours (or some multiple thereof — say every 24 hours), you can write a program that does what you want, and subscribe to it’s output as a feed.

You may notice that I have a Referrers to s-anand.net on my bookmarks page. These are the sites that someone clicked on to visit my site. I have a PHP application that searches my access log for external referrers. Rather than visit that page every time, I just made an RSS feed out of it and subscribed to it. Every three hours or so, Google accesses the URL. I search my access.log and archives the latest results. So, even after my access.log is trimmed by the server, I have it all on Google Reader to catch up with later.

Since Google doesn’t forget to ping, I can schedule some fairly time-critical processes this way. For instance, if I wanted to download each Dilbert strip, every day as it arrives, I can create an application that downloads the file and returns a feed entry. Now, I don’t need to remember to run it every day! I just subscribe to the application on Google Reader, and Google will remind the application to run every 3 hours. (I know — I could use a crontab, but somehow, I like this.) Plus I would get the Dilbert strip on my feed reader as a bonus.


Update: Google has just released PartnerBar, which is a more flexible way of viewing a snapshot of feeds.

Scraping RSS feeds using XPath

If a site doesn’t have an RSS feed, your simplest option is to use Page2Rss, which gives a feed of what’s changed on a page.

My needs, sometimes, are a bit more specific. For example, I want to track new movies on the IMDb Top 250. They don’t offer a feed. I don’t want to track all the other junk on that page. Just the top 250.

There’s a standard called XPath. It can be used to search in an HTML document in a pretty straightforward way. Here are some examples:

//a Matches all <a> links
//p/b Matches all <b> bold items in a <p> para. (the <b> must be immediately under the <p>)
//table//a Matches all links inside a table (the links need not be immediately inside the table — anywhere inside the table works)

You get the idea. It’s like a folder structure. / matches the a tag that’s immediately below. // matches a tag that’s somewhere below. You can play around with XPath using the Firefox XPath Checker add-on. Try it — it’s much easier to try it than to read the documentation.

The following XPath matches the IMDb Top 250 exactly.

//tr//tr//tr//td[3]//a

(It’s a link inside the 3rd column in a table row in a table row in a table row.)

Now, all I need is to get something that converts that to an RSS feed. I couldn’t find anything on the Web, so I wrote my own XPath server. The URL:

www.s-anand.net/xpath?
url=http://www.imdb.com/chart/top&
xpath=//tr//tr//tr//td[3]//a

When I subscribe to this URL on Google Reader, I get to know whenever there’s a new movie on the IMDb Top 250.

This gives only the names of the movies, though, and I’d like the links as well. The XPath server supports this. It accepts a root XPath, and a bunch of sub-XPaths. So you can say something like:

xpath=//tr//tr//tr title->./td[3]//a link->./td[3]//a/@href

This says three things:

//tr//tr//tr Pick all rows in a row in a row
title->./td[3]//a For each row, set the title to the link text in the 3rd column
link->./td[3]//a … and the link to the link href in the 3rd column

That provides a more satisfactory RSS feed — one that I’ve subscribed to, in fact. Another one that I track is a list of mininova top seeded movies category.

You can whiff up more complex examples. Give it a shot. Start simple, with something that works, and move up to what you need. Use XPath Checker liberally. Let me know if you have any isses. Enjoy!