S Anand, Author at S Anand

Less is more

January 14, 2008 January 14, 2008 / Business realities / 21 Comments

The hours in consulting are pretty long. 65 hours a week used to be my norm, and that’s ignoring the travel time to and from work. So there wasn’t too much life outside of work. (I’ve come to realise, though, that what you do outside of work doesn’t change that much with more free time. What does change is that you just enjoy it more — both in and out of work.)

We have a day, once every month or two, where you take time off from whatever project and head back to the office. One such featured a session with the managers telling the consultants how to succeed. Pretty good advice, actually… but that’s not what I’m going to talk about. It’s something about the nature of that advice.

The advice had a lot of TO-DOs and suggestions. Do this. Do that. Focus more on this. Focus a lot on that. Great. Now we know what to do more of.

My question, towards the middle of the session, was: OK, so what do we do less of, then?

You can’t do more of something unless you do less of something else. In most places, it’s easy to answer this with: “Oh, you need to be more efficient.” or “Cut the idle gossip”. For us, none of these were applicable.

The question pretty much remained unanswered. And with good reason. It’s a tough question.

Later, I got involved with a proposal. I wrote a few bits of it. (One page, actually.) Others wrote a few bits of it. And then some standard appendices were added to it. Finally, it ended up as a 180-page document.

The interesting thing is, I can bet no human ever read those 180 pages end-to-end.

I know no one at our end did, because we turned it around in 1 week, and I was the last to assemble the document before sending it out.

I’m guessing no one at the client end did, because they’d have gotten 5 such documents, and had a week to shortlist down to 3.

So if we didn’t read it and they didn’t read it, why did we put it in?

I think I know why. In my IBM days, I had to make a presentation to the management on productivity. I knew nothing of management or productivity. So I put in a report that had a lot of high-sounding words (you know… value-add, leverage, etc.) that looked reasonably impressive and had no basis in fact.

I did that mostly because I was scared. Of seeming to know less. Of being wrong. You know.

(Funnily enough, the presentation was pretty well received. I don’t know if it was because they were polite or had become numb to bullshit.)

This fear is pretty common. I know how that 180-page document ended up as a 180-page document, and I’m sure you’ve seen this happening before. First, here’s a sample conversation at the client end, when they’re writing up a request for information.

Martin: So, what do I put in the RFI?

Clive: Here’s a template we used. You can use some of that. Ask Nick for the one he used last month, and Natalie for hers. Maybe you should get something from our procurement team and information security group to be on the safe side.

Martin: And how do I make the RFI out of this? (BTW, this is a “bold” question that’s rarely asked.)

Clive: Well, make sure you cover everything from all of these documents.

So the RFI asks asks:

if any of your 80,000 employees are a member of any one of the following 340 organisations that are considered disruptive,
how many employees you have in each geography, function and vertical — where the break-down provided is as per their definitions (we cook up numbers which, if you add up, totals to over 200,000)
how much you spent on paper-clips last fortnight, and other such intimate corporate P&L secrets

And we answer these. The answers to the above 3 questions were “No”, a table of numbers, and “We are not at liberty to divulge this information…”

Now, looking at the answers above, it still doesn’t add up to 180 pages. It’s hardly half a page. But you’ve got to take the following conversation at our end into account.

Steve: You know, we’ve got to put in some details about our methodologies in this section.

Me: I have.

Steve: Yeah, but maybe we should add more, you know, like supply chain methodologies and change management.

Me: But they’re irrelevant!

Steve: Well, can’t say that. Change management is always relevant. SCM… well, no harm putting it in. They can skip it if they don’t want to read about it.

That’s it, isn’t it? There’s no harm in doing more. I’ll just toss it in. If you don’t want to read it, skip it. I’ll just ask you to do more of these. If you can’t, skip the useless stuff.

An innocuous sounding statement: do more. I tremble whenever anyone suggests it. There’s no defence.

There’s a fundamental belief at work here. That more is better.

This is fueled by a lack of confidence. Put in high-sounding words. They look impressive. What’s missed is that experts use jargon because they understand what it means, and it conveys a lot in few words. Others follow a cargo cult science.

What we lose, though, is subtle.

Firstly, it wastes time. It wastes my time. It wastes your time. But hey, time is not all that important. (I’m not saying this sarcastically. I believe that wasting time is quite OK, really, and it’s not such a big deal.)

What’s more important is that it destroys focus. Some things in the document are important. Most others are not. In a 180-page document, I can’t find the important stuff! It actually does harm to put it in if it’s irrelevant.

That’s the tough tradeoff, really. A tangible incremental value against an intangible loss of focus. The value looks attractive when you’re less confident. The document seems completely unfocused anyway.

So what the heck, put it in.

Do more of this. And that too.

So what can you do? Quite a bit, surprisingly.

Firstly, you’ve got to believe that less is more. The response to “What’s the harm in adding…?” is “It dilutes the message”. There’s two things here. Believing it. And having the courage to say it. Trust me, you really believe it only when you say it.

Next, you’ve got to understand — really understand — before you write or speak. That requires not fooling yourself. And it requires a lot of practice. I’ve had nearly 20 years of training in fooling myself, so it’s an uphill task. Many people are worse off, never having tasted true understanding.

Third, you’ve got to be brave enough to shut up, or say “I don’t know”. Initially, this was tough for me, but I learnt from a friend. I always thought him not-so-smart, but honest. He’d ask, “But why?” and when I’d explain, he’d say, “I don’t understand it.” After two hours of trying to get him to understand, I’d realise that I was the one who never got it in the first place. After a while, I got into the habit of being very prepared before I explained anything to him.

Saying “I don’t know” doesn’t make people think less of you, I’ve found. I know a lot of people disagree with me. One of the most consistent feedbacks I’ve received in the first half of any project or firm I’ve been in is, “He should speak up.” Dammit, I don’t have anything to say! If I know something, I’ll say it. If not, I’ll shut up. Now, despite this feedback, no one’s quite objected to me. And in the second half, they’re always amazed at how much I’ve improved based on the feedback.

The feedback had nothing to do with it, of course. I just happen to know more in the second half of a project.

There’s a reason why your boss wants you to talk. It makes you appear knowledgable. In the short term, that’s good. You talk about “value” and “leverage” and people nod wisely.

In the long term, it makes you less able to say “I don’t know.” (What? This brilliant chap who knew all about value and leverage doesn’t understand our way of calculating ROI?)

It makes you less likely to ask questions.

It makes you learn less.

It makes you dumb.

On the other hand, I’ve learnt to plead ignorance up front. “Do you understand ROI?” “No.” Not even an excuse for it. Frankly, it saves time.

Sometimes, a meeting’s running late, I’m hungry, and I just nod at whatever’s said, and you lose the window of opportunity to ask. Except, I’ve learnt, there’s no such thing as a window of opportunity. If you don’t get it, ask. If they’ve said it thrice, and you still don’t get it, ask. More likely they’re not clear about it.

Postscript: This morning, I had to convert a document into a standard template. My document was 3 pages long. The template (just the headings) was 14 pages long.

Why? Because someone wants all documents in that format. Does it help them? Maybe not. But it has to be done. Standards.

Sometimes, it’s easier to give up. The smart thing is to minimise the effort on pointless work. I took 15 minutes. Beyond a point, I protect myself rather than the poor reader.

Lazy bargain hunting

January 4, 2008 January 4, 2008 / How I do things / 2 Comments

I’m thinking of buying a digital keyboard with touch sensitive keys and MIDI support. (The one other thing that I thought off — a pitch bend — puts the keyboards out of my budget.)

I’d like a good deal. (Who doesn’t?) But I don’t like to spend time searching for one. (Who does?)

So here’s the plan.

Firstly, I’ll restrict my search to Amazon.co.uk. For electronics items, I haven’t found anyone consistently cheaper. Tesco has some pretty low prices, but not the range. eBuyer is pretty good, but not often enough. Google Products is the only other one that gets me consistent lower prices, but I’ve had my credit card identity stolen once before while shopping online, so I’d rather not pick any random seller listed on Google.

Amazon has a secret discount. You can search for electronics items with 30% off or more. And then you can narrow it down to Sound & Vision > Musical Instruments > MIDI Keyboards. Further cap a 100 – 200 GBP restriction. That leaves us with one product:

While that matches my criteria, I’m in no hurry and can wait for more offers to come up. But I don’t want to keep checking this page every day. So, RSS to the rescue. You probably think I can’t get enough of RSS feeds. And you’d be right. The thing is, as an attention mechanism, it is incredibly powerful, and I never cease to be amazed that the things it lets me do.

Using my XPath checker and a bit of trial and error, I figured all product links link to “amazon.co.uk/dp/…” with a <span> inside. So this XPath gets all the links:

//a[contains(@href,'/dp/')][span]

And I made an RSS feed out of that using my XPath server and subscribed to it on Google Reader.

Combining a bunch of such searches, I have a shopping folder on Google Reader has all the items I’m searching for. Now that’s lazy bargain hunting.

Which is all very fine. But given that I’m buying a car in a hurry right now, and I’m not doing any bargain hunting, it’s a classic case of being penny-wise and pound-foolish. Sigh…

Implicit information

January 3, 2008 January 3, 2008 / Interviews / 4 Comments

From what I’ve seen, puzzles and exam questions share two un-real-worldly characteristics. Firstly, you are guaranteed that a solution exists. Secondly, you are given that all the information provided to you is relevant. (Well, not always. Some case studies I’ve seen have had their share of contrived irrelevance. But that’s often what it is, I think. People fill in the relevant stuff, and then try and distract by adding irrelevant material in the hope of making it more real-world-like. But that’s just a guess).

These are very powerful constraints. I know of nothing that has given me as much confidence in solving puzzles as the assurance that a solution exists (and that someone thinks me capable of getting it).

But it’s more than just a confidence builder. The guarantee that a solution (and invariably it’s a unique) is a very powerful one. An extreme case is an objective type question, which explicitly provides three guarantees:

There is a solution
There is only ONE solution
It is among the choices listed below

(Some papers try and take away the first guarantee by having an (E) None of the above category. But that’s still leaving behind the other two more powerful guarantees.)

Marking answers randomly, or marking (A) for every question would still get you 25% in an exam with 4 choices. (Marking (C) would prove just as good, unless you had a kind professor like this.) That’s better than any real-world scenario I’ve seen. (Real-world strategies aren’t much better, though.)

Using guarantee 2, you can eliminate choices easily. If (A) and (B) do not satisfy some property of the solution, they CANNOT be the answer. There’s only one solution, and these are not it.

Using guarantee 3, you can pick the last remaining choice wihout having to check it. The solution is definitely among the choices listed. So you don’t need to solve an objective type question. You just need to pick the right answer — which is completely different.

The principle applies even outside of objective type questions, especially in mathematically-oriented problems, or puzzles. And you can solve it by trial and error. For example, try this one from Martin Gardner‘s Mathematical Magic Show:

Two brothers own n sheep, each of which is sold for n dollars. Thus they have n² dollars in all. This is in the form of 10 dollar notes and 1 dollar coins, the number of 1 dollar coins being less than 10 dollars. The elder brother divides the money as follows: he takes a note for himself, gives one to his younger brother, takes a note for himself and so on. At the end, the younger brother complains that the elder took the first note as well as the last. So the elder gives the younger all the one dollar coins. The younger brother complains that he still has more. So the elder brother writes the younger a cheque to equalize their share. What was the cheque for?

Now, this is a weird problem. Think about it. You’re told almost NOTHING. And you have to guess what the amount is. (Note: you don’t have to guess what ‘n’ is. That’s impossible.)

Here’s how I solved the problem. I said, let me find even one case where the elder brother gets the first and last note. Let’s see what the answer is. Whatever the answer is for that case, it has to be the answer for all other cases — because otherwise, the problem does not have a unique solution.

So I tried n=1. n=2. n=3. For n=4, the amount is 16. That’s 1 $10 note and 6 $1 coins. The elder brother would get the first and the last $10 note. The younger would get $6. So the elder would have $4 more than the younger, and would write out a cheque for $2. (It’s amazing how many people get as far as the $4, but forget to divide by two.)

You can try if for any other value that has an odd number of $10 notes. It has to be for n ending with 4 or 6. That means n² ends in 6, and the cheque has to be for $2.

Notice that you didn’t need number theory to get the answer. The assurance that there is a unique answer is enough.

There’s another kind of implicit information usually available: the amount of information there is. For example, take the following question:

Which city has a higher population: San Antonio or San Diego?

Children in the US apparantly had difficulty answering it. Children in Germany had less trouble. The reason? The German kids had heard of San Diego, but not San Antonio. They figured the one they’d heard of was more likely bigger. Knowing less may be better.

It’s the same principle you use to check spellings. Run a Google search on two spellings. The one that returns a higher number of results is the correct spelling. (Of course, Google has a spelling correction mechanism that works well, but I use it for Tamil words. I can never tell if I should use ர or ற.)

Of course, the fundamental assumption here is: MORE INFORMATION = MORE CORRECT, which is not always the case. But the point I’m driving to is this:

You’re always given additional information. Even if you’re not given any information, that’s informative.

Web lookup using Google Spreadsheets

December 31, 2007 December 31, 2007 / Excel tips / 5 Comments

I’d written earlier about Web lookup in Excel. I showed an example how you could create a movie wishlist that showed the links to the torrents from Mininova.

You can do that even easier on Google Spreadsheets. It has 4 functions that let you import external data:

=importData(“URL of CSV or TSV file”).
Imports a comma-separated or tab-separated file.
=importFeed(URL).vLets you import any Atom or RSS feed.
=importHtml(URL, “list” | “table”, index).
Imports a table or list from any web page.
=importXML(“URL”,”query”).
Imports anything from any web page using XPath.

Firstly, you can see straight off why it’s easy to view RSS feeds in Google Spreadsheets. Just use the importFeed function straight away. So, for example, if I wanted to track all 8GB iPods on Google Base, I can import its feed in Google Spreadsheets.

This automatically creates a list of the latest 8GB iPods.

Incidentally, the “Price” column doesn’t appear automatically. It’s a part of the description. But it’s quite easy to get the price using the standard Excel functions. Let’s say the description is in cell C1. =MID(C1, FIND("Price", C1), 20) gets you the 20 characters starting from “Price”. Then you can sort and play around as usual.

The other powerful thing about Google Spreadsheets is the CONTINUE function. The importFeed function creates a multi-dimensional array. You can extract any cell from the array (for example, row 3, column 2 from cell C1) using CONTINUE(C1, 3, 2). So you can just pick up the title and description, or only alternate rows, or put all rows and columns in a single column — whatever.

The most versatile of the import functions is the importXML function. It lets you import any URL (including an RSS feed), filtering only the XPath you need. As I mentioned earlier, you can scrape any site using XPath.

For example, =importXML("http://www.imdb.com/chart/top", "//table//table//table//a") imports the top 250 movies from the IMDb Top 250. the second parameter says, get all links (a) inside a table inside a table inside a table. This populates a list with the entire Top 250.

Now, against each of these, we could get a feed of Mininova’s torrents. Mininova’s RSS URL is http://www.mininova.org/rss/search_string. So, in cell B1, I can get a torrent for the cell A1 (The Godfather) using the importFeed function. (Note: you need to replace spaces with a + symbol. These functions don’t like invalid URLs.).

Just copy this formula down to get a torrent against each of the IMDb Top 250 movies!

Check out the sheet I’ve created. (You need a Google account to see the sheet. If you don’t want have one, you can view the sheet.)

Now, that’s still not the best of it. You can extract this file as an RSS feed! Google lets you publish your sheets as HTML, PDF, Text, XLS, etc. and RSS and Atom are included as well. Here’s the RSS feed for my sheet above.

Think about it. We now have an application that sucks in data from a web page, does a web-based vlookup on another service, and returns the results as an RSS feed!

There are only two catches to this. The first is that Google has restricted us to 50 import functions per sheet. So you can’t really have the IMDb Top 250 populated here — only the top 49. The second is that the spreadsheet updates only when you open it again. So it’s not really a dynamically updating feed. You need to open the spreadsheet to refresh it.

But if you really wanted these things, there’s always Yahoo! Pipes.

Taare Zameen Par and Calvin

December 30, 2007 December 30, 2007 / Funny / 8 Comments

Watch this segment of Taare Zameen Par.

Then check these Calvin & Hobbes strips.

Bless them both — Aamir and Bill.

Tamil songs quiz – Enchanting first interludes

December 27, 2007 January 25, 2025 / Quizzes / 27 Comments

Some background scores just stay in your mind. Here is a tribute to 20 wonderful first interludes, dating from the 1980s to the 2000s. Can you guess which movies they are from? (My intention here is not to make this tough, but rather to let you enjoy the music. So hope to see most of you score 20/20)

Don’t worry about the spelling. Just spell it like it sounds, and the box will turn green.

Javascript error logging

December 25, 2007 January 22, 2022 / Coding / 5 Comments

If something goes wrong with my site, I like to know of it. My top three problems are:

The site is down

A page is missing

Javascript isn’t working

This is the last of 3 articles on these topics.

I am a bad programmer

I am not a professional developer. In fact, I’m not a developer at all. I’m a management consultant. (Usually, it’s myself I’m trying to convince.)

Since no one pays me for what little code I write, no one shouts at me for getting it wrong. So I have a happy and sloppy coding style. I write what I feel like, and publish it. I don’t test it. Worse, sometimes, I don’t even run it once. I’ve sent little scripts off to people which wouldn’t even compile. I make changes to this site at midnight, upload it, and go off to sleep without checking if the change has crashed the site or not.

But no one tells me so

At work, that’s usually OK. On the few occasions where I’ve written Perl scripts or VB Macros that don’t work, people call me back within a few hours, very worried that THEY’d done something wrong. (Sometimes, I don’t contradict them.) It can be quite a stressful experience but good thing you can learn more here on how to cope up with it.

On my site, I don’t always get that kind of feedback. People just click the back button and go elsewhere.

Recently, I’ve been doing more Javascript work on my site than writing stuff. Usually, the code works for me. (I write it for myself in the first place.) But I end up optimising for Firefox rather than IE, and for the plugins I have, etc. When I try the same app a few months later on my media PC, it doesn’t work, and shockingly enough, no one’s bothered telling me about it all these months. They’d just click, nothing happens, they’d vanish.

But their browsers can tell me

The good part about writing code in Javascript is that I can catch exceptions. Any Javascript error can be trapped. So since the end of last year, I’ve started wrapping almost every Javascript function I write in a try {} catch() {} block. In the catch block, I send a log message reporting the error.

The code looks something like this:

function log(e, msg) {
    for (var i in e) { msg += i + "=" + e[i] + "\n"; }
    (new Image()).src="log.pl?m=" + encodeURIComponent(msg);
}

function abc() {
    try {
    // ... function code
    } catch(e) { log(e, "abc"); }
}

Any time there’s an error in function abc, the log function is called. It sends the function name ("abc") and the error details (the contents of the error event) to log.pl, which stores the error, along with details like the URL, browser, time and IP address. This way, I know exactly where what error occurs.

This is a fantastic for a three reasons.

It tells me when I’ve goofed up. This is instantaneous feedback. I don’t have to wait for a human. If you run my program on your machine, and it fails, I get to know immediately. (Well, as soon as I read the error log, at least.)
It tells me where I’ve goofed up. The URL and the function name clearly indicate the point of failure.
It tells me why I’ve goofed up. Almost. Using the browser name and the error message, I can invariably pinpoint the reason for the error. Then it’s just a matter of taking the time to fix it.

I’d think this sort of error reporting should be the norm for any software. At least for a web app, given how easy it is to implement.

Monitoring site downtime

December 24, 2007 December 24, 2007 / How I do things / 2 Comments

If something goes wrong with my site, I like to know of it. My top three problems are:

The site is down

A page is missing

Javascript isn’t working

I’ll talk about how I manage these over 3 articles.

My site used to go down a lot. Initially that was because I kept playing around with mod_rewrite and other Apache modules without quite understanding them. I’d make a change and upload it without testing. (I’m like that.) And then I’d go to sleep.

Next morning, the site’s down, and has been down all night.

This is a bit annoying. Partly because I couldn’t use my site, but mostly because of the Oh yeah, sorry — I goofed up last night replies that I have to send out the next day.

So I started using Site24x7 to track if my website was down. It’s a convenient (and free) service. It pings my site every hour. If it’s down, I get an SMS. If it’s back up, I get an SMS. It also keeps a history of how often the site is down.

Over time, I stopped making mistakes. But my site still kept going down, thanks to my hosting service (100WebSpace). When I goof up, it’s just an annoyance, and I can fix it. But when my hosting service goes down, it’s more than that. My site is where I listen to music, read comics, read RSS feeds, use custom search engines, watch movies, browse for books, etc. Not being able to do these things — nor fix the site — is suffocating.

Worse, I couldn’t sleep. I use my mobile as my alarm. It’s annoying to hear an SMS from under your pillow at 3am every day — especially if it says your site is down.

So I switched to HostGator a few months ago. Nowadays, the site is down a lot less. (In times of trouble, it becomes sluggish, but doesn’t actually go down.)

That came at a cost, though. I was paying 100 WebSpace about $25 per annum. I’m paying Hostgator about $75 per annum. Being the kind that analyses purchases to death, the big question for me was, is this worth it. There is where my other problem with the site being down kicks in. I get a bit of ad revenue from my site, and I lose that when the site’s down. (Not that it’s much. Still…)

According to Site24x7, my site was up ~98% of the time. So I’m losing about 2% of my potential ad revenue. For the extra $50 to be worth it, my ad revenue needs to be more than $50 / 2% = $2,500 per annum. I’m nowhere near it. So the switch isn’t actually a good idea economically, but it does make life convenient (which is pretty important) and I sleep better (much more so).

The important thing, I’ve realised, is not just to track this stuff. That’s useful, sure. But what really made Site24x7 useful to me is that it would alert me when there was a big problem.

There are many kinds of alerting.

There’s a report you can view whenever you remember to view it. (It could be an RSS feed, so at least you won’t have to remember the site. But you still need to read your feeds.)

Then there’s the more pushy alerting: sending you an e-mail. That may catch you instantly for the half of the day that you’re online. Or, if you’re like me, it may completely escape your attention. (I don’t read e-mail.)

And then there’s the equivalent of shaking you by the shoulder — getting an SMS. (At least, that’s how it is for me. Incidentally, I don’t reply to SMS either. Calling me gets a reply. Nothing else.)

The type of alerting is clearly a function of the severity of the problem. Wake me up when my site goes down. Don’t wake me up if a link is broken.

Site24x7 sends me an SMS when my site is down. Fits the bill perfectly.

Handling missing pages

December 24, 2007 December 24, 2007 / How I do things / Leave a Comment

If something goes wrong with my site, I like to know of it. My top three problems are:

The site is down

A page is missing

Javascript isn’t working

This article covers the second topic.

One thing I’m curious about is hits to non-existent pages (404s) on my site. I usually get 404s because:

I renamed the page
Someone typed a wrong URL
Someone followed a wrong link

Find the 404

The first problem is to know when someone gets a 404. I’ve seen sites that tell you to contact the administrator in case of a 404. That’s crazy. The administrator should automatically detect of 404s! Almost every web server provides this facility.

The real issue is attention. I receive 700 404s a day. That’s too much to manually inspect. And most of these are not for proper web pages, but for images (for example, almost all my 404s used to be for browsers requesting favicon.ico) or weird MS Office files.

I’m interested in a small subset of 404 errors. Those that hit a web page, not support files. And those requested by a human, not a search engine or a program.

A decent way of filtering these is to use Javascript in your 404 page. Javascript is typically executed only by browsers (i.e. humans, not search engines), and only in a web page (not images, etc.) So if you serve Javascript in your 404 page, and it gets executed, it’s likely to be a human requesting a web page.

I have a piece of Javascript in my custom 404 page that looks something like this:

<script>(new Image()).src = "/log.pl";</script>

Every time this code runs, it loads a new image. The source of the image is a Perl script, log.pl. Every time log.pl is accessed, it logs the URL from which it was called. I’m reasonably guaranteed that these are web pages a human tried to access.

The reduction in volume is tremendous. On a typical month, I get ~20,000 404 errors. With the Javascript logging, it’s down to around 200 a month, and most of them quite meaningful.

Point to the right page

Sometimes, the change happens because I changed the URLs. I keep fiddling with the site structure. Someone would have links to an old page that I’ve renamed. I may not even know that. Even if I did, they can’t be bothered with requests to change the link. So I’ve got to handle it.

The quickest way, I find, is to use Apache’s mod_rewrite. You can simply redirect the old URL to the new URL. For example, I used to have a link to /calvin.html which I now point to /calvinandhobbes.html. That becomes a simple line on my .htaccess file:

RewriteRule calvin.html  calvinandhobbes.html

I don’t do this for every site restructuring, though. I just restructure, wait for someone to request a wrong page, and when my 404 error log warns me, I create a line in the .htaccess. It keeps the redirections down to a minimum, and only for those links that are actually visited.

Be flexible with the URL structure

Sometimes people type in a wrong link. Often, these are unintentional. Here are some common misspellings for my Hindi songs search.

s-anand.net/hindi/
s-anand.net/Hindi
s-anand.net/hiundi

Occasionally, people are exploring the structure of my site:

s-anand.net/excel
s-anand.net/music
s-anand.net/hits

I need to decide what to do with both cases. For the former, sometimes my URL structure is too restrictive. I mean, why should someone have to remember to type /hindi instead of /Hindi or /hindi/? Who cares about case? Who cares about a trailing slash?

In such cases, I map all the variants to the right URL using mod_rewrite. For example, typing s-anand.net/HiNDi (with or without caps, with or without a slash at the end) will still take you to the right page.

As I keep discovering new mis-spellings, I take a call on whether to add it or not. The decision is usually based on volume. If two people make the same spelling mistake in a day, I almost certainly add the variant. Most of the time, it’s just typing errors like /hiundi which isn’t repeated oftener than once a month.

Provide search

To handle the exploratory URLs, and people following wrong links, I’ve turned my custom 404 page into a search engine.

For example, when someone types s-anand.net/excel, I know they’re searching for Excel. So I just do a Google Custom Search within my site for “excel” — that is, anything following the URL.

It’s a bit more complex than that, actually. I do a bit of tweaking to the URL, like convert punctuations (underscore, hyphen, full-stop, etc.) to spaces, remove common suffixes (.html, .htm) and ignore numbers. Quite often, it matches something on my site that they’re looking for. If not, ideally, I ought to try for various alternatives and subsets of the original search string to figure out a good match. But given that the number of mismatches is down to about one a day, I’m fairly comfortable right now.

What this means, incidentally, is that my site is, by default, a search engine for itself. To search for movie-related stuff on my site, just type s-anand.net/movie and you get a search of the word “movie” on my site. (Sort of like on a9.com, where searching for a9.com/keyword does a search on the keyword.)

Managing feed overload

December 18, 2007 March 8, 2009 / How I do things / Leave a Comment

I have only two problems with Google Reader.

The first is that it doesn’t support authenticated feeds. Ideally, I’d have liked to have a single reading list that combines my e-mail with newsfeeds. GMail offers RSS feeds of your e-mail. But the feeds require authentication (obviously) and Google Reader doesn’t support that right now. (So I usually don’t read e-mail 🙂

The second is that it’s tough to manage large feeds. It’s a personal quirk, really. I like to read all entries. If there are 100, I read all 100. If there are 1000, I struggle but read all 1000. I’m too scared to “Mark all read” because there are some sources that I don’t want to miss.

The 80-20 rule is at work here. There are some prolific writers (like Scoble) who write many entries a day. There are some prolific sources (del.icio.us or digg). I can’t keep up with such writers / sources. I don’t particularly want to. If I missed one day of del.icio.us popular items, I’ll just read the next day’s.

With Google Reader, that makes me uneasy. I don’t like having 200 unread items. I don’t like to mark them all read.

In such cases, popurls‘ approach is useful. It shows the top 15-30 entries of the popular sites as a snapshot. Any time you’re short of things to read, visit this. If you’re busy, don’t.

Using Google’s AJAX Feed API, it’s quite trivial to build your own feed reader. So I cloned popurls‘ layout into my bookmarks page, and put in feeds that I like.

You can customise my bookmarks page to use your own feeds. Save the page, open it in Notepad, and look for existing feeds. They’ll look like this:

"hacker news" : {
    entries:15,
    url:"http://news.ycombinator.com/rss"
},

The first line (“hacker news”) is the title of the feed. You can call it what you want. Set entries to the number of feed entries you want to show. Set url to the RSS feed URL. Save it, and you have your own feed reader. (If you want to put it up on your site, you may want to change the Google API key.)

Try it! Just save this page and edit the feeds.

Here, I must point out three things about Google’s AJAX Feed API that make it extremely powerful.

The obvious is that is allows Javascript access to RSS in a very easy way. That makes it very easy to integrate with any web page.

The second is subtler — it includes historical entries. So even if an RSS feed had only 10 entries, I could pick up the last 100 or 1,000, as long as Google has known about the feed for long enough. This is what makes Google Reader more of a platform rather than a simple feed reader application. Google Reader is a feed archiver — not just a feed reader.

The third (I’m a bit crazy here) is that you can use it to schedule tasks. Google’s FeedFetcher refreshes feeds every 3 hours or so. If you want to do something every three hours (or some multiple thereof — say every 24 hours), you can write a program that does what you want, and subscribe to it’s output as a feed.

You may notice that I have a Referrers to s-anand.net on my bookmarks page. These are the sites that someone clicked on to visit my site. I have a PHP application that searches my access log for external referrers. Rather than visit that page every time, I just made an RSS feed out of it and subscribed to it. Every three hours or so, Google accesses the URL. I search my access.log and archives the latest results. So, even after my access.log is trimmed by the server, I have it all on Google Reader to catch up with later.

Since Google doesn’t forget to ping, I can schedule some fairly time-critical processes this way. For instance, if I wanted to download each Dilbert strip, every day as it arrives, I can create an application that downloads the file and returns a feed entry. Now, I don’t need to remember to run it every day! I just subscribe to the application on Google Reader, and Google will remind the application to run every 3 hours. (I know — I could use a crontab, but somehow, I like this.) Plus I would get the Dilbert strip on my feed reader as a bonus.

Update: Google has just released PartnerBar, which is a more flexible way of viewing a snapshot of feeds.

S Anand