S Anand

Filtering vs weighting

I am selecting a CRM package for a bank. I asked my colleagues how they’d gone about it, and got 8 responses. Every single one of them had the same weighting approach: Take a huge list of criteria, assign weights, score each package, calculate a weighted-average score, pick the highest one.

As I mentioned earlier, I think weighting is a lousy method. (See Errors in multicriteria decision making.) You can’t say “I picked this package because it has X, Y and Z features, which the others don’t.” You can only say, “Oh, overall, it has the highest score…”

The scores and the weights are subjective. You spend ages arguing between a 3 and a 4. You can manipulate them very easily. And you end up having to revise the scores many times to get to the answer you want.

Since I now had an opinion, I put my foot down, and said, “Here’s what we’ll do. Let’s make a list of essential criteria. They will all be YES / NO questions. Any package that doesn’t meet any criteria is knocked off. That’s it.”

This may appear simplistic, but it isn’t. You see, at the end of the day, only a few criteria really matter. Ideally, you just pick these, and compare packages against these. Since you don’t know which these are, you make a bigger list, evaluate them all, and then realise the truth.

Sometimes, you have too many criteria. Then none of the packages make it, and you have to sacrifice some of your criteria.

Sometimes, all of them make it. Then you can choose to enforce more criteria. Or maybe not. If all of them meet your criteria, just pick the cheapest one.

Internally, we were convinced of this approach, and took it to the client. Things were fine for a week. Then, complaints started trickling in.

  1. Fear: “I’ve already used weighting in earlier package evaluations. Now, if you use filtering, it’ll make my earlier evaluations look bad…”
  2. Uncertainty: “I don’t know… what if there’s no YES / NO answer? What if we need shades of grey?”
  3. Doubt: What if it lets everything through? What if it rejects everything?

Uncertainty is the most popular objection.

“What if we need shades of gray?”

I always ask: “Any example?”

“Well, you know… it can come up.”

So I give them an example, and explain how it can be broken in to sub-questions.

“Well, yeah… but just to be on the safe side, could we have a score?”

The exercise is still going on. I haven’t seen a valid concern yet. What’s interesting is, everyone is hesitant about filtering, but no one can defend their objection.

Popular lousy movies

If you plot all movies by their number-of-votes on IMDb and their rating on IMDb, you get the chart below. Movies with more votes usually have a higher rating.

Popular lousy movies on IMDb

I was interested two things:

  1. Which are the unpopular, but good (highly-rated), movies?
  2. Which are the popular, but lousy, movies?

The answer to the first question is: there are no unpopular good movies. The cluster of dots on the top-left (in red) are not movies — they’re TV shows (Band of Brothers, Pride and Prejudice, Arrested Development).

The answer to the second is: there are 9 really popular lousy movies.

It’s interesting that every single one of these had a huge budget. (Perhaps this is understandable: more people would see a big-budget film and vote on it.)

Notepad easter egg is really a bug

If you create a file in Windows Notepad with the string “bush hid the facts”, save it and reopen it, it shows you boxes. Same with “this app can break”. Here’s why. It has nothing to do with George Bush or Microsoft. It’s just that these strings are in ASCII, but they also constitute valid Unicode strings, and Notepad guesses (wrongly) that they are in fact Chinese Unicode files.

Early delays

I haven’t been blogging the last 6-7 weeks. This is partly because I’ve been averaging 1 book or movie per day, but mostly because I ran out of things to say. I will start again soon. In the meantime, this is an announcement I heard when travelling on the Jubilee line. (The train had halted at North Greenwich.)

“Ladies and gentlemen, we’re being held at this station for a while. This is because, you’re not going to believe this, but we’re slightly early! We’re not due at North Greenwich for another 60 seconds. Once again, I apologise for the delay, which is because we’re early.

Programming theorems

Programming theorems.

The likelihood of Perl being involved in a system is directly proportional to the length of time the system has been in maintenance.

Every 5 minutes you spend writing code in a new language is more useful than 5 hours reading blog posts about how great the language is.

Think twice before presuming that CSV is a nice little easy file format. (see Leon)