S Anand

Lazy bargain hunting

I’m thinking of buying a digital keyboard with touch sensitive keys and MIDI support. (The one other thing that I thought off — a pitch bend — puts the keyboards out of my budget.)

I’d like a good deal. (Who doesn’t?) But I don’t like to spend time searching for one. (Who does?)

So here’s the plan.

Firstly, I’ll restrict my search to Amazon.co.uk. For electronics items, I haven’t found anyone consistently cheaper. Tesco has some pretty low prices, but not the range. eBuyer is pretty good, but not often enough. Google Products is the only other one that gets me consistent lower prices, but I’ve had my credit card identity stolen once before while shopping online, so I’d rather not pick any random seller listed on Google.

Amazon has a secret discount. You can search for electronics items with 30% off or more. And then you can narrow it down to Sound & Vision > Musical Instruments > MIDI Keyboards. Further cap a 100 – 200 GBP restriction. That leaves us with one product:

MIDI keyboard on Amazon

While that matches my criteria, I’m in no hurry and can wait for more offers to come up. But I don’t want to keep checking this page every day. So, RSS to the rescue. You probably think I can’t get enough of RSS feeds. And you’d be right. The thing is, as an attention mechanism, it is incredibly powerful, and I never cease to be amazed that the things it lets me do.

Using my XPath checker and a bit of trial and error, I figured all product links link to “amazon.co.uk/dp/…” with a <span> inside. So this XPath gets all the links:

//a[contains(@href,'/dp/')][span]

And I made an RSS feed out of that using my XPath server and subscribed to it on Google Reader.

Combining a bunch of such searches, I have a shopping folder on Google Reader has all the items I’m searching for. Now that’s lazy bargain hunting.


Which is all very fine. But given that I’m buying a car in a hurry right now, and I’m not doing any bargain hunting, it’s a classic case of being penny-wise and pound-foolish. Sigh…

Implicit information

From what I’ve seen, puzzles and exam questions share two un-real-worldly characteristics. Firstly, you are guaranteed that a solution exists. Secondly, you are given that all the information provided to you is relevant. (Well, not always. Some case studies I’ve seen have had their share of contrived irrelevance. But that’s often what it is, I think. People fill in the relevant stuff, and then try and distract by adding irrelevant material in the hope of making it more real-world-like. But that’s just a guess).

These are very powerful constraints. I know of nothing that has given me as much confidence in solving puzzles as the assurance that a solution exists (and that someone thinks me capable of getting it).

But it’s more than just a confidence builder. The guarantee that a solution (and invariably it’s a unique) is a very powerful one. An extreme case is an objective type question, which explicitly provides three guarantees:

  1. There is a solution
  2. There is only ONE solution
  3. It is among the choices listed below

(Some papers try and take away the first guarantee by having an (E) None of the above category. But that’s still leaving behind the other two more powerful guarantees.)

Marking answers randomly, or marking (A) for every question would still get you 25% in an exam with 4 choices. (Marking (C) would prove just as good, unless you had a kind professor like this.) That’s better than any real-world scenario I’ve seen. (Real-world strategies aren’t much better, though.)

Using guarantee 2, you can eliminate choices easily. If (A) and (B) do not satisfy some property of the solution, they CANNOT be the answer. There’s only one solution, and these are not it.

Using guarantee 3, you can pick the last remaining choice wihout having to check it. The solution is definitely among the choices listed. So you don’t need to solve an objective type question. You just need to pick the right answer — which is completely different.

The principle applies even outside of objective type questions, especially in mathematically-oriented problems, or puzzles. And you can solve it by trial and error. For example, try this one from Martin Gardner‘s Mathematical Magic Show:

Two brothers own n sheep, each of which is sold for n dollars. Thus they have n2 dollars in all. This is in the form of 10 dollar notes and 1 dollar coins, the number of 1 dollar coins being less than 10 dollars. The elder brother divides the money as follows: he takes a note for himself, gives one to his younger brother, takes a note for himself and so on. At the end, the younger brother complains that the elder took the first note as well as the last. So the elder gives the younger all the one dollar coins. The younger brother complains that he still has more. So the elder brother writes the younger a cheque to equalize their share. What was the cheque for?

Now, this is a weird problem. Think about it. You’re told almost NOTHING. And you have to guess what the amount is. (Note: you don’t have to guess what ‘n’ is. That’s impossible.)

Here’s how I solved the problem. I said, let me find even one case where the elder brother gets the first and last note. Let’s see what the answer is. Whatever the answer is for that case, it has to be the answer for all other cases — because otherwise, the problem does not have a unique solution.

So I tried n=1. n=2. n=3. For n=4, the amount is 16. That’s 1 $10 note and 6 $1 coins. The elder brother would get the first and the last $10 note. The younger would get $6. So the elder would have $4 more than the younger, and would write out a cheque for $2. (It’s amazing how many people get as far as the $4, but forget to divide by two.)

You can try if for any other value that has an odd number of $10 notes. It has to be for n ending with 4 or 6. That means n2 ends in 6, and the cheque has to be for $2.

Notice that you didn’t need number theory to get the answer. The assurance that there is a unique answer is enough.


There’s another kind of implicit information usually available: the amount of information there is. For example, take the following question:

Which city has a higher population: San Antonio or San Diego?

Children in the US apparantly had difficulty answering it. Children in Germany had less trouble. The reason? The German kids had heard of San Diego, but not San Antonio. They figured the one they’d heard of was more likely bigger. Knowing less may be better.

It’s the same principle you use to check spellings. Run a Google search on two spellings. The one that returns a higher number of results is the correct spelling. (Of course, Google has a spelling correction mechanism that works well, but I use it for Tamil words. I can never tell if I should use ர or ற.)

Of course, the fundamental assumption here is: MORE INFORMATION = MORE CORRECT, which is not always the case. But the point I’m driving to is this:

You’re always given additional information. Even if you’re not given any information, that’s informative.

Web lookup using Google Spreadsheets

I’d written earlier about Web lookup in Excel. I showed an example how you could create a movie wishlist that showed the links to the torrents from Mininova.

You can do that even easier on Google Spreadsheets. It has 4 functions that let you import external data:

  • =importData(“URL of CSV or TSV file”).
    Imports a comma-separated or tab-separated file.
  • =importFeed(URL).vLets you import any Atom or RSS feed.
  • =importHtml(URL, “list” | “table”, index).
    Imports a table or list from any web page.
  • =importXML(“URL”,”query”).
    Imports anything from any web page using XPath.

Firstly, you can see straight off why it’s easy to view RSS feeds in Google Spreadsheets. Just use the importFeed function straight away. So, for example, if I wanted to track all 8GB iPods on Google Base, I can import its feed in Google Spreadsheets.

Google Spreadsheets ImportFeed

This automatically creates a list of the latest 8GB iPods.

Incidentally, the “Price” column doesn’t appear automatically. It’s a part of the description. But it’s quite easy to get the price using the standard Excel functions. Let’s say the description is in cell C1. =MID(C1, FIND("Price", C1), 20) gets you the 20 characters starting from “Price”. Then you can sort and play around as usual.

The other powerful thing about Google Spreadsheets is the CONTINUE function. The importFeed function creates a multi-dimensional array. You can extract any cell from the array (for example, row 3, column 2 from cell C1) using CONTINUE(C1, 3, 2). So you can just pick up the title and description, or only alternate rows, or put all rows and columns in a single column — whatever.

Google Spreadsheets CONTINUE

The most versatile of the import functions is the importXML function. It lets you import any URL (including an RSS feed), filtering only the XPath you need. As I mentioned earlier, you can scrape any site using XPath.

For example, =importXML("http://www.imdb.com/chart/top", "//table//table//table//a") imports the top 250 movies from the IMDb Top 250. the second parameter says, get all links (a) inside a table inside a table inside a table. This populates a list with the entire Top 250.

Google Spreadsheets - ImportXML

Now, against each of these, we could get a feed of Mininova’s torrents. Mininova’s RSS URL is http://www.mininova.org/rss/search_string. So, in cell B1, I can get a torrent for the cell A1 (The Godfather) using the importFeed function. (Note: you need to replace spaces with a + symbol. These functions don’t like invalid URLs.).

Google Spreadsheets - Import Mininova Feed

Just copy this formula down to get a torrent against each of the IMDb Top 250 movies!

Check out the sheet I’ve created. (You need a Google account to see the sheet. If you don’t want have one, you can view the sheet.)


Now, that’s still not the best of it. You can extract this file as an RSS feed! Google lets you publish your sheets as HTML, PDF, Text, XLS, etc. and RSS and Atom are included as well. Here’s the RSS feed for my sheet above.

Think about it. We now have an application that sucks in data from a web page, does a web-based vlookup on another service, and returns the results as an RSS feed!

There are only two catches to this. The first is that Google has restricted us to 50 import functions per sheet. So you can’t really have the IMDb Top 250 populated here — only the top 49. The second is that the spreadsheet updates only when you open it again. So it’s not really a dynamically updating feed. You need to open the spreadsheet to refresh it.

But if you really wanted these things, there’s always Yahoo! Pipes.

Tamil songs quiz – Enchanting first interludes

Some background scores just stay in your mind. Here is a tribute to 20 wonderful first interludes, dating from the 1980s to the 2000s. Can you guess which movies they are from? (My intention here is not to make this tough, but rather to let you enjoy the music. So hope to see most of you score 20/20)

Don’t worry about the spelling. Just spell it like it sounds, and the box will turn green.

Search for the song and listen online, if you want to confirm your guess.

Score: 0 / 20
Song 1
Song 2
Song 3
Song 4
Song 5
Song 6
Song 7
Song 8
Song 9
Song 10
Song 11
Song 12
Song 13
Song 14
Song 15
Song 16
Song 17
Song 18
Song 19
Song 20

Javascript error logging

If something goes wrong with my site, I like to know of it. My top three problems are:

  1. The site is down
  2. A page is missing
  3. Javascript isn’t working

This is the last of 3 articles on these topics.

I am a bad programmer

I am not a professional developer. In fact, I’m not a developer at all. I’m a management consultant. (Usually, it’s myself I’m trying to convince.)

Since no one pays me for what little code I write, no one shouts at me for getting it wrong. So I have a happy and sloppy coding style. I write what I feel like, and publish it. I don’t test it. Worse, sometimes, I don’t even run it once. I’ve sent little scripts off to people which wouldn’t even compile. I make changes to this site at midnight, upload it, and go off to sleep without checking if the change has crashed the site or not.

But no one tells me so

At work, that’s usually OK. On the few occasions where I’ve written Perl scripts or VB Macros that don’t work, people call me back within a few hours, very worried that THEY’d done something wrong. (Sometimes, I don’t contradict them.) It can be quite a stressful experience but good thing you can learn more here on how to cope up with it.

On my site, I don’t always get that kind of feedback. People just click the back button and go elsewhere.

Recently, I’ve been doing more Javascript work on my site than writing stuff. Usually, the code works for me. (I write it for myself in the first place.) But I end up optimising for Firefox rather than IE, and for the plugins I have, etc. When I try the same app a few months later on my media PC, it doesn’t work, and shockingly enough, no one’s bothered telling me about it all these months. They’d just click, nothing happens, they’d vanish.

But their browsers can tell me

The good part about writing code in Javascript is that I can catch exceptions. Any Javascript error can be trapped. So since the end of last year, I’ve started wrapping almost every Javascript function I write in a try {} catch() {} block. In the catch block, I send a log message reporting the error.

The code looks something like this:

function log(e, msg) {
    for (var i in e) { msg += i + "=" + e[i] + "\n"; }
    (new Image()).src="log.pl?m=" + encodeURIComponent(msg);
}

function abc() {
    try {
    // ... function code
    } catch(e) { log(e, "abc"); }
}

Any time there’s an error in function abc, the log function is called. It sends the function name ("abc") and the error details (the contents of the error event) to log.pl, which stores the error, along with details like the URL, browser, time and IP address. This way, I know exactly where what error occurs.

This is a fantastic for a three reasons.

  • It tells me when I’ve goofed up. This is instantaneous feedback. I don’t have to wait for a human. If you run my program on your machine, and it fails, I get to know immediately. (Well, as soon as I read the error log, at least.)
  • It tells me where I’ve goofed up. The URL and the function name clearly indicate the point of failure.
  • It tells me why I’ve goofed up. Almost. Using the browser name and the error message, I can invariably pinpoint the reason for the error. Then it’s just a matter of taking the time to fix it.

I’d think this sort of error reporting should be the norm for any software. At least for a web app, given how easy it is to implement.

Monitoring site downtime

If something goes wrong with my site, I like to know of it. My top three problems are:

  1. The site is down
  2. A page is missing
  3. Javascript isn’t working

I’ll talk about how I manage these over 3 articles.

My site used to go down a lot. Initially that was because I kept playing around with mod_rewrite and other Apache modules without quite understanding them. I’d make a change and upload it without testing. (I’m like that.) And then I’d go to sleep.

Next morning, the site’s down, and has been down all night.

This is a bit annoying. Partly because I couldn’t use my site, but mostly because of the Oh yeah, sorry — I goofed up last night replies that I have to send out the next day.

So I started using Site24x7 to track if my website was down. It’s a convenient (and free) service. It pings my site every hour. If it’s down, I get an SMS. If it’s back up, I get an SMS. It also keeps a history of how often the site is down.

Site24x7

Over time, I stopped making mistakes. But my site still kept going down, thanks to my hosting service (100WebSpace). When I goof up, it’s just an annoyance, and I can fix it. But when my hosting service goes down, it’s more than that. My site is where I listen to music, read comics, read RSS feeds, use custom search engines, watch movies, browse for books, etc. Not being able to do these things — nor fix the site — is suffocating.

Worse, I couldn’t sleep. I use my mobile as my alarm. It’s annoying to hear an SMS from under your pillow at 3am every day — especially if it says your site is down.

So I switched to HostGator a few months ago. Nowadays, the site is down a lot less. (In times of trouble, it becomes sluggish, but doesn’t actually go down.)

That came at a cost, though. I was paying 100 WebSpace about $25 per annum. I’m paying Hostgator about $75 per annum. Being the kind that analyses purchases to death, the big question for me was, is this worth it. There is where my other problem with the site being down kicks in. I get a bit of ad revenue from my site, and I lose that when the site’s down. (Not that it’s much. Still…)

According to Site24x7, my site was up ~98% of the time. So I’m losing about 2% of my potential ad revenue. For the extra $50 to be worth it, my ad revenue needs to be more than $50 / 2% = $2,500 per annum. I’m nowhere near it. So the switch isn’t actually a good idea economically, but it does make life convenient (which is pretty important) and I sleep better (much more so).

The important thing, I’ve realised, is not just to track this stuff. That’s useful, sure. But what really made Site24x7 useful to me is that it would alert me when there was a big problem.

There are many kinds of alerting.

There’s a report you can view whenever you remember to view it. (It could be an RSS feed, so at least you won’t have to remember the site. But you still need to read your feeds.)

Then there’s the more pushy alerting: sending you an e-mail. That may catch you instantly for the half of the day that you’re online. Or, if you’re like me, it may completely escape your attention. (I don’t read e-mail.)

And then there’s the equivalent of shaking you by the shoulder — getting an SMS. (At least, that’s how it is for me. Incidentally, I don’t reply to SMS either. Calling me gets a reply. Nothing else.)

The type of alerting is clearly a function of the severity of the problem. Wake me up when my site goes down. Don’t wake me up if a link is broken.

Site24x7 sends me an SMS when my site is down. Fits the bill perfectly.

Handling missing pages

If something goes wrong with my site, I like to know of it. My top three problems are:

  1. The site is down
  2. A page is missing
  3. Javascript isn’t working

This article covers the second topic.

One thing I’m curious about is hits to non-existent pages (404s) on my site. I usually get 404s because:

  • I renamed the page
  • Someone typed a wrong URL
  • Someone followed a wrong link

Find the 404

The first problem is to know when someone gets a 404. I’ve seen sites that tell you to contact the administrator in case of a 404. That’s crazy. The administrator should automatically detect of 404s! Almost every web server provides this facility.

The real issue is attention. I receive 700 404s a day. That’s too much to manually inspect. And most of these are not for proper web pages, but for images (for example, almost all my 404s used to be for browsers requesting favicon.ico) or weird MS Office files.

I’m interested in a small subset of 404 errors. Those that hit a web page, not support files. And those requested by a human, not a search engine or a program.

A decent way of filtering these is to use Javascript in your 404 page. Javascript is typically executed only by browsers (i.e. humans, not search engines), and only in a web page (not images, etc.) So if you serve Javascript in your 404 page, and it gets executed, it’s likely to be a human requesting a web page.

I have a piece of Javascript in my custom 404 page that looks something like this:

<script>(new Image()).src = "/log.pl";</script>

Every time this code runs, it loads a new image. The source of the image is a Perl script, log.pl. Every time log.pl is accessed, it logs the URL from which it was called. I’m reasonably guaranteed that these are web pages a human tried to access.

The reduction in volume is tremendous. On a typical month, I get ~20,000 404 errors. With the Javascript logging, it’s down to around 200 a month, and most of them quite meaningful.

Point to the right page

Sometimes, the change happens because I changed the URLs. I keep fiddling with the site structure. Someone would have links to an old page that I’ve renamed. I may not even know that. Even if I did, they can’t be bothered with requests to change the link. So I’ve got to handle it.

The quickest way, I find, is to use Apache’s mod_rewrite. You can simply redirect the old URL to the new URL. For example, I used to have a link to /calvin.html which I now point to /calvinandhobbes.html. That becomes a simple line on my .htaccess file:

RewriteRule calvin.html  calvinandhobbes.html

I don’t do this for every site restructuring, though. I just restructure, wait for someone to request a wrong page, and when my 404 error log warns me, I create a line in the .htaccess. It keeps the redirections down to a minimum, and only for those links that are actually visited.

Be flexible with the URL structure

Sometimes people type in a wrong link. Often, these are unintentional. Here are some common misspellings for my Hindi songs search.

s-anand.net/hindi/
s-anand.net/Hindi
s-anand.net/hiundi

Occasionally, people are exploring the structure of my site:

s-anand.net/excel
s-anand.net/music
s-anand.net/hits

I need to decide what to do with both cases. For the former, sometimes my URL structure is too restrictive. I mean, why should someone have to remember to type /hindi instead of /Hindi or /hindi/? Who cares about case? Who cares about a trailing slash?

In such cases, I map all the variants to the right URL using mod_rewrite. For example, typing s-anand.net/HiNDi (with or without caps, with or without a slash at the end) will still take you to the right page.

As I keep discovering new mis-spellings, I take a call on whether to add it or not. The decision is usually based on volume. If two people make the same spelling mistake in a day, I almost certainly add the variant. Most of the time, it’s just typing errors like /hiundi which isn’t repeated oftener than once a month.

Provide search

To handle the exploratory URLs, and people following wrong links, I’ve turned my custom 404 page into a search engine.

For example, when someone types s-anand.net/excel, I know they’re searching for Excel. So I just do a Google Custom Search within my site for “excel” — that is, anything following the URL.

It’s a bit more complex than that, actually. I do a bit of tweaking to the URL, like convert punctuations (underscore, hyphen, full-stop, etc.) to spaces, remove common suffixes (.html, .htm) and ignore numbers. Quite often, it matches something on my site that they’re looking for. If not, ideally, I ought to try for various alternatives and subsets of the original search string to figure out a good match. But given that the number of mismatches is down to about one a day, I’m fairly comfortable right now.

What this means, incidentally, is that my site is, by default, a search engine for itself. To search for movie-related stuff on my site, just type s-anand.net/movie and you get a search of the word “movie” on my site. (Sort of like on a9.com, where searching for a9.com/keyword does a search on the keyword.)

Managing feed overload

I have only two problems with Google Reader.

The first is that it doesn’t support authenticated feeds. Ideally, I’d have liked to have a single reading list that combines my e-mail with newsfeeds. GMail offers RSS feeds of your e-mail. But the feeds require authentication (obviously) and Google Reader doesn’t support that right now. (So I usually don’t read e-mail 🙂

The second is that it’s tough to manage large feeds. It’s a personal quirk, really. I like to read all entries. If there are 100, I read all 100. If there are 1000, I struggle but read all 1000. I’m too scared to “Mark all read” because there are some sources that I don’t want to miss.

The 80-20 rule is at work here. There are some prolific writers (like Scoble) who write many entries a day. There are some prolific sources (del.icio.us or digg). I can’t keep up with such writers / sources. I don’t particularly want to. If I missed one day of del.icio.us popular items, I’ll just read the next day’s.

With Google Reader, that makes me uneasy. I don’t like having 200 unread items. I don’t like to mark them all read.

In such cases, popurls‘ approach is useful. It shows the top 15-30 entries of the popular sites as a snapshot. Any time you’re short of things to read, visit this. If you’re busy, don’t.

Using Google’s AJAX Feed API, it’s quite trivial to build your own feed reader. So I cloned popurls‘ layout into my bookmarks page, and put in feeds that I like.

You can customise my bookmarks page to use your own feeds. Save the page, open it in Notepad, and look for existing feeds. They’ll look like this:

"hacker news" : {
    entries:15,
    url:"http://news.ycombinator.com/rss"
},

The first line (“hacker news”) is the title of the feed. You can call it what you want. Set entries to the number of feed entries you want to show. Set url to the RSS feed URL. Save it, and you have your own feed reader. (If you want to put it up on your site, you may want to change the Google API key.)

Try it! Just save this page and edit the feeds.


Here, I must point out three things about Google’s AJAX Feed API that make it extremely powerful.

The obvious is that is allows Javascript access to RSS in a very easy way. That makes it very easy to integrate with any web page.

The second is subtler — it includes historical entries. So even if an RSS feed had only 10 entries, I could pick up the last 100 or 1,000, as long as Google has known about the feed for long enough. This is what makes Google Reader more of a platform rather than a simple feed reader application. Google Reader is a feed archiver — not just a feed reader.

The third (I’m a bit crazy here) is that you can use it to schedule tasks. Google’s FeedFetcher refreshes feeds every 3 hours or so. If you want to do something every three hours (or some multiple thereof — say every 24 hours), you can write a program that does what you want, and subscribe to it’s output as a feed.

You may notice that I have a Referrers to s-anand.net on my bookmarks page. These are the sites that someone clicked on to visit my site. I have a PHP application that searches my access log for external referrers. Rather than visit that page every time, I just made an RSS feed out of it and subscribed to it. Every three hours or so, Google accesses the URL. I search my access.log and archives the latest results. So, even after my access.log is trimmed by the server, I have it all on Google Reader to catch up with later.

Since Google doesn’t forget to ping, I can schedule some fairly time-critical processes this way. For instance, if I wanted to download each Dilbert strip, every day as it arrives, I can create an application that downloads the file and returns a feed entry. Now, I don’t need to remember to run it every day! I just subscribe to the application on Google Reader, and Google will remind the application to run every 3 hours. (I know — I could use a crontab, but somehow, I like this.) Plus I would get the Dilbert strip on my feed reader as a bonus.


Update: Google has just released PartnerBar, which is a more flexible way of viewing a snapshot of feeds.

Scraping RSS feeds using XPath

If a site doesn’t have an RSS feed, your simplest option is to use Page2Rss, which gives a feed of what’s changed on a page.

My needs, sometimes, are a bit more specific. For example, I want to track new movies on the IMDb Top 250. They don’t offer a feed. I don’t want to track all the other junk on that page. Just the top 250.

There’s a standard called XPath. It can be used to search in an HTML document in a pretty straightforward way. Here are some examples:

//a Matches all <a> links
//p/b Matches all <b> bold items in a <p> para. (the <b> must be immediately under the <p>)
//table//a Matches all links inside a table (the links need not be immediately inside the table — anywhere inside the table works)

You get the idea. It’s like a folder structure. / matches the a tag that’s immediately below. // matches a tag that’s somewhere below. You can play around with XPath using the Firefox XPath Checker add-on. Try it — it’s much easier to try it than to read the documentation.

The following XPath matches the IMDb Top 250 exactly.

//tr//tr//tr//td[3]//a

(It’s a link inside the 3rd column in a table row in a table row in a table row.)

Now, all I need is to get something that converts that to an RSS feed. I couldn’t find anything on the Web, so I wrote my own XPath server. The URL:

www.s-anand.net/xpath?
url=http://www.imdb.com/chart/top&
xpath=//tr//tr//tr//td[3]//a

When I subscribe to this URL on Google Reader, I get to know whenever there’s a new movie on the IMDb Top 250.

This gives only the names of the movies, though, and I’d like the links as well. The XPath server supports this. It accepts a root XPath, and a bunch of sub-XPaths. So you can say something like:

xpath=//tr//tr//tr title->./td[3]//a link->./td[3]//a/@href

This says three things:

//tr//tr//tr Pick all rows in a row in a row
title->./td[3]//a For each row, set the title to the link text in the 3rd column
link->./td[3]//a … and the link to the link href in the 3rd column

That provides a more satisfactory RSS feed — one that I’ve subscribed to, in fact. Another one that I track is a list of mininova top seeded movies category.

You can whiff up more complex examples. Give it a shot. Start simple, with something that works, and move up to what you need. Use XPath Checker liberally. Let me know if you have any isses. Enjoy!