A cool thing about Google Spreadsheets is that you can scrape websites using external data functions like importHtml. It’s really easy to use. The formula:
=importHtml("http://www.imdb.com/chart/top", "table", 1)
imports the Internet Movie Database top 250 table on to Google Spreadsheets.
Since you can publish these as RSS feeds, it ought to, in theory, be a great way of generating RSS feeds out of arbitrary content.
There’s just one problem: it doesn’t auto update.
There are claims that it does every hour. Maybe it does when the sheet is open. I don’t know. But it definitely does not when the sheet is closed. I wrote a simple script that logs the time at which the script was accessed, and prints the log every time it is accessed.
#!/usr/bin/env python import datetime, os.path print 'Content-Type: text/plain; charset=utf-8' print '' logfile = 'timenow.log' try: timelog = open(logfile).readlines() except: timelog = [] timelog.append(str(datetime.datetime.now()) + '\n') open(logfile, 'w').writelines(timelog) print ''.join(timelog) |
Then I importHtml’ed it into Google spreadsheets, and left it on for the night. Result: absolutely no hits when the document is closed.
Pity. Guess YQL is still the best option.
I’ve just started using google spreadsheets. Really impressed with how user friendly they are. I went ine expecting the worse and was very suprised.
I am running into the same thing. Can’t get importHtml() to refresh when google spreadsheet is closed. Was thinking I could hammer force a cell value to equal the IH() function via a script but that seems awfully inelegant.
so what happens when you add a parameter to the url with a value coming from a field which is set to show the current time – minute or hour ?