Coding

Why node.js

I’ve moved from Python to Javascript on the server side – specifically, Tornado to Node.js.

Three years ago, I moved from Perl to Python because I got free hosting at AppEngine. Python’s a cleaner language, but that was not enough to make me move. Free hosting was.

Initially, my apps were on AppEngine, but that wouldn’t work for corporate apps, so I tried Django. IMHO, Django’s too bulky, has too much “magic”, and templates are restrictive. Then I tried Tornado: small; independent modules; easy to learn. I used it for almost 2 years.

The unexpected bonus with Tornado was it’s event-based model: it wouldn’t wait for file or HTTP requests to be complete before serving the next request. I ended up getting a fair bit of performance from a single server.

Trouble is, Python’s a rare skill. I tried selling Python in corporates a couple of times, and barring RBS (which used it before I came in, and made it really easy for me to build an IRR calculator), I’ve failed every time. Apart from general fear, uncertainty and doubt, getting people is tougher.

Javascript’s a good choice. It has many of Python’s benefits. It’s easy to recruit people. Corporates aren’t terrified of it. Rhino was good enough a server. All it lacked was the “cool” factor, which node.js has now brought it. And besides,

  • It’s fast. About 20 times faster than Rhino, by my crude benchmarks.
  • It’s stable. (Well, at least, it feels stable. Rock solid stable. Sort of like nginx.)
  • It’s asynchronous. So I don’t miss Tornado
  • It has a pretty good set of libraries, thanks to everyone jumping on to it
  • I can write code that works on the client and server – e.g. form validation

Bye, Python.

HTML 4 & 5: The complete Reference

HTML-4-and-5-The-Complete-ReferenceHTML 4 & 5: The Complete Reference is an iPhone / iPad app that does exactly what it says: a reference for HTML 4 and 5.

It has a list of all tags, clearly demarcated as HTML4, HTML5 or both. The application is fairly easy to scroll through to find the tag or attribute you want. Clicking on a tag, you get:

  • a brief description of what it’s for
  • what attributes are valid – the good part is you can see clearly which attributes are specific to the element, and which ones are common (like class, id, etc.). You can also see the possible values for the attribute, which helps.
  • and an example of how the tag is used. The examples are quite simplistic, and there’s only one per tag, but it does have a rendered version of the code, which helps.

You can also scroll through the list of attributes and see which tags they’re valid for.

The part that quite interested me was the “characters” or HTML entities. Quite often, I’d want the pound (£) or right angle quotes (»), but wouldn’t know the character or entity reference. So far, I’ve been using this HTML entity reference to search for characters, where I can just type in the word (e.g. pound or quote) and it filters the list to show what I want. I was really hoping to see that on the app, but was disappointed. It lets you search, but it’s not search as you type. And the result points you to a section that contains the character – not directly to the character. (It’s a bit difficult to find the character in the longer sections).

There’s also a section where you can see elements by “task” – e.g. Forms, Link-related, Document Setup, Interaction, etc. This is a pretty useful break-up if you’re looking for the right element for the job, or browsing for interesting new elements to discover in HTML5. (I found the <menu> and <command> tags this way.

You do have the option of just downloading the PDF version of the HTML5 spec and reading it in iBooks, of course. So while I find the book useful, without a search-as-you-type feature, I suspect it won’t do much for my speed of looking up things, so I’ll just stick to the spec for the moment.

Disclosure: I’m writing this post as part of O’Reilly’s blogger review program. While I’m not getting paid to review the app, I did get it for free.

Yahoo Clues API

Yahoo Clues is like Google Insights for Search. It has one interesting thing that the latter doesn’t though: search flows.

It doesn’t have an official API, so I thought I’d document the unofficial one. The API endpoint is

http://clues.yahoo.com/clue

The query parameters are:

  • q1 – the first query string
  • q2 – the second query string
  • ts – the time span. 0 = today, 1 = past 7 days, 2 = past 30 days
  • tz – time zone? Not sure how it works. It’s just set to “0” for me
  • s – the format? No value other than “j” seems to work

So a search for “gmat” for the last 30 days looks like this:

http://clues.yahoo.com/clue?s=j&q1=gmat&q2=&ts=2&tz=0

The response has the all the elements required to render the page, but the search flows are located at:

  • response.data[2].qf.prevMax – an array of queries that often precede the current one
  • response.data[2].qf.nextMax – an array of queries that often follow the current one

The other parameters (such as demographic, geographic and search volume information) is pretty interesting as well, but is something you should be able to extract more reliably from Google Insights for Search.

Automated image enhancement

There are some standard enhancements that I apply to my photos consistently: auto-levels, increase saturation, increase sharpness, etc. I’d also read that Flickr sharpens uploads (at least, the resized ones) so that they look better.

So last week, I took 100 of my photos and created 4 versions of each image:

  1. The base image itself (example)
  2. A sharpened version (example). I used a sharpening factor of 200%
  3. A saturated version (example). I used a saturation factor of 125%
  4. An auto-levelled version (example)

I created a test asking people to compare these. The differences between these are not always noticeable when placed side-by-side, so the test flashed two images at the same place.

After about 800 ratings, here are the results. (Or, see the raw data.)

Sharpening clearly helps. 86% of the sharpened images were marked as better than the base images. Only 2 images (base/sharp, base/sharp) received a consistent feedback that the sharpened images were worse. (I have my doubts about those two as well.) On the whole, it seems fairly clear that sharpening helps.

Saturation and levels were roughly equal, and somewhat unclear. 69% of the saturated images and 68% of auto-levelled images were marked as better than the base images. And almost an equal number of images (52%) showed saturation as being better than the auto-levelled version. For a majority of images (60%), there’s a divided opinion on whether saturation was better than levelling or the other way around.

On the whole, sharpening is a clear win. When in doubt, sharpen images.

For saturation and levelling, there certainly appears to be potential. 2 in 3 images are improved by either of these techniques. But it isn’t entirely obvious which (or both) to apply.

Is there someone out there with some image processing experience to shed light on this?

Shortening sentences

When writing Mixamail, I wanted tweets automatically shortened to 140 characters – but in the most readable manner.

Some steps are obvious. Removing redundant spaces, for example. And URL shortening. I use bit.ly because it has an API. I’ll switch to Goo.gl, once theirs is out.

I tried a few more strategies:

  1. Replace words with short forms. “u” for “you”, “&” for and, etc.
  2. Remove articles – a, an, the
  3. Remove optional punctuation – comma, semicolon, colon and quotes, in particular
  4. Replace “one” with “1”, “to” or “too” with 2, etc. “Before” becomes “Be4”, for example
  5. Remove spaces after punctuations. So “a, b” becomes “a,b” – the space after the comma is removed
  6. Remove vowels in the middle. nglsh s lgbl wtht vwls.

How did they pan out? I tested out these on the English sentences on the Tanaka Corpus, which has about 150,000 sentences. (No, they’re not typical tweets, but hey…). By just doing these, independently, here is the percentage reduction in the size of text:

2.0% Remove optional punctuations – comma, semicolon, colon and quotes
2.2% Remove spaces after punctuations. So “a, b” becomes “a,b”
3.3% Replace words with short forms. “u” for “you”, “&” for and, etc.
3.3% Replace “one” with “1”, “to” or “too” with 2, etc.
6.7% Remove articles – a, an, the
18.2% Remove vowels in the middle

Touching punctuations doesn’t have much impact. There aren’t that many of them anyway. Word substitution helps, but not too much. I could’ve gone in for a wider base, but the key is the last one: removing vowels in the middle kills a whopping 18%! That’s tough to beat with any strategy. So I decided to just stop there.

The overall reduction, applying all of the above, is about 22%. So there’s a decent chance you can type in a 180-character tweet, and Mixamail.com will still tweet it intelligibly.

I had one such tweet a few days ago. I try and stay well within 140, but this one was just too long.

The Lesson: If you’re writing an app (or building anything), find a use for yourself. There’s no better motivation — and it won’t ever be a wasted effort.

That was 156 characters. It got shortened to:

Lesson If u’re writing app (or building anything) find use 4 yourself. There’s no better motivation — & it won’t ever be wasted ef4t.

Perfectly acceptable.

You may notice that Mixamail didn’t have to employ vowel shortening. It makes the most readable shortenings first, checks if it’s within 140, and tries the next only if required.

If anyone has a simple, readable way of shortening Tweets further, please let me know!

HTML5: Up and Running

HTML5: Up and Running is the book version of Mark Pilgrim’s comprehensive introduction to HTML5 at DiveIntoHTML5.org. Whether you buy the book or read it online, it’s the best introduction to the topic you’ll find.

Mark begins with the history of HTML5 (using email archaeology, as he calls it). You’d never guess that many of the problems we have with XHTML, MIME types, etc. have roots in discussions over 20 years ago. From then on, he moves into feature detection (which uses the Modernizr library), new tags, canvas, video, geo-location, storage, offline web apps, new form features and microdata. Each chapter can be read independently – so if you’re planning to use this as a reference, you may be better of reading the links kept up-to-date at DiveIntoHTML5.org. If you’re interesting in learning about the features, it’s a very readable book, terse, simple, and above all, delightfully intelligent.

Incidentally, if you’re starting off on a new HTML5 project, you’re probably best off using HTML5BoilerPlate.com. It’s very actively maintained, and contains some really nifty tricks you can learn like the protocol-relative URL.

Disclosure: I’m writing this post as part of O’Reilly’s blogger review program. While I’m not getting paid to review books, I sure am getting to read them for free.

Modular CSS frameworks

A fair number of the CSS frameworks I’ve seen – Blueprint, Tripoli, YUI, SenCSS – are monolithic. What I’d like is to be able to mix and match specific components of these.

For example, 960.gs has a simple grid system that I’d love to combine with the vertical rhythm that SenCSS offers. (Vertical rhythm ensures that sentences align vertically.) I’d love to have a CSS framework that just sets the fonts, for example, and touches nothing else. Or something that defines the colour schemes, and lets you change the theme like Microsoft Office does.

LessCSS

Less CSS has been invaluable in helping with this. It extends the CSS language without deviating significantly from it. Compared to SASS and CleverCSS, I’d say it has a better chance of getting incorporated as into, say, CSS4.

LessCSS offers variables. I can define a variable:

@foreground: #112233

and use it like this:

h1 { color: @foreground; }
a:hover { background-color: @foreground; }

When I change @foreground, it’s replaced everywhere.

LessCSS offers multiple inheritance.

.highlight { color: red; }
.button { border-radius: 10px; }
.action {
  .highlight;
  .button;
}

This assigns the properties of the highlight and the button classes to the action class. Any changes made to the parents automatically get inherited.

LessCSS has a Javascript pre-processor. So I can include it directly in the HTML, and add the pre-processor, which converts it into CSS.

<link rel="stylesheet/less" href="style.less">
<script src="less.js"></script>

I now use LessCSS as the basis of all new projects.

CSS libraries

My first attempt to consolidate modular CSS libraries is at bitbucket.org/sanand0/csslibs. As far as possible, I’ve tried to avoid creating new libraries, or even tweaking existing ones. Over time, I hope to completely eliminate any new code.

There are two types 2 types of libraries. Some just have variable definitions. Others actually define styles. For example, I’ve got three libraries that just define variables:

color_themes.less

Defines a standard set of color themes (based on the Office 2007 color themes)

font_stacks.less

Defines Web-safe font stacks (based on Sitepoint’s article)

backgrounds.less

Transparent background patterns (randomly useful images)

Including the above libraries will have no effect. You need to explicitly use them. For example:

@import "font_stacks.less";         // Does nothing
h1 { font-family: .font[@serif]; }  // Makes H1 a serif font

The following libraries define styles. Including them will define new classes or change the style of tags / classes.

reset.less

Resets default styles consistently across browsers. I chose YUI3 CSS Reset arbitrarily. I think HTML5boilerplate’s CSS reset may be a better choice, though.

grids.less

Defines classes for fixed and fluid grids. I choose YUI3 CSS Grids over 960.gs (which I’ve used for some years) because of its ability to offer fixed as well as fluid layouts, and the sheer brilliance of its minimality.

lineheight.less

Sets font sizes, ensuring that lines have a vertical rhythm. This is a stripped-down version of SenCSS, but over time, I’ll phase this out and use some standard framework someone comes up with.

Between these, I think the base infrastructure for most applications is in place. What’s required next are widgets. Specifically, I’d like:

  • Buttons. A really good, cross-browser, non-image-based button that offers rounded corners, gradients and borders.
  • Forms. Consistent form styling, without forcing me to use a specific form layout.
  • Icons. A standard icon library with replaceable CSS sprite-sets.

I’ll try keep the code updated as I find these. Do pass me any suggestions you may have.

Make backgrounds transparent

This is the simplest way that I’ve found to make the background colour of an image transparent.

  1. Download GIMP
  2. Open your image. I’ll pick this one:
  3. Optional: Select Image – Mode – RGB if it’s not RGB.
  4. Select Colors – Colors to Alpha…
  5. Click on the white button next to “From” and select the eye-dropper.
  6. Pick the green colour on the image, and click OK

The anti-aliasing is preserved as well.

Shopping with Cooliris

John Lewis jackets scrolling on CoolIris plugin

Zoom-in view of a jacket at John Lewis

I just put together this little demo that scrapes John Lewis’ site and creates a MediaRSS file out of it.

CoolIris has got to be the best way to shop. Apart from being really pretty, it’s quite useful when you know what something looks like, but don’t quite know how to search for it. For example, I was trying to look for a headphone-microphone (you know, the ones that connect into an iPhone or a Blackberry). I didn’t have a clue what it’s called. (TRRS, if you’re interested. I found out later.) The only way I could get it was to browse the wall…

Amazon search for ear microphones on CoolIris

For the curious, here’s the 50-line source code.

ImportHtml doesn’t auto-refresh

A cool thing about Google Spreadsheets is that you can scrape websites using external data functions like importHtml. It’s really easy to use. The formula:

=importHtml("http://www.imdb.com/chart/top", "table", 1)

imports the Internet Movie Database top 250 table on to Google Spreadsheets.

Since you can publish these as RSS feeds, it ought to, in theory, be a great way of generating RSS feeds out of arbitrary content.

There’s just one problem: it doesn’t auto update.

There are claims that it does every hour. Maybe it does when the sheet is open. I don’t know. But it definitely does not when the sheet is closed. I wrote a simple script that logs the time at which the script was accessed, and prints the log every time it is accessed.

#!/usr/bin/env python
 
import datetime, os.path
 
print 'Content-Type: text/plain; charset=utf-8'
print ''
 
logfile = 'timenow.log'
try:    timelog = open(logfile).readlines()
except: timelog = []
timelog.append(str(datetime.datetime.now()) + '\n')
open(logfile, 'w').writelines(timelog)
print ''.join(timelog)

Then I importHtml’ed it into Google spreadsheets, and left it on for the night. Result: absolutely no hits when the document is closed.

Pity. Guess YQL is still the best option.