When writing Mixamail, I wanted tweets automatically shortened to 140 characters – but in the most readable manner.
Some steps are obvious. Removing redundant spaces, for example. And URL shortening. I use bit.ly because it has an API. I’ll switch to Goo.gl, once theirs is out.
I tried a few more strategies:
- Replace words with short forms. “u” for “you”, “&” for and, etc.
- Remove articles – a, an, the
- Remove optional punctuation – comma, semicolon, colon and quotes, in particular
- Replace “one” with “1”, “to” or “too” with 2, etc. “Before” becomes “Be4”, for example
- Remove spaces after punctuations. So “a, b” becomes “a,b” – the space after the comma is removed
- Remove vowels in the middle. nglsh s lgbl wtht vwls.
How did they pan out? I tested out these on the English sentences on the Tanaka Corpus, which has about 150,000 sentences. (No, they’re not typical tweets, but hey…). By just doing these, independently, here is the percentage reduction in the size of text:
2.0% |
Remove optional punctuations – comma, semicolon, colon and quotes |
2.2% |
Remove spaces after punctuations. So “a, b” becomes “a,b” |
3.3% |
Replace words with short forms. “u” for “you”, “&” for and, etc. |
3.3% |
Replace “one” with “1”, “to” or “too” with 2, etc. |
6.7% |
Remove articles – a, an, the |
18.2% |
Remove vowels in the middle |
Touching punctuations doesn’t have much impact. There aren’t that many of them anyway. Word substitution helps, but not too much. I could’ve gone in for a wider base, but the key is the last one: removing vowels in the middle kills a whopping 18%! That’s tough to beat with any strategy. So I decided to just stop there.
The overall reduction, applying all of the above, is about 22%. So there’s a decent chance you can type in a 180-character tweet, and Mixamail.com will still tweet it intelligibly.
I had one such tweet a few days ago. I try and stay well within 140, but this one was just too long.
The Lesson: If you’re writing an app (or building anything), find a use for yourself. There’s no better motivation — and it won’t ever be a wasted effort.
That was 156 characters. It got shortened to:
Lesson If u’re writing app (or building anything) find use 4 yourself. There’s no better motivation — & it won’t ever be wasted ef4t.
Perfectly acceptable.
You may notice that Mixamail didn’t have to employ vowel shortening. It makes the most readable shortenings first, checks if it’s within 140, and tries the next only if required.
If anyone has a simple, readable way of shortening Tweets further, please let me know!