Year: 2006

Making a Tamil transliterator

I’ve built a simple Tamil transliterator. You can type in words in English and it will spell them out in Tamil. You can copy-paste the Tamil above into Microsoft Word, etc.

You may need to turn on tamil scripts to see the Tamil fonts above. If you have Windows 98, it may not work well. If you’ve visited this page recently, you will need to refresh this page as well (press F5).

Browse through my Javascript to see how it works. Feel free to reuse.

I’ve also made a Google Gadget that searches Google in Tamil using this tool.

Here’s what to type:

Tamil English
a
A or aa
i
I or ee
u
U or oo
e
E
ai
o
O
au
k or g
n
ch or s
j
n
t or d
N
th or dh
n
p or b
m
y
r
l
v
zh
L
R
sh
S
h

I also have a gadget that lets you search in Tamil.

Tamil song lyrics quiz 1

Here are words from the middle of 10 songs. Can you guess which movie they are from? (Films are NOT repeated)

Don’t worry about the spelling. Just spell it like it sounds, and the box will turn green.

If the Tamil lyrics are not OK, turn on tamil scripts. It’s worth it.

The movies also have a common theme that’s very easy to guess.

Search for the song and listen online, if you want to confirm your guess.

Score: 0 / 10

en vaazhkkai nadhiyil karai onRu kaNdEn

en nenjil yEdhO kaRai onRu kaNdEn


Erikkaraiyil jOdip paRavai ellaa azhagum aanandham

aadum kadalinil Odum padagugaL adhilE ulagam aarambam


idhayam enbadhu sadhaidhaan enRaal eRithazhal thinRuvidum

anbin kaRuvi idhayam enRaal saavai venRuvidum


kaadhal koNdEn kanavinai vaLarthEn

kaNmani unai naan karuththinil niraiththEn


kannathil muththaththin eeRam adhu kaaya villaiyE

kaNgaLil En indha kaNNeer adhu yaaraalE


kanRu kutti thuLLum pOdhu kaalil enna kattuppaadu

kaalam ennai vaazhththum pOdhu aasaik kenna kattuppaadu


manmadha ambugaL thaiththa idangaLil

sandhanamaay ennai poosugiren


neeyO vaanam vittu maNNil vandha thaarakai

naanO yaarum vandhu thangich chellum maaLigai


thaay madiyil piRandhOm thamizh madiyil vaLarndhOm

nadigarena vaLarndhOm naadagaththil kalandhOm


undhan udhattil niraindhirukkum pazharasam

andha vanaththil maraindhirukkum thuLi visham

Calculating IRR

Recently, I was helping a bank define Basel 2 requirements.

For every dollar a bank lends, at least 8 cents should come from its own pocket, and the rest from its depositors. But a risky $1 loan may be like a $1.5 loan, whereas a $1 Government loan may be like a $0.5 loan. This is the “risk-weighted asset” (RWA) value. Basel 2 says 8% of risk-weighted assets should come from the bank’s pocket.

I was trying to convince the people who were maintaining the leasing software that the RWA of a lease is the NPV of its future cash flows, and they had a whole lot of questions.

“What is this NPV?”

You can put 90 cents in the bank today at 11% and get $1 next year. So $1 next year is worth 90 cents today. When a customer pays $100 over the next 10 months, it’s worth less than $1000 today. That’s the NPV. The NPV is what you put in the bank today to get that cash flow: $100 over the next 10 months

“So why should we use NPV for leases?”

That’s because when a lease is cancelled, the closure payment is the NPV. If you take a lease for 10 months at $100 a month, this includes the interest. If you terminate the lease after 5 months, you won’t pay $500 for the remaining 5 months. You’ll pay less — the NPV of those $100 for 5 months. So there is some logic to using NPV as the RWA.

“OK, so how do we calculate the NPV?”

You divide each cash flow by (1 + r)^n, where r is the internal rate of return, and n is the number of years. Then you add them up. You’ll get a number less than the sum of cash flows.

“And how do we calculate this IRR?”

(sheepishly) The IRR is that interest rate for which the NPV is zero.

And we got stuck here, because their software didn’t have an IRR function, and the definitions for IRR and NPV are circular.

To do this in Excel is simple. Just enter the cash flow values. So, if on a cash loan of $500, you paid $100 for 6 months, and use the IRR function, as shown below. Your monthly IRR is 5.47%.

But we needed their AS/400 system to do it as well, and it didn’t have the IRR function.

After a few weeks of digging around, I found a paper that said you can calculate the IRR iteratively. Let

  • npv be the NPV given an IRR and cash flows
  • sum be the sum of cash flows
  • p be the principal amount

Then irr = irr * log(p/sum) / log(npv/sum) is the iteration you need to successively apply.

We decided to start with 1.85 times the stated interest rate (which was a pretty good guess for most leases), and kept applying this formula until it stays more or less the same. Worked like a charm.

Here’s the spreadsheet with the calculation.

Statistically improbable phrases

Calvin and Hobbes has some recurrent themes, like Hobbes pouncing, snow art, polls, letters to Santa, …

Over the last 5 years, I’ve transcribed the Calvin and Hobbes comics, and tagged them manually by theme. But can I generate themes automatically?

One way is to use Amazon’s statistically improbable phrases. It’s a list of words that occur a lot in a book, but rarely occur in others. It gives you a good feel of what topics the book is about.

Here’s how I did it:

  1. Transcribe Calvin & Hobbes. This is 99% of the work.
  2. Make a C&H word list. Just join all the words in Calvin and Hobbes. (Be careful about punctuation, and colloquialisms like “dunno”, “leggo”, etc.)
  3. Get an English corpus. That is, get a big list of words in normally occurring text. I have some e-books, and I picked 23 megabytes worth of these as my corpus.
  4. Compare the word frequency in C&H with the corpus. That is, compare the % of occurrences of a word in Calvin and Hobbes versus the corpus.
  5. Display those with significantly higher frequency in C&H.

The list below has common Calvin & Hobbes words occurring 10 times as often as in normal text. It’s incredible how closely it relates to most of the themes.

(Big words occur more often. Dark words are more improbable.)


allowance assignment babe balloon bat bath beanie bedtime bee beep bet bike blaster boring bug bus butter calvin calvinball cartoon cent cereal cheat chew chocolate click comic cookie crunch dad dame derkins dictator-for-life dinosaur disgusting doll doomed dumb duplicate earthling explorer fang fearless ferocious flip flush frog frosted fun fuzzy genius goggle goodness goon grade gross grown-up gum hack hamburger hamster hate hero hideous hobbes homework huey insect invent jelly jerk jurassic kid leaf loot martian math mild-mannered mom monster moron motto munch mushy nickel oatmeal ouija pant peanut perspective pit playground poll porridge poster quiz recess rosalyn rotten rub sandwich santa scary sculpture scum shovel
sissy sitter sled slimy slug slushball sniff snow snowball snowman soak spaceman spiff splash spoil sport squirt steer sting stuffed stupendous sugar susie tickle tiger toy transmogrifier transmogrify tub tuna twinky tyrannosaur underwear vacation weird wham whiff worm wormwood


Summary: “Statistically improbable phrases” are a powerful tool for text analysis. You can apply it on any content and figure out what topics it talks about.

Update: Technically, these are “Statistically improbable WORDS”, not phrases. So I re-did this analysis using phrases instead of words.