S Anand

Calculating IRR

Recently, I was helping a bank define Basel 2 requirements.

For every dollar a bank lends, at least 8 cents should come from its own pocket, and the rest from its depositors. But a risky $1 loan may be like a $1.5 loan, whereas a $1 Government loan may be like a $0.5 loan. This is the “risk-weighted asset” (RWA) value. Basel 2 says 8% of risk-weighted assets should come from the bank’s pocket.

I was trying to convince the people who were maintaining the leasing software that the RWA of a lease is the NPV of its future cash flows, and they had a whole lot of questions.

“What is this NPV?”

You can put 90 cents in the bank today at 11% and get $1 next year. So $1 next year is worth 90 cents today. When a customer pays $100 over the next 10 months, it’s worth less than $1000 today. That’s the NPV. The NPV is what you put in the bank today to get that cash flow: $100 over the next 10 months

“So why should we use NPV for leases?”

That’s because when a lease is cancelled, the closure payment is the NPV. If you take a lease for 10 months at $100 a month, this includes the interest. If you terminate the lease after 5 months, you won’t pay $500 for the remaining 5 months. You’ll pay less — the NPV of those $100 for 5 months. So there is some logic to using NPV as the RWA.

“OK, so how do we calculate the NPV?”

You divide each cash flow by (1 + r)^n, where r is the internal rate of return, and n is the number of years. Then you add them up. You’ll get a number less than the sum of cash flows.

“And how do we calculate this IRR?”

(sheepishly) The IRR is that interest rate for which the NPV is zero.

And we got stuck here, because their software didn’t have an IRR function, and the definitions for IRR and NPV are circular.

To do this in Excel is simple. Just enter the cash flow values. So, if on a cash loan of $500, you paid $100 for 6 months, and use the IRR function, as shown below. Your monthly IRR is 5.47%.

But we needed their AS/400 system to do it as well, and it didn’t have the IRR function.

After a few weeks of digging around, I found a paper that said you can calculate the IRR iteratively. Let

  • npv be the NPV given an IRR and cash flows
  • sum be the sum of cash flows
  • p be the principal amount

Then irr = irr * log(p/sum) / log(npv/sum) is the iteration you need to successively apply.

We decided to start with 1.85 times the stated interest rate (which was a pretty good guess for most leases), and kept applying this formula until it stays more or less the same. Worked like a charm.

Here’s the spreadsheet with the calculation.

Statistically improbable phrases

Calvin and Hobbes has some recurrent themes, like Hobbes pouncing, snow art, polls, letters to Santa, …

Over the last 5 years, I’ve transcribed the Calvin and Hobbes comics, and tagged them manually by theme. But can I generate themes automatically?

One way is to use Amazon’s statistically improbable phrases. It’s a list of words that occur a lot in a book, but rarely occur in others. It gives you a good feel of what topics the book is about.

Here’s how I did it:

  1. Transcribe Calvin & Hobbes. This is 99% of the work.
  2. Make a C&H word list. Just join all the words in Calvin and Hobbes. (Be careful about punctuation, and colloquialisms like “dunno”, “leggo”, etc.)
  3. Get an English corpus. That is, get a big list of words in normally occurring text. I have some e-books, and I picked 23 megabytes worth of these as my corpus.
  4. Compare the word frequency in C&H with the corpus. That is, compare the % of occurrences of a word in Calvin and Hobbes versus the corpus.
  5. Display those with significantly higher frequency in C&H.

The list below has common Calvin & Hobbes words occurring 10 times as often as in normal text. It’s incredible how closely it relates to most of the themes.

(Big words occur more often. Dark words are more improbable.)


allowance assignment babe balloon bat bath beanie bedtime bee beep bet bike blaster boring bug bus butter calvin calvinball cartoon cent cereal cheat chew chocolate click comic cookie crunch dad dame derkins dictator-for-life dinosaur disgusting doll doomed dumb duplicate earthling explorer fang fearless ferocious flip flush frog frosted fun fuzzy genius goggle goodness goon grade gross grown-up gum hack hamburger hamster hate hero hideous hobbes homework huey insect invent jelly jerk jurassic kid leaf loot martian math mild-mannered mom monster moron motto munch mushy nickel oatmeal ouija pant peanut perspective pit playground poll porridge poster quiz recess rosalyn rotten rub sandwich santa scary sculpture scum shovel
sissy sitter sled slimy slug slushball sniff snow snowball snowman soak spaceman spiff splash spoil sport squirt steer sting stuffed stupendous sugar susie tickle tiger toy transmogrifier transmogrify tub tuna twinky tyrannosaur underwear vacation weird wham whiff worm wormwood


Summary: “Statistically improbable phrases” are a powerful tool for text analysis. You can apply it on any content and figure out what topics it talks about.

Update: Technically, these are “Statistically improbable WORDS”, not phrases. So I re-did this analysis using phrases instead of words.

We feel fine

We feel fine analyses blog posts, and determines the current mood by gender, age and location. So you can see check if the mood of teenagers in London has improved this week. Amazing visualisations.

Facts and Fallacies in Software Engineering

Facts in Software Engineering

People

  1. The most important factor in software work is the quality of the programmers.
  2. The best programmers are up to 28 times better than the worst programmers.
  3. Adding people to a late project makes it later.
  4. The working environment has a profound impact on productivity and quality.

Tools and Techniques

  1. Hype (about tools and techniques) is the plague on the house of software.
  2. New tools/techniques cause an initial loss of productivity/quality.
  3. Software developers talk a lot about tools, but seldom use them.

Estimation

  1. One of the two most common causes of runaway projects is poor estimation.
  2. Software estimation usually occurs at the wrong time.
  3. Software estimation is usually done by the wrong people.
  4. Software estimates are rarely corrected as the project proceeds.
  5. It is not surprising that software estimates are bad. But we live and die by them anyway!
  6. There is a disconnect between software management and their programmers.
  7. The answer to a feasibility study is almost always “yes”.

Reuse

  1. Reuse-in-the-small is a well-solved problem.
  2. Reuse-in-the-large remains a mostly unsolved problem.
  3. Reuse-in-the-large works best for families of related systems.
  4. Reusable components are three times as hard to build, and should be tried out in three settings.
  5. Modification of reused code is particularly error-prone.
  6. Design pattern reuse is one solution to the problems of code reuse.

Complexity

  1. For every 25 percent increase in problem complexity, there is a 100 percent increase in solution complexity.
  2. Eighty percent of software work is intellectual. A fair amount of it is creative. Little of it is clerical.

Requirements

  1. One of the two most common causes of runaway projects is unstable requirements.
  2. Requirements errors are the most expensive to fix during production.
  3. Missing requirements are the hardest requirements errors to correct.

Design

  1. Explicit requirements “explode” as implicit (design) requirements for a solution evolve.
  2. There is seldom one best design solution to a software problem.
  3. Design is a complex, iterative process. Initial design solutions are usually wrong, and certainly not optimal.

Coding

  1. Designer “primitives” (solutions they can readily code) rarely match programmer “primitives”.
  2. COBOL is a very bad language, but all the others (for business applications) are so much worse.

Error-removal

  1. Error-removal is the most time-consuming phase of the life cycle.

Testing

  1. Software is usually tested at best at the 55-60 percent (branch) coverage level.
  2. 100 percent coverage is still far from enough.
  3. Test tools are essential, but many are rarely used.
  4. Test automation rarely is. Most testing activities cannot be automated.
  5. Programmer-created, built-in, debug code is an important supplement to testing tools.

Reviews/Inspections

  1. Rigorous inspections can remove up to 90 percent of errors before the first test case is run.
  2. But rigorous inspections should not replace testing.
  3. Post-delivery reviews (some call them “retrospectives”) are important, and seldom performed.
  4. Reviews are both technical and sociological, and both factors must be accommodated.

Maintenance

  1. Maintenance typically consumes 40-80 percent of software costs. It is probably the most important life cycle phase of software.
  2. Enhancements represent roughly 60 percent of maintenance costs.
  3. Maintenance is a solution, not a problem.
  4. Understanding the existing product is the most difficult task of maintenance.
  5. Better methods lead to MORE maintenance, not less.

Quality

  1. Quality IS: a collection of attributes.
  2. Quality is NOT: user satisfaction, meeting requirements, achieving cost/schedule, or reliability.

Reliability

  1. There are errors that most programmers tend to make.
  2. Errors tend to cluster.
  3. There is no single best approach to software error removal.
  4. Residual errors will always persist. The goal should be to minimize or eliminate severe errors.

Efficiency

  1. Efficiency stems more from good design than good coding.
  2. High-order-language code can be about 90 percent as efficient as comparable assembler code.
  3. There are tradeoffs between size and time optimization.

About Research

  1. Many researchers advocate rather than investigate.

Fallacies in Software Engineering

About Management

  1. Fallacy: You can’t manage what you can’t measure.
  2. Fallacy: You can manage quality into a software product.

People

  1. Fallacy: Programming can and should be egoless.

Tools and Techniques

  1. Fallacy: Tools and techniques: one size fits all.
  2. Fallacy: Software needs more methodologies.

Estimation

  1. Fallacy: To estimate cost and schedule, first estimate lines of code.

Testing

  1. Fallacy: Random test input is a good way to optimize testing.

Reviews

  1. Fallacy: “Given enough eyeballs, all bugs are shallow”.

Maintenance

  1. Fallacy: The way to predict future maintenance cost and to make product replacement decisions is to look at past cost data.

About Education

  1. Fallacy: You teach people how to program by showing them how to write programs.

These are from Robert Glass’ book Facts and Fallacies in Software Engineering.

Multicriteria decision making

Decisions are usually based on multiple criteria. You have to trade off between criteria. I’ve been involved many such decisions over the last 5 years.

Example 1: A conglomerate wanted to identify industries for growth. We shortlisted 19 industries, identified 12 criteria for the attractiveness of an industry, researched each one and plotted them on spidergraphs like below.

Spidergraph for Industry 1 Spidergraph for Industry 2

The intention was that, to identify the most favourable industries, you’d just pick the ones with the largest filled area.

Example 2: Another time, we had to decide among BPO vendors. Again, we picked a bunch of criteria and compared vendors against these criteria.

Spidergraph for BPO Vendor 1 Spidergraph for BPO Vendor 2

Example 3: Once, we had to identify stakeholders’ position on a project.

Change readiness profile for Dave Change readiness profile for Uli

Those who were big on the right of the graph were for, and those who were big on the left were against.


In all the above cases, the same process was used for decision making.

  1. List criteria exhaustively
  2. Evaluate options against each criteria
  3. Assign weights to criteria (equal weights implicitly assigned above)
  4. Compare options

Having applied this methodology it several times, I am convinced this process is fundamentally flawed. See how in this post: Errors in multicriteria decision making.