S Anand

Moderating marks

Sometimes, school marks are moderated. That is, the actual marks are adjusted to better reflect students’ performances. For example, if an exam is very easy compared to another, you may want to scale down the marks on the easy exam to make it comparable.

I was testing out the impact of moderation. In this video, I’ll try and walk through the impact, visually, of using a simple scaling formula.

BTW, this set of videos is intended for a very specific audience. You are not expected to understand this.

Rough transcript

First, let me show you how to generate marks randomly. Let’s say we want marks with a mean of 50 and a standard deviation of 20. That means that two-thirds of the marks will be between 50 plus/minus 20. I use the NORMINV formula in Excel to generate the numbers. The formula =NORMINV(RAND(), Mean, SD) will generate a random mark that fits this distribution. Let’s say we create 225 students’ marks in this way.

Now, I’ll plot it as a scatterplot. We want the X-axis to range from 0 to 225. We want the Y-axis to range from 0 to 100. We can remove the title, axes and the gridlines. Now, we can shrink the graph and position it in a single column. It’s a good idea to change the marker style to something smaller as well. Now, that’s a quick visual representation of students’ marks in one exam.

Let’s say our exam has a mean of 70 and a standard deviation of 10. The students have done fairly well here. If I want to compare the scores in this exam with another exam with a mean of 50 and standard deviation of 20, it’s possible to scale that in a very simple way.

We reduce the mean from the marks. We divide by the standard deviation. Then multiply by the new standard deviation. And add back the new mean.

Let me plot this. I’ll copy the original plot, position it, and change the data.

Now, you can see that the mean has gone down a bit — it’s down from 70 to 50, and the spread has gone up as well — from 10 to 20.

Let’s try and understand what this means.

If the first column has the marks in a school internal exam, and the second in a public exam, we can scale the internal scores to be in line with the public exam scores for them to be comparable.

The internal exam has a higher average, which means that it was easier, and a lower spread, which means that most of the students answered similarly. When scaling it to the public exam, students who performed well in the interal exam would continue to perform well after scaling. But students with an average performance would have their scores pulled down.

This is because the internal exam is an easy one, and in order to make it comparable, we’re stretching their marks to the same range. As a result, the good performers would continue getting a top score. But poor performers who’ve gotten a better score than they would have in a public exam lose out.

Server speed benchmarks

Yesterday, I wrote about node.js being fast. Here are some numbers. I ran Apache Benchmark on the simplest Hello World program possible, testing 10,000 requests with 100 concurrent connections (ab -n 10000 -c 100). These are on my Dell E5400, with lots of application running, so take them with a pinch of salt.

PHP5 on Apache 2.2.6
<?php echo “Hello world” ?>
1,550/sec Base case. But this isn’t too bad
Tornado/Python
See Tornadoweb example
1,900/sec Over 20% faster
Static HTML on Apache 2.2.6
Hello world
2,250/sec Another 20% faster
Static HTML on nginx 0.9.0
Hello world
2,400/sec 6% faster
node.js 0.4.1
See nodejs.org example
2,500/sec Faster than a static file on nginx!

I was definitely NOT expecting this result… but it looks like serving a static file with node.js could be faster than nginx. This might explain why Markup.io is exposing node.js directly, without an nginx or varnish proxy.

Why node.js

I’ve moved from Python to Javascript on the server side – specifically, Tornado to Node.js.

Three years ago, I moved from Perl to Python because I got free hosting at AppEngine. Python’s a cleaner language, but that was not enough to make me move. Free hosting was.

Initially, my apps were on AppEngine, but that wouldn’t work for corporate apps, so I tried Django. IMHO, Django’s too bulky, has too much “magic”, and templates are restrictive. Then I tried Tornado: small; independent modules; easy to learn. I used it for almost 2 years.

The unexpected bonus with Tornado was it’s event-based model: it wouldn’t wait for file or HTTP requests to be complete before serving the next request. I ended up getting a fair bit of performance from a single server.

Trouble is, Python’s a rare skill. I tried selling Python in corporates a couple of times, and barring RBS (which used it before I came in, and made it really easy for me to build an IRR calculator), I’ve failed every time. Apart from general fear, uncertainty and doubt, getting people is tougher.

Javascript’s a good choice. It has many of Python’s benefits. It’s easy to recruit people. Corporates aren’t terrified of it. Rhino was good enough a server. All it lacked was the “cool” factor, which node.js has now brought it. And besides,

  • It’s fast. About 20 times faster than Rhino, by my crude benchmarks.
  • It’s stable. (Well, at least, it feels stable. Rock solid stable. Sort of like nginx.)
  • It’s asynchronous. So I don’t miss Tornado
  • It has a pretty good set of libraries, thanks to everyone jumping on to it
  • I can write code that works on the client and server – e.g. form validation

Bye, Python.

Mapping PIN codes

I haven’t found an open or reliable database providing the geo-location of Indian PIN codes. That’s a bother if you’re creating geographic mash-ups. The closest were commercial sources:

  • a PIN code directory from the Postal Training Centre for Rs. 2,000, which probably just contains a list of PIN codes, and
  • a PIN code map from MapMyIndia for Rs. 1,00,000, whose quality I’m not sure of. (I spoke to one of their sales representatives who mentioned that the data was gathered via companies such as Coca Cola, using their local distribution knowledge, perhaps GPSs.)

Crowd-sourcing this might help. Here’s a site where you can map the location of any PIN code you know:

pincode.datameet.org

For example, if you knew the exact location of the PIN code 110083 (which happens to be Mongolpuri in New Delhi), just go to http://pincode.datameet.org/IN/110083 and move the marker to where it should be.

I’ve initially populated the data from GeoNames. Arun has offered OpenStreetMap data. If you know of any sources that we could use, please let me know. And if you want to use the data, feel free. It’s CC licensed. You can check out the source on github too.

Software update

Time for the annual update on software I use. This time, I’ve got Wakoopa to help me with the relative usage as well. Here’s the top 100 software / web apps I’ve used recently, and how long I spent on them.

  1. Gmail 186361 seconds
  2. Notepad++ 130641 seconds
  3. Google Chrome 79879 seconds
  4. GitHub 43780 seconds
  5. Windows Command Prompt 40967 seconds
  6. Microsoft Excel 32578 seconds
  7. Microsoft Word 27067 seconds
  8. Microsoft PowerPoint 27059 seconds
  9. Windows Explorer 20902 seconds
  10. Google Docs 17989 seconds
  11. Foxit Reader 17001 seconds
  12. Microsoft Outlook 15855 seconds
  13. Internet Explorer 15830 seconds
  14. Google Search 15616 seconds
  15. Skype 14423 seconds
  16. Media Player Classic 14159 seconds
  17. Google Groups 7061 seconds
  18. Google Calendar 5531 seconds
  19. Wesabe 2814 seconds
  20. Google Analytics 2665 seconds
  21. TeamViewer 1985 seconds
  22. RGui 1875 seconds
  23. LinkedIn 1528 seconds
  24. YouTube 1400 seconds
  25. Stack Overflow 1167 seconds
  26. Acrobat Connect 964 seconds
  27. Kongregate 914 seconds
  28. HTML Help 871 seconds
  29. PicPick 790 seconds
  30. Zoundry Raven 684 seconds
  31. Mockingbird 657 seconds
  32. Twitter 655 seconds
  33. iStockphoto 590 seconds
  34. 7-Zip 584 seconds
  35. Buzznet 552 seconds
  36. Inkscape 516 seconds
  37. Bitbucket 499 seconds
  38. Microsoft Visio 496 seconds
  39. Paint.NET 474 seconds
  40. IrfanView 461 seconds
  41. Tableau Public 436 seconds
  42. µTorrent 435 seconds
  43. HandBrake 422 seconds
  44. Check Point Endpoint Security 411 seconds
  45. Windows Task Manager 385 seconds
  46. Microsoft Project 372 seconds
  47. IETester 347 seconds
  48. Google Maps 340 seconds
  49. eBay 310 seconds
  50. Spokn 270 seconds
  51. Firefox 269 seconds
  52. Google Calendar Sync 259 seconds
  53. Windows Calculator 247 seconds
  54. PayPal 246 seconds
  55. JsonView 220 seconds
  56. Windows Live Writer 184 seconds
  57. Junction Link Magic 152 seconds
  58. WinDirStat 142 seconds
  59. Kindle 139 seconds
  60. XAMPP 127 seconds
  61. Wakoopa 105 seconds
  62. Dropbox 100 seconds
  63. Office Help Viewer 99 seconds
  64. PrimoPDF 94 seconds
  65. PuTTY 84 seconds
  66. Python 80 seconds
  67. Flavors.me 75 seconds
  68. Google Sites 71 seconds
  69. Process Explorer 70 seconds
  70. Windows Volume Control 63 seconds
  71. Wikipedia 58 seconds
  72. Nitro PDF Reader 57 seconds
  73. Management Console 47 seconds
  74. PythonWin 45 seconds
  75. Windows Based Script Host 45 seconds
  76. WinDiff 45 seconds
  77. VLC Media Player 39 seconds
  78. ClipX 35 seconds
  79. Windows Installer 35 seconds
  80. The Internet Movie Database 32 seconds
  81. ImageShack 31 seconds
  82. WordPad 25 seconds
  83. TeraCopy 22 seconds
  84. Skype Portable 22 seconds
  85. Picasa Web Albums 20 seconds
  86. Syncplicity 17 seconds
  87. Google Reader 16 seconds
  88. Google Talk 15 seconds
  89. VirtualDub 12 seconds
  90. Adobe Manager 10 seconds
  91. FreeCall 10 seconds
  92. Notepad 8 seconds
  93. Codebase 5 seconds
  94. eTrust ITM 5 seconds
  95. Google Checkout 5 seconds
  96. GDI++ Tray Notifier 5 seconds
  97. ImgBurn 2 seconds
  98. Virtual Desktop Manager 2 seconds
  99. Tesseract201 2 seconds
  100. TortoiseHg 0 seconds

HTML 4 & 5: The complete Reference

HTML-4-and-5-The-Complete-ReferenceHTML 4 & 5: The Complete Reference is an iPhone / iPad app that does exactly what it says: a reference for HTML 4 and 5.

It has a list of all tags, clearly demarcated as HTML4, HTML5 or both. The application is fairly easy to scroll through to find the tag or attribute you want. Clicking on a tag, you get:

  • a brief description of what it’s for
  • what attributes are valid – the good part is you can see clearly which attributes are specific to the element, and which ones are common (like class, id, etc.). You can also see the possible values for the attribute, which helps.
  • and an example of how the tag is used. The examples are quite simplistic, and there’s only one per tag, but it does have a rendered version of the code, which helps.

You can also scroll through the list of attributes and see which tags they’re valid for.

The part that quite interested me was the “characters” or HTML entities. Quite often, I’d want the pound (£) or right angle quotes (»), but wouldn’t know the character or entity reference. So far, I’ve been using this HTML entity reference to search for characters, where I can just type in the word (e.g. pound or quote) and it filters the list to show what I want. I was really hoping to see that on the app, but was disappointed. It lets you search, but it’s not search as you type. And the result points you to a section that contains the character – not directly to the character. (It’s a bit difficult to find the character in the longer sections).

There’s also a section where you can see elements by “task” – e.g. Forms, Link-related, Document Setup, Interaction, etc. This is a pretty useful break-up if you’re looking for the right element for the job, or browsing for interesting new elements to discover in HTML5. (I found the <menu> and <command> tags this way.

You do have the option of just downloading the PDF version of the HTML5 spec and reading it in iBooks, of course. So while I find the book useful, without a search-as-you-type feature, I suspect it won’t do much for my speed of looking up things, so I’ll just stick to the spec for the moment.

Disclosure: I’m writing this post as part of O’Reilly’s blogger review program. While I’m not getting paid to review the app, I did get it for free.

Visualising student performance 2

This earlier visualisation was revised based feedback from teachers. It’s split into two parts: one focused on performance by subject, and another on performance of each student.

Students’ performance by subject

Visualisation by subject

This is fairly simple. Under each subject, we have a list of students, sorted by marks and grouped by grade. The primary use of this is to identify top performers and bottom performers at a glance. It also gives an indication of the grade distribution.

For example, here’s mathematics.

Student scores in a subject

Grades are colour-coded intuitively, like rainbow colours. Violet is high, Red is low.

Colour coding of grades 

The little graphs on the left show the performance in individual exams, and can be used to identify trends. For example, from the graph to the left of Karen’s score:

A single student's score

… you can see that she’d have been an A1 student (the first two bars are coloured A1) but for the dip in the last exam (which is coloured A2).

Finally, there’s a histogram showing the grades within the subject.

Histogram of grades

Incidentally, while the names are fictitious, the data is not. This graph shows a bimodal distribution and may indicate cheating.

Students’ performance

Visualisation by student 

This is useful when you want to take a closer look at a single student. On the left are the total scores across subjects.

Visualisation of total scores

Because of the colour coding, it’s easy to get a visual sense of a performance across subjects. For example, in the first row, Kristina is having some trouble with Mathematics. And on the last row, Elsie is doing quite well.

To give a better sense of the performance, the next visualisation plots the relative performance of each student.

Visualisation of relative performance

From this, it’s easy to see that Kristina is the the bottom quarter of the class in English and Science, and isn’t doing to well in Mathematics either. Gretchen and Elsie, on the other hand, are consistently doing well. Patrick may need some help with Mathematics as well. (Incidentally, the colours have no meaning. They just make it overlaps less confusing.)

Next to that is the break-up of each subject’s score.

Visualisation of score break-up

The first number in each subject is the total score. The colour indicates the grade. The graph next to it, as before, is the trend in marks across exams. The same scores are shown alongside as numbers inside circles. The colour of the circle is the grade for that exam.

In some ways, this visualisation is less information-dense than the earlier visualisation. But this is intentional. Redundancy can help with speed of interpretation, and a reduced information density is also less intimidating to first-time readers.

Google search via e-mail

I’ve updated Mixamail to access Google search results via e-mail.

For those new here, Mixamail is an e-mail client for Twitter. It lets you read and update Twitter just using your e-mail (you’ll have to register once via Twitter, though).

Now, you can send an e-mail to twitter@mixamail.com with a subject of “Google” and a body containing your query. You’ll get a reply within a few seconds (~20 seconds on my BlackBerry) with the top 8 search results along with the snippets.

It’s the snippets that contain the useful information, as far as I’m concerned. Just yesterday, I managed to find the show timings for Manmadan Ambu at the Ilford Cine World via a search on Mixamail. (Mixamail win, but the movie was a let down, given expectations.)

You don’t need to be registered to use this. So if you’re ever stuck with just e-mail access, just mail twitter@mixamail.com with a subject “Google”.

PS: The code is on Github.

Visualising student performance

I’ve been helping with visualising student scores for ReportBee, and here’s what we’ve currently come up with.

class-scores

Each row is a student’s performance across subjects. Let’s walk through each element here.

The first column shows their relative performance across different subjects. Each dot is their rank in a subject. The dots are colour coded based on the subject (and you can see the colours on the image at the top: English is black, Mathematics is dark blue, etc.)

class-scores-2

The grey boxes in the middle shows the quartiles. A dot on the left side means that the student is in the bottom quartile. Student 30 is in the bottom quartile in almost every subject. The grey boxes indicate the 2nd and 3rd quartiles. Dots on the right indicate the top quartile.

This view lets teachers quickly explain how a student is performing – either to the headmistress, or parents, or the student. There is a big difference between a consistently good performer, a consistently poor performer, and one that is very good in some subjects, very poor in others. This view lets the teachers identify which type the student falls under.

For example, student 29 is doing very well in a few subjects, OK is some, but is very bad at computer science. This is clearly an intelligent student, so perhaps a different teaching method might help with computer science. Student 30 is doing badly in almost every subject. So the problem is not subject-specific – it is more general (perhaps motivation, home atmosphere, ability, etc.) Student 31 is consistently in the middle, but above average.

class-scores-3

The bars in the middle show a more detailed view, using the students’ marks. The zoomed view above shows the English, Mathematics and Social Science marks for the same 3 students (29, 30, 31). The grey boxes have the same meaning. Anyone to the right of those is in the top quarter. Anyone to the left is in the bottom quarter.

Some of bars have a red or a green circle at the end

class-scores-5

The green circle indicates that the student has a top score in the subject. The red circle indicates that the student has a bottom score in the subject. This lets teachers quickly narrow down to the best and worst performers in each subject.

The bars on top of the subjects show the histogram of students’ performances. It is a useful view to get a sense of the spread of marks.

class-scores-4

For example, English is significantly biased towards the top half than Mathematics or Science. Mathematics has main “trailing” students at the bottom, while English has fewer, and Social Science has many more.

Most of this explanation is intuitive, really. Once explained (and often, even when not explained), they are easy to remember and apply.

So far, this visualisation answers descriptive questions, like:

  • Where does this student stand with respect to the class?
  • Is this student a consistent performer, or does his performance vary a lot?
  • Does this subject have a consistent performance, or does it vary a lot?

We’re now working on drawing insights from this data. For example:

  • Is there a difference between the performance across sections?
  • Do students who perform well in science also do well in mathematics?
  • Can we group students into “types” or clusters based on their performances?

Will share those shortly.

What does India search for?

Over the last couple of years, I’ve been tracking the top 5 hot searches in India on Google Trends (http://www.google.co.in/trends). Here are the results:

If you’re interested in making visualisations out of it, please feel free. But there’s one particular thing I’m trying out, which is to categorise these searches and see if there’s a trend around that. I’ve added a “Tag” column.

Could you please help me tag the spreadsheet: https://spreadsheets.google.com/ccc?key=0Av599tR_jVYgdE5zTU5QWjcxVWVCaTBuY3d0NkUtc1E&hl=en_GB

It’s publicly editable, no special access required. If you could stick to the tags I already have (Business, Education, Entertainment, News, Politics, Sports, Technology), that would be great. If not, that’s fine as well.

And if you’ve made any visualisations or done any analysis using this data, please do drop a comment.