As a software engineer for a company that generates analytics for advertisements, one technology that we leverage heavily is Node.js, which serves as a collection service for incoming status events when a user is watching a video. We had initially written the code that is doing this task a few years ago, when the newest stable version of Node.js was 0.10.18, and the “Express” module was in the 3.x branch. I’ve been looking to update our code to newer modules, and newer binaries, but updating doesn’t always mean a performance boost. I’ve read a lot about benchmarks comparing 0.10 and 0.12, and had seen people find 0.10 beating out the newer binaries, which fuels my caution about upgrading blindly into Node.js 4.1.0. Additionally, I have also been considering trying other web service modules, besides Express. Koa and Hapi have been getting attention, and I’ve seen benchmarking between all three, but I have yet to see a cross comparison with different versions of Node.js. I wanted to find out what branch of Node.js works the best with each module. Being that we have hundreds of Node.js processes behind load balancers, together handling 100s of million events per day, any minor change in performance could make a huge difference. If I’m going to make a change, I really want to be sure that it is a well informed one.
Since this means there are 240 tests to run (4 Node x 4 Modules x 5 Payloads x 3 runs of each test), and I’d have to switch the Node.js binaries a lot, making a script manage the testing was a must. I am using nvm to switch the Node.js environment, and ab (apachebench) to do the actual network testing.
First, I created a fresh m3.xlarge EC2 instance with more than enough CPU and ram, running Amazon Linux. Next, I wrote very simple scripts for each of the web modules, and made sure they work with both the newest and oldest version of Node I will be testing. Each script just opens HTTP on port 5000, and returns data of a desired payload size.
var s = '1234567890'; var s1k = s.repeat(100); var s10k = s.repeat(1000); var s100k = s.repeat(10000); var s1000k = s.repeat(100000); var s10000k = s.repeat(1000000); var koa = require('koa'); var app = koa(); var route = require('koa-route'); app.use(route.get('/0k', function*() { this.body = ''; })); app.use(route.get('/1k', function*() { this.body = s1k; })); app.use(route.get('/10k', function*() { this.body = s10k; })); app.use(route.get('/100k', function*() { this.body = s100k; })); app.use(route.get('/1m', function*() { this.body = s1000k; })); app.use(route.get('/10m', function*() { this.body = s10000k; })); var server = app.listen(5000, function() { console.log('Koa is listening to http://localhost:5000'); });
Once those are all working, I wrote a bash script to “for loop” through all combinations of our tests, setting the version of Node.js as necessary. Run each test three times, and store the output of ab into a .csv file, and the standard output of ab into a .log file.
#!/bin/bash export NVM_DIR="/home/ec2-user/.nvm" [ -s "$NVM_DIR/nvm.sh" ] && . "$NVM_DIR/nvm.sh"# This loads nvm ABN=10000 ABC=125 bench () { nvm use $2 >/dev/null || exit; node $1/$1.js --MARKER > /dev/null 2>&1 & sleep 2 PID=$(ps aux|grep MARKER|grep -v grep|awk '{print $2}') if [ "$PID" = "" ]; then echo"Testing '$3' on $1 with Node $2, failed. Could not start process." return 1; fi name="node-${1}_${2}_${3}_${ABN}_${ABC}" rm$name* 2>/dev/null echo Node $(node --version) >> $name.log for r in {1..3}; do echo"Testing $3 using $1 with Node $2, run $r" name2="${name}_run${r}"; ab -n $ABN -c $ABC -e $name2.csv http://localhost:5000/$3 >> $name.log 2> /dev/null; cat$name2.csv >> $name.csv rm$name2.csv 2> /dev/null sleep 1; done kill$PID return 0 } for t in 1m 100k 10k 1k 0k 10m ; do for m in express4 hapi koa; do for v in 0.10.40 0.12.7 4.0.0 4.1.0; do bench $m$v$t done done done
This script took a few hours to run through all tests, and when complete I was left with a pile of csv and log files. Next I created another script to collect all of the CSV files, average the three runs of each test, and combine all of the results into a single .csv file.
var fs = require('fs'); var table={}; var files = fs.readdirSync('./'); var col=1;
for(var i in files) { if (files[i].substr(-4, 4) != '.csv' ) continue; if (files[i].indexOf('node-') == -1 ) continue;
var name = files[i].replace('.csv','').replace('node-',''); var nameParts = name.split('_'); var service = nameParts[0]; var nodeVer = nameParts[1]; var test = nameParts[2]; var abr = nameParts[3]; var abc = nameParts[4];
if (!table[0]) table[0] = {}; if (!table[0][0]) table[0][0] = 'Percentage served'; table[0][col] = name;
fs.readFileSync(files[i]).toString().split('\n').forEach(function (line) { var lineParts = line.split(','); // Split CSV if (!lineParts[1] ) return; // Discard if if no second column if (lineParts[0]*1 < 60) return; // Discard if percent under 60 if (isNaN(lineParts[0])) return; // Discard if not a number
var rowNum = lineParts[0] * 1 + 1; // Scoot data down a row to allow for header
if (!table[rowNum]) table[rowNum] = {}; if (!table[rowNum][0]) table[rowNum][0] = lineParts[0];
// If not existing, set it, otherwise, add then divide by two if (!table[rowNum][col]) { table[rowNum][col] = lineParts[1] * 1; } else { table[rowNum][col] = (table[rowNum][col] + lineParts[1] * 1) / 2 } }); col++; } for(var r in table) { var csvRow=''; for (var c in table[r]){ var val = table[r][c]; if ( !isNaN(val) ) { val = Math.round(val * 100) / 100 } else { val = '"' + val + '"' } csvRow += val + ','; } console.log(csvRow); }
Now with all that complete, I can finally start looking at my results. I found that when comparing Node.js 4.0.0 vs 4.1.0, Express 4 vs 5, 0k vs 1k vs 10k, the results were most often, near identical, so I am excluding 4.0.0, 0k, 1k, and Express 5-alpha.
Percentage of the requests served within a certain time (ms):
Payload: 10k Best performing: Koa on 4.1.0, with Express (all Node.js versions) close behind. Worst performing: Hapi on all Node.js binaries
Payload: 100k Best performing: Koa on 4.1.0. Worst performing: Hapi on 0.12.7
Payload: 1000k / 1m Best performing: Hapi on 0.10.40, then Koa 4.1.0 Worst performing: Express on all Node.js binaries
Payload: 10000k / 10m Best performing: Hapi on all Node.js binaries Worst performing: Express on all Node.js binaries
I have to say that the results are somewhat puzzling. Koa and Express clearly performed very well when the payload was 10k or 100k, but when testing 1m or 10m, they did poorly. The reverse was true for Hapi, which I had initially thought would remain a poor performer. All in all, since we only return 1-50k responses, I think we will stick with Express, but Koa does remain interesting. Upgrading Node from 0.10 to 4.1 didn’t look like it will cause any performance hit, though using 0.12 would in most cases.
Upgrading Node, sticking with Express!
For the curious and for archival purposes, here is the XLS file with my results. node_benchmark.xlsx