Improving real-time collaboration on Prowork with NodeJs
We are working to improve “real-time collaboration” on Prowork by pushing project updates, notifications and chat just as they happen. Changes by other project members should be effected real-time in your time-line without the need for a refresh. In order to accomplish this, we are adding NodeJS to the existing architecture.
Why NodeJS?
We needed to implement a server that can push notifications (messages and other data as well) to connecting clients. Currently, clients poll for notifications (and messages) at intervals of few seconds. Building a push system on PHP/Apache (our core platform) is however complicated. The best option is Comet. Unfortunately, Comet doesn’t scale on Apache because of the threaded architecture and running a separate comet-based server is complicated as well. NodeJs on the other hand “uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, perfect for data-intensive real-time applications that run across distributed devices”. Just perfect!
Adding NodeJs to the existing architecture
Adding NodeJs to the existing PHP/Apache architecture wasn’t as difficult as anticipated. All we had to do was put NodeJs on a different port so it can run concurrently with Apache. We then have Varnish on the top layer to distribute traffic between both servers accordingly.
Setting up the daemon
Running a [node] js script is normally done via the command
node script.js
But that is only good for development and not production. Once you leave the terminal, the script terminates. In production you don’t want this. You also want node to pick up your script and execute it when the server restarts. For this, enter Forever, a node module to keep your NodeJs server up.
The other things
You will run into bugs here and there with your application. My advice is to test very well. Outputting debug info in different parts of your code with console.log really helps. One experience I had was with streams from http.get.
getData(function(data) {
  console.log(data);
});
var getData = function(callback) {
  var options = {
    host: '..',
    path: '..'
  };
  
  var req = http.get(options, function(res) {
    res.on('data', function (chunk) {
      console.log('Data: '+chunk);
      return callback(JSON.parse(chunk));
    });
  });
  req.on('error', function(e) {
    console.log('Problem: ' + e.message);
    return callback(false);
  });
}
The console log in http.get printed the right data while the callback from the function printed something along <Buffer [some characters]. It took me a while to figure out res.on(‘data’) returns the data stream in chunks. The appropriate way is to collate the data till finished before sending it back to the callback.
var getData = function(callback) {
  // ...
  var result = "";
  
  var req = http.get(options, function(res) {
    res.on('data', function (chunk) {
      result += chunk;
    }); 
    res.on('end', function() {
      callback(JSON.parse(result));
    });
  });
  req.on('error', function(e) {
    console.log('Problem: ' + e.message);
    return callback(false);
  });
}
Back to Prowork
We are still testing the improved real-time system in-house and will push it live once we are satisfied with it. If you are building an event driven application involving lots of I/O operations, ignore the rants and go for NodeJs.