Wednesday, 19 August 2015

How to communicate between Python and NodeJs

NodeJs is amazing at a lot of things, but one area where it falls short is numerical and scientific computation. Python, on the other hand, is great for stuff like that, and libraries like numpy and scipy make scientific computing a breeze. Fortunately, we can utilize the power of numpy within our node application, by calling a python process to run in the background, do all the dirty work and give back the result.

In this tutorial, we will be using the child_process standard library in nodeJs to spawn a python process which will compute the sum of all elements in an array, using numpy, and return back the result to our node program.

If you want to skip the whole tutorial and just get your hands dirty, copy start.js and compute_input.py into the same directory, and run the command
node start.js
on your terminal.

Lets write the javascript first:

  1. Initialize all our variables
    var spawn = require('child_process').spawn,
        py    = spawn('python', ['compute_input.py']),
        data = [1,2,3,4,5,6,7,8,9],
        dataString = '';
            
    'py' is our spawned python process, which starts the script compute_input.py (which we will write later)
  2. Define what we want to happen once we get data back from the python process:
    /*Here we are saying that every time our node application receives data from the python process output stream(on 'data'), we want to convert that received data into a string and append it to the overall dataString.*/
    py.stdout.on('data', function(data){
      dataString += data.toString();
    });
    
    /*Once the stream is done (on 'end') we want to simply log the received data to the console.*/
    py.stdout.on('end', function(){
      console.log('Sum of numbers=',dataString);
    });
    

  3. Finally, dump our data on to the python process:
    /*We have to stringify the data first otherwise our python process wont recognize it*/
    py.stdin.write(JSON.stringify(data));
    
    py.stdin.end();
    
In the end our javascript code would look like this:
//start.js
var spawn = require('child_process').spawn,
    py    = spawn('python', ['read_input.py']),
    data = [1,2,3,4,5,6,7,8,9],
    dataString = '';

py.stdout.on('data', function(data){
  dataString += data.toString();
});
py.stdout.on('end', function(){
  console.log('Sum of numbers=',dataString);
});
py.stdin.write(JSON.stringify(data));
py.stdin.end();

Now we have to write compute_input.py which is relatively more straigtforward than our node application code

## compute_input.py

import sys, json, numpy as np

#Read data from stdin
def read_in():
    lines = sys.stdin.readlines()
    #Since our input would only be having one line, parse our JSON data from that
    return json.loads(lines[0])

def main():
    #get our data as an array from read_in()
    lines = read_in()

    #create a numpy array
    np_lines = np.array(lines)

    #use numpys sum method to find sum of all elements in the array
    lines_sum = np.sum(np_lines)

    #return the sum to the output stream
    print lines_sum

#start process
if __name__ == '__main__':
    main()
    

And thats all there is to it. Just run start.js on your terminal to verify that the program runs correctly. You should get 45 as the output.

Although for a simple summing operation you're better off sticking to nodeJs itself, for more complex operations(like maybe doing signal processing or finding the frequency spectrum on a series of numbers) its highly advisable to use numpy as the same functionality is just not there in nodeJs (at least not yet). Furthermore, computationally intensive operations will most likely freeze your program, which can be a disaster given the single threaded architecture of node, and they should be moved to their own separate child processes.

How to work with NodeJS and not lose your mind

NodeJS is great! Its fast, its event based, and it uses the same language front-end developers know and love in he server as well. One of the greatest features of nodeJs is its non-blocking nature, which means that operations which seemed expensive before, like reading user input, and database operations, are now not a problem. Unfortunately this is also one of one of the most dangerous aspects of nodeJs as it makes it really easy for developers to write horrible code. The non-blocking IO means that you now rely on callbacks to perform tasks after an operation has occured, which can lead to quite a messy situation.

Lets take a look at a simple example to get to know what exactly I mean.

FYI, In all the snippets below, I use the callback structure of expressJs, since it is the most popular backend framework for nodeJs. As for the database operations, I use the Waterline ORM, which uses the general format of :

 SomeDataBase.find({/*Javascript object to find*/},/*callback after element is found in the database*/);

 SomeDataBase.create({/*Javascript object to insert/create*/},/*callback after element is inserted in the database*/);

 SomeDataBase.update({/*Javascript object to update*/},{/*what to update it with*/},/*callback after element is inserted in the database*/);

Now, back to the example... I want to define a route, which receives a name as one of the request parameters. I want to then search a particular database for that name, update it if it exists, or create a new entry if it doesnt exist.

Lets look at the naive approach first :

var someRoute = function(req, res){
  var name = req.params.name;
  MyDb1.find({name: name}, function(err, data){
      if(data.length > 0){
        MyDb.update({name: name},{updated: 'yes'}, function(err, data){
          console.log('updated');
          res.end('User with name:', name, 'updated from MyDb1');
        });
      } else{
        MyDb.create({name: name}, function(err, data){
          console.log('created');
          res.end('User with name:', name, 'created from MyDb1');
        });
      }
  });
}

Yikes! Not only does this code look horrible to the eye, but its also untestable, and repeats a lot of similar functionality. One thing we could try is to take the callback function out of the create and update operations. It would then look something like this:


var someRoute = function(req, res){
  var name = req.params.name;
  var dbCallback = function(err, data){
          console.log('done');
          res.end('User with name:', name, 'done from MyDb1');
        };
  MyDb1.find({name: name}, function(err, data){
      if(data.length > 0){
        MyDb.update({name: name},{updated: 'yes'}, dbCallback);
      } else{
        MyDb.create({name: name}, dbCallback);
      }
  });
}

Ok, this sort of looks ok, but there are a couple of problems with this approach. First, we dont know whether the entry has been created or updated, and as an admin, its important to me to know the nature of operations taking place on the database. Secondly, we cant use the same callback function for any other database, as the response says 'MyDb1'. What we essentially want in this case is a function which does mostly the same things with only very few different things. Luckily, the first class functions of javascript have got your back!


var giveResponse = function(dbName, type, res){
  return function(err, data){
    console.log(type);
    res.end('User with name:', data.name, type + ' from ' + dbName);
  };
};

var someRoute = function(req, res){
  var name = req.params.name;
  MyDb1.find({name: name}, function(err, data){
      if(data.length > 0){
        MyDb1.update({name: name},{updated: 'yes'}, giveResponse('MyDb1', 'updated', res));
      } else{
        MyDb1.create({name: name}, giveResponse('MyDb1', 'created', res));
      }
  });
  

So this looks quite a bit better than before. We now have a function 'giveResponse' which generates the callback function we want based on the arguments we give it. Take note, 'giveResponse' is not our actual callback function, it simply returns the callback function which does something slightly differently based on the parameters passed to 'giveResponse'. In this case, were passing the name of our database, the type of operation, and our response object, which means we can modify any one of these based on our requirements. One more advantage of this approach is that the callback function is now easily testable, because we can now replace the dbName, type,and res parameters with our own mocks, and test the giveResponse function as a separate unit, something we couldnt do before. Even though this is a major improvement from the previous code snippet, there is still a lot more we can do to improve it looking at future use cases. Take, for example, the process of updating an entry if it exists and creating it if it doesnt. This seems like a fairly common problem, and thus it would be wise to take that functionality and put it into its own unit. This insert/update process actually has its own name, called (unsurprisingly) 'upsert'. Lets move upsert into its own block of code.


var giveResponse = function(dbName, type, res){
  return function(err, data){
    console.log(type);
    res.end('User with name:', data.name, type + ' from ' + dbName);
  };
};

var upsert = function(name, db, dbName, res){
  return function(err, data){
      if(data.length > 0){
        db.update({name: name},{updated: 'yes'}, giveResponse(dbName, 'updated', res));
      } else{
        db.create({name: name}, giveResponse(dbName, 'created', res));
      }
  };
};

var someRoute = function(req, res){
  var name = req.params.name;
  MyDb1.find({name: name}, upsert(name, MyDb1, 'MyDb1', res));
}

Similar to 'giveResponse', 'upsert' is not our callback, but returns another function which is. The reason we cant just use 'giveResponse' and 'upsert' as callbacks directly is because the callbacks for most database operations use the standard function(err, result) format, thus we cannot directly pass on more arguments as we like, but we can pass them on through their 'overlooking' functions. This whole process of returning a different function through another function is known as function currying, and its especially useful for situations like these.

If we wanted to make another route which did a similar upsert on another parameter, all we would have to add would be :


var someOtherRoute = function(req, res){
  var name = req.params.name;
  MyDb2.find({name: name}, upsert(name, MyDb2, 'MyDb2', res));
}

Hopefully now dealing with the increasing number of callbacks and async operations won't be as much of a pain as it was originally. Of course, there is no such thing as the 'best' solution to deal with this kind of callback hell, and there are many, many more solutions (like the async library, promises and ES6 generators) to make your life easier. The one thing in my opinion to keep in mind, regardless of the method you use, is to follow the DRY (dont repeat yourself) principle, so that the same functionality, or functionality that is likely to be used again, is not isolated from the rest of the code, and can be called easily as and when required.

Thursday, 6 August 2015

Understanding the modern front end web application project structure


Most people starting their journey on web development don't really pay much attention to their project structure. This is because its not really necessary, and one can easily get away by putting a bunch of html, css, and javascript files in a single folder, and linking them together. However, once you start developing more complex web applications requiring multiple frameworks and libraries, you will quickly find that this single folder structure will not cut it, and without proper organization, adding new features to your project becomes a nightmare.

If you explore any popular repo on Github, you will most likely see a bunch of folders called 'lib', 'dist', 'app', 'public', 'fonts', and also a bunch of weird files like 'bower.json' and 'package.json' which don't have any apparent relation to the project itself. "Why are all these files and folders there? Why am I seeing anything other than html, css and js files?" is what I thought to myself when I was introduced to my first professional project, and a transition from a single folder to an organized structure can definitely be a bit confusing, so here is my attempt to explain as simply as possible, what each file and folder is doing in your project and what exactly is its purpose in life

The directory structure shown here is the standard yeoman web project structure.
YourAppName
|
--app
| |
| --index.html
| |
| --favicon.ico
| |
| --robots.txt
| |
| --scripts
| |
| --styles
| |
| --images
|
--dist
|
--test
|
--.tmp
|
--node_modules
|
--bower_components
|
--bower.json
|
--package.json
|
--Gruntfile.js

app

this is where all you application code goes. Literally all of it. All the html, css, and javascript that you will be writing for your web application will be contained in this single folder.
Go Back

favicon.ico

A 'favicon' is the little picture that appears on the title bar of your website (just next to the name on the tab title bar in your browser) Although a favicon is not necessary, it helps to add a bit of professionalism to your webpage, and also gives the user a visual cue about the identity of your webpage.
Go Back

images

Self explanatory. All the images used for your web app go here.
Go Back

index.html

The starting point of your web app. this is the first page that users will see when they navigate to your webpage. Incase you are (most likely) using one of the many MVC fronted frameworks (like angular, ember, react, etc) then your index.html file will mostly be empty, only containing script tags and style tags, to load all your javascript and css for the web app. In this case, most of the markup (html) for your application will be either dynamically generated at runtime, or contained in another folder called [[views]]
Go Back

robots.txt

A file to determine what kinds of users can access your app. You can mostly ignore this file, unless you're *really* curious.
Go Back

scripts

As the name suggests, this is where all the scripts for your application go. The number of different ways you can organise your javascript files deserves an entire blog post on its own, but, in general, try to keep your javascript as modular as possible. This means that each separate script file should aim to do only one thing, and should be really good at doing that one thing only, and nothing else. For example, you should have one script file that deals with fetching data from the server, another script file for any dynamic rendering of DOM elements, another script file for any sorting functionality on your web app, another script file for any complex mathematical calculations you would want to do, etc. This is of course an over simplification and each piece of logic I just mentioned can further be broken down into separate pieces of logic.
Go Back

styles

All your stylesheets go here. Each component or widget in your app should ideally have a stylesheet of its own. This helps with naming and version control(git) and is also useful for fellow developers to recognise where exactly the style for each element in your application is contained. You should be especially careful when naming css classes in your applications because that name is then applied globally. This becomes a major cause of concern as your application grows because of naming conflicts. Fortunately, there are many guidelines to solve this problem, and you should develop the habit of following these guidelines from day 1.
Go Back

bower_components

This folder contains all the external libraries and frameworks that are used for your app. Bower is a tool which helps you manage external libraries and dependancies required by your app. For example, if you want to download and install jquery, all you have to do is type bower install jquery on your terminal, and the source files for the jquery library will be downloaded and available in this folder. This folder should not be committed to your repo.
Go Back

node_modules

Contains all the NodeJS dependencies required by your project. Whenever you type npm install on your terminal, the node dependencies get installed inside this folder. This folder should not be committed to your repo because its generally really heavy in terms of space. You can just ignore this folder
Go Back

test

This folder contains all the tests for your app, which include unit tests as well as end to end (e2e) test cases. Testing your applications source code is extremely important, not just because it helps you catch bugs early on, but also because it forces you to make your code modular and maintainable, and also gives you confidence that your code won't break as long as your test cases pass.
Go Back

dist

As the name suggests, your distribution, or 'dist' folder is the folder which ultimately gets served to the user on production. Why do we need the dist folder? Because, serving our applications code as is in the [[.tmp]] folder is very inefficient and slow for the end user in terms of network performance. Because of this, the code in the app folder goes through a number of processes, such as [[concatenation]] and [[minification]] in order to make the network performance as fast as possible. This code is generally unreadable and is meant to be deployed once the code in the app folder is thoroughly tested. For all practical purposes, you should just leave this folder alone.
Go Back

.tmp

The contents of this folder is what you're actually going to view when you open your app on the browser. If all you are using is html, css and javascript, then the contents of your .tmp folder will be exactly the same as your app folder. But with the ever increasing number of build tools, templating languages, and module loading frameworks, this is rarely the case. You can mostly ignore this folder, but if you've ever made a change to the contents of your app folder and can't see them appear in the browser, you would most likely refer to this folder to check if your changes have actually been built.
Go Back

Gruntfile.js

The grunt file is the file that describes the Grunt tasks that are going to be run. Frontend task runners is a broad topic and may require a whole [[tutorial]] on its own. In a nutshell, grunt does for you the boring repetitive tasks that would be a pain to do otherwise, like copying all your files from your app folder to your .tmp folder, compiling jade files into html files, concatenating and minifying your javascript, and even linting your code to make sure you don't make any silly errors. Although grunt is the most popular task runner as of now, its worth mentioning that there are a lot of better and faster task runners like gulp or broccoli, so you may want to consider checking them out before you begin with grunt.
Go Back

bower.json

All bower components that get installed are described in bower.json, additionally, the versioning information for each dependancy is also described. To add a bower component to bower.json you can either manually edit this file or add the dependancy directly during installation by adding a --save to the installation command (for example, to install jQuery, you would do bower install jquery —save. The latter way is recommended because there is a lesser chance of error, and also, the latest version of whatever you're trying to install will be added automatically to bower_components and bower.json
Go Back

package.json

Similar to [[bower.json]] except that these are you npm dependancies that go into the node_modules folder. Overtime you want to add a new node module, you should use the --save flag
Go Back