Module Core - Session 1 - Hello World in Node.js

In our learning systems, we learn top-down, meaning that we will setup an overriding goal to serve as a compass in order to lead us toward what we need to learn.

My goal is pretty straightforward. I'm building out this site (appfromscratch.com) as an online learning platform for the purpose of teaching you to build your dream apps. So I'm using that as the compass, i.e. you'll see the site appfromscratch.com built on in front of your eyes.

Obviously - the premise of the program is that you can utilize it to build out your dream app as well, so be sure to keep your goal in mind, and we'll get to them for sure.

Prerequisites

In this Node.js course, we'll assume that you have prior programming experience in either JavaScript or another programming language. If your experience is in another language besides JavaScript, you'll find that JavaScript is generally easy to pickup. But we have a separate module for learning JavaScript.

If your prior experience is in server programming in another programming language, that's great, as there will be a lot of domain knowledge in network, HTTP, and web you'll be able to reuse. However, this knowledge isn't a pre-requisite, so you might find some of the content being a refresher for you, since we'll teach them to people who haven't had prior experience.

Installation

Node.js is a runtime environment for JavaScript outside of browser. For the longest time JavaScript has been limited to inside the browsers, and that made people who knew only JavaScript unable to build applications outside of the browsers with their knowledge. While Node.js isn't the only JavaScript runtime that can do this, it's currently the most popular.

If your prior experience is in another language besides JavaScript - you will be pretty used to having to install the development environment, however, if you have primarily only dealt with JavaScript inside the browsers, then this will be a new step for you - we'll have to download and install the Node.js runtime.

Linux

[ fill in ]

OSX

[ fill in ]

Windows

Go to http://nodejs.org. Go to downloads. Select Windows installer (chances are you want the 64bit version). Download it and install it according to the wizard steps.

The Windows Node.js installer installs a Node.js specific terminal (called Node.js Command Prompt). You can use it for development since it's setup with the necessary environment settings for Node.js, though you can of course also setup your own environments.

Search for Node.js terminal by typing in Node in the search field of the task bar. You should see Node.js Command Prompt come up. Click to open it.

To test whether your installation is successful, open up the terminal (command prompt on windows), and then type in:

> node --version

If you see a version number printed, you have installed node successfully.

Also test for npm to ensure that it has been installed:

> npm --version

Hello world

Now that we have everything we need to get started, let's do what we do best - write a "Hello world" Node.js app.

Open your favorite code editor, create a new file called helloworld.js, and type in the following:

console.log('hello world');

And then save it. Now run:

> cd <the folder of the helloworld.js> # if you aren't already there.
> node helloworld.js

NOTE - the < and > should also be removed when you substitute your folder, i.e. if your folder is c:\Users\john\files, then you should type cd c:\Users\john\files.

Voila - you have just printed out hello world on screen, and outside of a browser. Congrats on your first Node.js program!

When in browser, console.log prints to the JavaScript console. Since Node.js is outside of the browser, it prints to the standard output stream, which is attached to the terminal's screen when running the script inside the terminal. This is one of the "big deals" of Node.js, it unshackles JavaScript from the browser, so that you can now use JavaScript for programming tasks that previously would have required you to know another programming language, like writing quick scripts to automate some system tasks, or writing the backend of your web app, which is something that Node.js is specifically designed for.

Why don't we try to do that? Let's try to convert our small script that prints to the terminal into a web server that prints out a web page that says "Hello world"!

Let's rewrite the same script into the following with your code editor.

var http = require('http');
var server = http.createServer(function(req, res) {
    res.statusCode = 200;
    res.end('hello world');
});
server.listen(8000, function(e) {
    console.log('server running...');
});

Save it, and let's run the script again:

> node helloworld.js

You should now see server running... printed in the terminal. Open up a browser, and type in http://localhost:8000 into the location. You should now see hello world showing up in your browser. Yup, you have just written your first Node.js web server!

So, what happened? Let's look at the above line by line.

The first line - var http = require('http'); defines a variable called http with the value of the function call require('http').

So what does require('http') do? It loads the module named 'http' and returns the result from the module.

For the longest time, JavaScript didn't have a way to organize functionalities into modules for reuse purposes. Node.js more or less defined and enabled the way to do so for server-side JavaScript, and as a result today you have npm as the largest software package repository in the world. There is a lot of code living in npm waiting for you to discover them and use.

The http package is a built-in package in Node, i.e. it's automatically available to you without you having to download it separately (via npm - we will discuss how later) - you just need to require it. You can find the reference documentation for Node's built-in packages on Node's website. It's written in a more as a reference instead of as a learning material, so while you might find it intimidating, outside of the actual source code, it'll be the final arbitrator of the information.

You'll find that spartan documentations are the norm in Node.js land, if not completely missing sometimes. Unfortunately this is just the way it is at this time, as things are moving blazing fast. Some technologies got phased out by the next best thing before it has a chance to mature and gain the necessary documentations. This is one of the drawbacks of the Node.js platform as of 2017. As the ecosystem matures it will improve, but right now you'll need to expect to read the source code of the packages you are using, even the best supported ones.

The http.createServer function takes a callback function as the parameter, and returns a HttpServer object, which you can then use to listen to the port number (in this case 8000) for incoming HTTP requests. It also takes a callback to be called when listen is done.

Wait - what is a callback? And why are we using them?

To talk about this, we need to quickly talk about another big reason for Node's success, and that is the use of asynchronous IO.

Asynchronous IO, What does it mean?

In most programming environments, IO (input / output) is synchronous. For example, if we want to open a file in C (it's okay if you don't know C, this is just for illustrative purpose):

File *file;
file = fopen(<the_file_path>, "r");
// do what we need to do with the file...

What happens is that fopen won't return until the file is opened by the operating system, so if we are reading from the file in the next statement:

if (file) {
    while ((c = getc(file)) != EOF) {
    // process the character read.
    }
}

Our reading function getc (reads a character) will also wait until we have gotten the character from the file before it returns.

This is a "synchronous IO" programming model. The IO functions will "block" (meaning, not returning before it finishes) so they behave uniformly with non-IO functions:

// add is a regular, non-IO function
function add(a, b) {
    return a + b;
}
var x = add(1, 2);
var y = add(x, 3); // this line won't run until the previous line has returned.

The nice thing about functions is that they have a very simple execution model - the next line won't run until the current line is done. Synchronous IO functions basically preserve this property by not returning until the IO function is done.

This is what it would look like if we have synchronous IO functions for reading files in Node.js.

var data = fs.readFileSync(<the_file_path>); // this function exists, by the way.
console.log(data); // runs after fs.readFileSync returns.

The problem with Synchronous IO

The advantage of synchronous IO is largely in that it's very, very simple to understand what's going on, as the next line runs only when the current line is done, so you know for sure that if you want to process the data you read in, you just need to write it down in the following line.

However, this advantage comes at a cost - a large performance cost.

It turns out that IO is very, very expensive in terms of time, because the IO system (files, networks, etc) is just so much slower when comparing to manipulating things that are already in memory (incidentally, if you still don't have an SSD, get one, it's the best investment you can make for your computer).

And it's idle time too. Your program isn't doing any additional processing while waiting for the IO to return. It just sits there doing nothing.

Of course, the computer isn't doing nothing. A modern OS knows when your program isn't doing anything, and will put the CPU to work on other programs if you have multiple programs running. This is called multitasking.

Many powerful programs have been built with this approach. They leverage the OS to run multiple processes to keep the simple execution model.

Of course, processes don't come cheap. These days some processes can get up to hundreds of megabytes. Even with gigabytes of memory (and virtual memory) there is a upper limit of how many processes we can run.

A more advanced way is, instead of letting OS creating multiple processes, we'll create multiple threads within the same process. This way we will be able to make better use of the memory, and yet continue to keep the same execution model.

While this does alleviate the issue of multiple processes, it turns out that even threads are not cheap. There is still quite a bit of memory needed for multiple system threads, and if you are doing non-system, user-level threads, then you'll need to do more manual bookkeeping yourself via explicitly passing control to another thread, which is error prone.

Asynchronous IO

So, Node's approach is to turn this model upside down. Instead of trying to keep the same simple execution model, we switch over to a different execution model. Instead of waiting until the IO function returns, we do the following instead:

  1. we "register" a callback with the async IO function
  2. the async IO function starts the IO process and then immediately returns.
  3. when the IO process finishes, the callback function will be called with the result (or the error).

With this approach, our code can continue to stay busy if there are more processing to do. However, it complicates the way we need to write the code.

For example, instead of reading file synchronously as follows:

var data = fs.readFileSync(<the_file_path>);
console.log(data);

We can no longer write out the next step on the next line, because in the async IO model, the async IO function returns as soon as it registers the callback and triggers the IO process, without waiting for the IO process to finish. i.e.,

var data = fs.readFile(<the_file_path>); // notice this is not the sync function.
console.log(data); // data is not available here. we will reach this line before the data is actually read by the IO process.

Instead, we must write the async version differently.

fs.readFile(<the_file_path>, function callMeWhenDone(err, data) {
  console.log(data);
});

The function callMeWhenDone is the callback function we are registering for this particular IO process. When the IO process is done, callMeWhenDone will be run.

Advantage and Disadvantage of Async IO

The advantage of async IO is that our program is now able to keep processing without waiting for the IO calls to finish. This is an especially beneficial property for server applications, as server applications are predominated by IO accesses. This is one of the reasons why Node.js is a powerful platform for server applications comparing to the traditional platforms, even when JavaScript itself isn't as performant as some of the other programming languages.

But, it should be obvious that the disadvantage is that the loss of a simplistic execution model is a big deal, especially for beginners who don't yet have a strong grasp on which functions are IO-based vs non-IO-based.

Even when you are no longer a beginner, it's easy to see that when the code gets complex, it can get ugly really quickly.

// sync version to read 5 files.
var data1 = fs.readFileSync(<file_1>);
var data2 = fs.readFileSync(<file_2>);
var data3 = fs.readFileSync(<file_3>);
var data4 = fs.readFileSync(<file_4>);
var data5 = fs.readFileSync(<file_5>);
...

Async version of reading the files.

fs.readFile(<file1>, function(e1, data1) {
  fs.readFile(<file2>, function (e2, data2) {
    fs.readFile(<file3>, function (e3, data3) {
      fs.readFile(<file4>, function (e4, data4) {
        fs.readFile(<file5>, function (e5, data5) {
          ...
        })
      })
    })
  })
})

This is called callback hell, and is one of the biggest drag of writing Node.js applications.

Some people find this issue to be unbearable and therefore don't want to work with Node.js at all, your mileage may vary of course. Every platform has issues, and this is one of them for Node.js. I'm talking about its warts early so that you don't have to invest deeply if you don't want to, but obviously there are enough developers who decide to continue in Node.js to reap its benefits even given all of its warts, to the point that it now has the largest user contributed code repository of any platform.

So, why didn't Node.js choose any other approaches listed above and instead settled on the asynchronous IO approach?

It's because JavaScript's programming model is single threaded. I.e. the multi-threaded approach wouldn't work without the JavaScript runtime changes, and the multi-process approach doesn't scale (it can still work, but wouldn't have given Node.js the massive advantage that it enjoys to leapfrog the competitions).

So, instead of having a custom JavaScript engine, Node.js embraces the status quo of JavaScript, and develops a programming model that works with it.

There are solutions to the callback hell mentioned above. We will look at them in future lessons, though you'll still need to understand the callback model as there are a lot of Node.js code out there that only provides the callback model (including the Node's built-in API).

Interestingly, Node.js helped popularize the concept of asynchronous programming model, to the point that other programming languages are now also adopting this model as well. In the future this concept might be so prevalent that I won't need to explain it anymore.

Coming back to our code.

Alright - asynchronous IO does takes some time to explain. Let's come back and look at our mini web application again.

var http = require('http');
var server = http.createServer(function(req, res) {
  res.statusCode = 200;
  res.end('hello world');
});
server.listen(8000, function(e) {
  console.log('server running...');
});

As we said, Node.js embraces the asynchronous IO model throughout its built-in API, and that means working with callbacks.

The first callback is passed to the http.createServer function. The callback takes 2 parameters, one called req (short for request), and the other called res (short for response) by convention. This function is called whenever there is an HTTP request to the server, i.e. whenever you refresh your browser, this function will be called.

The second callback is passed to the server.listen function. This callback takes an error parameter in the case that listen failed.

These are the two main types of callbacks you'll encounter in Node.js programming.

The first type of callbacks happens due to an external event. i.e. when an HTTP request is made by a user, or when the user clicked on something. They might be called multiple times, but they might not be called at all if there are no events. They are used to model "event-driven programming".

The second type of callback happens because the function you are calling itself will need to return results. The function call is itself the "event", and it's only going to return once. This type of callbacks is used to model sequential computation.

Since the first type of callbacks might or might not happen, when it happens it might pass arbitrary values, so it's up to the event to define the parameters. But one thing it won't include is an error parameter, unless the event to be modeled happens to be an error event.

In contrast, the second type of callback needs to include an error parameter, because just like the results being returned asynchronous, we will also need to return the error asynchronously since otherwise we cannot capture the error returned from the IO process, i.e. try catch doesn't work for asynchronous errors.

try {
  fs.readFile(<file_to_be_read>, function (err, data) {
    // ...
  })
} catch (e) {
  // the error the can be caught here are only synchronous.
}

So, handling error is also more complex in asynchronous code as well, since there are now two possible ways of error being thrown.

Synchronous errors still can exist, especially in a language like JavaScript, where you don't know if you have a typo that cause an error until you run the code.

The best way to overcome the issue right now is to practice, which is what we will do with the upcoming lessons. For now you have accomplished the first goal - writing the hello world app, as well as understanding the asynchronous IO issues regarding Node.js programming. We'll tackle more in the upcoming lessons.