Asynchronous programming notes

How to use this?

  1. read the last line of the disclaimer: read, but do your own research...;
  2. jump to the Problem section;
  3. move to the Issues section;
  4. finalize with the Techniques section;

Disclaimer

This is a work in progress!

First of all I am an amateur NodeJs programmer --- the largest piece of code being 3k lines long but not production ready --- and worst a novice JavaScript programmer. Thus I ask the reader to take every statement with a grain of salt and doubt, and urge him to follow the references and make his own judgment on the matter. However to my defense I must add that I've been programming for 10 years mostly in Java, and for about 3 years I'm moving slowly towards Erlang for network programming, Go for system programming and backend web services, NodeJs for web frontends, and of course C for low-level "heavy lifting"; plus I've played a lot with Scheme and Python. As such I would say I'm not a stranger to various programming techniques, moreover to the subject of asynchronous programming I have tried to bring it into the Java ecosystem (which retrospectively was a huge failure due to the Java syntax inflexibility)...

Secondly my view is completely biased on the lines like: NodeJs should remain single-threaded and callback based; code transformation works only when the language actually supports it (think Lisp or Scheme macros); promises / futures (just like "full" OOP) doesn't fit into the JavaScript landscape; deep and "hairy" library dependency graphs are bad; reinventing the wheel is not a crime (see Reinvent The Wheel!).

In a few words: read, but do your own research and judge afterwards!

Context

Initially I've got interested in this subject by reading and then replying to a thread [NodeJs-1] on the NodeJs mailing list; then I've started digging into the mailing list archive to get a full picture on the subject:

  • what is the general consensus on the topic; (i.e. none so far, then at least,)
  • what are the potential issues;
  • what are the best practices;

Moreover my interest in the subject transcends the NodeJs area, as I've tried to apply the "NodeJs-way" of asynchronous callbacks to Java, and suffered a major failure --- which I blame on the fact that Java's inflexible syntax disallows any form of creativity -- that made me think twice about the usefulness of deeply asynchronous code. On the other hand it seems that I wasn't the only one to try this Java heresy, as a similar framework recently emerged under the name of Vert.x, and the asynchronous techniques have already been applied in Red5 <http://www.red5.rg/>_ and the frameworks Apache MINA and Netty.

As such, focusing strictly on NodeJs, I've started looking in the best place for information (and disinformation for that matter) on the subject: the NodeJs mailing list. After roaming through endless debates and flame-wars, selecting the most "to the point" and informative emails, I've manage to form a (subjective) opinion, at least from a philosophical point of view. Then I've moved to read the various works pointed by the some emails, which are mainly small tutorials, articles or blog posts, but quite informative; most of these works promote one particular library (usually the poster's own one), or advocated for sticking with the bare minimum. Thus in the end I've also taken a quick look over these libraries, focusing mostly on "what" they have to offer, and less on "how".

The problem

In the current work I focus mainly on how to deal with the peculiarities of an asynchronous run-time environment like NodeJs's. And by "deal" I mean "how to write code" as a developer "stuck" in such an environment, thus I'm not questioning the aspects of the environment, I just take them as a given that can't be touched.

Specificity

Before actually describing the problem, I must remark the fact that such a subject can't be discussed in complete isolation of the targeted run-time environment --- like the Circuits approach is suggesting --- because what makes sense in a particular programming language it might be impossible in another. For example:

  • we might argue that writing asynchronous code is possible in any programming language, starting from assembly (not quite a programming language) and ending with Prolog; but disregarding the particularities of the language and environment makes the solution crumble --- like trying it in Java whose syntax slaps your hands at each keystroke --- or worse it just doesn't fit --- like Erlang or Go which already have a good way to deal with concurrency, i.e. actors, and channels respectively;
  • the emerging solution is highly dependent on the available features; for example in Lisp or Scheme you have the power of macros that allow code transformation as a natural part of the code, meanwhile in Java's OOP world you're most likely going to approach it with futures than with callbacks, or in C most likely you'll end up with event loops;
  • then the solution must take into account the "customs" of the community or else you'll get bashed as a "non-believer" --- the words "that is not Pythonic" or "that is not the Python way" pop up each time someone doesn't have (or want to give) a concrete argument as a replacement for it, or an embodiment of the "ultimate" answer;

Therefore I will focus solely on the subject of asynchronous code writing in the NodeJs run-time and the JavaScript language --- although from what I've seen it could be translated easily to other JavaScript replacements, like CoffeeScript, that keep intact its semantic.

Examples

Similar to the Chinese proverb, nothing is better at describing something than an example; and in our case some simple contrived examples of how the code looks in "plain" NodeJs, to better grasp the "how to deal" part of the problem.

File loading example

We'll take the simple case of loading the contents of a file and passing its whole content to a function that will hopefully do something useful. Although the example is almost meaningless in the real world as the functionality is already available one callback away (see the second code snippet), it is both simple to understand and exposes the issues we might encounter if we are forced to implement similar solutions ourselves. In order to better see the intricacies of asynchronous programming, I'll start with the "high-level" variant, and move backwards towards a low-level implementation, and then to alternative solutions.

First by using the synchronous blocking API --- thus a quick and dirty shortcut, a technique not advisable in all cases:

var fs = require ("fs");

function load (path, onSuccess) {
    var data = fs.readFileSync ("/etc/services");
    onSuccess (data);
}

load ("/etc/services", function (data) { console.log (data.length); })

Then by using the caned solution:

var fs = require ("fs");

function load (path, onSuccess) {
    fs.readFile (path, function (error, data) {
        if (error) throw (error);
        onSuccess (data);
    });
};

load ("/etc/services", function (data) { console.log (data.length); })

Afterwards by using a "nested" callbacks variant with the low-level POSIX API:

var fs = require ("fs");

function load (path, onSuccess) {

    fs.open (path, "r", function (error, file) {
            if (error) throw (error);
            var buffers = [];
            var loop = function () {
                    var buffer = new Buffer (1024);
                    fs.read (file, buffer, 0, buffer.length, null, function (error, read) {
                            if (error) throw (error);
                            if (read == 0) {
                                    fs.close (file, function (error) {
                                            if (error) throw (error);
                                            var data = Buffer.concat (buffers);
                                            onSuccess (data);
                                    });
                            } else {
                                    buffer = buffer.slice (0, read);
                                    buffers.push (buffer);
                                    loop ();
                            }
                    });
            };
            loop ();
    });
}

load ("/etc/services", function (data) { console.log (data.length); })

Now by using a "flattened" callbacks variant with the low-level POSIX API:

var fs = require ("fs");

function load (path, onSuccess) {

    var file;
    var buffers = [];

    function tryOpen () {
        fs.open (path, "r", function (error, file_) {
            if (error) throw (error);
            file = file_;
            tryRead ();
        });
    }

    function tryRead () {
        var buffer = new Buffer (1024);
        fs.read (file, buffer, 0, buffer.length, null, function (error, read) {
            if (error) throw (error);
            if (read == 0) return tryClose ();
            buffer = buffer.slice (0, read);
            buffers.push (buffer);
            tryRead ();
        });
    }

    function tryClose () {
        fs.close (file, function (error) {
            if (error) throw (error);
            done ();
        });
    }

    function done () {
        var data = Buffer.concat (buffers);
        onSuccess (data);
    }

    tryOpen ();
}

load ("/etc/services", function (data) { console.log (data.length); })

And finally by using the event-based streaming API that some libraries (especially the I/O ones) expose:

var fs = require ("fs");

function load (path, onSuccess) {

    var file = fs.createReadStream (path);
    var buffers = [];

    file.on ("data", function (buffer) {
            buffers.push (buffer);
    });

    file.on ("end", function () {
            file.destroy ();
    });

    file.on ("close", function () {
            var data = Buffer.concat (buffers);
            onSuccess (data);
    });

    file.on ("error", function (error) {
            throw (error);
    });
}

load ("/etc/services", function (data) { console.log (data.length); })

Observations:

  • the function onSuccess does not respect the the "callback" pattern that the NodeJs environment expects; as such it should have actually been called callback and had called like: callback (null, data) or callback (error) (here the value of data will be implicitly set to undefined);
  • in all cases any error encountered is "handled" by just throwing it away and crashing the process; but this isn't necessarily so, because with a simple modification errors can be gracefully handled and propagated to the callback function;
  • of the all solutions I would guess the second one (the "canned" one) would be preferred, or the last one (the "stream" one);
  • but in general when not having either a "canned" solution, or the API isn't a stream based one, I would choose the "flattened" variant if the nesting is more than two levels deep;

Another important observation is that although the code is written in an asynchronous manner, its logic is still linear, thus synchronous or serial in effect --- an observation made also by Mark in one of his replies (from Mark ?), --- as we do not take advantage of the full potential available, for example by issuing multiple read requests in batches and always keeping the downstream system busy. Now in this case involving access to the file-system, unless the backing drive has multiple spindles (which most likely implies a RAID system), such multiple pending requests would just put pressure on the drive, unless we know low-level details like file-system block size, stripe size. But in other cases, like fetching some content from a NoSQL database or from the Internet, having such batches might make a difference by reducing the completion time of the overall task. Unfortunately however the code gets more complicated...

Non-trivial examples

In one of his replies to the NodeJs mailing list, (from Isaac Schlueter, 1), Isaac complains that most of the examples used to lure developers into one library or another are trivial and non-realistic, and that maybe proponents should try to prove their point by using more complex, real-world examples, like his rimarf tool (an equivalent of rm -rf written in NodeJs): rimraf.

Definitions
Asynchronous
It refers to the fact that in some cases the actions of a particular function invocation happen after the function has returned --- and in particular to NodeJs after the callee itself has returned --- thus the code still looks linear (synchronous), but the semantic is not, as the logical operation signals back once "in-line" through the return value, and possibly multiple times explicitly through various mechanisms, ranging from callbacks to futures.
Concurrency (preemptive or cooperative)
Denoting that inside the same process there might be multiple interleaving logical operations that "seem" to happen in parallel, their interleaving being either explicitly controlled by the developer (i.e. cooperative, and the NodeJs's case) or implicitly by the run-time (i.e. preemptive). Although there are cases where the underlying run-time is cooperative, but the employed solution --- like those based on code transformation --- hide these cooperation points. Concurrency shouldn't be confused with parallelism, which does imply multiple operations happening at the same time.

Issues

"Magic"...

Expressiveness...

Composability...

Error handling...

Language neutrality (i.e. does it work in CoffeeScript?)...

Debugging (stack traces, stepping, etc.)...

Public interface (i.e. don't impose your choice to your users, just go the simple way with callbacks)...

Context (i.e. what is available in the scope of a callback)...

Domains...

Aesthetics (i.e. code looks inside out)...

Techniques

Callbacks...

Promises / futures...

Events...

Dependency graphs...

Generators...

Coroutines... ("fibers" in NodeJs)

Continuations...

Code manipulation...

References

NodeJs libraries:

[Circuit]

https://github.com/tatumizer/circuit

...

NodeJs mailing list

Before digging into this below are some nice statistics to put the whole thing in perspective:

  • the sample is the NodeJs mailing list archive, starting from December 2010 up to January 2013, thus a partial but sufficiently large and fresh sample; (notmuch search --output=summary --sort=oldest-first --limit=10 -- 'nodejs@googlegroups.com')
  • the number of emails: approximative 35 thousand; (notmuch count --output=messages -- 'nodejs@googlegroups.com')
  • the number of threads: approximative 5.5 thousand; (notmuch count --output=threads -- 'nodejs@googlegroups.com');
  • the summary of all emails / threads: (notmuch search --output=threads -- 'nodejs@googlegroups.com' | xargs -L1 -- notmuch count -- | Rscript -e 'summary(as.numeric(readLines("stdin")))')
  • min: 1; max: 199;
  • average: 6.29;
  • 1st quarter: 2; median: 4; 3rd quarter: 7;
  • the number of threads that contain the words async, asynchronous or callback: 5834; (almost 15% of the total) (notmuch count --output=messages -- 'nodejs@googlegroups.com AND (async OR asynchronous OR callback)')
  • the number of threads that contain the words from above: 1104; (almost 20% of the total) (notmuch count --output=threads -- 'nodejs@googlegroups.com AND (async OR asynchronous OR callback)')
  • the summary of emails / threads that contain the words from above: (notmuch search --output=threads -- 'nodejs@googlegroups.com AND (async OR asynchronous OR callback)' | xargs -L1 -- notmuch count -- | Rscript -e 'summary(as.numeric(readLines("stdin")))')
  • min: 1; max: 199;
  • average: 10.81;
  • 1st quarter: 3; median: 6; 3rd quarter: 12;
  • as above, but taking only those threads with 10 or more replies: (notmuch search --output=threads -- 'nodejs@googlegroups.com AND (async OR asynchronous OR callback)' | xargs -L1 -- notmuch count -- | grep -E -e '^[0-9]{2,}' | Rscript -e 'summary(as.numeric(readLines("stdin")))')
  • threads: 366;
  • min: 10; max 199;
  • average: 24.19;
  • 1st quarter: 12; median: 17; 3rd quarter: 26;

The conclusion remains to be drawn by each one individually... My view is that the topic of asynchronous programming is either a very misunderstood, or a very popular one; but most likely the mere apparition of the word causes a flame-war... (Mikeal Rogers is even more blunt on this: A Vocal Minority...)

But digging through the whole pile --- which took me about 14 hours just to skim through --- here is what I've found interesting:

[NodeJs-1]

(from Alexander K.) new concept for structured asynchronous programming

The discussion revolves around a proposed library [Circuit], the main concerns and topics being:

  • the usage of code parsing (for dependency detection);
  • the usage of code manipulation (for replacing intermediary result expressions, called "tokens", with accessors);
  • there are many implicit elements that give a sens of "magic" (like "tokens", parallelism);
  • the introduced abstraction leads to increased complexity, thus reducing usability;
  • the solution is not "idiomatic" in the context of NodeJs (i.e. it hides the callbacks);
  • an alternative would be to use fibers with plain synchronous code;

Interesting replies:

In the end the discussion moved on GitHub's issue tracker https://github.com/tatumizer/circuit/issues/1.

[NodeJs-2]

(from Alan Gutierrez) Re: [nodejs] Re: trying to wrap my head around "promises" - async vs Q

It makes a very good point on how to achieve robust callbacks with two little functions, and without touching the flow control in any way.

[NodeJs-3]

(from Gregor Elke) Re: [nodejs] Re: trying to wrap my head around "promises" - async vs Q

It provides an interesting technique, by using Function.prototype.bind, of how to thread the context between multiple callback functions without nesting them (and thus without catching a large closure).

[NodeJs-4]

(from Marcel Laverdet) fibers 0.5; say "no" callbacks (when it makes sense in your application)

The discussion revolves around the topic of synchronous or asynchronous code syntax (the code still runs mono-threaded).

All in all this thread is a classical flame-war, but there are also some good insights, provided that the reader shovels trough all the content not quite on the subject...

Interesting replies:

  • (from Bruno Jouhier, 1), pointing the fact that in the I/O layers callbacks make more sens, while in business logic synchronous calls (i.e. futures) are better;
  • (from Bruno Jouhier, 2), addressing the clarity of two main aspects of the various solutions, i.e. how easy it is to discern what a certain code does (the semantic) and how a certain code does it (the flow);
  • (from Isaac Schlueter, 1), presenting (at the end of the message) a few interesting problems that should be used as exercises (especially rimraf, equivalent to rm -rf) instead of the classical examples; (later on an example is provided that uses fibers https://gist.github.com/1131093;)
  • (from Isaac Schlueter, 2), (amongst other things) debating the exception handling issues in both synchronous and asynchronous code;
  • (from Will Conant), making the case "for fibers", with a real-world usage;
[NodeJs-5]

(from Adam Crabtree, 1) Re: Introducing trycatch: Async try catch / scoped error handler / handle http.serverRequest 500s

Provides a good technical insight into how to collect and produce "full" exception stack-traces in the context of asynchronous callbacks, i.e. so that you can see where the callback was registered from. Although it is an interesting technique, it is quite invasive and requires patching the core libraries (and explicit patching of other user libraries).

[NodeJs-6]

(from Adam Crabtree, 2) Re: Introducing trycatch: Async try catch / scoped error handler / handle http.serverRequest 500s

Another good insight into the issues (mainly disadvantages) of exception handling by various flow control frameworks which change the semantic of the code or the library.

[NodeJs-7]

(from Tim Oxley) peeves & current best practices?

Although the initial topic is a list of best programming practices, a good part of the thread discusses about the issues introduced by the inconsistent calling of asynchronous operation callbacks, i.e. for the same operation some times the callback is called right away (when it arguably should be wrapped into process.nextTick) and some times asynchronous.

[NodeJs-8]

(from Mariusz Nowak) [ANN]: Deferred - Maintainable asynchronous JavaScript with promises

Although the discussion starts with the announcement of a new release of the Deferred library and comparisons with others, halfway through the thread the topic changes to the issues of exception handling in the context of promises libraries.

Interesting replies:

[NodeJs-9]

(from Tony Huang) Wind.js: An elegant approach to asynchronies JavaScript

The discussion centers around comparisons, advantages / disadvantages, of the various libraries that rely on code transformation. (Insightful at on a philosophical level on the topic, but without many technical details.)

[NodeJs-10]

(from Bruno Jouhier) Are callbacks good enough?

Like in most of the previous threads --- the initial topic is about solutions that allow the developer to write synchronous code that runs in the asynchronous world of NodeJs --- this one starts as a request for comments for the blog post Asynchronous JavaScript -- the tale of Harry, and in return the discussion gets side-tracked into the flame-war between pro-asynchronous vs pro-synchronous, plus the frame-war pro-multithreading vs anti-multithneading. But still there are some good insights in the overall thread, especially:

  • (from Mark ?), which makes the observation that what actually most programmers write is in almost all cases linearly-asynchronous, and not fully-asynchronous (i.e. parallel tasks with joins);
  • (from Tim Caswell, 1) and (from Tim Caswell, 2), suggesting that for new-comers to the language using the raw callbacks, are the best solution, and only after that they should switch to using flow control libraries;
[NodeJs-11]

(from Tim Caswell) Re: Why throw exceptions in async code?

It has a small and to-the point explanation about how exception handling (including raising) should be done in NodeJs.