Streaming files to Azure Blob Storage with Node.js

Ancient Knowledge

This article is getting old. It was written in the ancient times and the world of software development has changed a lot since then. I'm keeping it here for historical purposes, but I recommend you check out the newer articles on my site.

Microsoft has done an excellent job of building a cloud solution that is downright fun to work with, regardless of the language or toolset you want to use. They are regularly releasing updated API's and new cloud service offerings, have remained competitive in their pricing, and have even (wait for it) been called "visionary". The feature that I enjoy the most is the ease of configuration of their various offerings, most can be configured in a couple clicks and up and running in less than a minute. This really helps ease you into a nice deployment cycle right off the bat, especially if you take advantage of automatic Github or TFS deployments. Enough talk! Show us the code!

Azure Blob Storage

Setting up the Azure blob storage account is so easy, I'm not going to walk step by step through that one. Once the account is setup, you will be given a couple keys that you will need to work with:

Node Configuration

To make our REST api nice and easy, lets npm install express. I'm going to use a simple configuration to avoid any unnecessary code.

// start up the express train
var express = require('express'),
    app = express();

// allow PUT and DELETE
app.use(express.methodOverride());
app.use(logErrors);

// index page, upload file form
app.get('/', function (req, res) {
    res.send(
        '<form action="/upload" method="post" enctype="multipart/form-data">' +
        '<input type="file" name="snapshot" />' +
        '<input type="submit" value="Upload" />' +
        '</form>'
    );       
});

// thundercats, GO!
var port = process.env.PORT || 4337;
app.listen(port);

function logErrors(err, req, res, next){
    console.error(err.stack);
    next(err);
}

With all that loaded up into the server.js file, you should be able to node server.js and navigate to the form via the browser.

Using Middleware

Now that we have the easy stuff taken care of, we need to install some middleware to handle some of the nuances of form data processing for us. Run npm install formidable and npm install multiparty to download the bits. Multiparty is a node module for parsing form data, but it's specifically for multipart/form-data and has the ability to chunk the data into streams, which we are going to need to pass our data onto Azure. Now is also a good time to npm install azure to get all the good Azure API modules for node. The documentation is pretty good and they provide out of the box support for working with table storage, blob storage, and the Azure service bus. Let's update the server.js file to take advantage of these new modules.

// express, now with friends!
var express = require('express'),
    app = express(),
    multiparty = require('multiparty'),
    azure = require('azure');

Multipart Form Data

If you have ever seen under the covers of an HTTP request containing multipart data, then you know it isn't pretty. We don't want to have to deal with the details of parsing that out, so we're going to let Multiparty do it for us.

// example of multiparty parsing
app.post('/upload', function(req, res) {
    var form = new multiparty.Form();
    form.parse(req);
});

In the normal usage of Multiparty, the files are going to get saved to a temporary location on disk. We could upload them to Azure after they get saved to disk, but that seems pretty wasteful since it requires us to wait on the filesystem and delete the file when we are done. Fortunately, Multiparty emits a part event that we can latch on to begin streaming our file to Azure. The part event is emitted when a part (or chunk of data) is encountered in the request. The event also includes some useful data such as the filename, byteCount of the chunk, and byteOffset.

Azure SDK

To access the Azure blob storage, you need to provide your storage account name and access key as environment variables. An alternate method is to place the following at the top of your server.js file, although your mileage may vary.

// azure storage account
process.env['AZURE_STORAGE_ACCOUNT'] = 'youraccountname';
process.env['AZURE_STORAGE_ACCESS_KEY'] = 'youraccountkey';

The Azure SDK includes a method for passing a bytestream to blob storage with its createBlockBlobFromStream method.

// method signature:
// createBlockBlobFromStream(containerName, fileName, stream, size, callback)

var blobService = azure.createBlobService();
blobService.createBlockBlobFromStream('container', 'filename', fs.createReadStream('upload.txt'), 11, function(error){
    if(!error){
        // Blob uploaded
    }
});

In the example above we are actually creating a stream from a file on disk, but lucky for us Multiparty's part is also a stream.

Putting It All Together

app.post('/upload', function (req, res) {
    var blobService = azure.createBlobService();
    var form = new multiparty.Form();

    form.on('part', function(part) {
        if (part.filename) {
            var filename = part.filename;
            var size = part.byteCount;

            var onError = function(error) {
                if (error) {
                    res.send({ grrr: error });
                }
            };
            blobService.createBlockBlobFromStream('container', filename, part, size, onError);
        } else {
            form.handlePart(part);
        }
    });

    form.parse(req);
    res.send("SWEET");
});

If you open up the control panel in your storage account, you should now see the uploaded bits. In the end, pretty easy to do, you just need to know where to look for the right parts. UPDATE The one caveat to the approach above is we really don't know the actual size of the stream/file that was uploaded. With multiparty, if the uploaded file is the last field in the form, then the byteSize is the same as the file size. However, if the file is anywhere else in the form, the byteSize is the remaining size of bytes in the form. With Azure, the file size is only used as a timeout; Azure waits for the specified number of bytes before continuing. This works for the upload, but unfortunately is not very accurate and can result in the upload call taking longer than is really necessary. I am still looking for a way to modify the Multiparty code to get the correct file size and I'll update this post and the comments when I find one. Thanks to Matthew Kim for leading me to find this little hiccup.