Running MongoDb on Microsoft Windows Azure with CloudDrive

I’ve been playing around with the whole CQRS approach and think MongoDb works really well for the query side of things. I also figured it was time I tried Azure so I had a look round the web to see if there we’re instructions on how to run MongoDb on Microsoft’s Azure cloud. It turned out there were only a few mentions of it or a general approach that should work but no detailed instructions on how to do it. So, I figured I’d give it a go and for a total-Azure-newbie it didn’t turn out to be too difficult.

Obviously you’ll need an Azure account which you may get with MSDN or you can sign-up for their ‘free’ account which has a limited number of hours included before you have to start paying. One thing to be REALLY careful of though – just deploying an app to Azure starts the clock running and leaving it deployed but turned off counts as hours so be sure to delete any experimental deployments you make after trying things out!!

First of all though it’s important to understand where MongoDb would fit with Azure. Each web or worker role runs as a virtual machine which has an amount of local storage included depending on the size of the VM, currently the four pre-defined VMs are:

  • Small: 1 core processor, 1.7GB RAM, 250GB hard disk
  • Medium: 2 core processors, 3.5GB RAM, 500GB hard disk
  • Large: 4 core processors, 7GB RAM, 1000GB hard disk
  • Extra Large: 8 core processors, 15GB RAM, 2000GB hard disk

This local storage is only temporary though and while it can be used for processing by the role instance running it isn’t available to any others and when the instance is moved, upgraded or recycled then it is lost forever (as in, gone for good).

For permanent storage Azure offers SQL-type databases (which we’re not interested in), Table storage (which would be an alternative to MongoDb but harder to query and with more limitations) and Blob storage.

We’re interested in Blob storage or more specifically Page-Blobs which support random read-write access … just like a disk drive. In fact, almost exactly like a disk drive because Azure provides a new CloudDrive which uses a VHD drive image stored as a Page-Blob (so it’s permanent) and can be mounted as a disk-drive within an Azure role instance.

The VHD images can range from 16Mb to 1Tb and apparently you only pay for the storage that is actually used, not the zeroed-bytes (although I haven’t tested this personally).

So, let’s look at the code to create a CloudDrive, mount it in an Azure worker role and run MongoDb as a process that can use the mounted CloudDrive for it’s permanent storage so that everything is kept between machine restarts. We’ll also create an MVC role to test direct connectivity to MongoDb between the two VMs using internal endpoints so that we don’t incur charges for Queue storage or Service Bus messages.

The first step is to create a ‘Windows Azure Cloud Service’ project in Visual Studio 2010 and add both an MVC 2 and Worker role to it.

We will need a copy of the mongod.exe to include in the worker role so just drag and drop that to the project and set it to be Content copied when updated. Note that the Azure VMs are 64-bit instances so you need the 64-bit Windows version of MongoDb.

We’ll also need to add a reference to the .NET MongoDb client library to the web role. I’m using the mongodb-csharp one but you can use one of the others if you prefer.

Our worker role needs a connection to the Azure storage account which we’re going to call ‘MongDbData’

The other configured setting that we need to define is some local storage allocated as a cache for use with the CloudDrive, we’ll call this ‘MongoDbCache’. For this demo we’re going to create a 4Gb cache which will match the 4Gb drive we’ll create for MongoDb data. I haven’t played enough to evaluate performance yet but from what I understand this cache acts a little like the write-cache that you can turn on for your local hard drive.

The last piece before we can crack on with some coding is to define an endpoint which is how the Web Role / MVC App will communicate with the MongoDb server on the Worker Role. This basically tells Azure that we’d like an IP Address and a port to use and it makes sure that we can use it and no one else can. It should be possible to make the endpoint public to the world if you wanted but that isn’t the purpose of this demo. The endpoint is called ‘MongoDbEndpoint’ and set to Internal / TCP:

Now for the code and first we’ll change the WorkerRole.cs file in the WorkerRole1 project (as you can see, I put a lot of effort into customizing the project names!). We’re going to need to keep a reference to the CloudDrive that we’re mounting and also the MongoDb process that we’re going to start so that we can shut them down cleanly when the instance is stopping:

private CloudDrive _mongoDrive;
private Process _mongoProcess;

In the OnStart() method I’ve added some code copied from the Azure SDK Thumbnail sample – this prepares the CloudStorageAccount configuration so that we can use the method CloudStorageAccount.FromConfigurationSetting() to load the details from configuration (this just makes it easier to switch to using the Dev Fabric on our local machine without changing code). I’ve also added a call to StartMongo() and created an OnStop() method which simply closes the MongoDb process and unmounts the CloudDrive when the instance is stopping:

public override bool OnStart()
{
    // Set the maximum number of concurrent connections
    ServicePointManager.DefaultConnectionLimit = 12;

    DiagnosticMonitor.Start("DiagnosticsConnectionString");

    #region Setup CloudStorageAccount Configuration Setting Publisher

    // This code sets up a handler to update CloudStorageAccount instances when their corresponding
    // configuration settings change in the service configuration file.
    CloudStorageAccount.SetConfigurationSettingPublisher((configName, configSetter) =>
    {
        // Provide the configSetter with the initial value
        configSetter(RoleEnvironment.GetConfigurationSettingValue(configName));

        RoleEnvironment.Changed += (sender, arg) =>
        {
            if (arg.Changes.OfType()
                .Any((change) => (change.ConfigurationSettingName == configName)))
            {
                // The corresponding configuration setting has changed, propagate the value
                if (!configSetter(RoleEnvironment.GetConfigurationSettingValue(configName)))
                {
                    // In this case, the change to the storage account credentials in the
                    // service configuration is significant enough that the role needs to be
                    // recycled in order to use the latest settings. (for example, the
                    // endpoint has changed)
                    RoleEnvironment.RequestRecycle();
                }
            }
        };
    });
    #endregion

    // For information on handling configuration changes
    // see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.
    RoleEnvironment.Changing += RoleEnvironmentChanging;

    StartMongo();

    return base.OnStart();
}

public override void OnStop()
{
    _mongoProcess.Close();
    _mongoDrive.Unmount();

    base.OnStop();
}

Next is the code to create the CloudDrive and start the MongoDb process running:

private void StartMongo()
{
    // local cache drive we'll use on the CM
    LocalResource localCache = RoleEnvironment.GetLocalResource("MongoDbCache");

    Trace.TraceInformation("MongoDbCache {0} {1}", localCache.RootPath, localCache.MaximumSizeInMegabytes);
    // we'll use all the cache space we can (note: InitializeCache doesn't work with trailing slash)
    CloudDrive.InitializeCache(localCache.RootPath.TrimEnd('\'), localCache.MaximumSizeInMegabytes);

    // connect to the storage account
    CloudStorageAccount storageAccount = CloudStorageAccount.FromConfigurationSetting("MongoDbData");

    // client for talking to our blob files
    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

    // the container that our dive is going to live in
    CloudBlobContainer drives = blobClient.GetContainerReference("drives");

    // create blob container (it has to exist before creating the cloud drive)
    try {drives.CreateIfNotExist();} catch {}

    // get the url to the vhd page blob we'll be using
    var vhdUrl = blobClient.GetContainerReference("drives").GetPageBlobReference("MongoDb.vhd").Uri.ToString();
    Trace.TraceInformation("MongoDb.vhd {0}", vhdUrl);

    // create the cloud drive
    _mongoDrive = storageAccount.CreateCloudDrive(vhdUrl);
    try
    {
        _mongoDrive.Create(localCache.MaximumSizeInMegabytes);
    }
    catch (CloudDriveException ex)
    {
        // exception is thrown if all is well but the drive already exists
    }

    // mount the drive and get the root path of the drive it's mounted as
    var dataPath = _mongoDrive.Mount(localCache.MaximumSizeInMegabytes, DriveMountOptions.Force) + @"";
    Trace.TraceInformation("Mounted as {0}", dataPath);

    // get the internal enpoint that we're going to use for MongoDb
    var ep = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["MongoDbEndpoint"];

    // create the process to host mongo
    _mongoProcess = new Process();
    var startInfo = _mongoProcess.StartInfo;
    // so we can redirect streams
    startInfo.UseShellExecute = false;
    // we don't need a window, it's hard to see the monitor from here (jk)
    startInfo.CreateNoWindow = false;
    // the mongo daemon is included in our project in the current directory
    startInfo.FileName = @"mongod.exe";
    startInfo.WorkingDirectory = Environment.CurrentDirectory;
    // specify the ip address and port for MongoDb to use and also the path to the data
    startInfo.Arguments = string.Format(@"--bind_ip {0} --port {1} --dbpath {2} --quiet", ep.IPEndpoint.Address, ep.IPEndpoint.Port, dataPath);
    // capture mongo output to Azure log files
    startInfo.RedirectStandardError = true;
    startInfo.RedirectStandardOutput = true;
    _mongoProcess.ErrorDataReceived += (sender, evt) => WriteLine(evt.Data);
    _mongoProcess.OutputDataReceived += (sender, evt) => WriteLine(evt.Data);

    Trace.TraceInformation("Mongo Process {0}", startInfo.Arguments);

    // start mongo going
    _mongoProcess.Start();
    _mongoProcess.BeginErrorReadLine();
    _mongoProcess.BeginOutputReadLine();
}

[TODO: Add more explanation !!]

So, that’s the server-side, oops, I mean Worker Role setup which will now run MongoDb and persist the data permanently. We could get fancier and have multiple roles with slave / sharded instances of MongoDb but they will follow a similar pattern.

The client-side in the Web Role MVC app is very simple and the only extra work we need to do is to figure out the IP Address and Port that we need to connect to MongoDb using which are setup for us by Azure. The RoleEnvironment lets us get to this and I believe (but could be wrong so don’t quote me) that the App Fabric part of Azure handles the communication between roles to pass this information. Once we have it we can create our connection to MongoDb as normal and save NoSQL JSON documents to our hearts content …

var workerRoles = RoleEnvironment.Roles["WorkerRole1"];
var workerRoleInstance = workerRoles.Instances[0];
RoleInstanceEndpoint ep = workerRoleInstance.InstanceEndpoints["MongoDbEndpoint"];

string connectionString = string.Format("Server={0}:{1}", ep.IPEndpoint.Address, ep.IPEndpoint.Port);

var mongo = new Mongo(connectionString);
mongo.Connect();
var db = mongo.GetDatabase("notes");

I hope you find this useful. I’ll try and add some extra notes to explain the code and the thinking behind it in more detail and will post some follow ups to cover deploying the app to Azure and what I’ve learned of that process.