Creating labels for GitHub issue system

I liked the ideas for Managing your backlog with GitHub Issues and the type of labels used but creating them was harder than it should have been because of the unicode characters and custom colors and so using them consistently on multiple projects would mean repeating the same work each time (unless there is a ‘copy labels’ button that I haven’t noticed!).

So, I decided to write a little script to automate the process. It creates a slightly different set of labels as shown below but could be easily adapted to your own needs:

You need to add curl to your path or put it in the same folder as the script which prompts you for your GitHub Profile, Password and Project name.

The actual script (save as a cmd file):

@echo off
SETLOCAL
echo This script creates issue labels for a GitHub repository
echo.
echo Please specify the GitHub Profile containing the Repository, e.g.:
echo https://github.com/MyProfile/MyCoolProject
echo                    ~~~~~~~~~
set /P username= "  Enter Profile   : "
echo.
echo Please specify the GitHub password for that profile:
set /P password= "  Enter Password  : "
echo.
echo Please specify the GitHub Repository, e.g.:
echo https://github.com/MyProfile/MyCoolProject
echo                              ~~~~~~~~~~~~~
echo.
set /P repository= "  Enter Repository: "
echo.
echo Creating labels ...
curl -k -u "%username%:%password%" -d "{\"name\":\"Feature\",\"color\":\"2d9e11\"}" https://api.github.com/repos/%username%/%repository%/labels
curl -k -u "%username%:%password%" -d "{\"name\":\"Bug\",\"color\":\"e10c02\"}" https://api.github.com/repos/%username%/%repository%/labels
curl -k -u "%username%:%password%" -d "{\"name\":\"Rejected\",\"color\":\"000000\"}" https://api.github.com/repos/%username%/%repository%/labels
curl -k -u "%username%:%password%" -d "{\"name\":\"Idea\",\"color\":\"e102d8\"}" https://api.github.com/repos/%username%/%repository%/labels
curl -k -u "%username%:%password%" -d "{\"name\":\"Task\",\"color\":\"0b02e1\"}" https://api.github.com/repos/%username%/%repository%/labels
curl -k -u "%username%:%password%" -d "{\"name\":\"\u2605\",\"color\":\"fffdd6\"}" https://api.github.com/repos/%username%/%repository%/labels
curl -k -u "%username%:%password%" -d "{\"name\":\"\u2605\u2605\",\"color\":\"fff875\"}" https://api.github.com/repos/%username%/%repository%/labels
curl -k -u "%username%:%password%" -d "{\"name\":\"\u2605\u2605\u2605\",\"color\":\"fff200\"}" https://api.github.com/repos/%username%/%repository%/labels
echo.
echo Current labels ...
curl -k -u "%username%:%password%" https://api.github.com/repos/%username%/%repository%/labels
ENDLOCAL
pause

Feb 2013: Script updated to work with latest Curl for Windows that I tried on Windows 8

Event-sourcing benefit: migrating between storage types

Here’s an example of one benefit of Event Sourcing …

How much work would typically result from “hey, we need to change platform and store our data in a different database …“? Even using NHibernate and going from one relational database to another, you’d potentially be looking at a significant piece of effort.

Now imagine instead trying to migrate between fundamentally different storage engines such as from SQL Server to Amazon S3 or from Oracle to MongoDB?!

Well, EventSourcing not only makes it possible, it also makes it trivially simple too. Here’s an actual EventStore migrator implementation that took all of two minutes to write:

	var source = GetSourceEventStore();
	var destination = GetDestinationEventStore();

	foreach (var commit in source.GetFrom(StartDate))
		destination.Commit(commit);

Oh look, it’s done !! Ok, before you accuse me of being slow there is the event-store wire-up which is where one of the minutes went (I could add some XML DI configuration to make it really generic but it’s probably not worth the effort):

	private static IPersistStreams GetSourceEventStore()
	{
		return Wireup.Init()
			.UsingMongoPersistence("source", new DocumentObjectSerializer())
			.UsingSynchronousDispatcher()
				.PublishTo(new NullDispatcher())
			.Build()
			.Advanced;
	}

	private static IPersistStreams GetDestinationEventStore()
	{
		return Wireup.Init()
			.UsingSqlPersistence("destination")
				.WithDialect(new MsSqlDialect())
				.InitializeStorageEngine()
				.UsingBinarySerialization()
				.Compress()
			.UsingSynchronousDispatcher()
				.PublishTo(new NullDispatcher())
			.Build()
			.Advanced;
	}

BTW: The “source” and “destination” strings are the names of connection strings in the <connectionStrings> section of the config file.

Version 2 of the “EventStore Migrator Enterprise EditionTM” may add the ability to store the last commit DateTime migrated and then restart from there to enable incremental replication. A real-life version would also need a little exception handling and that would probably make use of the last DateTime to continue from the right point after any failure.

So, with a couple of minutes work you can easily migrate between different storage engines and also have a mechanism to incrementally keep them in sync – maybe for disaster recovery or backup where you use a lower cost / higher latency event store in an emergency (and still have a way to migrate back).

Event sourcing and immutable events makes all this possible.

Simple Service Bus / Message Queue with MongoDB

A service bus or message queue allow producers and subscribers to communicate asynchronously so that a system can handle disconnects, processes being stopped and started or enable peaks of demand to be handled beyond what the subscriber can immediately cope with. The queue acts as a buffer that the producer writes to and the subscriber reads from.

There are lots of implementations such as NServiceBus, MassTransit, Rhino Service Bus and the cloud-provided services such as Amazon’s Simple Queue Service and Window Azure’s AppFabric Service Bus. Some take a little time to get started with and the cloud ones can also rack up charges pretty quickly if you are doing too much polling.

Often, all that is needed is something fairly simple to buffer messages between processes and persist them. I’ve been making good use of MongoDB recently in conjunction with Jonathan Oliver’s EventStore library for a CQRS-based project so it seemed the obvious place to start – why not use MongoDB to store the queue?!

Now, I did have a look round first to see if anyone else had created something already and the closest I got was the post here: Why (and How) I Replaced Amazon SQS with MongoDB. However, from reading the MongoDB website I’d seen that it had Tailable Cursors which, together with the Capped Collections feature, seemed like the ideal tools to build a queue on and possibly more efficient – in fact, MongoDB uses these very features internally for its replication.

Why are these features important?

We don’t want the queue to just grow and grow and grow but would like to put a cap on the size. Once a capped collection in MongoDB is full it wraps round and starts overwriting the oldest records. Capped collections are actually pre-allocated which helps with performance too. All we need is a collection that will be big enough to cope with any downtime from the subscriber so that messages are not lost.

Capped collections also support natural sort order where you can read records in the order they were written to which means we don’t need an index which means both reads and writes will be much faster without MongoDB having as much extra work to do.

Tailable cursors block at the server so we don’t have to keep polling or have to give up some latency. If a cursor is opened and there is no data to return it just sits there waiting but will fire off the next record to you as soon as it comes in (actually, it doesn’t wait indefinitely but somewhere around 4 seconds but the result is the same – we only ‘poll’ every 4 seconds but get immediate notification of a new message).

So, with the newly released Official C# MongoDB Driver in hand I set-out to build my queue …

Before the details though, you can take a look at the finished result from this Jinq screen-cast:

video

We’ll try and keep things really simple for this example so welcome to the simplest queue interfaces ever conceived! We just have an interface for adding things to the queue and another for reading from it:

public interface IPublish<in T> where T : class
{
    void Send(T message);
}
public interface ISubscribe<out T> where T : class
{
    T Receive();
}

And of course we need something to actually send – again, we’ll keep things simple for the demo and have a very simple message with a couple of properties:

public class ExampleMessage
{
    public int Number { get; private set; }
    public string Name { get; private set; }

    public ExampleMessage(int number, string name)
    {
        Number = number;
        Name = name;
    }

    public override string ToString()
    {
        return string.Format("ExampleMessage Number:{0} Name:{1}", Number, Name);
    }
}

The ExampleMessage will be the generic <T> parameter to the queue interfaces but we’re going to want to store a bit more information in MongoDB than the message itself so we’ll also use a MongoMessage class to add the extra properties and act as a container / wrapper for the message itself. Nothing outside of the queue will ever see this though:

public class MongoMessage<T> where T : class
{
    public ObjectId Id { get; private set; }
    public DateTime Enqueued { get; private set; }
    public T Message { get; private set; }

    public MongoMessage(T message)
    {
        Enqueued = DateTime.UtcNow;
        Message = message;
    }
}

This will give each message that we send an Id and also record the date / time that it was enqueued (which would enable us to work-out the latency of the queue). The Id is an ObjectId and this is the default Document ID type that MongoDB uses. All of the messages that we write to our queue will be assigned an Id and these should be sortable which we can use to pick up our position when reading from the queue should we need to re-start.

Here is what the messages look like inside of MongoDB (via the excellent MongoVUE GUI tool):

With the interfaces and commands in place we can add a couple of projects to show how each side will be used. First the producer which will just write commands to our queue:

class Program
{
    private static readonly ManualResetEvent Reset = new ManualResetEvent(false);
    private static long lastWrite;
    private static long writeCount;
    private static Timer timer;
    private static readonly object _sync = new object();

    static void Main(string[] args)
    {
        Console.WriteLine("Publisher");
        Console.WriteLine("Press 'R' to Run, 'P' to Pause, 'X' to Exit ...");

        timer = new Timer(TickTock, null, 1000, 1000);

        var t = new Thread(Run);
        t.Start();

        var running = true;
        while (running)
        {
            if (!Console.KeyAvailable) continue;

            var keypress = Console.ReadKey(true);
            switch (keypress.Key)
            {
                case ConsoleKey.X:
                    Reset.Reset();
                    running = false;
                    break;
                case ConsoleKey.P:
                    Reset.Reset();
                    Console.WriteLine("Paused ...");
                    break;
                case ConsoleKey.R:
                    Reset.Set();
                    Console.WriteLine("Running ...");
                    break;
            }
        }

        t.Abort();
    }

    public static void Run()
    {
        IPublish<ExampleMessage> queue = Configuration.GetQueue<ExampleMessage>();

        var i = 0;

        while (true)
        {
            Reset.WaitOne();
            i++;

            var message = new ExampleMessage(i, "I am number " + i);
            queue.Send(message);
            Interlocked.Increment(ref writeCount);

            if (i == int.MaxValue)
                i = 0;
        }
    }

    public static void TickTock(object state)
    {
        lock (_sync)
        {
            Console.WriteLine("Sent {0} {1}", writeCount, writeCount - lastWrite);
            lastWrite = writeCount;
        }
    }
}

… and the consumer which will read from the queue:

class Program
{
    private static readonly ManualResetEvent Reset = new ManualResetEvent(false);
    private static long lastRead;
    private static long readCount;
    private static Timer timer;
    private static readonly object _sync = new object();

    static void Main(string[] args)
    {
        Console.WriteLine("Subscriber");
        Console.WriteLine("Press 'R' to Run, 'P' to Pause, 'X' to Exit ...");

        timer = new Timer(TickTock, null, 1000, 1000);

        var t = new Thread(Run);
        t.Start();

        var running = true;
        while (running)
        {
            if (!Console.KeyAvailable) continue;

            var keypress = Console.ReadKey(true);
            switch (keypress.Key)
            {
                case ConsoleKey.X:
                    Reset.Reset();
                    running = false;
                    break;
                case ConsoleKey.P:
                    Reset.Reset();
                    Console.WriteLine("Paused ...");
                    break;
                case ConsoleKey.R:
                    Reset.Set();
                    Console.WriteLine("Running ...");
                    break;
            }
        }

        t.Abort();
    }

    public static void Run()
    {
        ISubscribe<ExampleMessage> queue = Configuration.GetQueue<ExampleMessage>();

        while (true)
        {
            Reset.WaitOne();
            var message = queue.Receive();
            Interlocked.Increment(ref readCount);
        }
    }

    public static void TickTock(object state)
    {
        lock (_sync)
        {
            Console.WriteLine("Received {0} {1}", readCount, readCount - lastRead);
            lastRead = readCount;
        }
    }
}

Both show the total number of messages sent or received and also the number in the last second.

Finally, the MongoQueue implementation. It could be a little simpler but I wanted to make sure things were as simple as possible for the consumers and should be easy enough to follow.

public class MongoQueue<T> : IPublish<T>, ISubscribe<T> where T : class
{
    private readonly MongoDatabase _database;
    private readonly MongoCollection<MongoMessage<T>> _queue;	// the collection for the messages
    private readonly MongoCollection<BsonDocument> _position;	// used to record the current position
    private readonly QueryComplete _positionQuery;

    private ObjectId _lastId = ObjectId.Empty;					// the last _id read from the queue

    private MongoCursorEnumerator<MongoMessage<T>> _enumerator;	// our cursor enumerator
    private bool _startedReading = false;						// initial query on an empty collection is a special case

    public MongoQueue(string connectionString, long queueSize)
    {
        // our queue name will be the same as the message class
        var queueName = typeof(T).Name;
        _database = MongoDatabase.Create(connectionString);

        if (!_database.CollectionExists(queueName))
        {
            try
            {
                Console.WriteLine("Creating queue '{0}' size {1}", queueName, queueSize);

                var options = CollectionOptions
                    // use a capped collection so space is pre-allocated and re-used
                    .SetCapped(true)
                    // we don't need the default _id index that MongoDB normally created automatically
                    .SetAutoIndexId(false)
                    // limit the size of the collection and pre-allocated the space to this number of bytes
                    .SetMaxSize(queueSize);

                _database.CreateCollection(queueName, options);
            }
            catch
            {
                // assume that any exceptions are because the collection already exists ...
            }
        }

        // get the queue collection for our messages
        _queue = _database.GetCollection<MongoMessage<T>>(queueName);

        // check if we already have a 'last read' position to start from
        _position = _database.GetCollection("_queueIndex");
        var last = _position.FindOneById(queueName);
        if (last != null)
            _lastId = last["last"].AsObjectId;

        _positionQuery = Query.EQ("_id", queueName);
    }

    public void Send(T message)
    {
        // sending a message is easy - we just insert it into the collection
        // it will be given a new sequential Id and also be written to the end (of the capped collection)
        _queue.Insert(new MongoMessage<T>(message));
    }

    public T Receive()
    {
        // for reading, we give the impression to the client that we provide a single message at a time
        // which means we maintain a cursor and enumerator in the background and hide it from the caller

        if (_enumerator == null)
            _enumerator = InitializeCursor();

        // there is no end when you need to sit and wait for messages to arrive
        while (true)
        {
            try
            {
                // do we have a message waiting?
                // this may block on the server for a few seconds but will return as soon as something is available
                if (_enumerator.MoveNext())
                {
                    // yes - record the current position and return it to the client
                    _startedReading = true;
                    _lastId = _enumerator.Current.Id;
                    _position.Update(_positionQuery, Update.Set("last", _lastId), UpdateFlags.Upsert, SafeMode.False);
                    return _enumerator.Current.Message;
                }

                if (!_startedReading)
                {
                    // for an empty collection, we'll need to re-query to be notified of new records
                    Thread.Sleep(500);
                    _enumerator.Dispose();
                    _enumerator = InitializeCursor();
                }
                else
                {
                    // if the cursor is dead then we need to re-query, otherwise we just go back to iterating over it
                    if (_enumerator.IsDead)
                    {
                        _enumerator.Dispose();
                        _enumerator = InitializeCursor();
                    }
                }
            }
            catch (IOException)
            {
                _enumerator.Dispose();
                _enumerator = InitializeCursor();
            }
            catch (SocketException)
            {
                _enumerator.Dispose();
                _enumerator = InitializeCursor();
            }
        }
    }

    private MongoCursorEnumerator<MongoMessage<T>> InitializeCursor()
    {
        var cursor = _queue
            .Find(Query.GT("_id", _lastId))
            .SetFlags(
                QueryFlags.AwaitData |
                QueryFlags.NoCursorTimeout |
                QueryFlags.TailableCursor
            )
            .SetSortOrder(SortBy.Ascending("$natural"));

        return (MongoCursorEnumerator<MongoMessage<T>>)cursor.GetEnumerator();
    }
}

After opening a cursor we get an enumerator and try to read records. The call to MoveNext() will block for a few seconds if we’re already at the end of the cursor and may then timeout without returning anything. In this case we need to dispose of the enumerator and get another from the cursor but we don’t need to re-run the query – it’s still connected and available and we just need to ‘get more’ on it.

The reason for the _startedReading flag is that the initial query against an empty collection will result in an invalid cursor and we need to re-query in this case. However, we don’t want to re-query after that as it’s more efficient to let the cursor wait for additional results (unless the cursor is dead when we do need to re-query).

Occasionally, the connection will be broken which will cause an exception so we need to catch that and setup the cursor and enumerator again.

Assuming we got a record back then we return it to the client (yield return) and go back to get the next item. We also store the position of the last item read in the queue so that when we re-start we can skip any existing entries.

Here is an explanation of the query flags.

Query Flags:

AwaitData

If we get to the end of the cursor and there is no data we’d like the server to wait for a while until some more arrives. The default appears to be around 2-4 seconds.

TailableCursor

Indicates that we want a tailable cursor where it will wait for new data to arrive.

NoCursorTimeout

We don’t want our cursor to timeout.

So there it is – a simple but easy to use message queue or service bus that hopefully makes splitting an app into multiple processes with re-startability and fast asynchronous communication a little less challenging. I’ve found the performance of MongoDB to be outstanding and ease of setting this up beats the ‘proper’ message queue solutions. When it comes to the cloud, the small amount of blocking that the cursor does at the server saves us having to do a lot of polling while still giving us the fast low-latency response we want.

Please let me know what you think of the article and if you run into any issues or have any ideas for improvement for this approach.

UPDATED: Source code now on GitHub (https://github.com/CaptainCodeman/mongo-queue)

Running ElasticSearch as a Service on Windows 2008 x64

I think I first started using Apache Lucene for full-text indexing as part of NHibernate Search. At some point I decided I needed more control and did my own indexing using Lucene directly. Now, it seems the easiest approach is to make use of a packaged up search service and so I’ve been looking at ElasticSearch. So far, I’m very happy with it – it’s doing everything it say’s on the box and lets me offload all the full-text indexing and search functionality.

The only issue I’ve come across is trying to run it as a service on 64-bit Windows 7 or Windows 2008. While there is a service-wrapper available it just wasn’t working for me and I think the x64 platform may be part of that as there was only a elasticsearch-windows-x86-32.exe included, no elasticsearch-windows-x86-64.exe. This service wrapper seems to be based off a product that doesn’t appear to have a free community edition for 64-bit Windows.

So, I had a hunt around for ‘how to run a Java app as a Windows Service’ and came across the Apache Commons Daemon or ‘procrun‘. This worked so I thought I’d share it here in case anyone else is trying to do the same thing.

First of all, there are the pre-requisites: it’s a Java app so you need to have the Sun Java SDK installed and JAVA_HOME environment variable set.

Download ElasticSearch and extract it to a folder. I’m using 0.16.0 and put it into D:elasticsearch (because Program Files and UAC caused too many issues for me).

Before trying to set it to run as a service it’s best to make sure it runs as a regular app first. To start ElasticSearch on Windows there is a “binelasticsearch.bat” file to launch it which should show it running. As an extra check, there is a handy little web-based admin tool you can get called elasticsearch-head which will show the running status and provides a neat little browser / search interface. I extract this to D:elasticsearchtools. When you open the index.html file it lets you connect to your elasticsearch instance and show it’s status:

created

Downloading the Apache Commons Daemon or procrun is a little harder because it isn’t in the links on the download page. Instead you need to follow the ‘browse native binaries download area…’ link, then look in the windows folder for the zip file. The file I used was: commons-daemon-1.0.5-bin-windows.zip

Extract this to D:elasticsearchservice and then copy the amd64prunsrv.exe to the D:elasticsearchservice folder to replace the x86 version (or skip this step if you are actually running on a 32-bit OS).

Although we can set everything up with the exe files as they are, we’re going to rename them because it makes it clearer what is running on Windows Task Manager if you have other processes using this service runner. The convention is to use the service name and append a ‘w’ to the GUI manager exe so they become:

prunsvr.exe => ElasticSearch.exe
prunmgr.exe => ElasticSearchw.exe

Because we’ll be running things as a service it will be running under a different account than the regular process does when we run it interactively. I used the ‘NETWORK SERVICE’ account which is able to handle network traffic and gave this account full permissions to the D:elasticsearch folder so it will also be able to create data and log files.

Figuring out the command line to actually run the service is what took the longest. With a bit of trial and error and looking at the output from the batch file to launch elasticsearch I ended up with this which ‘works on my machine’. If it doesn’t work on yours try enabling the echo output from the batch file and checking the parameters are the same.

It’s easiest to put this into a create.cmd file to make editing and running it easier:

ElasticSearch.exe //IS//ElasticSearch --DisplayName="ElasticSearch" --Description="Distributed RESTful Full-Text Search Engine based on Lucene (http://www.elasticsearch.org/) --Install=D:elasticsearchserviceElasticSearch.exe --Classpath="D:elasticsearchlibelasticsearch-0.16.0.jar;D:elasticsearchlib*;D:elasticsearchlibsigar*" --Jvm="C:Program FilesJavajre6binserverjvm.dll" --JvmMx=512 --JvmOptions="-Xms256m;-Xmx1g;-XX:+UseCompressedOops;-Xss128k;-XX:+UseParNewGC;-XX:+UseConcMarkSweepGC;-XX:+CMSParallelRemarkEnabled;-XX:SurvivorRatio=8;-XX:MaxTenuringThreshold=1;-XX:CMSInitiatingOccupancyFraction=75;-XX:+UseCMSInitiatingOccupancyOnly;-XX:+HeapDumpOnOutOfMemoryError;-Djline.enabled=false;-Delasticsearch;-Des-foreground=yes;-Des.path.home=D:elasticsearch" --StartMode=jvm --StartClass=org.elasticsearch.bootstrap.Bootstrap --StartMethod=main --StartParams="" --StopMode=jvm --StopClass=org.elasticsearch.bootstrap.Bootstrap --StopMethod=main --StdOutput=auto --StdError=auto --LogLevel=Debug --LogPath="D:elasticsearchlogs" --LogPrefix=service --ServiceUser="NT AUTHORITYNetworkService" --Startup=auto

Phew !

Running that should create the service and running the ElasticSearchw.exe should how pop-up a GUI that lets us view and edit all the settings. The various tabs are shown below and should correspond to the settings defined above:

1-general 2-logon

3-logging4-java

5-startup 6-shutdown

You can also have the GUI run as a task-tray which gives you a handy way to start and stop the service while you’re developing. To do this, create a monitor.cmd file with the following command:

start ElasticSearchw.exe //MS

You should be able to right-click on the new tray icon and start the service:

starting

This isn’t mandatory though – the service should appear in the normal Windows Service Manager where it can be stopped and started as usual:

windows-services

Whether everything starts or not, you should get some useful information written to the log files. Here’s how mine looked after the service was started successfully.

service.2011-05-19.log:

[2011-05-19 10:21:30] [debug] ( prunsrv.c:1494) Commons Daemon procrun log initialized
[2011-05-19 10:21:30] [info]  (          :0   ) Commons Daemon procrun (1.0.5.0 64-bit) started
[2011-05-19 10:21:30] [info]  (          :0   ) Running 'ElasticSearch' Service...
[2011-05-19 10:21:30] [debug] ( prunsrv.c:1246) Inside ServiceMain...
[2011-05-19 10:21:30] [info]  (          :0   ) Starting service...
[2011-05-19 10:21:30] [debug] ( javajni.c:206 ) loading jvm 'C:Program FilesJavajre6binserverjvm.dll'
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[0] -Xms256m
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[1] -Xmx1g
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[2] -XX:+UseCompressedOops
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[3] -Xss128k
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[4] -XX:+UseParNewGC
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[5] -XX:+UseConcMarkSweepGC
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[6] -XX:+CMSParallelRemarkEnabled
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[7] -XX:SurvivorRatio=8
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[8] -XX:MaxTenuringThreshold=1
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[9] -XX:CMSInitiatingOccupancyFraction=75
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[10] -XX:+UseCMSInitiatingOccupancyOnly
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[11] -XX:+HeapDumpOnOutOfMemoryError
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[12] -Djline.enabled=false
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[13] -Delasticsearch
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[14] -Des-foreground=yes
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[15] -Des.path.home=D:elasticsearch
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[16] -Djava.class.path=C:Program Files (x86)Javajre6libextQTJava.zip;D:elasticsearchlibelasticsearch-0.16.1.jar;D:elasticsearchlibelasticsearch-0.16.1.jar;D:elasticsearchlibjline-0.9.94.jar;D:elasticsearchlibjna-3.2.7.jar;D:elasticsearchliblog4j-1.2.15.jar;D:elasticsearchliblucene-analyzers-3.1.0.jar;D:elasticsearchliblucene-core-3.1.0.jar;D:elasticsearchliblucene-highlighter-3.1.0.jar;D:elasticsearchliblucene-memory-3.1.0.jar;D:elasticsearchliblucene-queries-3.1.0.jar;D:elasticsearchlibsigarsigar-1.6.4.jar
[2011-05-19 10:21:30] [debug] ( javajni.c:660 ) Jvm Option[17] -Xmx512m
[2011-05-19 10:21:31] [debug] ( javajni.c:891 ) Java Worker thread started org/elasticsearch/bootstrap/Bootstrap:main
[2011-05-19 10:21:32] [debug] ( prunsrv.c:1058) Java started org/elasticsearch/bootstrap/Bootstrap
[2011-05-19 10:21:32] [info]  (          :0   ) Service started in 2066 ms.
[2011-05-19 10:21:32] [debug] ( prunsrv.c:1369) Waiting for worker to finish...
[2011-05-19 10:21:39] [debug] ( javajni.c:907 ) Java Worker thread finished org/elasticsearch/bootstrap/Bootstrap:main with status=0
[2011-05-19 10:21:39] [debug] ( prunsrv.c:1374) Worker finished.
[2011-05-19 10:21:39] [debug] ( prunsrv.c:1397) Waiting for all threads to exit

elasticsearch.log:

[2011-05-19 10:21:32,709][INFO ][node                     ] [Hack] {elasticsearch/0.16.1}[2344]: initializing ...
[2011-05-19 10:21:32,711][INFO ][plugins                  ] [Hack] loaded []
[2011-05-19 10:21:36,149][INFO ][node                     ] [Hack] {elasticsearch/0.16.1}[2344]: initialized
[2011-05-19 10:21:36,150][INFO ][node                     ] [Hack] {elasticsearch/0.16.1}[2344]: starting ...
[2011-05-19 10:21:36,268][INFO ][transport                ] [Hack] bound_address {inet[/0.0.0.0:9300]}, publish_address {inet[/10.0.1.8:9300]}
[2011-05-19 10:21:39,311][INFO ][cluster.service          ] [Hack] new_master [Hack][Gkn9PLFTR0KdX2X__ybpIQ][inet[/10.0.1.8:9300]], reason: zen-disco-join (elected_as_master)
[2011-05-19 10:21:39,337][INFO ][discovery                ] [Hack] elasticsearch/Gkn9PLFTR0KdX2X__ybpIQ
[2011-05-19 10:21:39,351][INFO ][gateway                  ] [Hack] recovered [0] indices into cluster_state
[2011-05-19 10:21:39,366][INFO ][http                     ] [Hack] bound_address {inet[/0.0.0.0:9200]}, publish_address {inet[/10.0.1.8:9200]}
[2011-05-19 10:21:39,366][INFO ][node                     ] [Hack] {elasticsearch/0.16.1}[2344]: started

Hopefully, this helps you get ElasticSearch up and running as a service on Windows x64. It’s a great app and really worth looking at. I’m hoping to make good use of it on a couple of projects, particularly the faceted search feature.

Limit MongoDB memory use on Windows without Virtualization

I’ve seen the question of how to control MongoDB’s memory usage on Windows come up several times and the stock answer always seemed to be “you can’t – it uses memory-mapped files and if you want to limit resources you need to use some form of virtualization to do it (HyperV, VMWare, Virtuozzo etc…)”.

win2008r2

If you are using MongoDB on a dedicated server then you generally want it to use all the memory it can but if you want to use it on a server shared with other processes (e.g. an IIS website using MongoDB for storage, maybe with SQL Server as well) then you will want to put a cap on how much it uses to ensure memory is kept available for the other processes.

So is it possible if you are not on a virtualized environment? Yes (otherwise this would be a very short blog post!) and we’ll explore how …

The standard behaviour described above is actually a result of the default resource manager used by Windows but both Windows 2003 and Windows 2008 have a separate installable option called the “Windows System Resource Manager” (WSRM) that allows greater control over the CPU and Memory available to a process.

First of all, lets look at what we’re trying to solve. Here we have a low-memory server (only 2Gb) running MongoDB on Windows 2008 R2 x64. There are a few databases of a few Gb each so the mongod.exe process quickly starts consuming as much memory as it can (rightly so) to keep as much of it’s indexes in memory for the fast performance we know and love:

2-performance

1-processes

What we’d like to do is save some memory for other processes by limiting the mongod.exe process to 1Gb in this case (I know this is ridiculously low but the only thing that will change for you are the actual limits you want to use).

To do this we first need to install Windows System Resource Manager which on Windows 2008 is available under the Features section of the Server Manager.

windows-features

Once that’s installed fire it up and you’ll see the default resource management policies. By default the standard Windows “memory is given to whoever shouts loudest” policy is used but other pre-configured alternatives are available. WSRM also provides a calendar / event system where the policy can be changed at certain times (a typical scenario is giving critical business apps priority during the day but then batch processes greater priority overnight). We’re not going to go into the calendar features here but it’s interesting to know about.

Let’s create a new policy to control the resources that MongoDB can consume. To do this, right click on the “Resource Allocation Policies” container and chose “New Resource Allocation Policy …”. This will present us with the New Resource Allocation Policy dialog below:

6-new

First of all, we need to add a new resource allocation entry so click the ‘Add…’ button and we get to another “Add or Edit Resource Allocation” dialog:

7-add

We don’t have a Process matching criteria for MongoDB yet so choose <New…> to get … yes, you guessed – another dialog, this time “New Process Matching Criteria”. We’ll call it “mongod_process” and click the Add… button to get another death-by-dialog to define it.

There are a few ways to do this – if MongoDB is installed as a service then you can choose “Registered Service” in the drop-down, click “Select” and choose it from the list or you can select from a list of running processes or you can just enter the full path and filename to mongod.exe. Here is the entry after selecting an installed MongoDB Windows Service:

9-edit-rule

After clicking OK we get back to the Process Matching Criteria dialog showing our new rule:

10-mongod-process-criteria

After clicking OK we’re now back at the Resource Allocation dialog with the new “mongod_process” Process matching criteria selected and can now decide what resources we want to allocate to the process. Lets limit the CPU to 50% (not that MongoDB seems to consume much CPU):

11-allocation-general

The Memory tab allows us to limit the memory and here there are two options. The maximum committed memory limit is more to control apps that may have a memory leak and can be setup to stop or alert someone when the process goes above the configured limit. We don’t want this one … instead we’ll set a maximum working set limit which will control how much memory is allocated to MongoDB. In this case, we’ll set the limit to 1Gb but the actual value to use will depend on your circumstances:

12-allocation-memory

After clicking OK we should then be at the Resource Allocation Policy dialog with our process matching criteria, CPU and memory limits shown. We could include more limits in the policy but we’ll leave it as it is for now – any remaining resources will be allocated to other processes as normal after the limits have been imposed.

13-limit-mongo-policy

The final piece is to make this policy active which is done by clicking on the “Selected Policy” link on the main ‘page’ or right-clicking on the new entry under the “Resource Allocation Policies” and choosing “Set as Managing Policy’”. You can also right-click on the “Windows System Resource Manager (local)” entry and choose “Properties …” to display the dialog below which allows you to select the Current resource allocation policy:

14-properties

So, we’ve created a new policy that has a criteria to match the mongod.exe process which will limit the CPU usage to 50% and memory to 1Gb … does it work? Here’s the result after it’s enabled showing the memory used immediately dropping:

15-performance

… and the MongoDB / mongod.exe process using the 1Gb limit we specified (1Gb = 1024Mb = 1,048,576Kb).

16-processes

So, we’ve successfully limited the CPU and memory that MongoDB can consume without havign to resort to any form of server-virtualization and while MongoDB will probably not run as fast as it did when it had free-reign to consume as much as it wanted (or rather, when the default windows resource manager gave it what it asked for) we will probably have a faster overall system as our other processes are allocated the memory and CPU that they need for a better balanced system.

Please let me know what you think of the above technique and if you find it useful.

Error with Azure local development storage and table named ‘event’

While working on an Azure event-sourcing provider for my CQRS framework I came across a really strange problem so I’m posting the details in-case anyone else comes across a similar issue so they can save wasting as much time on it as I did! Basically, the local development storage doesn’t seem to like you having a table called ‘event’ (I haven’t tested it on the live system).

Here is some test code to demonstrate the behavior:

class Program
{
    static void Main(string[] args)
    {
        const string tableName = "test";
        var storageAccount = CloudStorageAccount.DevelopmentStorageAccount;

        var tableClient = storageAccount.CreateCloudTableClient();
        tableClient.CreateTableIfNotExist(tableName);

        var dataContext = tableClient.GetDataServiceContext();

        var bobdylan = new Artist
            {
                PartitionKey = "folk",
                RowKey = "bob-dylan",
                Name = "Bob Dylan"
            };

        dataContext.AddObject(tableName, bobdylan);
        dataContext.SaveChanges();
    }
}

[DataServiceKey("PartitionKey", "RowKey")]
public class Artist
{
    public virtual String PartitionKey { get; set; }
    public virtual String RowKey { get; set; }
    public DateTime Timestamp { get; set; }
    public string Name { get; set; }
}

With the table called ‘test’ in the example shown everything works fine and the table and data appear in the storage explorer:

If we change the table name to “event” though we’ll get a DataServiceRequestException raised when we attempt to save changes:

Strangely, the table does get created OK but just won’t let you save anything into it. I originally thought this may be an issue with a SQL reserved word (because the development storage is simulated using a SQL database) but I’m not sure this is the actual cause.

I guess I can’t have an ‘event’ table like I wanted and will have to settle for ‘events’ instead!

Running MongoDb on Microsoft Windows Azure with CloudDrive

I’ve been playing around with the whole CQRS approach and think MongoDb works really well for the query side of things. I also figured it was time I tried Azure so I had a look round the web to see if there we’re instructions on how to run MongoDb on Microsoft’s Azure cloud. It turned out there were only a few mentions of it or a general approach that should work but no detailed instructions on how to do it. So, I figured I’d give it a go and for a total-Azure-newbie it didn’t turn out to be too difficult.

Obviously you’ll need an Azure account which you may get with MSDN or you can sign-up for their ‘free’ account which has a limited number of hours included before you have to start paying. One thing to be REALLY careful of though – just deploying an app to Azure starts the clock running and leaving it deployed but turned off counts as hours so be sure to delete any experimental deployments you make after trying things out!!

First of all though it’s important to understand where MongoDb would fit with Azure. Each web or worker role runs as a virtual machine which has an amount of local storage included depending on the size of the VM, currently the four pre-defined VMs are:

  • Small: 1 core processor, 1.7GB RAM, 250GB hard disk
  • Medium: 2 core processors, 3.5GB RAM, 500GB hard disk
  • Large: 4 core processors, 7GB RAM, 1000GB hard disk
  • Extra Large: 8 core processors, 15GB RAM, 2000GB hard disk

This local storage is only temporary though and while it can be used for processing by the role instance running it isn’t available to any others and when the instance is moved, upgraded or recycled then it is lost forever (as in, gone for good).

For permanent storage Azure offers SQL-type databases (which we’re not interested in), Table storage (which would be an alternative to MongoDb but harder to query and with more limitations) and Blob storage.

We’re interested in Blob storage or more specifically Page-Blobs which support random read-write access … just like a disk drive. In fact, almost exactly like a disk drive because Azure provides a new CloudDrive which uses a VHD drive image stored as a Page-Blob (so it’s permanent) and can be mounted as a disk-drive within an Azure role instance.

The VHD images can range from 16Mb to 1Tb and apparently you only pay for the storage that is actually used, not the zeroed-bytes (although I haven’t tested this personally).

So, let’s look at the code to create a CloudDrive, mount it in an Azure worker role and run MongoDb as a process that can use the mounted CloudDrive for it’s permanent storage so that everything is kept between machine restarts. We’ll also create an MVC role to test direct connectivity to MongoDb between the two VMs using internal endpoints so that we don’t incur charges for Queue storage or Service Bus messages.

The first step is to create a ‘Windows Azure Cloud Service’ project in Visual Studio 2010 and add both an MVC 2 and Worker role to it.

We will need a copy of the mongod.exe to include in the worker role so just drag and drop that to the project and set it to be Content copied when updated. Note that the Azure VMs are 64-bit instances so you need the 64-bit Windows version of MongoDb.

We’ll also need to add a reference to the .NET MongoDb client library to the web role. I’m using the mongodb-csharp one but you can use one of the others if you prefer.

Our worker role needs a connection to the Azure storage account which we’re going to call ‘MongDbData’

The other configured setting that we need to define is some local storage allocated as a cache for use with the CloudDrive, we’ll call this ‘MongoDbCache’. For this demo we’re going to create a 4Gb cache which will match the 4Gb drive we’ll create for MongoDb data. I haven’t played enough to evaluate performance yet but from what I understand this cache acts a little like the write-cache that you can turn on for your local hard drive.

The last piece before we can crack on with some coding is to define an endpoint which is how the Web Role / MVC App will communicate with the MongoDb server on the Worker Role. This basically tells Azure that we’d like an IP Address and a port to use and it makes sure that we can use it and no one else can. It should be possible to make the endpoint public to the world if you wanted but that isn’t the purpose of this demo. The endpoint is called ‘MongoDbEndpoint’ and set to Internal / TCP:

Now for the code and first we’ll change the WorkerRole.cs file in the WorkerRole1 project (as you can see, I put a lot of effort into customizing the project names!). We’re going to need to keep a reference to the CloudDrive that we’re mounting and also the MongoDb process that we’re going to start so that we can shut them down cleanly when the instance is stopping:

private CloudDrive _mongoDrive;
private Process _mongoProcess;

In the OnStart() method I’ve added some code copied from the Azure SDK Thumbnail sample – this prepares the CloudStorageAccount configuration so that we can use the method CloudStorageAccount.FromConfigurationSetting() to load the details from configuration (this just makes it easier to switch to using the Dev Fabric on our local machine without changing code). I’ve also added a call to StartMongo() and created an OnStop() method which simply closes the MongoDb process and unmounts the CloudDrive when the instance is stopping:

public override bool OnStart()
{
    // Set the maximum number of concurrent connections
    ServicePointManager.DefaultConnectionLimit = 12;

    DiagnosticMonitor.Start("DiagnosticsConnectionString");

    #region Setup CloudStorageAccount Configuration Setting Publisher

    // This code sets up a handler to update CloudStorageAccount instances when their corresponding
    // configuration settings change in the service configuration file.
    CloudStorageAccount.SetConfigurationSettingPublisher((configName, configSetter) =>
    {
        // Provide the configSetter with the initial value
        configSetter(RoleEnvironment.GetConfigurationSettingValue(configName));

        RoleEnvironment.Changed += (sender, arg) =>
        {
            if (arg.Changes.OfType()
                .Any((change) => (change.ConfigurationSettingName == configName)))
            {
                // The corresponding configuration setting has changed, propagate the value
                if (!configSetter(RoleEnvironment.GetConfigurationSettingValue(configName)))
                {
                    // In this case, the change to the storage account credentials in the
                    // service configuration is significant enough that the role needs to be
                    // recycled in order to use the latest settings. (for example, the
                    // endpoint has changed)
                    RoleEnvironment.RequestRecycle();
                }
            }
        };
    });
    #endregion

    // For information on handling configuration changes
    // see the MSDN topic at http://go.microsoft.com/fwlink/?LinkId=166357.
    RoleEnvironment.Changing += RoleEnvironmentChanging;

    StartMongo();

    return base.OnStart();
}

public override void OnStop()
{
    _mongoProcess.Close();
    _mongoDrive.Unmount();

    base.OnStop();
}

Next is the code to create the CloudDrive and start the MongoDb process running:

private void StartMongo()
{
    // local cache drive we'll use on the CM
    LocalResource localCache = RoleEnvironment.GetLocalResource("MongoDbCache");

    Trace.TraceInformation("MongoDbCache {0} {1}", localCache.RootPath, localCache.MaximumSizeInMegabytes);
    // we'll use all the cache space we can (note: InitializeCache doesn't work with trailing slash)
    CloudDrive.InitializeCache(localCache.RootPath.TrimEnd('\'), localCache.MaximumSizeInMegabytes);

    // connect to the storage account
    CloudStorageAccount storageAccount = CloudStorageAccount.FromConfigurationSetting("MongoDbData");

    // client for talking to our blob files
    CloudBlobClient blobClient = storageAccount.CreateCloudBlobClient();

    // the container that our dive is going to live in
    CloudBlobContainer drives = blobClient.GetContainerReference("drives");

    // create blob container (it has to exist before creating the cloud drive)
    try {drives.CreateIfNotExist();} catch {}

    // get the url to the vhd page blob we'll be using
    var vhdUrl = blobClient.GetContainerReference("drives").GetPageBlobReference("MongoDb.vhd").Uri.ToString();
    Trace.TraceInformation("MongoDb.vhd {0}", vhdUrl);

    // create the cloud drive
    _mongoDrive = storageAccount.CreateCloudDrive(vhdUrl);
    try
    {
        _mongoDrive.Create(localCache.MaximumSizeInMegabytes);
    }
    catch (CloudDriveException ex)
    {
        // exception is thrown if all is well but the drive already exists
    }

    // mount the drive and get the root path of the drive it's mounted as
    var dataPath = _mongoDrive.Mount(localCache.MaximumSizeInMegabytes, DriveMountOptions.Force) + @"";
    Trace.TraceInformation("Mounted as {0}", dataPath);

    // get the internal enpoint that we're going to use for MongoDb
    var ep = RoleEnvironment.CurrentRoleInstance.InstanceEndpoints["MongoDbEndpoint"];

    // create the process to host mongo
    _mongoProcess = new Process();
    var startInfo = _mongoProcess.StartInfo;
    // so we can redirect streams
    startInfo.UseShellExecute = false;
    // we don't need a window, it's hard to see the monitor from here (jk)
    startInfo.CreateNoWindow = false;
    // the mongo daemon is included in our project in the current directory
    startInfo.FileName = @"mongod.exe";
    startInfo.WorkingDirectory = Environment.CurrentDirectory;
    // specify the ip address and port for MongoDb to use and also the path to the data
    startInfo.Arguments = string.Format(@"--bind_ip {0} --port {1} --dbpath {2} --quiet", ep.IPEndpoint.Address, ep.IPEndpoint.Port, dataPath);
    // capture mongo output to Azure log files
    startInfo.RedirectStandardError = true;
    startInfo.RedirectStandardOutput = true;
    _mongoProcess.ErrorDataReceived += (sender, evt) => WriteLine(evt.Data);
    _mongoProcess.OutputDataReceived += (sender, evt) => WriteLine(evt.Data);

    Trace.TraceInformation("Mongo Process {0}", startInfo.Arguments);

    // start mongo going
    _mongoProcess.Start();
    _mongoProcess.BeginErrorReadLine();
    _mongoProcess.BeginOutputReadLine();
}

[TODO: Add more explanation !!]

So, that’s the server-side, oops, I mean Worker Role setup which will now run MongoDb and persist the data permanently. We could get fancier and have multiple roles with slave / sharded instances of MongoDb but they will follow a similar pattern.

The client-side in the Web Role MVC app is very simple and the only extra work we need to do is to figure out the IP Address and Port that we need to connect to MongoDb using which are setup for us by Azure. The RoleEnvironment lets us get to this and I believe (but could be wrong so don’t quote me) that the App Fabric part of Azure handles the communication between roles to pass this information. Once we have it we can create our connection to MongoDb as normal and save NoSQL JSON documents to our hearts content …

var workerRoles = RoleEnvironment.Roles["WorkerRole1"];
var workerRoleInstance = workerRoles.Instances[0];
RoleInstanceEndpoint ep = workerRoleInstance.InstanceEndpoints["MongoDbEndpoint"];

string connectionString = string.Format("Server={0}:{1}", ep.IPEndpoint.Address, ep.IPEndpoint.Port);

var mongo = new Mongo(connectionString);
mongo.Connect();
var db = mongo.GetDatabase("notes");

I hope you find this useful. I’ll try and add some extra notes to explain the code and the thinking behind it in more detail and will post some follow ups to cover deploying the app to Azure and what I’ve learned of that process.

Benefits of NoSQL (MongoDb) for the Query-side of CQRS

As you may know I’ve been researching CQRS and the benefits of using this approach to developing systems. My focus at the moment is on the Query side of things and for this I’ve been comparing a SQL Server / NHibernate solution with a NoSQL alternative using MongoDb. For this, I’ve been using a simple forum app that I’ve been working on with a database of around 4m posts and 200k topics.

I’ll post more detailed results when I have time to show things in more detail (with some example code) but basically, the performance difference I’ve been seeing is huge.

The SQL Server / NHibernate solution was generating about 10 requests per second and had SQL Server at about 50% of the CPU. The MongoDB backed solution was running at around 50 requests per second and MongoDb was sat at around 3-4% of CPU time.

Right now, I’m very excited about the possibilities of improving app performance (especially now that Google are taking this into account when calculating page-rank) but also, the difference in the complexity of the code between the two systems is also refreshing with the MongoDb solution very, very simple and quick to develop.

Circles of Interest

I saw this on another blog and thought it was a neat way of showing the technologies that I am interested in or indifferent to.

Positive / Core

Things I care about or am interested in learning:

  • ASP.NET MVC
  • HTML5 / CSS3
  • jQuery / JavaScript
  • DDDD / ESB / EDA / SOA / CQRS
  • Dependency Injection
  • Architecture
  • Lucene
  • Document Databases
  • Cloud Computing
  • User Interface Design
  • NoSQL data stores
  • MongoDB
  • Event Sourcing

Neutral / Non-core

Things I care about but not as much or already know:

  • ASP.NET WebForms
  • SQL Server
  • ETL
  • LINQ
  • AJAX, TDD
  • Source Control
  • WCF

Negative

Things I don’t care about and have no particular interest or excitement in:

  • Team System
  • Sharepoint
  • WPF
  • Silverlight
  • Flash
  • Windows
  • Office
  • Virtualization