Captain Codeman Captain Codeman

NHibernate.Search using Lucene.NET Full Text Index (1)

Contents

Introduction

Ayende added the NHibernate.Search last year but I’ve never seen a great deal of documentation or examples around it so hopefully this post will help others to get started with it.

Basically, this addition to NHibernate brings two of the best open source libraries together - NHibernate as the Object Relational Mapper that persists your objects to a database and Lucene.NET which provides full-text indexing and query support.

So how do you use it?

The first problem you will run into is actually finding it. Unfortunately the release of NHibernate does not include it in the bin although it is there in the source. Download the latest version of the NHibernate source (1.2.1 GA as of writing) and compile it to produce the NHibernate.Search.dll assembly.

Before you do this though, you may want to also download the latest Lucene.NET release (2.0.004) and replace the Lucene.NET.dll assembly in the NHibernate libnet2.0 folder (I’m assuming you are using .NET 2.0). While the Lucene.NET library has the same version number and did work fine, the sizes are different and I ran into some problems when trying to use some of the extra Lucene.NET assemblies for hit-highlighting and similarity matching.

The first step is of course to add a reference to NHibernate.Search.dll to your Visual Studio.NET Project.

Next, you need to add some additional properties to the session-factory element of the NHibernate configuration section(normally stored in your web.config file):

<property name="hibernate.search.default.directory_provider">NHibernate.Search.Storage.FSDirectoryProvider, NHibernate.Search</property><property name="hibernate.search.default.indexBase">~/Index</property>

If you’ve used Lucene.NET much you will know that it has the concept of different directory providers for storing the indexed such as RAM or FS (File System). The entries above are used to indicate that we want the Lucene index to be stored on the file system and located in the /Index folder of the website (it could of course be outside the website mapped folder). It’s well worth reading a book such as Lucene in Action to get a good idea of how Lucene works and what it can do (it’s for the Java version but is still excellent for learning the .NET implementation).

The next step requires that you decorate your C# class with some attributes to control the indexing operation. Personally, I don’t like this as it means I need to start referencing NHibernate and Lucene assemblies from my otherwise nice, clean POCO (Plain Old CLR/C# Classes) project. It would have been much nicer IMO if this information could have been put in the NHibernate .hbm.xml mapping files but it’s a small price to pay and some people already use the attribute approach for NHibernate anyway.

Here is an example of a Book class for a library application with the additional attributes:

[Indexed(Index = "Book")] public class Book : IBook {
  private Guid _id;
  private string _title;
  private string _summary;
  private string _summaryHtml; 
  private string _authors; 
  private string _url;
  private string _smallImageUrl;
  private string _mediumImageUrl;
  private string _largeImageUrl;
  private string _isbn;
  private string _published;
  private string _publisher;
  private string _binding;

  [DocumentId]
  [FieldBridge(typeof(GuidBridge))]
  public Guid Id
  {
    get { return _id; }
    set { _id = value; }
  }

  [Field(Index.Tokenized, Store = Store.No)]
  [Analyzer(typeof(StandardAnalyzer))]
  [Boost(2)]
  public string Title
  {
    get { return _title; }
    set { _title = value; }
  }

  [Field(Index.Tokenized, Store = Store.No)]
  [Analyzer(typeof(StandardAnalyzer))]
  public string Summary
  {
    get { return _summary; }
    set { _summary = value; }
  }

  public string SummaryHtml
  {
    get {
      if (_summaryHtml == null || _summaryHtml.Length == 0) {
        return _summary;
      }
      return _summaryHtml;
    }
    set { _summaryHtml = value; }
  }

  [Field(Index.Tokenized, Store = Store.No)]
  [Analyzer(typeof(StandardAnalyzer))]
  public string Authors
  {
    get { return _authors; }
    set { _authors = value; }
  }

  public string Url
  {
    get { return _url; }
    set { _url = value; }
  }

  public string SmallImageUrl
  {
    get { return _smallImageUrl; }
    set { _smallImageUrl = value; }
  }

  public string MediumImageUrl
  {
    get { return _mediumImageUrl; }
    set { _mediumImageUrl = value; }
  }

  public string LargeImageUrl
  {
    get { return _largeImageUrl; }
    set { _largeImageUrl = value; }
  }

  [Field(Index.UnTokenized, Store = Store.Yes)]
  public string Isbn
  {
    get { return _isbn; }
    set { _isbn = value; }
  }

  [Field(Index.UnTokenized, Store = Store.No)]
  public string Published
  {
    get { return _published; }
    set { _published = value; }
  }

  [Field(Index.Tokenized, Store = Store.No)]
  [Analyzer(typeof(StandardAnalyzer))]
  public string Publisher
  {
    get { return _publisher; }
    set { _publisher = value; }
  }

  public string Binding
  {
    get { return _binding; }
    set { _binding = value; }
  }
}

Now we’re ready to start using it from NHibernate. To do this we need to create a FullTextSession and use this instead of the regular NHibernate Session (which it wraps / extends):

ISession session = sessionFactory.OpenSession(new SearchInterceptor());IFullTextSession fullTextSession = Search.CreateFullTextSession(session);

And that’s it. You can use the IFullTextSession in place of the regular ISession (even casting it for places where you are just doing normal NHibernate operations). All the magic happens inside NHibernate.Search - when you add, update or delete records the ‘documents’ in the Lucene index are automatically updated which provides you with an excellent Full Text index without a Windows Service in sight!

You can check that it’s working by looking in the Index folder - there should be a ‘Book’ folder containing the Lucene index files (with CFS extensions).

In the next post I’ll demonstrate using the index to do some queries including hit-highlighting for presenting the results but for now you may want to download and try Luke - a Java program to browser Lucene index catalogs (the file format is identical between the two implementations).