Arrow of time
Arrow of time

The Needle Search Server - pre-alpha

Share Tweet Share

I've talked about my new project, the Needle Search Server before - it is supposed to be a light-weight full text …

I've talked about my new project, the Needle Search Server before - it is supposed to be a light-weight full text search server written in C++ and using LevelDB for storage. I've arrived at a point where the code actually does something useful and I want to talk about it some more.

Of course, you will need to fetch and compile the code yourself and once you get over that hurdle, you need to configure it as a FastCGI application. I've written some instructions which work with Apache 2.2 and Lighttpd, but it shouldn't be so hard to use it in your favourite web server.

While here, I'd like to rant a little about how the web servers' FastCGI implementations needlessly differ: some will pass the base app path as a part of the SCRIPT_NAME, some won't, some treat HTTP methods such as PUT and DELETE as something normal, while other require special incantations. Sigh. I guess I'll have to add special handling here and there as needed.

One important thing to do before you start the app (which won't go into backround, "server-style" as it's still very early code) is to copy the provided example config file needle.conf.json into /etc (if on Linux) or /usr/local/etc (just about everywhere else). In there, you must change the database directory path to somewhere where the application can write to and create its subdirectories. You can leave the other settings as-is for now.

Once you get it running ("mounting" the app in the web server on an URL such as http://localhost/needle) you can immediately start testing it using CURL.

Needle supports an arbitrary number of databases. The databases are collections of indexed documents. A database is created by issuing a GET command such as:

curl http://localhost/needle/+create/test

Note the "+" in front of "create": all administrative commands start with a "+". In this case, the server should respond with a message similar to:

{ "ok": true, "message": "Database created: test" }

Yay! You have created your first Needle database and called it "test". Normally, you would upload JSON data structures containing document data to be indexed, but for now I've only implemented a shortcut: you can upload a plain text file with the "text/plain" content type:

curl -T mydocument.txt -H "Content-type: text/plain" http://localhost/needle/test/doc001

Note that we're issuing a PUT request, and that the URL contains the database name ("test") and the document id ("doc001"). The document id is an (almost) arbitrary string which identifies the document in any way meaningful to the client apps. The server will respond with an "ok" message similar to the last one to indicate success. You can add an arbitrary number of documents to the database (the maximum number of documents per database is 4G).

Now, to query the database, issue a command such as:

curl "http://localhost/needle/test?q=word"

For now, only the simplest query method and result format are implemented: the query will search all documents with any of the words mentioned in the "q" argument, and the result will be a list of document id's:

{
        "ok": true,
        "type": "simple",
        "list":
        {
                "doc001",
                "bar",
                "myblog.txt"
        }
}

The "list" part of the result is the part you should be interested in. To verify everything is linked up correctly, you can use the +debug administrative command:

curl http://localhost/needle/+debug/test

This will generate a list of documents and words indexed in the database. Note that Needle supports word stemming and that, for test purposes, the default stemmer (which can be changed in the config file) is for the Croatian language, which probably isn't what you need.

As work on Needle progresses, more features will appear, such as proper query syntax, uploading JSON documents, more result formats and more word stemmers.

Update: I've just found out about Google FlatBuffers and it looks like it will solve some of my problems so I'll probably spend some time converting to it.


comments powered by Disqus