2023 twenty-four merry days of Perl Feed

Arango::Tango, the new mammal in town

Arango::Tango - 2023-12-06

Introduction

Arango::Tango is not properly new, as the first versions are from 2019. Nevertheless, the number of users is still quite small as other document-oriented databases, like ElasticSearch or MongoDB are still quite popular. But being somehow alergic to Java, I went in the search for an alternative and found ArangoDB, a graph database (also able to deal with plain document collections) written in C++.

ArangoDB is a graph-oriented database, making it particularly well-suited for storing network structures such as social networks, knowledge graphs, and geospatial information. To achieve this, ArangoDB utilizes two collections: one for nodes and another for edges. The nodes collection functions as a standard database of documents, similar to what you would find in MongoDB. Meanwhile, the edges collection has documents containing two special attributes: the origin and target nodes, identified by their document identifiers. Unlike MongoDB and ElasticSearch, ArangoDB employs a specific Domain-Specific Language for queries. However, unlike the former two, ArangoDB's query language, named AQL (ArangoDB Query Language), is not based on JSON. This distinction makes AQL easier to write and understand.

When I found out about ArangoDB, I did not find a good Perl module to interact with it, so I decided to create one. Arango::Tango was born. It is, surely, the Perl module I own with a cuter name, with both a pun on dancing a Tango with Arango, but also with the similarity with orangutangus.

Quick and Dirty Arango::Tango

The ArangoDB insterface is based on REST, and the REST API is properly documented with Swagger. It should be possible to use any generator that, from a Swagger specification, generates an API. But, being REST, the API is stateless, and that does not help being proficient when using it. Thus, I created my own monster, that both tries to be versatile enough to allow quick implementation of new methods using a structure similar to a Swagger file, but also with an object-oriented approach, where each type of object we can manage in the ArangoDB database has a counterpart Perl object.

To give a quick example on how to create a ArangoDB collection and insert a couple of documents:

use Arango::Tango;

my $server = Arango::Tango->new( host => '127.0.0.1',
                                 username => 'root',
                                 password => '123123123');

my $database = $server->create_database("database_name");
my $collection = $database->create_collection("collection_name");

my $document = { name => "John", surname => "Doe" };
$collection->create_document( $document );

my $documents = [
    { _key => "homer", name => "Homer", surname => "Simpson" },
    { _key => "marge", name => "Marge", surname => "Simpson" }
];
$collection->bulk_import( $documents );

All the methods result in an HTTP request. But note that the objects created by Arango::Tango track the current database and collection. Thus, you do not need to pass that information everytime a new request is performed. You just create documents in the collection, and the module knows where the collection is (which database) and constructs properly the request.

The query of documents can be performed directly, if you know the document identifier:

my $document = $collection->document( "lisa" );  # retrieve document with key "lisa"

Usually one does not want to just fetch a document, but perform a complex query. For that we will need to use the ArangoDB's query language: AQL. The method's name to query a collection is not the more intuitive, as I followed the convention from the REST API. I may change that in the future. It is named cursor because it returns... a cursor object, that allows you to go through a set of results.

my $query = <<EOQ;
FOR doc IN collection_name
      FILTER doc.surname == 'Simpson'
      RETURN doc
EOQ
my $cursor = $collection->cursor($query);
while ($cursor->has_more) {
    my @block = $cursor->next();
# do something with elements from @block
}

Note that, by default, ArangoDB returns more than one document when iterating a cursor, to make it more efficient. Thus, the next method returns a list of results and not just one item.

The Arango::Tango Guts

I am proud of the way this module has been designed. The code is not as polished as I would like it to be, but the idea behind it is quite interesting. I decided to give some insight into its implementation. Note this is just a quick glimpse of it. You are welcome to look into the code for more detail.

Each of the modules (database, collection, etc) start by running an initialization method. It is based on a structure that specifies the available REST API requests. This initialization generates methods with the required code to execute the REST request.

Two small examples for the Arango::Tango::Collection module:

load_indexes => {
    rest => [ put => '{{database}}_api/collection/{name}/loadIndexesIntoMemory' ],
    inject_properties => [ 'database', 'name' ],
},

rename => {
    rest => [ put => '{{database}}_api/collection/{collection}/rename' ],
    inject_properties => [ 'database', { prop => 'name', as => 'collection' } ],
    signature => [ 'name' ],
    schema => { name => { type => 'string' }},
},

The top-level key is the name of the method to create. The value of the dictionary defines details on the method implementation:

  • For the load_indexes method, there is the REST template (that specifies the HTTP verb and the route template) and the inject_properties array. This array is a list of the parameters, in the REST template, that should be filled in from data in the collections object fields.

  • The rename method has the same two properties as the previous one. But the inject_properties is a little different. Basically, it has two fields from the object being injected in the template (the database and the collection name) but, for the collection, we specify that it is stored in the name field of the object, but will be referred to in the template as collection. This change is done because there is a parameter named name, and we need to disambiguate.

    The signature field is the list of parameters that the method will receive and the schema property is the structure of the data that will be sent in the PUT request. Note that there is a name field in that data, that will come from the method's signature.

Unfortunately not all requests are as easy to generalize and therefore there is some extra code to deal with hairy calls. Also, there is a reason why the template has two curly braces for the database parameter. But that is not relevant for being discussed here.

If you found this interesting, I would appreciate your help on supporting more of the ArangoDB API!

Gravatar Image This article contributed by: Alberto Simões <ambs@cpan.org>