Skip to content

RedisGraph: A High Performance In-Memory Graph Database

Abstract

Graph-based data is everywhere now days. Facebook, Google, Twitter and Pinterest are just a few who've realize the power behind relationship data and are utilizing it to the fullest. As a direct result, we see a rise both in interest for and variety of graph data solutions.

With the introduction of Redis Modules we've recognized the great potential of introducing a graph data structure to the Redis arsenal, and developed RedisGraph. Bringing new graph capabilities to Redis through a native C implementation with an emphasis on performance, RedisGraph is now available as an open source project.

In this documentation, we'll discuss the internal design and features of RedisGraph and demonstrate its current capabilities.

RedisGraph At-a-Glance

RedisGraph is a graph database developed from scratch on top of Redis, using the new Redis Modules API to extend Redis with new commands and capabilities. Its main features include:

  • Simple, fast indexing and querying
  • RAM data stored, using memory-efficient custom data structures
  • On-disk persistence
  • Tabular result sets
  • The use of popular graph query language openCypher

A Little Taste: RedisGraph in Action

Let’s look at some of the key concepts of RedisGraph using this example over the redis-cli tool:

Constructing a graph

It is common to represent entities as nodes within a graph. In this example, we'll create a small graph with both actors and movies as its entities, and an "act" relation that will connect actors to the movies they acted in. We use the graph.QUERY command to issue a CREATE query, which will introduce new entities and relations to our graph.

graph.QUERY <graph_id> 'CREATE (:<label> {<attribute_name>:<attribute_value>,...})'
graph.QUERY <graph_id> 'CREATE (<source_node_alias>)-[<relation> {<attribute_name>:<attribute_value>,...}]->(<dest_node_alias>)'

Construct our graph in one go:

graph.QUERY IMDB 'CREATE (aldis:actor {name: "Aldis Hodge", birth_year: 1986}),
                         (oshea:actor {name: "OShea Jackson", birth_year: 1991}),
                         (corey:actor {name: "Corey Hawkins", birth_year: 1988}),
                         (neil:actor {name: "Neil Brown", birthyear: 1980}),
                         (compton:movie {title: "Straight Outta Compton", genre: "Biography", votes: 127258, rating: 7.9, year: 2015}),
                         (neveregoback:movie {title: "Never Go Back", genre: "Action", votes: 15821, rating: 6.4, year: 2016}),
                         (aldis)-[act]->(neveregoback),
                         (aldis)-[act]->(compton),
                         (oshea)-[act]->(compton),
                         (corey)-[act]->(compton),
                         (neil)-[act]->(compton)'

Querying the graph

RedisGraph exposes a subset of openCypher graph language. Although only some language capabilities are supported, there's enough functionality to extract valuable insights from your graphs. To execute a query, we use the GRAPH.QUERY command:

GRAPH.QUERY <graph_id> <query>

Let's execute a number of queries against our movies graph:

Find the sum, max, min and avg age of the Straight Outta Compton cast:

GRAPH.QUERY IMDB "MATCH (a:actor)-[act]->(m:movie {title:\"Straight Outta Compton\"})
RETURN m.title, SUM(a.age), MAX(a.age), MIN(a.age), AVG(a.age)"

RedisGraph will reply with:

1) "m.title, SUM(a.age), MAX(a.age), MIN(a.age), AVG(a.age)"
2) "Straight Outta Compton,123.000000,37.000000,26.000000,30.750000"

The first row is our result-set header which name each column according to the return clause.

Second row contains our query result.

The second row contains our query result.

Let's try another query. This time, we'll find in how many movies each actor played.

GRAPH.QUERY IMDB "MATCH (actor)-[act]->(movie) RETURN actor.name, COUNT(movie.title) AS movies_count ORDER BY
movies_count DESC"

1) "actor.name, movies_count"
2) "Aldis Hodge,2.000000"
3) "O'Shea Jackson,1.000000"
4) "Corey Hawkins,1.000000"
5) "Neil Brown,1.000000"

The Theory: Ideas behind RedisGraph

Representation

RedisGraph uses sparse adjacency matrices to represent graphs. As directed edge connecting source node S to destination node T is recorded within an adjacency matrix M, by setting M's S,T entry to 1 (M[S,T]=1). As a rule of thumb, matrix rows represent source nodes while matrix columns represent destination nodes.

Every graph stored within RedisGraph has at least one matrix, referred to as THE adjacency matrix (relation type agnostic). In addition, every typed relation has its own dedicated typed matrix. Consider a graph with two types of edges:

  1. visits
  2. friend

The underline graph data structure would maintain three matrices:

  1. THE adjacency matrix - marking every connection within the graph
  2. visit matrix - marking visit connections
  3. friend matrix - marking friend connections

A visit edge E, connecting node A to node B, would set THE adjacency matrix at position [A,B] to 1, in addition the visit matrix V would also set position V[A,B] to 1.

To accommodate typed nodes, additional matrices are allocated one per label, and a label matrix is symmetric with ones along the main diagonal. Assume node N was labeled as a Person, then the Person matrix P would set position P[N,N] to 1.

This design enables RedisGraph to modify its graph with ease:

  • Adding new nodes simply extend matrices, adding additional rows and columns.
  • Adding new edges is done by setting the relevant entries at the relevant matrices.
  • Removing edges clear relevant entries.
  • Deleting nodes is achieved by matrix row/column deletion.

One of the main reasons we've chosen to represent our graphs as sparse matrices is graph traversal.

Traversal

Graph traversal is accomplished by matrix multiplication. Suppose we wanted to find friends of friends for every node in the graph this traversal can be expressed by FOF = F^2. F stands for the friendship relation matrix, and the result matrix FOF summarizes the traversal. Its rows represent source nodes, while its columns represent friends who are two hops away: if FOF[i,j] = 1 then j is a friend of a friend of i.

Algebraic expression

When a search pattern such as (N0)-[A]->(N1)-[B]->(N2)<-[A]-(N3) is given as part of a query, we translate it into a set of matrix multiplications. For the given example, one possible expression would be: A * B * Transpose(A).

We should note that matrix multiplication is an associative and distributive operation. This gives us the freedom to choose which terms we would like to multiply first (preferring terms that will produce highly sparse intermediate matrices). It also enables concurrency when evaluating an expression, e.g. we could compute AZ and BY in parallel.

GraphBLAS

To perform all of these operations for sparse matrices, RedisGraph uses GraphBLAS - a standard API similar to BLAS. The current implementation uses the CSC sparse matrix format (compressed sparse columns), although the underlying format is subject to change.

Query language: openCypher

There are a number of graph query languages, so we didn't want to reinvent the wheel. We decided to implement a subset of one of the most popular graph query languages out there: openCypher. While the openCypher project provides a parser for the language, we decided to create our own parser. We used Lex as a tokenizer and Lemon to generate a C target parser.

As mentioned earlier, only a subset of the language is currently supported, but it is our intention to continue adding new capabilities and extend RedisGraph's openCypher capabilities.

Runtime: query execution

Let's review the steps RedisGraph takes when executing a query.

Consider the following query, which finds all actors who've played alongside Aldis Hodge and are over 30 years old:

MATCH (aldis::actor {name:"Aldis Hodge"})-[act]->(m:movie)<-[act]-(a:actor) WHERE a.age > 30 RETURN m.title, a.name

RedisGraph will

  • Parse the query, and build an abstract syntax tree (AST);
  • Compose traversal algebraic expressions;
  • Build filter trees;
  • Construct an optimized query execution plan composed of:
    • Filtered traverse
    • Conditional traverse
    • Filter
    • Project
  • Execute the plan; and
  • Populate a result set with matching entity attributes.

Filter tree

A query can filter out entities by creating predicates. In our example, we're filtering actors younger then 30.

It's possible to combine predicates using OR, AND keywords to form granular conditions.

During runtime, the WHERE clause is used to construct a filter tree, and each node within the tree is either a condition (e.g. A > B) or an operation (AND/OR). When finding candidate entities, they are passed through the tree and evaluated.

Benchmarks

Depending on the underlying hardware results may vary.

That said, inserting a new relationship is done in O(1). RedisGraph is able to create over 1 million nodes under half a second and form 500K relations within 0.3 of a second.

License

RedisGraph is published under Apache 2.0 with Commons Clause.

Conclusion

Although RedisGraph is still a young project, it can be an alternative to other graph databases. With its subset of operations, you can use it to analyze and explore graph data. Being a Redis Module, this project is accessible from every Redis client without adjustments. It's our intention to keep on improving and extending RedisGraph with the help of the open source community.