Working with Cypher and the Movie Data

This topic provides information about loading the Movies demo data and running example Cypher queries such as those described in the Neo4j sandbox environment (where the Movies database originated). The Movies data set is based on the graph database provided in the Neo4j sandbox environment. For users already familiar with Cypher, using this data set previews AnzoGraph DB support of the Cypher language. You can run many of the same Cypher commands and queries as you would in other environments that support Cypher.

This topic demonstrates how you can run AnzoGraph DB queries using Cypher language syntax if you prefer using Cypher instead of SPARQL.

Cypher language support in AnzoGraph DB follows the open Cypher language specification as described in this Adobe Acrobat PDF document: https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf

Refer to the Cypher Query Language Reference for a complete description of AnzoGraph DB Cypher language compatibility with the open Cypher language specification. Cypher® is a registered trademark of Neo4j, Inc.

Using the Cypher CLI (AZGBOLT)

AnzoGraph DB uses the Bolt protocol to provide a client application interface and CLI, azgbolt, to run Cypher commands and queries. To view the syntax and command line arguments allowed with the Cypher CLI, simply type azgbolt on a new line and press Enter.

$ ./<install_path>/bin/azgbolt

The azgbolt CLI returns command line syntax and available arguments, along with a sampling of commonly-used commands and queries:

azgbolt (Bolt CLI) [-c "command"] [-f file] [-h hosturl] [-p port] [-nohead ] 
                   [-o outputfile][--help display this message]

For example, the following syntax runs a Cypher query:

azgbolt -c "any query"

When running Cypher commands or queries from the azgbolt CLI, you can use the standard Linux shell method of escaping any embedded single or double quote characters . For example, with a character string such as "John Smith", contained within a Cypher query, you would escape each quotation mark character with the backslash (\) character, for example:

\"John Smith\"

The following example shows the syntax used to run a Cypher query in a file:

azgbolt -f /home/user/match.cql

Using Bolt Protocol

In addition to the azbolt CLI, users can also connect other applications that support the Bolt protocol to run Cypher queries against AnzoGraph DB data by specifying the Cypher Bolt protocol port (default 7088) following the AnzoGraph DB host server's IP address. That is:

<host_IP>:<Cypher_port>

Similarly, those same applications can run SPARQL commands and queries by specifying the SPARQL port (default 7098) following the AnzoGraph DB host server's IP address. That is:

<host_IP>:<SPARQL_port>

Loading Data with Cypher CREATE

Cypher CREATE statements to replicate the original Neo4j Movie dataset in AnzoGraph DB are available in a file you can download from the following location:

movies.cql

After saving the movies.cql file to an accessible location on your AnzoGraph DB server, you can run the following command to create the Movies dataset in AnzoGraph DB. The movies.cql file contains a single Cypher statement that includes multiple CREATE IN <dataset> commands.

azgbolt -f /<filepath>/movies.cql

The IN <dataset> clause is an AnzoGraph DB extension that was added to the standard Cypher language syntax to allow ingestion of data into a named dataset.

Getting to Know the Movies Data

The Movies dataset captures information about movies and the actors and directors involved with each of these films. To familiarize you with the Movies dataset, the following diagram shows the model or ontology for the Movies dataset.

Movies database nodes, relationships, and properties

The Movies database has two primary nodes Person and Movie with a number of different relationship types such as ACTED_IN, WROTE, DIRECTED, and REVIEWED. You can write Cypher queries to traverse the relationships between Node and Movie instances to retrieve node property values such as an actor's name or a specific movie's title, its director(s), and other information.

Running Cypher Queries

This section provides a brief introduction to the Cypher language. It also demonstrates execution of some basic sample Cypher queries run against Movie data stored in AnzoGraph DB. Like SPARQL, the Cypher language is especially designed for working with graph data and shares some similarities with SQL, with many SQL-like clauses and operations. The primary method of querying data with Cypher uses the MATCH command keyword.

This first query simply returns all nodes with a specified label (people). In this case, it returns the name of all people in the Movies dataset.

MATCH (people:Person)RETURN people.name ;

Cypher keywords are case-insensitive, however, relationship types and property value are case-sensitive.

A second simple query returns all movie titles in the Movies dataset.

MATCH (films:Movie)RETURN films.title ;

Of course, Cypher supports more complex query operations that take full advantage of the relationships between entities or nodes that graph databases are able to capture. These capabilities involve fetching and filtering data, traversing graphs by joining data, using aggregate functions, and writing subqueries.

In addition, you may include any of the standard AnzoGraph DB built-in functions in Cypher queries.

Like SPARQL, MATCH statements provides options to specify patterns that Cypher will search for in the database. You can use labels and specify pattern restrictions based on specific relationship types and direction and use a WHERE clause to further filter results that a query returns. For example, using Tom Hanks as an example, you could run the following query to return a list of movies in which Tom acted in.

MATCH (actor:Person)-[:ACTED_IN]-(film:Movie) WHERE actor.name='Tom Hanks' RETURN actor.name, film.title ;

The following diagram shows a graphic representation of nodes and relationship types in the Movies dataset using Tom Hanks, both an actor and director, as an example:

To further traverse the relationship between nodes in the Movies dataset, you could run the following query.

MATCH (actor:Person)-[:ACTED_IN]-(film:Movie), (director:Person)-[:DIRECTED]-(film:Movie) 
WHERE actor.name='Tom Hanks' RETURN actor.name, film.title, director.name ;

In this example, the MATCH pattern identifies and returns all the directors of Movies in which Tom Hanks acted.

For more information on the Cypher language, see the opencypher.org project web site and the Cypher Query Language Reference available at https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf. Refer to the Cypher Query Language Reference for a complete description of AnzoGraph DB Cypher language compatibility with the open Cypher language specification.