Working with Cypher and the Movies Graph
This topic provides information about loading the Movies demo data and running example Cypher queries such as those described in the Neo4j sandbox environment (where the Movies database originated). This topic demonstrates how you can run AnzoGraph queries using Cypher language syntax, if you prefer using Cypher language commands and queries, instead of SPARQL.
- Using the Cypher Command Line Interface (AZGBOLT)
- Loading Data with the Cypher CREATE Command
- Getting to Know the Movies Data
- Running Cypher Language Queries
Note: Cypher language support in the current AnzoGraph release is limited, but primarily demonstrates the ability to run Cypher queries available on other platforms while taking advantage of AnzoGraph's core strengths of high performance query execution and scalability. In the current release, AnzoGraph supports the main MATCH [IN], CREATE [IN], WHERE, ORDER BY, and LIMIT query clauses, however, it does not currently support options such as OPTIONAL MATCH, START, SKIP, DELETE, SET, REMOVE, FOREACH, MERGE, CALL, OR UNION. In addition, AnzoGraph does not support array or spatial data types, CREATE UNIQUE or LOAD CSV statements. For functions, you may include in Cypher queries any of the standard AnzoGraph built-in functions.
AnzoGraph uses the Bolt protocol to provide a client application interface and CLI, azgbolt, to run Cypher commands and queries against AnzoGraph data. To view the syntax and command line arguments allowed with the Cypher CLI, simply type
azgbolt on a new line, from your client computer environment, and press Enter.
The Cypher CLI returns command line syntax and available arguments, along with a sampling of commonly-used commands and queries:
azgbolt (Bolt CLI) [-c "command"] [-f file] [-h hosturl] [-p port] [-nohead ] [-o outputfile][--help display this message] [ Cypher examples follow ... ]
For example, the following command syntax runs a Cypher query:
azgbolt -c "any query"
Note: When running Cypher commands or queries from the azgbolt CLI, you can use the standard Linux shell method of escaping any embedded single or double quote characters . For example, with a character string such as "John Smith", contained within a Cypher query, you would escape each quotation mark character with the backslash (\) character, for example:
The following example shows the syntax used to run one or more Cypher queries contained within a file:
azgbolt -f /home/user/match.cql
In addition to the
azbolt CLI, users can also connect other applications that support the Bolt protocol to run Cypher commands and queries against AnzoGraph data by specifying the Cypher Bolt protocol port (default 7088) following the AnzoGraph host server's IP address. That is:
Similarly, those same applications can run SPARQL commands and queries by specifying the SPARQL port (default 7098) following the AnzoGraph host server's IP address. That is:
Cypher CREATE statements to replicate the original Neo4j Movie dataset in AnzoGraph are available in a file you can download from the following location:
After saving the
movies.cql file to an accessible location on your AnzoGraph server, you can run the following command to create the Movies dataset in AnzoGraph. The
movies.cql file contains a single Cypher statement that includes multiple
CREATE IN <dataset> commands.
azgbolt -f /<filepath>/movies.cql
IN <dataset> clause is an AnzoGraph extension that was added to the standard Cypher language syntax to allow ingestion of data into a named AnzoGraph dataset.
The Movies dataset captures information about movies and the actors and directors involved with each of these films. To familiarize you with the Movies dataset, the following diagram shows the model or ontology for the Movies dataset.
The Movies database has two primary nodes Person and Movie with a number of different relationship types such as ACTED_IN, WROTE, DIRECTED, and REVIEWED. You can write Cypher queries to traverse the relationships between Node and Movie instances to retrieve node property values such as an actor's name or a specific movie's title, its director(s), and other information.
This section provides a brief introduction to the Cypher language. It also demonstrates execution of some basic sample Cypher queries run against Movie data stored in AnzoGraph. Like SPARQL, the Cypher language is especially designed for working with graph data and shares some similarities with SQL, with many SQL-like clauses and operations. The primary method of querying data with Cypher uses the MATCH command keyword.
This first query simply returns all nodes with a specified label (people). In this case, it returns the name of all people in the Movies dataset.
MATCH (people:Person)RETURN people.name ;
Note: Cypher keywords are case-insensitive, however, relationship types and property value are case-sensitive.
A second simple query returns all movie titles in the Movies dataset.
MATCH (films:Movie)RETURN films.title ;
Of course, Cypher supports more complex query operations that take full advantage of the relationships between entities or nodes that graph databases are able to capture. These capabilities involve fetching and filtering data, traversing graphs by joining data, using aggregate functions, and writing subqueries.
Note: In the current release, AnzoGraph supports the main MATCH [IN], CREATE [IN], WHERE, ORDER BY, and LIMIT query clauses, however, it does not currently support options such as OPTIONAL MATCH, START, SKIP, DELETE, SET, REMOVE, FOREACH, MERGE, CALL, OR UNION. In addition, AnzoGraph does not support array or spatial data types, CREATE UNIQUE or LOAD CSV statements. For functions, you may include in Cypher queries any of the standard AnzoGraph built-in functions.
Like SPARQL, MATCH statements provides options to specify patterns that Cypher will search for in the database . You can use labels and specify pattern restrictions based on specific relationship types and direction. In addition, you can use a WHERE clause to further filter results that a query returns. For example, using Tom Hanks as an example, you could run the following query to return a list of movies in which Tom acted in.
MATCH (actor:Person)-[:ACTED_IN]-(film:Movie) WHERE actor.name='Tom Hanks' RETURN actor.name, film.title ;
The following diagram shows a graphic representation of nodes and relationship types in the Movies dataset using Tom Hanks, both an actor and director, as an example:
To further traverse the relationship between nodes in the Movies dataset, you could run the following query.
MATCH (actor:Person)-[:ACTED_IN]-(film:Movie), (director:Person)-[:DIRECTED]-(film:Movie) WHERE actor.name='Tom Hanks' RETURN actor.name, film.title, director.name ;
In this example, the MATCH pattern identifies and returns all the directors of Movies in which Tom Hanks acted.
Note: For more information on the Cypher language, see the opencypher.org project web site and the Cypher Query Language Reference available at https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf.