Cypher Language Overview
The implementation of Cypher in AnzoGraph closely follows the openCypher community group's version 9 specification of the Cypher language. (See the openCypher Resource page available at https://www.opencypher.org/resources/. An Acrobat PDF copy of the Cypher Query Language Reference is available at:
https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf.)
Some Cypher features, based on the version 9 openCypher specification of the Cypher language, are not yet supported in AnzoGraph:
- Uniqueness.
- List and Map data types, and the operators and functions acting on these types.
- Path variables.
- Interleaving of read and update clauses.
To use Cypher in AnzoGraph, queries and other statements can be sent over the Bolt client-server network protocol to AnzoGraph servers. AnzoGraph supports Bolt protocol, version 1.0. Port 7088 is the default port used for the AnzoGraph Bolt end-point. The azgbolt CLI provides a simple way to send Cypher queries to AnzoGraph over the Bolt protocol. For example:
azgbolt -f query.cql azgbolt -c "match (m:Movie) return m"
Graph data stored in AnzoGraph can be queried with SPARQL as well as Cypher language statements; both query modes can co-exist. The AnzoGraph Bolt protocol can also be used to execute SPARQL queries, however, in that case, support for various data types in results returned from queries is limited.
For a brief introduction to using Cypher in AnzoGraph, see Working with Cypher and the Movies Graph.
There are various ways available to load the data into AnzoGraph. In Cypher, CREATE statements can be used to load graph data into AnzoGraph, which is convenient for loading smaller data sets. For bulk loading of larger RDF/RDF* triple data sets, or loading data from CSV files, you can use the AnzoGraph LOAD command. (See Loading Data from Files for more information on the different data loading options that are available.)
Cypher node and relationship data is represented in RDF* triples format as illustrated in the following example.
Cypher CREATE statements:
CREATE (TheMatrix:Movie {title:'The Matrix', released:1999, tagline:'Welcome to the Real World'}) CREATE (Keanu:Person {name:'Keanu Reeves', born:1964}) CREATE (Keanu)-[:ACTED_IN {role:'Neo'}]->(TheMatrix)
RDF* triple format of data stored in AnzoGraph:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <TheMatrix> rdf:type <Movie> . <TheMatrix> <title> 'The Matrix' . <TheMatrix> <released> 1999 . <TheMatrix> <tagline> 'Welcome to the Real World' . <Keanu> rdf:type <Person> . <Keanu> <name> 'Keanu Reeves' . <Keanu> <born> 1964 . << <Keanu> <ACTED_IN> <TheMatrix> >> <role> 'Neo' .
To query data using Cypher, the auto_predicate configuration setting should be enabled before loading data into AnzoGraph. Saving information about Cypher nodes requires that the node labels be registered as predicates, so enabling the auto_predicate setting ensures that node labels are registered as predicates during any subsequent data load operations.
Cypher Syntax Conventions
The specification of Cypher language syntax provides some difference from that used with SPARQL. The following list describes some specific conventions and styling used in specifying Cypher command syntax and other elements:
- Node labels are case-sensitive, typically specified in CamelCase format, for example, (:NetworkAddress).
- Relationship types are styled in all upper-case, using the underscore character between words, for example: [:ACTED_IN].
- Property keys, variables, parameters, aliases, and functions are case-sensitive and typically styled in CamelCase where the first letter of one of these elements begins with a lower-case letter. Capitalization must match either what is in the database (properties), what is already defined in the query (variables, parameters, aliases), or Cypher definitions (functions).
- Clauses are not case-sensitive, but are typically styled in all capital letters.
- Keywords, such as AND, DISTINCT, IN, CONTAINS, NOT, and others are not case-sensitive but are typically styled in all capital letters.
- Single quotes are typically used to specify literal string values, except when single quotes are part of the string.
- Escaping special characters and spaces in property and label names is done by enclosing the identifer with special characters between single back quote characters ( ` ), for example `special character`.
- Italics are used in this document to identify placeholder values that you replace in a Cypher statement.
The Cypher Property Graph Model
The Cypher graph query language operates on property graphs. A property graph is defined as a directed, vertex-labeled, edge-labeled multigraph with self edges, where edges have their own identity. In a property graph, the term node is used to denote a vertex, and relationship is used to denote an edge.
The following elements may exist in a property graph:
- Entity
- Node
- Relationship
- Path
- Token
- Label
- Relationship type
- Property key
- Property
Entities
An entity has a unique, comparable identity which is assigned a set of properties, each of which are uniquely identified.
Nodes
A node is the basic entity of the graph. In addition:
- A node may be assigned a set of unique labels.
- A node may have zero or more outgoing relationships.
- A node may have zero or more incoming relationships.
Relationships
A relationship is an entity that specifies a directed connection between exactly two nodes, the
source node and the target node. In addition:
- An outgoing relationship is a directed relationship from the point of view of its source node.
- An incoming relationship is a directed relationship from the point of view of its target node. A relationship is assigned exactly one relationship type.
In AnzoGraph, no two relationships can have the same set of start and end nodes connected by the same relationship type.That is, relationships are uniquely identified by the start node, end node, and relationship type. Also, unique integer identifiers can be associated with nodes, however no integer identifier can be designated for relationships.
Paths
A path represents a walk-through of a property graph consisting of a sequence of alternating nodes and relationships. In addition:
- A path always starts and ends at a node.
- The shortest possible path contains a single node; also called an empty path.
- A path has a length, which is an integer greater than or equal to zero; the length is equal to the number of relationships in the path.
Tokens
A token is a nonempty string of Unicode characters.
Labels
A label is a token that is assigned only to nodes.
Relationship types
A relationship type is an attribute which is only assigned to relationships.
Property keys
A property key is a token which uniquely identifies an entity’s property.
Properties
A property is a pair consisting of a property key and a property value. A property value is an instance of one of Cypher’s concrete, scalar types, or a list of a concrete, scalar type.
Reserved Keywords
You can escape any Cypher reserved words by enclosing the reserved word between single back quote characters ( ` ), for example `reserved word`.