Graph Query Language: Difference between revisions

Content deleted Content added
Heading changes, sub-headings
Line 1:
{{AFC submission|t||ts=20191006084816|u=A James Green|ns=118|demo=}}<!-- Important, do not remove this line before article has been created. -->
 
== ISO/IECProject JTCfor 1a new International Standard projectGQL ==
In September 2019 a proposal for a GQL standard project (39075 GQL)<ref name="39075 GQL">{{cite web|url=https://www.iso.org/standard/76120.html|title=ISO/IEC WD 39075 Information Technology — Database Languages — GQL|last=|first=|date=|website=|publisher=ISO|accessdate=September 29, 2019}}</ref> was approved by a vote of national standards bodies which are members of ISO/IEC Joint Technical Committee 1([https://jtc1info.org/page-3/ ISO/IEC JTC 1]), which is responsible for international Information Technology standards. GQL is intended to be a declarative query language, like SQL.
 
Line 8:
The GQL project is led by Stefan Plantikow (who was the first lead engineer of Neo4j's Cypher for Apache Spark project) and Stephen Cannan (Technical Corrigenda editor of SQL). They are also the editors of the initial early working drafts<ref name="GQL EWD v2.2">{{cite web|url=https://isotc.iso.org/livelink/livelink?func=ll&objId=20836584&objAction=Open|title=''GQL Early Working Draft v2.2''.|last=Eds. Plantikow|first=Stefan|last2=Cannan|first2=Stephen|date=November 2019|website=|publisher=ISO|accessdate=November 9, 2019}}</ref> of the GQL specification.
 
====The GQL property graph data model====
 
GQL is a query language specifically for property graphs. A property graph closely resembles a conceptual data model, as expressed in an Entity Relationship model or in a UML class diagram. Entities or concepts are modelled as nodes, and relationships as edges, in a graph. Property graphs are ''multigraphs'': there can be many edges between the same pair of nodes. GQL graphs can be ''mixed'': they can contain directed edges, where one of the endpoint nodes of an edge is the tail (or source) and the other node is the head (or target or destination), but they can also contain undirected (bidirectional or reflexive) edges.
Line 20:
Additional aspects of the ERM or UML models (like generalization or subtyping, or entity or relationship cardinalities) may be captured by GQL schemas or types that describe possible instances of the general data model.
 
==Managed alongside SQL by JTC 1/SC32 Working Group 3 (WG3): extending SQL and creating GQL ==
The GQL project has a four-year timespan. Seven national standards bodies (those of the United States, China, Korea, the Netherlands, the United Kingdom, Denmark and Sweden) have nominated national subject-matter experts to work on the project, which is conducted by Working Group 3 (Database Languages) of ISO/IEC JTC 1's Subcommittee 32 (Data Management and Interchange), usually abbreviated as ISO/IEC JTC 1/SC 32 WG3, or just "WG3" for short. WG3 (and its direct predecessor committees within JTC 1) has been responsible for the SQL standard since 1987.<ref name="SC32 and WG3 history">{{cite web|url=https://jtc1info.org/sd_2-history_of_jtc1/jtc1-subcommittees/sc-32/|title=JTC 1/SC 32 Data Management and Interchange|last=|first=|date=|website=|publisher=ISO/IEC JTC1|accessdate=October 6, 2019}}</ref><ref name="1987 scope of SQL">{{cite web|url=https://isotc.iso.org/livelink/livelink?func=ll&objId=19733701&objAction=Open/|title=''Scope from the original standard, ISO 9075-1987, Database Language SQL''|last=|first=|date=|website=|publisher=ISO/IEC JTC1|accessdate=November 9, 2019}}</ref>
 
Line 31:
Prior work by WG3 and SC32 mirror bodies, particularly in INCITS DM32, has helped to define a new planned Part 16 of the SQL Standard, which allows a read-only graph query to be called inside a SQL SELECT statement, matching a graph pattern using syntax which is very close to Cypher, PGQL and G-CORE, and returning a table of data values as the result. SQL/PGQ also contains DDL to allow SQL tables to be mapped to a graph view schema object with nodes and edges associated to sets of labels and set of data properties.<ref name="SQL Part 16 PGQ">{{cite web|url=https://www.iso.org/standard/79473.html?browse=tc|title=ISO/IEC WD 9075-16 Information technology — Database languages SQL — Part 16: SQL Property Graph Queries (SQL/PGQ)|last=|first=|date=|website=|publisher=ISO|accessdate=October 6, 2019}}</ref><ref name="W3C Berlin SQL and GQL">{{cite web|url=https://www.w3.org/Data/events/data-ws-2019/assets/slides/KeithWHare-2.pdf||title=''SQL and GQL'', W3C Workshop on Web Standardization for Graph Data. Creating Bridges: RDF, Property Graph and SQL.|last=Hare|first=Keith|display-authors=etal|date=March 2019|website=|publisher=W3C|accessdate=October 6, 2019}}</ref><ref name="LDBC SQL/PGQ">{{cite web|url=http://wiki.ldbcouncil.org/download/attachments/106233859/ldbc_tuc_2019_sql-pgq.pdf?version=1&modificationDate=1562342465000&api=v2|title=''Property graph extensions for the SQL standard''. LDBC 12th TUC.|last=Trigonakis|first=Vasileios|date=July 2019|website=|publisher=LBDC|accessdate=October 6, 2019}}</ref>. The GQL project coordinates closely with the SQL/PGQ "project split" of (extension to) ISO 9075 SQL, and the technical working groups in the U.S. (INCITS DM32) and at the international level (SC32/WG3) have several expert contributors who work on both projects<ref name="W3C Berlin SQL and GQL"/>. The GQL project proposal mandates close alignment of SQL/PGQ and GQL, indicating that GQL will in general be a superset of SQL/PGQ.
=== Cypher 9 ===
Cypher<ref name="Cypher">{{cite web|url=https://dl.acm.org/citation.cfm?id=3190657|title=''Cypher: An Evolving Query Language for Property Graphs.'' In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). ACM, New York, NY, USA, 1433-1445. DOI: 10.1145/3183713.3190657|last=Francis|first=Nadime|display-authors=etal|date=|website=|publisher=ACM|accessdate=October 25, 2019}}</ref> is a language originally designed and first implemented by Neo4j Inc., in 2011. Since 2015 it has been made available as an open source language description<ref name="Cypher 9">{{cite web|url=https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf|title=''Cypher Query Language Reference (Version 9)'' |last=|first=|display-authors=etal|date=|website=|publisher=opencypher.org|accessdate=November 10, 2019}}</ref> with grammar tooling, a JVM front-end that parses Cypher queries, and a Technology Compatibility Kit (TCK) of over 2000 test scenarios, using Cucumber for implementation language portability<ref name="Cypher Resources">{{cite web|url=http://www.opencypher.org/resources|title=''openCypher Resources''|last=|first=|display-authors=etal|date=|website=|publisher=ACM|accessdate=November 10, 2019}}</ref>. The TCK reflects the language description and an enhancement for temporal datatypes and functions documented in a Cypher Improvement Proposal<ref name="Date-Time CIP">{{cite web|url=https://github.com/thobe/openCypher/blob/date-time/cip/1.accepted/CIP2015-08-06-date-time.adoc|title=''CIP2015-08-06 - Date and Time''|last=|first=|date=|website=|publisher=opencypher.org|accessdate=October 25, 2019}}</ref>.
 
Cypher allows creation, reading, updating and deleting of graph elements, and is a language that can therefore be used for analytics engines and transactional databases.
 
====TheQuerying central role ofwith visual path patterns====
Cypher uses compact fixed- and variable-length patterns which combine visual representations of node and relationship (edge) topologies, with label existence and property value predicates. (These patterns are usually referred to as "ASCII Art" patterns, and arose originally as a way of commenting programs which used a lower-level graph API.[<ref name="GQLs history"/>) By matching such a pattern against graph data elements, a query can extract references to nodes, relationships and paths of interest. Those references are emitted as a "binding table" where column names are bound to a multiset of graph elements. The name of a column becomes the name of a "binding variable", whose value is a specific graph element reference for each row of the table.
 
Cypher uses compact fixed- and variable-length patterns which combine visual representations of node and relationship (edge) topologies, with label existence and property value predicates. (These patterns are usually referred to as "ASCII Art" patterns, and arose originally as a way of commenting programs which used a lower-level graph API.[<ref name="GQLs history"/>) By matching such a pattern against graph data elements, a query can extract references to nodes, relationships and paths of interest. Those references are emitted as a "binding table" where column names are bound to a multiset of graph elements. The name of a column becomes the name of a "binding variable", whose value is a specific graph element reference for each row of the table.
 
For example, a pattern &nbsp;{{code|MATCH (p:Person}-[LIVES_IN]->(c:City)}}&nbsp; will generate a two-column output table. The first column named &nbsp;{{code|p}}&nbsp; will contain references to nodes with a label &nbsp;{{code|Person}}&nbsp;. The second column named &nbsp;{{code|c}}&nbsp; will contain references to nodes with a label &nbsp;{{code|City}}&nbsp;, denoting the city where the person lives.
Line 48 ⟶ 47:
 
Queries are therefore able to first project a sub-graph of the graph input into the query, and then extract the data values associated with that subgraph. Data values can also be processed by functions, including aggregation functions, leading to the projection of computed values which render the information held in the projected graph in various ways. Patterns of this kind have become pervasive in property graph query languages, and are the basis for the advanced pattern sub-language being defined in SQL/PGQ, which is likely to become a subset of the GQL language.
Cypher uses patterns for insertion and modification clauses (CREATE and MERGE), and proposals have been made in the GQL project for collecting node and edge patterns to describe graph types.
 
====Cypher implementations====
The current version of Cypher (including the temporal extension) is referred to as Cypher 9. Prior to the GQL project it was planned to create a new version, Cypher 10 ['''REF HEADING BELOW'''], that would incorporate features like schema and composable graph queries and views. The first designs for Cypher 10, including graph construction and projection, were implemented in the Cypher for Apache Spark project starting in 2016.<ref name="CAPS Morpheus">{{cite web|url=https://github.com/opencypher/morpheus||title=''Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.''|last=Rydberg|first=Mats|display-authors=etal|date=July 2016|website=|publisher=openCypher|accessdate=November 3, 2019}}</ref>.
 
Cypher is implemented in Neo4j's database, in SAP's HANA Graph, by Redis Graph<ref name="Redis Graph Cypher">{{cite web|url=https://oss.redislabs.com/redisgraph/|title=''RedisGraph - a graph database module for Redis''|last=|first=|date=|website=|publisher=Redis Labs|accessdate=November 9, 2019}}</ref>, by Cambridge Semantics' Anzograph<ref name="Anzograph openCypher">{{cite web|url=https://www.prweb.com/releases/cambridge_semantics_adds_opencypher_to_anzograph_cambridge_semantics_is_first_vendor_to_offer_both_rdf_sparql_and_opencypher_graph_data_access/prweb16192576.htm|title=''Cambridge Semantics Adds OpenCypher to AnzoGraph''|last=|first=|date=March 2019|website=|publisher=|accessdate=November 9, 2019}}</ref>, by Bitnine's Agens Graph, by Memgraph, and in open source projects Cypher for Gremlin<ref name="CfoG">{{cite web|url=https://github.com/opencypher/cypher-for-gremlin||title=''Cypher for Gremlin adds Cypher support to any Gremlin graph database.''|last=Novikov|first=Dmitry|display-authors=etal|date=January 2018|website=|publisher=openCypher|accessdate=November 3, 2019}}</ref> maintained by Neueda Labs in Riga, and Cypher for Apache Spark (now renamed to Morpheus)<ref name="CAPS Morpheus"/><ref name="Morpheus SQL and Cypher">{{cite web|url=https://databricks.com/session/neo4j-morpheus-interweaving-table-and-graph-data-with-sql-and-cypher-in-apache-spark||title=''Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apache Spark''|last=Green|first=Alastair|last2=Junghanns|first2=Martin|date=April 2019|website=|publisher=Databricks Inc.|accessdate=November 3, 2019}}</ref><ref name="Morpheus SQL and Cypher cont">{{cite web|url=https://databricks.com/session/neo4j-morpheus-interweaving-table-and-graph-data-with-sql-and-cypher-in-apache-spark-continues|title=''Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apache Spark (continues)''}}</ref>, as well as in research projects such as Cypher.PL and Ingraph<ref name="Cypher usage">{{cite web|url=http://www.opencypher.org/projects|title=''Usage of Cypher''|last=|first=|display-authors=etal|date=|website=|publisher=openCypher.org|accessdate=November 10, 2019}}</ref>. Cypher as a language is governed as the openCypher project <ref name="openCypher project">{{cite web|url=https://github.com/opencypher/openCypher|title=''Specification of the Cypher property graph query language''|last=|first=|display-authors=etal|date=|website=|publisher=openCypher.org|accessdate=November 10, 2019}}</ref>by an informal community which has held five face-to-face openCypher Implementers' Meetings since February 2017 <ref name="Cypher events">{{cite web|url=http://www.opencypher.org/events|title=''Events''|last=|first=|display-authors=etal|date=|website=|publisher=openCypher.org|accessdate=November 10, 2019}}</ref>.
 
====Cypher 9 and Cypher 10====
 
The current version of Cypher (including the temporal extension) is referred to as Cypher 9. Prior to the GQL project it was planned to create a new version, Cypher 10 ['''REF HEADING BELOW'''], that would incorporate features like schema and composable graph queries and views. The first designs for Cypher 10, including graph construction and projection, were implemented in the Cypher for Apache Spark project starting in 2016.<ref name="CAPS Morpheus">{{cite web|url=https://github.com/opencypher/morpheus||title=''Morpheus brings the leading graph query language, Cypher, onto the leading distributed processing platform, Spark.''|last=Rydberg|first=Mats|display-authors=etal|date=July 2016|website=|publisher=openCypher|accessdate=November 3, 2019}}</ref>.
 
=== PGQL ===
Line 63 ⟶ 67:
GSQL<ref name="GSQL white paper">{{cite web|url=https://info.tigergraph.com/gsql|title=''GSQL: An SQL-Inspired Graph Query Language''|last=Wu|first=Mingxi|last2=Deutsch|first2=Alin|date=|website=|publisher=|accessdate=November 9, 2019}}</ref> is a language designed for TigerGraph Inc.'s property graph database. Since October 2018 TigerGraph language designers have been very active in promoting and working on the GQL project. GSQL is a Turing-complete language that incorporates procedural flow control and iteration, and a facility for gathering and modifying computed values associated with a program execution for the whole graph or for elements of a graph called accumulators. These features are designed to enable iterative graph computations to be combined with data exploration and retrieval. GSQL graphs must be described by a schema of vertexes and edges . Vertexes and edges are are named schema objects which contain data but also define an imputed type, much as SQL tables are data containers, with an associated implicit row type. GSQL graphs are then composed from these vertex and edge sets, and multiple named graphs can include the same vertex or edge set. GSQL has developed new features since its release in September 2017<ref name="GSQL 1.0">{{cite web|url=https://doc-archive.tigergraph.com/1.0/GSQL-Language-Reference-Part-1---Defining-Graphs-and-Loading-Data.html|title=''GSQL documentation Tigergraph 1.0''.|date=2017|website=|publisher=|accessdate=November 9, 2019}}</ref>, most notably introducing variable-length edge pattern matching<ref name="GSQL patterns">{{cite web|url=https://docs.tigergraph.com/v/2.4/release-notes-change-log/release-notes-tigergraph-2.4|title=''Pattern Matching'', TigerGraph 2.4 Release Notes.|date=June 2019|website=|publisher=|accessdate=November 9, 2019}}</ref> using a syntax related to that seen in Cypher, PGQL, SQL/PGQ, but also close in style to the fixed-length patterns offered by Microsoft SQL/Server Graph<ref name="SQLServer Graph">{{cite web|url=https://docs.microsoft.com/en-us/sql/relational-databases/graphs/sql-graph-overview?view=sql-server-ver15#query-language-extensions|title=''Query language extensions'', Graph processing with SQL Server and Azure SQL Database|last=|first=|display-authors=etal|date=2017|website=|publisher=Microsoft Inc.|accessdate=November 10, 2019}}</ref>.
 
=== CypherMultiple 10graphs extensionsand composable graph queries in Cypher for Apache Spark ===
The opencypher Morpheus project<ref name="CAPS Morpheus"/> implements Cypher for Apache Spark users. Commencing in 2016, this project originally ran alongside two related efforts, in which Morpheus designers also took part: SQL/PGQ and G-CORE. The Morpheus project acted as a testbed for extensions to Cypher (known as "Cypher 10") in the two areas of graph DDL and query language extensions.