Content deleted Content added
m →GSQL |
Wikilink |
||
Line 2:
== Project for a new International Standard Graph Query Language ==
In September 2019 a proposal for a project to create a new standard graph [[query language]] (ISO/IEC 39075 Information Technology — Database Languages — GQL)<ref name="39075 GQL">{{cite web|url=https://www.iso.org/standard/76120.html|title=ISO/IEC WD 39075 Information Technology — Database Languages — GQL|last=|first=|date=|website=|publisher=ISO|accessdate=September 29, 2019}}</ref> was approved by a vote of national standards bodies which are members of ISO/IEC Joint Technical Committee 1([https://jtc1info.org/page-3/ ISO/IEC JTC 1]). JTC 1 is responsible for international Information Technology standards. GQL is intended to be a declarative database query language, like [[SQL]].
The GQL project proposal states{{quote|text="Using graph as a fundamental representation for data modeling is an emerging approach in data management. In this approach, the data set is modeled as a graph, representing each data entity as a vertex (also called a node) of the graph and each relationship between two entities as an edge between corresponding vertices. The graph data model has been drawing attention for its unique advantages. Firstly, the graph model can be a natural fit for data sets that have hierarchical, complex, or even arbitrary structures. Such structures can be easily encoded into the graph model as edges. This can be more convenient than the relational model, which requires the normalization of the data set into a set of tables with fixed row types. Secondly, the graph model enables efficient execution of expensive queries or data analytic functions that need to observe multi-hop relationships among data entities, such as reachability queries, shortest or cheapest path queries, or centrality analysis. There are two graph models in current use: the Resource Description Framework (RDF) model and the Property Graph model. The RDF model has been standardized by W3C in a number of specifications. The Property Graph model, on the other hand, has a multitude of implementations in graph databases, graph algorithms, and graph processing facilities. However, a common, standardized query language for property graphs (like SQL for relational database systems) is missing. GQL is proposed to fill this void."<ref name="39075 GQL NWIP">{{cite web|url=https://isotc.iso.org/livelink/livelink?func=ll&objId=20911483&objAction=Open&viewType=1|title=SC32 WG3 N282 “SC32 N3002 Draft NWIP Form4 Information Technology – Database Languages - GQL"|last=|first=|date=|website=|publisher=ISO|accessdate=December 9, 2019}}</ref>.}}
Line 8:
The GQL project is the culmination of converging initiatives dating back to 2016, particularly a private proposal from Neo4j to other database vendors in July 2016<ref name="Creating standard">{{cite web|url=https://s3.amazonaws.com/artifacts.opencypher.org/website/materials/DM32.2/DM32.2-2018-00144.Creating+an+Open+Industry+Standard+for+a+Declarative+Property+Graph+Query+Language.pdf|title=''Creating an Open Industry Standard for a Declarative Property Graph Query Language''|last1=Green|first1=Alastair|date=July 2016|website=|publisher=opencypher.org|accessdate=November 12, 2019}}</ref>, and a proposal from Oracle technical staff within the ISO/IEC JTC 1 standards process later that year<ref name="Towards NWIP">{{cite web|url=https://s3.amazonaws.com/artifacts.opencypher.org/website/materials/DM32.2/DM32.2-2018-00128r1.Working+towards+a+GQL+NWIP.pdf|title=''Working towards a New Work Item for GQL, to complement SQL PGQ'', ANSI INCITS DM32.2 submission ''DM32.2-2018-00128r1''|last1=Green|first1=Alastair|date=July 2018|website=|publisher=opencypher.org|accessdate=November 12, 2019}}</ref>.
The GQL project is led by Stefan Plantikow (who was the first lead engineer of [[Neo4j]]'s [[Cypher (query language)|Cypher]] for [[Apache Spark]] project) and Stephen Cannan (Technical Corrigenda editor of SQL). They are also the editors of the initial early working drafts<ref name="GQL EWD v2.2">{{cite web|url=https://isotc.iso.org/livelink/livelink?func=ll&objId=20836584&objAction=Open|title=''GQL Early Working Draft v2.2''.|last=Eds. Plantikow|first=Stefan|last2=Cannan|first2=Stephen|date=October 2019|website=|publisher=ISO|accessdate=November 9, 2019}}</ref> of the GQL specification.
As originally motivated<ref name="Towards NWIP"/>, the GQL project aims to complement the work of creating an implementable normative natural-language specification with supportive community efforts that enable contributions from those who are unable or uninterested in taking part in the formal process of defining a JTC 1 International Standard<ref name="community">{{cite web|url=https://www.gqlstandards.org/|title=''GQL Standard''|accessdate=November 12, 2019}}</ref><ref name="GCU 3">{{cite web|url=https://www.gqlstandards.org/community-updates|title=''GQL Community Updates''|accessdate=November 12, 2019}}</ref>. In July 2019 the Linked Data Benchmark Council (LDBC) agreed to become the umbrella organization for the efforts of community technical working groups. The Existing Languages and the Property Graph Schema working groups formed in late 2018 and early 2019 respectively. A working group to define formal denotational semantics for GQL was proposed at the third GQL Community Update in October 2019<ref name="FSWG">{{cite web|url=https://drive.google.com/open?id=15DAUAORu477FF-DooTH2ol0SZhx2ARtr|title=''Formal Semantics Working Group''|last=Libkin|first=Leonid|accessdate=November 12, 2019}}</ref>.
Line 14:
====The GQL property graph data model====
GQL is a query language specifically for property graphs. A property graph closely resembles a conceptual data model, as expressed in an
Nodes and edges, collectively known as elements, have attributes. Those attributes may be data values, or labels (tags). Values of properties cannot be elements of graphs, nor can they be whole graphs: these restrictions intentionally force a clean separation between the topology of a graph, and the attributes carrying data values in the context of a graph topology. The property graph data model therefore deliberately prevents nesting of graphs, or treating nodes in one graph as edges in another. Each property graph may have a set of labels and a set of properties that are associated with the graph as a whole.
Line 34:
=== SQL/PGQ Property Graph Query ===
Prior work by WG3 and SC32 mirror bodies, particularly in [[International Committee for Information Technology Standards|INCITS]] DM32, has helped to define a new planned Part 16 of the SQL Standard, which allows a read-only graph query to be called inside a SQL SELECT statement, matching a graph pattern using syntax which is very close to Cypher, PGQL and G-CORE, and returning a table of data values as the result. SQL/PGQ also contains DDL to allow SQL tables to be mapped to a graph view schema object with nodes and edges associated to sets of labels and set of data properties.<ref name="SQL Part 16 PGQ">{{cite web|url=https://www.iso.org/standard/79473.html?browse=tc|title=ISO/IEC WD 9075-16 Information technology — Database languages SQL — Part 16: SQL Property Graph Queries (SQL/PGQ)|last=|first=|date=|website=|publisher=ISO|accessdate=October 6, 2019}}</ref><ref name="W3C Berlin SQL and GQL">{{cite web|url=https://www.w3.org/Data/events/data-ws-2019/assets/slides/KeithWHare-2.pdf||title=''SQL and GQL'', W3C Workshop on Web Standardization for Graph Data. Creating Bridges: RDF, Property Graph and SQL.|last=Hare|first=Keith|display-authors=etal|date=March 2019|website=|publisher=W3C|accessdate=October 6, 2019}}</ref><ref name="LDBC SQL/PGQ">{{cite web|url=http://wiki.ldbcouncil.org/download/attachments/106233859/ldbc_tuc_2019_sql-pgq.pdf?version=1&modificationDate=1562342465000&api=v2|title=''Property graph extensions for the SQL standard''. LDBC 12th TUC.|last=Trigonakis|first=Vasileios|date=July 2019|website=|publisher=LBDC|accessdate=October 6, 2019}}</ref>. The GQL project coordinates closely with the SQL/PGQ "project split" of (extension to) ISO 9075 SQL, and the technical working groups in the U.S. (INCITS DM32) and at the international level (SC32/WG3) have several expert contributors who work on both projects<ref name="W3C Berlin SQL and GQL"/>. The GQL project proposal mandates close alignment of SQL/PGQ and GQL, indicating that GQL will in general be a superset of SQL/PGQ.
=== Cypher ===
Cypher<ref name="Cypher">{{cite web|url=https://dl.acm.org/citation.cfm?id=3190657|title=''Cypher: An Evolving Query Language for Property Graphs.'' In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). ACM, New York, NY, USA, 1433-1445. DOI: 10.1145/3183713.3190657|last=Francis|first=Nadime|display-authors=etal|date=|website=|publisher=ACM|accessdate=October 25, 2019}}</ref> is a language originally designed by Andrés Taylor and colleagues at Neo4j Inc., and first implemented by that company in 2011. Since 2015 it has been made available as an open source language description<ref name="Cypher 9">{{cite web|url=https://s3.amazonaws.com/artifacts.opencypher.org/openCypher9.pdf|title=''Cypher Query Language Reference (Version 9)'' |last=|first=|display-authors=etal|date=|website=|publisher=opencypher.org|accessdate=November 10, 2019}}</ref> with grammar tooling, a [[Java virtual machine|JVM]] front-end that parses Cypher queries, and a Technology Compatibility Kit (TCK) of over 2000 test scenarios, using [[Cucumber (software)|Cucumber]] for implementation language portability<ref name="Cypher Resources">{{cite web|url=http://www.opencypher.org/resources|title=''openCypher Resources''|last=|first=|display-authors=etal|date=|website=|publisher=ACM|accessdate=November 10, 2019}}</ref>. The TCK reflects the language description and an enhancement for temporal datatypes and functions documented in a Cypher Improvement Proposal<ref name="Date-Time CIP">{{cite web|url=https://github.com/thobe/openCypher/blob/date-time/cip/1.accepted/CIP2015-08-06-date-time.adoc|title=''CIP2015-08-06 - Date and Time''|last=|first=|date=|website=|publisher=opencypher.org|accessdate=October 25, 2019}}</ref>.
Cypher allows creation, reading, updating and deleting of graph elements, and is a language that can therefore be used for analytics engines and transactional databases.
====Querying with visual path patterns====
Cypher uses compact fixed- and variable-length patterns which combine visual representations of node and relationship (edge) topologies, with label existence and property value predicates. (These patterns are usually referred to as "[[ASCII
For example, a pattern {{code|MATCH (p:Person)-[:LIVES_IN]->(c:City)}} will generate a two-column output table. The first column named {{code|p}} will contain references to nodes with a label {{code|Person}} . The second column named {{code|c}} will contain references to nodes with a label {{code|City}} , denoting the city where the person lives.
Line 69:
====Cypher implementations====
Cypher is implemented in Neo4j's database, in SAP's [[SAP HANA|HANA]] Graph, by [[Redis]] Graph<ref name="Redis Graph Cypher">{{cite web|url=https://oss.redislabs.com/redisgraph/|title=''RedisGraph - a graph database module for Redis''|last=|first=|date=|website=|publisher=Redis Labs|accessdate=November 9, 2019}}</ref>, by Cambridge Semantics' Anzograph<ref name="Anzograph openCypher">{{cite web|url=https://www.prweb.com/releases/cambridge_semantics_adds_opencypher_to_anzograph_cambridge_semantics_is_first_vendor_to_offer_both_rdf_sparql_and_opencypher_graph_data_access/prweb16192576.htm|title=''Cambridge Semantics Adds OpenCypher to AnzoGraph''|last=|first=|date=March 2019|website=|publisher=|accessdate=November 9, 2019}}</ref>, by Bitnine's Agens Graph, by Memgraph, and in open source projects Cypher for [[Gremlin (programming language)|Gremlin]]<ref name="CfoG">{{cite web|url=https://github.com/opencypher/cypher-for-gremlin||title=''Cypher for Gremlin adds Cypher support to any Gremlin graph database.''|last=Novikov|first=Dmitry|display-authors=etal|date=January 2018|website=|publisher=openCypher|accessdate=November 3, 2019}}</ref> maintained by Neueda Labs in Riga, and Cypher for Apache Spark (now renamed to Morpheus)<ref name="CAPS Morpheus"/><ref name="Morpheus SQL and Cypher">{{cite web|url=https://databricks.com/session/neo4j-morpheus-interweaving-table-and-graph-data-with-sql-and-cypher-in-apache-spark||title=''Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apache Spark''|last=Green|first=Alastair|last2=Junghanns|first2=Martin|date=April 2019|website=|publisher=Databricks Inc.|accessdate=November 3, 2019}}</ref><ref name="Morpheus SQL and Cypher cont">{{cite web|url=https://databricks.com/session/neo4j-morpheus-interweaving-table-and-graph-data-with-sql-and-cypher-in-apache-spark-continues|title=''Neo4j Morpheus: Interweaving Table and Graph Data with SQL and Cypher in Apache Spark (continues)''}}</ref>, as well as in research projects such as Cypher.PL and Ingraph<ref name="Cypher usage">{{cite web|url=http://www.opencypher.org/projects|title=''Usage of Cypher''|last=|first=|display-authors=etal|date=|website=|publisher=openCypher.org|accessdate=November 10, 2019}}</ref>. Cypher as a language is governed as the openCypher project <ref name="openCypher project">{{cite web|url=https://github.com/opencypher/openCypher|title=''Specification of the Cypher property graph query language''|last=|first=|display-authors=etal|date=|website=|publisher=openCypher.org|accessdate=November 10, 2019}}</ref>by an informal community which has held five face-to-face openCypher Implementers' Meetings since February 2017 <ref name="Cypher events">{{cite web|url=http://www.opencypher.org/events|title=''Events''|last=|first=|display-authors=etal|date=|website=|publisher=openCypher.org|accessdate=November 10, 2019}}</ref>.
====Cypher 9 and Cypher 10====
Line 80:
=== G-CORE===
G-CORE is a research language designed by a group of academic and industrial researchers and language designers which draws on features of Cypher, PGQL and [[SPARQL]]<ref name="G-CORE">{{cite web|url=https://dl.acm.org/citation.cfm?id=3190654/|title=''G-CORE: A Core for Future Graph Query Languages.'' In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18). ACM, New York, NY, USA, 1421-1432. DOI: 10.1145/3183713.3190654|last=Angles|first=Renzo|display-authors=etal|date=2018|website=|publisher=ACM|accessdate=November 9, 2019}}</ref><ref name="G-CORE summary">{{cite web|url=https://dl.acm.org/citation.cfm?id=3190654/|title=''G-CORE: The LDBC Graph Query Language Proposal''. In archives of FOSDEM 2018.|last=Voigt|first=Hannes|date=February 2018|website=|publisher=|accessdate=November 12, 2019}}</ref>. The project was conducted under the auspices of the Linked Data Benchmark Council (LDBC), starting with the formation of a Graph Query Language task force in late 2015, with the bulk of the work of paper writing occurring in 2017. G-CORE is a composable language which is closed over graphs: graph inputs are processed to create a graph output, using graph projections and graph set operations to construct the new graph. G-CORE queries are pure functions over graphs, having no side effects, which mean that the language does not define operations which mutate (update or delete) stored data. G-CORE introduces views (named queries). It also incorporates paths as elements in a graph ("paths as first class citizens"), which can be queried independently of projected paths (which are computed at query time over node and edge elements). G-CORE has been partially implemented in open-source research projects in the LDBC
=== GSQL ===
Line 91:
Graph DDL features include<ref name="CAPS oCIM V">{{cite web|url=https://s3.amazonaws.com/artifacts.opencypher.org/website/ocim5/slides/ocim5+-+CAPS+%2B+GraphDDL.pdf|title=''Multiple graphs and composable queries in Cypher for Apache Spark''. openCypher Implementers Meeting V, Berlin.|last=Kiessling|first=Max|date=2019|website=|publisher=opencypher.org|accessdate=November 9, 2019}}</ref>
#definition of property graph views over [[Java Database Connectivity|JDBC]]-connected SQL tables and Spark DataFrames<ref name="EXTENDS element type">{{cite web|url=https://github.com/tobias-johansson/graphddl-example-ldbc/|title=''graphddl-example-ldbc: A cypher-for-apache-spark example showing the use of SqlPropertyGraphSource and GraphDDL to provide a property graph view of a SQL dataset''.|last=Johanssen|first=Tobias|display-authors=etal|date=2019|website=|publisher=|accessdate=November 9, 2019}}</ref>
#definition of graph schemas or types defined by assembling node type and edge type patterns, with subtyping<ref name="EXTENDS element type"/>
#constraining the content of a graph by a closed or fixed schema
|