Introduction to Labeled Property Graphs

Introduction

Jakob Voß

2024-11-22

Data Modelling with Graphs (and Star Wars)

“Little boxes with arrows and stuff”

Example

C3PO C3PO R2D2 R2D2 R2D2--C3PO friends Luke Luke R2D2--Luke ⮜ owns Luke--C3PO owns ➤

Example

R2D2 R2D2 C3PO C3PO R2D2--C3PO friends Luke Luke R2D2--Luke ⮜ owns Luke--C3PO owns ➤

Example

C3PO Robot R2D2 Robot R2D2--C3PO friends Luke Person R2D2--Luke ⮜ owns Luke--C3PO owns ➤

Basic graph elements for data modeling

  • nodes (aka vertices) representing entities
  • edges (aka connections, relations…)
  • node labels as
    • node identifiers and/or
    • node types (aka node labels, classes…)
  • edge labels as edge types (aka edge labels…)

Data modeling

  • Arbitrary graphs used for models
  • Models expressed in data formats

Some Graph Data Formats

RDF/Turtle

# directed edges
<Luke> <owns> <R2D2> , 
              <C3PO> .

# node types (additional edges)
<R2D2> a <robot> .
<C3PO> a <robot> .
<Luke> a <person> .

# no undirected edges!
<R2D2> <friend> <C3PO> .
  • Requires IRIs
  • More limitations later

CSV

robot ownership

owner,robot
Luke,C3PO
Luke,R2D2

robot friendship

friend1,friend2
R2D2,C3PO
  • Requires contextual information
  • Least common denominator: resistance is futile!

SQL

# nodes
INSERT INTO robots VALUES ("R2D2");
INSERT INTO robots VALUES ("C3PO");
INSERT INTO people VALUES ("Luke");

# edges
INSERT INTO robot_ownership VALUES ("Luke", "C3PO");
INSERT INTO robot_ownership VALUES ("Luke", "R2D2");
INSERT INTO robot_friends VALUES ("R2D2", "C3PO");   # directed!
  • Requires a database schema. Pros and cons?

GraphML

<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns">
  <graph edgedefault="undirected">
    <node id="C3PO"/>
    <node id="Luke"/>
    <node id="R2D2"/>
    <edge source="Luke" target="C3PO" directed="true"/>
    <edge source="Luke" target="R2D2" directed="true"/>
    <edge source="C3PO" target="R2D2"/>
  </graph>
</graphml>
  • Many more graph data formats exist
  • Mostly for applications other than metadata
    (e.g. network analysis)

Cypher

CREATE (C3PO:robot)
CREATE (Luke:person)
CREATE (R2D2:robot)
CREATE (Luke)-[:owns]->(C3PO)
CREATE (Luke)-[:owns]->(R2D2)
CREATE (Luke)-[:friend]->(C3PO)  # directed!
  • Used in property graph databases
  • Established standard (more or less)

Property Graph Exchange Format (PG)

# nodes
R2D2 :robot
C3PO :robot
Luke :person

# edges
Luke -> C3PO :owns      # directed
Luke -> R2D2 :owns      # undirected
C3PO -- R2D2 :friends   # undirected

Try out PG in your browser!

Additional features

Additional graph features

cluster subgraph X X X->X loops a a X->a Y Y X->Y multi X->Y edges b b a->b orphan orphan

  • Support depends on the actual format or software!

Properties / Attributes

# node properties
Padmé  :person  gender: female
Anakin :person  gender: male                 
Luke   :person  gender: male                 
C3PO   :robot   color:  golden, silver   # multi-value!
R2D2   :robot              
   
# edge properties
Padmé  -> R2D2   :owns     episode:1
Anakin -> R2D2   :owns     episode:2    
Anakin -> Luke   :child    episode:3
Padmé  -> Luke   :child    episode:3 
Luke   -> R2D2   :owns     episode:4
Luke   -> C3PO   :owns     episode:4

Details depend on format & software

  • Special properties (name, id, visual, reserved…)

  • Which datatypes are supported (string, number, date…)?

  • Can properties have values of mixed type? Empty set? Null?

  • What are node/edge ids (internal, numeric, name…)?

  • Can nodes/edges have multiple labels/types?

Wikidata as a property graph

Wikidata as a property graph

flowchart LR
  Q28193["<u>Academy Award for Best Film Editing (Q28193)</u><br>alias: Oscar for Best Film Editing"]
  Q463119["<u>Marcia Lucas (Q463119)</u><br><tt>alias:</tt> Marcia Griffin"]
  Q463119 -- "<u>award received (P166)</u><br>for work: Star Wars<br>date: 1978" --> Q28193

  • Node identifiers and edge labels (property identifiers)
  • Data model and terminology differ from
    both RDF and common property graph models
    • aliases and descriptions with language
    • properties can link to entities

First Summary

(Labeled) Property Graphs

  • A class of graph structures where

    • nodes and edges have labels (aka types)
    • nodes and edges have properties (aka attributes)
  • Specific features differ depending on data format and software

  • Useful for data modeling and schema-less data management

Tow hard things

There are only two hard things in Computer Science: cache invalidation and naming things. — Phil Karlton

  • “Property”
    • attribute-value pair in a property graph
    • IRI used as middle part in an RDF triple
    • attribute, field, …
  • “Label”
    • type, class, …
    • name, …

Some property Graph data formats

  • CSV
  • Cypher
  • GraphML
  • Property Graph Exchange Format (PG)

Converter NPM package pgraphs