Skip to content

Graph Model

NeoWiki stores a query-optimized projection of its data in a Neo4j graph database (ADR 3). The source of truth for all data remains in MediaWiki revision slots (ADR 4); the graph is a secondary store that enables efficient querying and relationship traversal.

For definitions of domain terms like Subject, Statement, and Schema, see the Glossary.

Overview

The graph consists of two node types and two relationship categories:

(:Page)-[:HasSubject {isMain}]->(:Subject:SchemaName)
(:Subject)-[:RelationType {id, ...}]->(:Subject)

Page nodes represent MediaWiki pages and carry page-level metadata. Subject nodes represent structured data entities. HasSubject relationships connect pages to their Subjects. Typed relationships connect Subjects to other Subjects via Relations.

Page Nodes

Every page that contains structured data has a corresponding :Page node. These nodes make page metadata available for graph queries.

PropertyNeo4j TypeDescription
idintegerMediaWiki page ID (unique)
namestringPage title
creationTimedatetimeWhen the page was created
lastUpdateddatetimeWhen the page was last modified
lastEditorstringUsername of the last editor
categoriesstring[]MediaWiki categories the page belongs to

creationTime and lastUpdated are stored as Neo4j datetime values (ISO 8601), converted from MediaWiki's YmdHis timestamp format.

Subject Nodes

Each Subject stored on a page gets a :Subject node. Subject nodes carry two labels: Subject and the name of their Schema (e.g., :Subject:Person, :Subject:Company). The Schema label changes if a Subject's type changes.

Fixed properties

PropertyNeo4j TypeDescription
idstringSubject ID, 15 characters starting with s (unique)
namestringSubject label

Dynamic properties

Each Statement on a Subject becomes a node property, keyed by Property Name. The value is converted to a Neo4j-compatible format by the corresponding PropertyType implementation. For example, a Statement with Property Name "Founded at" and a number value of 2019 results in a node property Founded at: 2019.

Relation-type Statements are not stored as node properties. They are stored as relationships between Subject nodes (see below).

Relationships

HasSubject

Connects a Page node to each of its Subject nodes.

PropertyNeo4j TypeDescription
isMainbooleantrue for the Main Subject, false for Child Subjects

A page can have at most one Main Subject and any number of Child Subjects (ADR 7).

Typed Relations

Subject-to-Subject relationships represent Relations. The relationship type in Neo4j is the Relation Type defined in the Property Definition (e.g., Has author, Has product). Names that are not valid Cypher identifiers are backtick-escaped.

PropertyNeo4j TypeDescription
idstringRelation ID, 15 characters starting with r
(additional)scalarAny properties from the Relation's property map

When a Subject is deleted but still has incoming relations from other Subjects, its outgoing relationships and HasSubject relationship are removed, but the node itself is kept so that the incoming references remain valid.

Constraints

Two uniqueness constraints are created on initialization:

  • Page.id is unique
  • Subject.id is unique