About the project

Method, data, and
how to cite

WORD-DRIFT is an open research resource. This page covers the ontology design, the source datasets, license terms, and citation information.

Ontology

Five-module RDF/OWL design

The schema uses the namespace drift: = https://w3id.org/word-drift/ontology# and is split across five Turtle modules that are loaded together by validate.py.

01

Lexical

drift:Word and drift:Sense, aligned to OntoLex-Lemon. Words carry drift:writtenForm and drift:language; senses carry glosses and connotation.

02

Sense over time

Attestation intervals (OWL-Time), connotation labels (Positive / Neutral / Negative), and relative frequency observations that drive the sparkline in the timeline view.

03

Drift event

Reified change event (drift:DriftEvent) with typed edges drift:senseFrom / drift:senseTo, a SKOS drift-type taxonomy, year / interval, and confidence (0.0–1.0).

04

Causation

A drift:TriggerEvent (dateable, Wikidata-linkable via owl:sameAs) is connected to a drift event through a reified drift:CausalHypothesis (drift:aboutDrift + drift:proposedTrigger, under prov:wasInfluencedBy). Each hypothesis carries a confidence and typed evidence on a five-rung ladder: Speculative < FrequencyCorrelation < ChangeSignalAlignment < LexicographicNote < ScholarlyAttestation. Browse the live trigger timeline →

05

Provenance

PROV-O-based. Every drift claim requires at least one drift:Source with a URL. This is enforced structurally by the SHACL shapes file, not just by convention.

What the KG is built on

WORD-DRIFT layers a curated causal annotation over existing, publicly available lexical change datasets. The backbone datasets are loaded via Trails RML into the ETL pipeline (etl/rml/).

DWUG DE + EN

Diachronic Word Usage Graphs (Schlechtweg et al. 2020 & 2021). Usage-graph clusters become drift:Sense nodes; cluster transitions become candidate drift:DriftEvents. License: CC-BY 4.0.

SemEval-2020 Task 1

Binary and graded change labels for target word sets in DE and EN. Provides the gold-standard drift binary that anchors the curated sense pairs. License: CC-BY 4.0.

DWDS (BBAW)

Digitales Wörterbuch der deutschen Sprache. Word-frequency curves (Wortverlaufskurven) drive the drift:FrequencyObservation nodes for German words. License: CC-BY-NC 3.0 DE.

Wikidata

Real-world trigger events are linked via owl:sameAs to Wikidata entities (e.g. Q115500066 for the Querdenken-711 movement). Dates, geolocation, and related entities federate in by SPARQL. License: CC0 1.0.

Oxford English Dictionary (OED)

Used as a citable source for English etymology and first-attestation dates in the curated showcase set. OED is a licensed resource; only bibliographic references are included in the KG, not content.

Wikipedia

Used as a citable source for trigger event descriptions (e.g. the Querdenken movement article). CC-BY-SA 4.0.

Technical stack

RDF all the way down

The project uses a pure Semantic Web stack. No proprietary graph DB required for validation and the showcase queries; a qlever instance adds SPARQL federation with Wikidata for production use.

RDF / OWL / Turtle

All data in Turtle, loaded via rdflib for validation. Namespaces registered at w3id.org for permanence.

SHACL

Two shapes files enforce: every drift event has a source; every sense has a connotation; every trigger event has a date. Validated with pyshacl.

SPARQL / qlever

Query layer via qlever for production federation. The four showcase queries in queries/ all validate with rdflib's built-in SPARQL engine for CI.

Trails RML

ETL pipeline (etl/rml/) maps DWUG TSV and SemEval JSON to the drift: ontology via RML rules, keeping ingest reproducible and auditable.

D3.js v7

The interactive visualiser is plain vanilla D3, served as static files. No build step, no framework. Works from a local python -m http.server.

Open by design

Schema and code

The ontology modules (ontology/), SHACL shapes, SPARQL queries, Python scripts, and the D3 visualiser are released under the MIT License.

MIT — ontology / shapes / code / viz

Curated data

The hand-curated RDF instances in examples/ and future data/ releases are published under CC-BY 4.0. Source datasets retain their own licenses (see Data sources above).

CC-BY 4.0 — curated RDF instances

Citation

How to cite WORD-DRIFT

If you use the schema, data, or visualiser in your research, please cite the repository. A companion paper describing the causal ontology layer is in preparation; this entry will be updated when it is published.

BibTeX

@misc{worddrift2026,
  title  = {{WORD-DRIFT}: A Knowledge Graph for Evidenced Causal
            Hypotheses in Lexical Semantic Change},
  author = {Nennemann, Christian},
  year   = {2026},
  url    = {https://github.com/XORwell/word-drift},
  note   = {Version 0.4. Data: CC-BY 4.0; Code: MIT.}
}

Plain text

Nennemann, C. (2026). WORD-DRIFT: A Knowledge Graph for Evidenced Causal Hypotheses in Lexical Semantic Change (Version 0.4). https://github.com/XORwell/word-drift

Repository: github.com/XORwell/word-drift

Contribute

Adding words and triggers

New word entries follow the pattern in examples/querdenker.ttl and examples/funk.ttl. Every entry requires at minimum:

Run python validate.py before submitting a pull request. All examples must pass SHACL validation and the four SPARQL showcase queries.