About the project
WORD-DRIFT is an open research resource. This page covers the ontology design, the source datasets, license terms, and citation information.
Ontology
The schema uses the namespace
drift: = https://w3id.org/word-drift/ontology#
and is split across five Turtle modules that are loaded together by
validate.py.
drift:Word and drift:Sense, aligned to
OntoLex-Lemon. Words carry drift:writtenForm and
drift:language; senses carry glosses and connotation.
Attestation intervals (OWL-Time), connotation labels
(Positive / Neutral / Negative), and relative frequency observations
that drive the sparkline in the timeline view.
Reified change event (drift:DriftEvent) with typed edges
drift:senseFrom / drift:senseTo, a
SKOS drift-type taxonomy, year / interval, and confidence (0.0–1.0).
A drift:TriggerEvent (dateable, Wikidata-linkable via
owl:sameAs) is connected to a drift event through a reified
drift:CausalHypothesis (drift:aboutDrift +
drift:proposedTrigger, under prov:wasInfluencedBy).
Each hypothesis carries a confidence and typed evidence on a five-rung ladder:
Speculative < FrequencyCorrelation < ChangeSignalAlignment <
LexicographicNote < ScholarlyAttestation.
Browse the live trigger timeline →
PROV-O-based. Every drift claim requires at least one
drift:Source with a URL. This is enforced structurally
by the SHACL shapes file, not just by convention.
Data sources
WORD-DRIFT layers a curated causal annotation over existing, publicly
available lexical change datasets. The backbone datasets are loaded
via Trails RML into the ETL pipeline (etl/rml/).
Diachronic Word Usage Graphs (Schlechtweg et al. 2020 & 2021).
Usage-graph clusters become drift:Sense nodes; cluster
transitions become candidate drift:DriftEvents.
License: CC-BY 4.0.
Binary and graded change labels for target word sets in DE and EN. Provides the gold-standard drift binary that anchors the curated sense pairs. License: CC-BY 4.0.
Digitales Wörterbuch der deutschen Sprache. Word-frequency curves
(Wortverlaufskurven) drive the drift:FrequencyObservation
nodes for German words. License: CC-BY-NC 3.0 DE.
Real-world trigger events are linked via owl:sameAs
to Wikidata entities (e.g. Q115500066 for the Querdenken-711 movement).
Dates, geolocation, and related entities federate in by SPARQL.
License: CC0 1.0.
Used as a citable source for English etymology and first-attestation dates in the curated showcase set. OED is a licensed resource; only bibliographic references are included in the KG, not content.
Used as a citable source for trigger event descriptions (e.g. the Querdenken movement article). CC-BY-SA 4.0.
Technical stack
The project uses a pure Semantic Web stack. No proprietary graph DB required for validation and the showcase queries; a qlever instance adds SPARQL federation with Wikidata for production use.
All data in Turtle, loaded via rdflib for validation. Namespaces registered at w3id.org for permanence.
Two shapes files enforce: every drift event has a source; every sense has a connotation; every trigger event has a date. Validated with pyshacl.
Query layer via qlever for production federation.
The four showcase queries in queries/ all validate
with rdflib's built-in SPARQL engine for CI.
ETL pipeline (etl/rml/) maps DWUG TSV and SemEval
JSON to the drift: ontology via RML rules, keeping ingest
reproducible and auditable.
The interactive visualiser is plain vanilla D3, served as static
files. No build step, no framework. Works from a local
python -m http.server.
License
The ontology modules (ontology/), SHACL shapes,
SPARQL queries, Python scripts, and the D3 visualiser are released
under the MIT License.
The hand-curated RDF instances in examples/ and
future data/ releases are published under
CC-BY 4.0. Source datasets retain their
own licenses (see Data sources above).
Citation
If you use the schema, data, or visualiser in your research, please cite the repository. A companion paper describing the causal ontology layer is in preparation; this entry will be updated when it is published.
@misc{worddrift2026,
title = {{WORD-DRIFT}: A Knowledge Graph for Evidenced Causal
Hypotheses in Lexical Semantic Change},
author = {Nennemann, Christian},
year = {2026},
url = {https://github.com/XORwell/word-drift},
note = {Version 0.4. Data: CC-BY 4.0; Code: MIT.}
}
Repository: github.com/XORwell/word-drift
Contribute
New word entries follow the pattern in examples/querdenker.ttl
and examples/funk.ttl. Every entry requires at minimum:
drift:Word with drift:writtenForm and drift:languagedrift:Sense nodes with gloss (EN), connotation, and drift:firstAttesteddrift:DriftEvent connecting the senses with a typed drift and at least one drift:Sourcedrift:TriggerEvent with owl:sameAs to Wikidata and a drift:confidence
Run python validate.py before submitting a pull request.
All examples must pass SHACL validation and the four SPARQL showcase queries.