EventKG Data Model
- The EventKG data model is a multilingual, event-centric temporal knowledge graph that offers structured representations of events and their temporal relations.
- It extends the W3C Simple Event Model with a robust ontology, integrating data from sources like DBpedia, Wikidata, YAGO, and curated Wikipedia event lists.
- The model supports detailed temporal provenance, entity alignment, and fusion strategies to enhance semantic analysis of both historical and contemporary events.
EventKG is a multilingual event-centric temporal knowledge graph designed to facilitate semantic analysis of both contemporary and historical events on the Web. It provides canonical, structured representations for events, their participating entities, and temporal relations—including fine-grained validity intervals—across datasets such as DBpedia, Wikidata, YAGO, and curated Wikipedia event lists. The data model is grounded in a formal temporal knowledge graph abstraction, provides an expressive OWL/RDF schema, and supports entity/event alignment, provenance, and popularity metrics. EventKG is published as RDF (Turtle/RDF/XML), integrating over 690,000 events and 2.3 million temporal relations with extensive provenance and mapping strategies (Gottschalk et al., 2019, Gottschalk et al., 2018).
1. Formal Definition of the Temporal Knowledge Graph
EventKG is formally defined as a temporal knowledge graph , where:
- is the node set, partitioned into entities (e.g., people, organizations, places) and events .
- Each entity or event has a unique identifier and an associated time interval:
encodes the existence interval for entities or happening time for events.
- is the set of temporal relations, each as:
where links and and is valid during .
This model supports direct encoding of temporal provenance and event-centric semantics that are lacking in traditional, purely entity-centric knowledge graphs.
2. Ontology Schema: Core Classes, Hierarchy, and Extensions
EventKG extends the W3C Simple Event Model (SEM) to address schema limitations and supports arbitrary event/entity–entity relations. The ontological hierarchy is:
- sem:Core: Superclass for all temporal entities (both events and non-events).
- sem:Event sem:Core: Real-world happenings (e.g., elections, conflicts, tournaments).
- sem:Actor sem:Core: Event participants (persons, organizations).
- sem:Place sem:Core: Spatial locations.
- eventKG-s:Relation sem:Core: Encodes binary relations (event–entity, entity–entity, event–event) with role type and temporal validity.
- sem:RoleType: Specifies the predicate type implemented by a Relation instance.
Other notable properties include:
- so:hasSubEvent: Links an event to a subevent.
- dbo:previousEvent / dbo:nextEvent: Orders events in a series.
- so:containedInPlace: Constructs place hierarchies (e.g., city containedInPlace country).
3. Key Properties, Predicates, and Modeling Patterns
The EventKG data model employs an expressive predicate set with defined domains/ranges and cardinality rules. Principal properties include:
- rdfs:label: Multilingual labels for sem:Core entities (domain: sem:Core; range: xsd:string, per language).
- dcterms:description: Human-readable descriptions (domain: sem:Event or sem:Core; range: rdfs:Literal).
- dcterms:alternative: Alternative names/aliases.
- sem:hasBeginTimeStamp, sem:hasEndTimeStamp: Time interval boundaries for entities/events/relations (domain: sem:Core; range: xsd:date or xsd:dateTime).
- sem:hasPlace: Event location linking (domain: sem:Event; range: sem:Place).
- sem:roleType: Encodes the semantic type of an eventKG-s:Relation.
- eventKG-s:links: Wikipedia interlink count (domain: eventKG-s:Relation; range: xsd:nonNegativeInteger).
- eventKG-s:mentions: Wikipedia sentence-level co-occurrence count.
- eventKG-s:extractedFrom: Named graph or dataset provenance of the resource.
- owl:sameAs: Entity alignment across sources.
Each statement resides in a named graph, ensuring granular provenance (e.g., facts from Wikidata vs. facts fused by EventKG integration).
Cardinality constraints:
- At most one begin/end timestamp per sem:Core entity.
- At most one place per event after fusion.
- eventKG-s:extractedFrom is functional (one source per resource).
4. Data Model Representation: RDF, Namespaces, URIs, and Triple Patterns
EventKG is disseminated in RDF, with a preference for Turtle syntax and explicit namespace management:
| Prefix | URI Namespace | Usage |
|---|---|---|
| sem: | http://semanticweb.cs.vu.nl/2009/11/sem/ | SEM classes |
| so: | http://schema.org/ | Sub-event, place |
| dbo: | http://dbpedia.org/ontology/ | Event series |
| rdfs:, dcterms: | Standard W3C URLs | Labels, descriptions |
| eventKG-s: | http://eventkg.l3s.uni-hannover.de/schema/ | Relation schema |
| eventKG-r: | http://eventkg.l3s.uni-hannover.de/resource/ | Fused resources |
| eventKG-g: | http://eventkg.l3s.uni-hannover.de/graph/ | Named graphs |
Triple patterns feature relation reification: all binary relations with temporal/role qualifiers are modeled as instances of eventKG-s:Relation (predicate subject, object, validity, role, link/mention counts). Provenance is attached via named graphs rather than triple-level reification.
Example (Turtle):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
eventKG-r:Event_ObamaInaug2013 a sem:Event ; rdfs:label "Second inauguration of Barack Obama"@en ; sem:hasBeginTimeStamp "2013-01-20T12:00:00"^^xsd:dateTime ; sem:hasPlace eventKG-r:Place_Washington_DC ; eventKG-s:extractedFrom <http://eventkg.l3s.uni-hannover.de/graph/Wikidata> ; owl:sameAs <http://www.wikidata.org/entity/Q181781> . eventKG-r:Rel_ObamaInaug2013 a eventKG-s:Relation ; eventKG-s:subject <http://dbpedia.org/resource/Barack_Obama> ; eventKG-s:object eventKG-r:Event_ObamaInaug2013 ; sem:hasBeginTimeStamp "2013-01-20T12:00:00"^^xsd:dateTime ; sem:hasRoleType sem:participant ; eventKG-s:links 42 ; eventKG-s:mentions 17 ; eventKG-s:extractedFrom <http://eventkg.l3s.uni-hannover.de/graph/DBpedia_en> . |
5. Integration, Mapping, and Fusion Strategies
EventKG integrates event and entity data from five primary sources: Wikidata (all languages), DBpedia (per-language dumps), YAGO, Wikipedia event lists, and the Wikipedia Current Events Portal (WCEP).
Pipeline:
- Preprocessing: Regex for date extraction, blacklists, predicate mapping table (e.g., Wikidata P361 "part of" → so:hasSubEvent).
- Event identification: Instance extraction by type/subclassing in respective graphs (e.g., wd:Q1656682 for events in Wikidata, dbo:Event in DBpedia).
- Relation extraction:
- Relations with explicit time qualifiers are mapped to eventKG-s:Relation using direct temporal intervals.
- Indirect relations infer interval from object’s timestamps if a dedicated one is missing.
- Manual mapping handles subevent, series, and location relations.
- Interlinking counts (links/mentions) are computed over Wikipedia.
- Integration: Identical entities/events across sources merged using owl:sameAs (mostly via Wikidata IDs). Wikipedia-list events without URIs are deduplicated via date/entity overlap, unmapped ones minted with eventKG-s:extractedFrom.
- Fusion:
- Time fusion: Drop default dates if more precise alternatives, use majority vote, rank source trust (Wikidata > DBpedia > Wikipedia lists > WCEP > YAGO).
- Location fusion: Union and prune transitively contained places (via so:containedInPlace).
- Type fusion: Normalize types to DBpedia ontology via mappings and owl:sameAs.
6. Extended Formalisms and Algorithms
Beyond the schema-centric formalism, EventKG adopts:
- Rule-based fusion procedures for date, location, and type consolidation—ensuring consistency and precision across integrated sources.
- Distant supervision model for biographical timeline generation: Uses SVMs and feature engineering (relation property identifier, Wikipedia link/mention metrics, and temporal distance) to rank and select relations most biographically relevant for an entity. The output is an ordered relation list:
where are chronologically ordered, biography-relevant relations for .
These algorithms are not provided as pseudocode in the source, but their implementation protocols and feature sets are described explicitly.
7. Provenance, Multilinguality, and URIs
EventKG uses named graphs for statement-level provenance, distinguishing facts extracted from individual sources (e.g., GRAPH eventKG-g:wikidata) and fused/integrated knowledge (GRAPH eventKG-g:event_kg). Multilingual support is achieved via rdfs:label, dcterms:alternative, and dcterms:description—each with @lang tags. URIs for resources are minted via eventKG-r: for integrated facts, with owl:sameAs used to maintain cross-source identity.
EventKG establishes a comprehensive, extensible event ontology and a robust data representation formalism integrating temporal and semantic dimensions across large event and entity datasets. Its pipeline, schema, and mapping strategies exemplify the state of practice for event-centric temporal knowledge graph construction in the semantic web community (Gottschalk et al., 2019, Gottschalk et al., 2018, Guan et al., 2021).