Jump to content

User:Reversedragon/Embedding RDF in wiki pages (SeaTurtle proposal)

From Philosophical Research

During the early development of this project, we attempted to use Wikibase, but quickly realized that there were some serious problems with it in terms of getting started from a fresh MediaWiki install. One of the biggest problems was that it was not trivial to transclude Wikibase Items into regular wiki pages through templates — you need to install the ParserFunctions extension, which for reasons we were not able to install. Another problem was that it was not easy to list Wikibase Items using normal MediaWiki Categories, or customize category listings to be suitable for displaying basic Item metadata. This would seem obvious for the use case of new users exploring a given knowledge base who do not know much about MediaWiki or the way Wikibase works — nested categories are a great way to get a feel for what kinds of overall topics the knowledge base covers, and MediaWiki's built-in Category mechanism is easy and uncomplicated for new users to learn to edit.

This slowly led us toward the development of a new MediaWiki extension, tentatively named SeaTurtle. Said extension is currently only in the planning phase, but this page will provide help with coding it.

The SeaTurtle concept has been more or less abandoned in favor of simply laying out tentative Entities on Ontology pages with templates.

Wikibase representation[edit]

Wikibase already has an established RDF representation of its data model for exporting triples and creating large data dumps, as well as an official OWL ontology for some prefixes and data types. (Warning: the OWL file may start a download.) This is conceivably useful for the purpose of manipulating Items inside text pages.

Although Wikibase does have a JSON serialization format, this can quickly get unwieldy for purposes such as Entity labels. Ideally, if we are to store Entities in pages as text, we should make sure the representation of Entities more or less follows a similar design philosophy to wikitext, such that edits to an Item make sense in a page diff view, and so forth.

Storing RDF inside text pages[edit]

RDF Turtle can be embedded into an HTML page using the HTML script tag. The Turtle format is relatively easy to work with because it mostly consists of simple lines of three consecutive concept URIs: <Subject> <Predicate> <Object>, Q1 a Item. Any seemingly long URI prefixes can also be abbreviated with the @prefix directive.

<script type="text/turtle" id="P15"><![CDATA[
@base         <https://research.moraleconomy.au/entity/> .   # called wd: in Wikidata's dumps
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix wdt:  <https://research.moraleconomy.au/prop/direct/> .

<P15>
  # "a" is pre-defined to stand for rdf:type, but the purpose of it is arguably hard to remember
  rdf:type wikibase:Property ;   # the entity is a Property
  wikibase:propertyType wikibase:WikibaseItem ;   # the Property takes an Item as its value

  # claims with simple values - "truthy" statements
  wdt:P30 <P14> ;   # inverse property of - appears in work  (en)

  schema:description "a Property"@en ;   # Item or Property description
  skos:altLabel      "depicts"@en , "illustrates"@en , "tropes"@en , "motifs"@en;   # each alternate label
  skos:prefLabel     "work depicts or contains"@en .   # primary label
  # if the least-edited thing is last, it's harder to forget the last period.
]]></script>

For a prettier display of RDF statements and the potential to use the multilingual editor to add or change lines just as with Wikibase, we want MediaWiki to find this Turtle block and interpret all its lines into a series of claims internally. This should not be difficult — Turtle is relatively easy to parse. Once we know MediaWiki can parse the contents, a simpler syntax for marking Turtle blocks may be in order:

```ttl
@base  <https://research.moraleconomy.au/entity/> .
# ... prefixes ...

<P15>  a  wikibase:Property .
# ... characteristics or claims ...
<P15>  skos:altLabel  "depicts"@en .
<P15>  skos:altLabel  "illustrates"@en .
<P15>  skos:prefLabel  "work depicts or contains"@en .
# we could make every statement a "complete sentence" like this; the semicolon and comma are just abbreviations.
```

The only real issue with this simplified syntax, or even the wordier HTML syntax, is that it does not necessarily signal to MediaWiki that this is not simply "a" random Turtle example for decoration but really is the Turtle block to represent this particular wiki page. For this purpose we can make use of MediaWiki's Magic words feature and add a string which marks a Turtle block as an Entity block. In theory, MediaWiki should scan the overall page for __ENTITY__, mark the page as a potential Entity if found, and if this string was found on a line inside a particular Turtle block, begin parsing the Turtle block as a special Entity block rather than simply for syntax highlighting.

```ttl __ENTITY__
@base  <https://research.moraleconomy.au/entity/> .  # __ENTITY__ could also go in a comment, etc
# ... prefixes ...

<P15>  a  wikibase:Property .
# ... characteristics or claims ...
<P15>  skos:prefLabel  "work depicts or contains"@en .
```

Lexemes[edit]

This is an example Lexeme entry based on the official Wikibase RDF mapping:

```ttl
@base         <https://research.moraleconomy.au/entity/> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dct:  <http://purl.org/dc/terms/> .
@prefix ontolex:  <http://www.w3.org/ns/lemon/ontolex#> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix wdt:  <https://research.moraleconomy.au/prop/direct/> .

<L404>
 rdf:type  wikibase:Lexeme ;  # the Entity is a Lexeme
 wikibase:lexicalCategory  wd:foo ;  # this project has a particular term classification system - see below
   # [S0] philosophical tradition / field
   # [S0] plurally-replicated uncountable phenomenon - PRIN-primary concept
 wikibase:lemma  "Trotskyism"@en ;  # also emitted as rdfs:label
 dct:language  wd:Q1860 ;  # English / en

 ontolex:lexicalForm <L404-F1>, <L404-F2>, <L404-F3>, <L404-F4>, <L404-F5>, <L404-F6>, <L404-F7>, <L404-F8>, <L404-F9>, <L404-F10>, <L404-F11>, <L404-F12>, <L404-F13>, <L404-F14>, <L404-F15> ;
 ontolex:sense       <L404-S1>, <L404-S2>, <L404-S3>, <L404-S4> .

<L404-S1>
 rdf:type wikibase:Sense , ontolex:LexicalSense ;
 # ... any Sense may contain wdt: statements ...
 skos:definition "sect of Leninism created between 1906-1925 with many splinter sects or divisions"@en .  # also emitted as rdfs:label

<L404-S2>
 rdf:type wikibase:Sense , ontolex:LexicalSense ;
 skos:definition "model of Leninism which requires a world civilization - usually international-conference Trotskyism"@en .  # also emitted as rdfs:label

<L404-S3>
 rdf:type wikibase:Sense , ontolex:LexicalSense ;
 skos:definition "model of Leninism which unites several parties across multiple countries or nationalities into a civilization - international-party Trotskyism, plural international-conference Trotskyisms, international-identity Trotskyism, etc."@en .  # also emitted as rdfs:label

<L404-S4>
 rdf:type wikibase:Sense , ontolex:LexicalSense ;
 skos:definition "any model of Leninism which meets particular criteria for being an enemy of Stalin Thought - Trotskyism in one country, Molecular Trotskyism, etc."@en .  # also emitted as rdfs:label

<L404-F1>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "Trotskyism"@en ;  # also emitted as rdfs:label
 # ... any Form may contain wdt: statements ...
 wikibase:grammaticalFeature wd:PRIN .

<L404-F2>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "Trotskyist"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:PRAN .

<L404-F3>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "realize Trotskyism"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:PRVIP .

<L404-F4>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "Trotskyist theorist"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:DPRAN .

<L404-F5>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "ours Trotskyist theories"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:HLNP .

<L404-F6>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "Trotskyist movement"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:DPRNS .

<L404-F7>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "Trotskyist movements"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:DPRNP .

<L404-F8>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "Leninism"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:HPRIN .

<L404-F9>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "Leninist"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:HPRAN .

<L404-F10>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "organize Trotskyists"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:DVIP .

<L404-F11>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "Trotskyisms"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:BNP .

<L404-F12>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "Trotskyist workers' states"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:RNP .

<L404-F13>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "regenerate Trotskyism"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:RVIP .

<L404-F14>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "realize Trotskyism onto"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:RVTP .

<L404-F14>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "transition to Trotskyism"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:RVTP .

<L404-F15>
 rdf:type  wikibase:Form , ontolex:Form ;
 ontolex:representation "Trotsky"@en ;  # also emitted as rdfs:label
 wikibase:grammaticalFeature wd:PNI .
```

Despite the Lexeme mapping being relatively to-the-point, it is still potentially wordy enough to scare off new editors. Some of this could be omitted by giving the SeaTurtle extension the ability to intelligently read rdf prefixes from Category pages:

```ttl
@base  <https://research.moraleconomy.au/entity/> .  # all the other prefixes will be added via Categories
# [[Category:RDF prefix rdf]] [[Category:RDF prefix rdfs]] [[Category:RDF prefix dct]] [[Category:RDF prefix ontolex]] [[Category:RDF prefix wikibase]] [[Category:RDF prefix wdt]]

<L404>
 rdf:type  wikibase:Lexeme ;  # the Entity is a Lexeme
 wikibase:lexicalCategory  wd:foo ;
 wikibase:lemma  "Trotskyism"@en ;
 dct:language  <Q1860> ;  # English / en

<L404-S1>
 rdf:type wikibase:Sense ;
 # ... any Sense may contain wdt: statements ...
 skos:definition "sect of Leninism created between 1906-1925 with many splinter sects or divisions"@en .

<L404-S2>
 rdf:type wikibase:Sense ;
 skos:definition "model of Leninism which requires a world civilization - usually international-conference Trotskyism"@en .

<L404-S3>
 rdf:type wikibase:Sense ;
 skos:definition "model of Leninism which unites several parties across multiple countries or nationalities into a civilization - international-party Trotskyism, plural international-conference Trotskyisms, international-identity Trotskyism, etc."@en .

<L404-S4>
 rdf:type wikibase:Sense ;
 skos:definition "any model of Leninism which meets particular criteria for being an enemy of Stalin Thought - Trotskyism in one country, Molecular Trotskyism, etc."@en .

<L404-F1>
 rdf:type  wikibase:Form ;
 ontolex:representation "Trotskyism"@en ;
 # ... any Form may contain wdt: statements ...
 wikibase:grammaticalFeature wd:PRIN .

<L404-F2>
 rdf:type  wikibase:Form ;
 ontolex:representation "Trotskyist"@en ;
 wikibase:grammaticalFeature wd:PRAN .

<L404-F3>
 rdf:type  wikibase:Form ;
 ontolex:representation "realize Trotskyism"@en ;
 wikibase:grammaticalFeature wd:PRVIP .

<L404-F4>
 rdf:type  wikibase:Form ;
 ontolex:representation "Trotskyist theorist"@en ;
 wikibase:grammaticalFeature wd:DPRAN .

<L404-F5>
 rdf:type  wikibase:Form ;
 ontolex:representation "ours Trotskyist theories"@en ;
 wikibase:grammaticalFeature wd:HLNP .

<L404-F6>
 rdf:type  wikibase:Form ;
 ontolex:representation "Trotskyist movement"@en ;
 wikibase:grammaticalFeature wd:DPRNS .

<L404-F7>
 rdf:type  wikibase:Form ;
 ontolex:representation "Trotskyist movements"@en ;
 wikibase:grammaticalFeature wd:DPRNP .

<L404-F8>
 rdf:type  wikibase:Form ;
 ontolex:representation "Leninism"@en ;
 wikibase:grammaticalFeature wd:HPRIN .

<L404-F9>
 rdf:type  wikibase:Form ;
 ontolex:representation "Leninist"@en ;
 wikibase:grammaticalFeature wd:HPRAN .

<L404-F10>
 rdf:type  wikibase:Form ;
 ontolex:representation "organize Trotskyists"@en ;
 wikibase:grammaticalFeature wd:DVIP .

<L404-F11>
 rdf:type  wikibase:Form ;
 ontolex:representation "Trotskyisms"@en ;
 wikibase:grammaticalFeature wd:BNP .

<L404-F12>
 rdf:type  wikibase:Form ;
 ontolex:representation "Trotskyist workers' states"@en ;
 wikibase:grammaticalFeature wd:RNP .

<L404-F13>
 rdf:type  wikibase:Form ;
 ontolex:representation "regenerate Trotskyism"@en ;
 wikibase:grammaticalFeature wd:RVIP .

<L404-F14>
 rdf:type  wikibase:Form ;
 ontolex:representation "realize Trotskyism onto"@en ;
 wikibase:grammaticalFeature wd:RVTP .

<L404-F14>
 rdf:type  wikibase:Form ;
 ontolex:representation "transition to Trotskyism"@en ;
 wikibase:grammaticalFeature wd:RVTP .

<L404-F15>
 rdf:type  wikibase:Form ;
 ontolex:representation "Trotsky"@en ;
 wikibase:grammaticalFeature wd:PNI .
```

Querying Turtle blocks[edit]

Turtle blocks may seem almost too simple. How can the search function possibly query for them? Well, every Entity within Wikibase secretly contains a JSON file, and Wikibase manages to search through these just fine.

It seems (?) that the claims inside Wikibase Items are cached in a SQL database. If this is the case, searching for any label or Property ID should not be any slower than if Entities were input in JSON. For that matter, a regular text search should be able to find un-localized Item and Property IDs or an Item's own localized labels on any Item.