Jump to content

Philosophical Research:Data model: Difference between revisions

From Philosophical Research
Lexemes
m LithoGRAPHica
Line 1: Line 1:
What are Items, Lexemes, and Ontology pages? Perhaps you have already found technical descriptions of how each of these things behave, but still would like to know what their intended purpose is.
What are Items, Lexemes, and Ontology pages? Perhaps you have already found technical descriptions of how each of these things behave, but still would like to know what their intended purpose is.


First of all, Litho<i>graph</i>ica is an <i>ontology</i>. It is not a "page", "book", or "text" composed of linguistic utterances; instead, it is primarily a mathematical graph made up of points connected by arrows. The act of studying natural processes or meaning within particular texts is carried out through the act of assigning each separable object or image a point and drawing particular kinds of arrows between them, much as an astronomer might keep track of particular areas of stars by drawing a constellation. By drawing a lot of arrows we become able to use simple building blocks to replicate outwardly-complex structures and processes, such as all the bone names in a typical bird skeleton, all the stars and planets in a particular star system, or all the separate [https://en.wikipedia.org/wiki/Earl#England earldoms] in 1000s England.
First of all, Litho<em>graph</em>ica is an <i>ontology</i>. It is not a "page", "book", or "text" composed of linguistic utterances; instead, it is primarily a mathematical graph made up of points connected by arrows. The act of studying natural processes or meaning within particular texts is carried out through the act of assigning each separable object or image a point and drawing particular kinds of arrows between them, much as an astronomer might keep track of particular areas of stars by drawing a constellation. By drawing a lot of arrows we become able to use simple building blocks to replicate outwardly-complex structures and processes, such as all the bone names in a typical bird skeleton, all the stars and planets in a particular star system, or all the separate [https://en.wikipedia.org/wiki/Earl#England earldoms] in 1000s England.


In terms of the digital tools used to encode this "arrow method", [https://www.w3.org/TR/rdf11-concepts/ RDF]-style data structures are used to encode each relationship between things. Given some concept A and some concept B, we can term the arrow between them the <dfn style="font-weight: bold; font-style: normal;">predicate</dfn> and begin differentiating different kinds of relationships into different kinds of <dfn style="font-weight: bold; font-style: normal;">RDF properties</dfn>. Some arrows can now begin to describe literal information such as names or measurements, while other arrows describe specific kinds of structural relationships such as a book belonging to a series. RDF-style data frameworks can then mark particular kinds of nodes as generally containing certain kinds of arrows, like a book series generally containing books — this leads to the concept of <dfn style="font-weight: bold; font-style: normal;">Resources</dfn> or <dfn style="font-weight: bold; font-style: normal;">Items</dfn> which are said to conform to particular <dfn style="font-weight: bold; font-style: normal;">schemas</dfn> or data structure <dfn style="font-weight: bold; font-style: normal;">classes</dfn>. This is the basis of the Wikibase Item structure, the Wikibase Property structure which is used to model arbitrary new RDF-style properties, and the Wikibase Lexeme structure which applies the concepts of the Wikibase Item class to model elements of particular human languages, typically in their written form.
In terms of the digital tools used to encode this "arrow method", [https://www.w3.org/TR/rdf11-concepts/ RDF]-style data structures are used to encode each relationship between things. Given some concept A and some concept B, we can term the arrow between them the <dfn style="font-weight: bold; font-style: normal;">predicate</dfn> and begin differentiating different kinds of relationships into different kinds of <dfn style="font-weight: bold; font-style: normal;">RDF properties</dfn>. Some arrows can now begin to describe literal information such as names or measurements, while other arrows describe specific kinds of structural relationships such as a book belonging to a series. RDF-style data frameworks can then mark particular kinds of nodes as generally containing certain kinds of arrows, like a book series generally containing books — this leads to the concept of <dfn style="font-weight: bold; font-style: normal;">Resources</dfn> or <dfn style="font-weight: bold; font-style: normal;">Items</dfn> which are said to conform to particular <dfn style="font-weight: bold; font-style: normal;">schemas</dfn> or data structure <dfn style="font-weight: bold; font-style: normal;">classes</dfn>. This is the basis of the Wikibase Item structure, the Wikibase Property structure which is used to model arbitrary new RDF-style properties, and the Wikibase Lexeme structure which applies the concepts of the Wikibase Item class to model elements of particular human languages, typically in their written form.


Litho<i>graph</i>ica is a bit like <i>building Wikipedia backwards</i>. Instead of starting from broad concepts and working down into fine-grained sections, the goal is to work up from the most elementary and easily-observable concepts and build progressively larger concepts or statements, which in some cases receive their own wiki pages acting as human-readable summaries of the mathematical Item relationships. Wiki pages appear eventually as the core ontological models designed mostly-independently of language snap together and solidify and thus become commonly-understood and easy to describe in natural language.
Litho<em>graph</em>ica is a bit like <i>building Wikipedia backwards</i>. Instead of starting from broad concepts and working down into fine-grained sections, the goal is to work up from the most elementary and easily-observable concepts and build progressively larger concepts or statements, which in some cases receive their own wiki pages acting as human-readable summaries of the mathematical Item relationships. Wiki pages appear eventually as the core ontological models designed mostly-independently of language snap together and solidify and thus become commonly-understood and easy to describe in natural language.


In theory, Ontology pages may become localized at some point as the project grows, such that each node deemed interesting enough for a summary has a summary in any number of natural languages. Early on, many areas of the ontology including Lexemes have focused on studying texts in either English, Japanese, or German, but there is no particular reason for this other than the desire to centralize parallel models of the same concept in the same place, which in the case of Lexemes is the language-separated term — here we encounter a minor conflict between the Litho<i>graph</i>ica use of Lexemes as concept disambiguation and the ontolex definition of Lexemes as being separated by language.
In theory, Ontology pages may become localized at some point as the project grows, such that each node deemed interesting enough for a summary has a summary in any number of natural languages. Early on, many areas of the ontology including Lexemes have focused on studying texts in either English, Japanese, or German, but there is no particular reason for this other than the desire to centralize parallel models of the same concept in the same place, which in the case of Lexemes is the language-separated term — here we encounter a minor conflict between the Litho<em>graph</em>ica use of Lexemes as concept disambiguation and the ontolex definition of Lexemes as being separated by language.


------
------
Line 61: Line 61:
== Lexemes ==
== Lexemes ==


Lexemes are Item-like Entities provided by the Wikibase Lexeme extension. Similar to a dictionary entry, their basic purpose is to divide specific recorded languages into words or phrases of particular grammatical categories (for example, English noun or German verb), and map the connections between a set of related written forms and a set of distinct but related meanings. As far as Litho<i>graph</i>ica is concerned, Lexemes are to be used like disambiguation pages between ambiguous written words and word-independent concepts (Items). Terms are usually sorted by language, but for the purposes of this project their precise grammatical categories are broader to allow for notions like abstract nouns that express themselves into verbs and adjectives, etc. (This will be described in more detail elsewhere — later.)
Lexemes are Item-like Entities provided by the Wikibase Lexeme extension. Similar to a dictionary entry, their basic purpose is to divide specific recorded languages into words or phrases of particular grammatical categories (for example, English noun or German verb), and map the connections between a set of related written forms and a set of distinct but related meanings. As far as Litho<em>graph</em>ica is concerned, Lexemes are to be used like disambiguation pages between ambiguous written words and word-independent concepts (Items). Terms are usually sorted by language, but for the purposes of this project their precise grammatical categories are broader to allow for notions like abstract nouns that express themselves into verbs and adjectives, etc. (This will be described in more detail elsewhere — later.)


The Lexeme structure is also (mis)used for a few more specialized roles where Lexemes are more strictly interpreted as written signifiers, as explained below.
The Lexeme structure is also (mis)used for a few more specialized roles where Lexemes are more strictly interpreted as written signifiers, as explained below.
Line 73: Line 73:
== Works and editions ==
== Works and editions ==


In general, this project follows a simplified and incomplete version of the FRBR standards. Works and editions should not be separated, and instead should be regarded as if the characteristics of particular editions are all varying characteristics of the work they originate from. This simplification is for the purposes of making data entry slightly easier, or for those who are willing to take the effort to separate out editions anyway, to allow separate edition identifiers to all be managed by Wikidata. The logic goes that if Wikidata is already a repository detailing almost all "official" published works, there is no real need to duplicate the effort again especially if it would result in a single book having two Items on Wikidata and two Items on Litho<i>graph</i>ica which the same user might have to create all at once.
In general, this project follows a simplified and incomplete version of the FRBR standards. Works and editions should not be separated, and instead should be regarded as if the characteristics of particular editions are all varying characteristics of the work they originate from. This simplification is for the purposes of making data entry slightly easier, or for those who are willing to take the effort to separate out editions anyway, to allow separate edition identifiers to all be managed by Wikidata. The logic goes that if Wikidata is already a repository detailing almost all "official" published works, there is no real need to duplicate the effort again especially if it would result in a single book having two Items on Wikidata and two Items on Litho<em>graph</em>ica which the same user might have to create all at once.


* A graphic novel which neatly follows the story of a particular prose volume with no deviation should be considered an edition of the same work. ex.: <cite>Silver Eyes</cite> trilogy (FNaF), <cite>Wings of Fire</cite> graphix adaptation
* A graphic novel which neatly follows the story of a particular prose volume with no deviation should be considered an edition of the same work. ex.: <cite>Silver Eyes</cite> trilogy (FNaF), <cite>Wings of Fire</cite> graphix adaptation

Revision as of 02:56, 12 April 2025

What are Items, Lexemes, and Ontology pages? Perhaps you have already found technical descriptions of how each of these things behave, but still would like to know what their intended purpose is.

First of all, Lithographica is an ontology. It is not a "page", "book", or "text" composed of linguistic utterances; instead, it is primarily a mathematical graph made up of points connected by arrows. The act of studying natural processes or meaning within particular texts is carried out through the act of assigning each separable object or image a point and drawing particular kinds of arrows between them, much as an astronomer might keep track of particular areas of stars by drawing a constellation. By drawing a lot of arrows we become able to use simple building blocks to replicate outwardly-complex structures and processes, such as all the bone names in a typical bird skeleton, all the stars and planets in a particular star system, or all the separate earldoms in 1000s England.

In terms of the digital tools used to encode this "arrow method", RDF-style data structures are used to encode each relationship between things. Given some concept A and some concept B, we can term the arrow between them the predicate and begin differentiating different kinds of relationships into different kinds of RDF properties. Some arrows can now begin to describe literal information such as names or measurements, while other arrows describe specific kinds of structural relationships such as a book belonging to a series. RDF-style data frameworks can then mark particular kinds of nodes as generally containing certain kinds of arrows, like a book series generally containing books — this leads to the concept of Resources or Items which are said to conform to particular schemas or data structure classes. This is the basis of the Wikibase Item structure, the Wikibase Property structure which is used to model arbitrary new RDF-style properties, and the Wikibase Lexeme structure which applies the concepts of the Wikibase Item class to model elements of particular human languages, typically in their written form.

Lithographica is a bit like building Wikipedia backwards. Instead of starting from broad concepts and working down into fine-grained sections, the goal is to work up from the most elementary and easily-observable concepts and build progressively larger concepts or statements, which in some cases receive their own wiki pages acting as human-readable summaries of the mathematical Item relationships. Wiki pages appear eventually as the core ontological models designed mostly-independently of language snap together and solidify and thus become commonly-understood and easy to describe in natural language.

In theory, Ontology pages may become localized at some point as the project grows, such that each node deemed interesting enough for a summary has a summary in any number of natural languages. Early on, many areas of the ontology including Lexemes have focused on studying texts in either English, Japanese, or German, but there is no particular reason for this other than the desire to centralize parallel models of the same concept in the same place, which in the case of Lexemes is the language-separated term — here we encounter a minor conflict between the Lithographica use of Lexemes as concept disambiguation and the ontolex definition of Lexemes as being separated by language.


  • [1] - SQL tables versus RDF. I just like this page, I think it's neat

Elementary Items

these Entities often have the purpose of linking to descriptions of elementary observable concepts in other databases such as Wikidata, Wikipedia, and Fandom wikis.

at times, elementary Items can form their own definitions through set-theory Properties modeling an object's structure: nucleon - consists of components - quark - at order of magnitude - on average 3 (quantum physics).

Sign Entities - these have been under consideration as an improvement on Wikibase Items. currently it appears that they will not be implemented as a new data structure, but may return later in the form of an extension to name particular Wikibase predicate-statements and tag them with their own RDF Resource classes.

Z Items

Z stands for Zettel (card) or Zahl (number), both in reference to Z Items being the most generic kind of concept a "number", "card", or "entry" can be assigned to.

S Items

S stands for signifier, statement (in the case of S2 Statements), or structure (in the case of S0 data structures).

Statement Items

  • Statement Items: z2, s2, f2
    • these Items express concrete, hypothetical, or purely-counterfactual relationships between elementary Items.
    • this is a somewhat different way of doing things than Wikidata does them. it means that at an internal level, the whole notion of named Claims could possibly be replaced with regular Items with shorter ID strings.
    • relying on Statement Items has the advantage of making the SeaTurtle approach more viable.
    • it also has the advantage of making it easier to achieve SPoV from the beginning. Statement Items inherently promote the emergence of plural ontologies suitable for modeling a real world of competing plural philosophies and models.

S2 Statements

F2 Statements

f stands for false or fringe-science

Z2 Statements

Ontological-category Items

S0 Concepts

logical or metaphysical categories which group Z or S items

Other kinds of "zero" categories

There are no F0 Concepts because of the way in which the inherently arbitrary nature of some S0 classifications makes it impossible to identify "objectively wrong" categories. However, Z0 Concepts might or might not be introduced in the future for the purposes of utterly tried-and-true mathematical data structures in fields such as quantum-mechanical mathematics. Rather than having any chance to be an arbitrary historically-contingent grouping, Z0 Concept would unambiguously refer to a data structure or RDF-style Resource class as defined by its mathematical fields, relations, or capabilities: most zoological or botanical taxa could not be marked Z0 because of the inherent uncertainty in the exact minutiae of where they should be placed and how we can be sure, but at the same time the basic concept of a clade might end up classified as Z0 because it is a pattern which has been consistently observed again and again with little variation.

It is vaguely possible some kind of "H0" categorization could be introduced for structures which are strictly historically-contingent. "[H0] United States" would contain "[Z] California" and "[Z] Texas", while "[H0] Romance languages" would contain "[Z] French" and "[Z] English". This might improve the distinction between categories which are due to patterns and categories which are traditional: a category such as "[H0] analytic philosophy" could be defined almost purely based on which people wrote letters to each other rather than the specific philosophical content of their works, while a category such as "[Z0] molecular Trotskyism" could be defined strictly based on the presence of particular mathematical relations within the structure of a Marxist society (a movement's internal Particle Theory or Bauplan).

Lexemes

Lexemes are Item-like Entities provided by the Wikibase Lexeme extension. Similar to a dictionary entry, their basic purpose is to divide specific recorded languages into words or phrases of particular grammatical categories (for example, English noun or German verb), and map the connections between a set of related written forms and a set of distinct but related meanings. As far as Lithographica is concerned, Lexemes are to be used like disambiguation pages between ambiguous written words and word-independent concepts (Items). Terms are usually sorted by language, but for the purposes of this project their precise grammatical categories are broader to allow for notions like abstract nouns that express themselves into verbs and adjectives, etc. (This will be described in more detail elsewhere — later.)

The Lexeme structure is also (mis)used for a few more specialized roles where Lexemes are more strictly interpreted as written signifiers, as explained below.

Citation Lexemes

A citation Lexeme is meant to represent the concept of a particular referenced work rendered into speech. Citation Lexemes do not hold the contrasting connotations of works ("Dragon Ball means wild power escalation"), but instead simply associate the titles or aliases of works to particular sub-series or editions ("Dragon Ball and DBZ are two names for the same continuous series"). The Senses of a citation Lexeme should be suitable for linking to particular Z Items or S0 Concepts which constitute actual works or collections of works somebody would reference, while the Forms of the citation Lexeme can be any names the Senses are ever referred to by no matter how ambiguous. Each Sense should be connected to the collection of Forms which represent it so that it becomes clear which names refer to which parts of the series or work. Specific numbered parts of a larger work can also be added as Senses in the case they come to be referenced so often they overshadow other parts. Note that it is generally not necessary to add every numbered part of a work as a Sense — if individual numbered parts are being referenced this often, it may be best to simply reference them as Items or give them their own separate citation Lexemes.

Citation Lexemes may be especially useful for thesis portals referencing works by recurring bop citation codes. A recurring citation code should be recorded on a particular citation Lexeme, and the Lexeme Sense linked by Property on its corresponding Item, such that searching the citation code can bring up both the Lexeme and the Item. At that point the citation code can safely link itself directly to the Item for the most specific numbered part of the work referenced.

Works and editions

In general, this project follows a simplified and incomplete version of the FRBR standards. Works and editions should not be separated, and instead should be regarded as if the characteristics of particular editions are all varying characteristics of the work they originate from. This simplification is for the purposes of making data entry slightly easier, or for those who are willing to take the effort to separate out editions anyway, to allow separate edition identifiers to all be managed by Wikidata. The logic goes that if Wikidata is already a repository detailing almost all "official" published works, there is no real need to duplicate the effort again especially if it would result in a single book having two Items on Wikidata and two Items on Lithographica which the same user might have to create all at once.

  • A graphic novel which neatly follows the story of a particular prose volume with no deviation should be considered an edition of the same work. ex.: Silver Eyes trilogy (FNaF), Wings of Fire graphix adaptation
  • A film adaptation which neatly maps to a particular prose volume and does not intentionally deviate from it should be considered an edition of the same work. ex.: Harry Potter and the Philosopher's Stone
  • A film adaptation which "adapts" a larger series but does not map to a particular prose or comic volume should be considered a different work. ex.: Dragon Ball Evolution
  • A dramatized adaptation which neatly maps to a particular comic volume but has its own set of numbered parts should be formally considered an edition of the same work, but is allowed to have a separate Item primarily for the purpose of grouping differing sets of numbered parts. ex.: Dragon Ball (books), Dragon Ball / Dragon Ball Z (shows)

Part of the reason this system was devised was it was too confusing and unintuitive for new Wikidata editors to immediately identify an edition of a comic. Are volumes of a serialized comic considered works? (No. An entire collection of volumes is considered an edition, despite the misleading Wikidata data constraint that Items should not have more than one ISBN.) If several non-serialized comics are collected together, what is this? (An edition of the individual comics.) If a graphix adaptation has hardcover and softcover bindings, does it count as a work, or the line of adapted books count as a separate series? (It shouldn't.) If fans create an abridged series, does this count as an edition? (It should. Journey to the West was also abridged.)