Description of the mapping (in JSON):
[Syntax · Examples: ]
Dictionary to be transformed (in XML):
[Examples: ]
Results:
Syntax of the JSON mapping descriptions
The JSON object should contain the following members:
entry — describes the selector for entry elements
entry_lang — describes the transformer for the language attribute of the entries
sense — describes the selector for sense elements
hw — describes the transformer for headwords
sec_hw — describes the transformer for secondary headwords
variant — describes the transformer for variant headwords
inflected — describes the transformer for inflected forms
pos — describes the transformer for part-of-speech tags
hw_tr — describes the transformer for translations of headwords
hw_tr_lang — describes the transformer for the language of the translations of headwords
ex — describes the transformer for examples
ex_tr — describes the transformer for translations of examples
ex_tr_lang — describes the transformer for the language of the translations of examples
def — describes the transformer for definitions
gloss — describes the transformer for sense indicators
usg — describes the transformer for labels
note — describes the transformer for notes
xr — describes the transformer for cross-references
Selector descriptions
A selector is a rule that selects 0 or more elements in the input XML tree.
The description of a selector must be a JSON object. This object must contain
an attribute named type, whose value specifies the type the selector,
plus one or more other attributes whose name and meaning depends on the selector type.
The following types of selectors
are currently supported:
Xpath selector: selects the nodes that match a given xpath expression
(given in an attribute named expr). Optionally, an attribute named namespaces
may also be present; its value should be a dictionary defining any namespace
prefixes that occur in expr.
Examples:
A transformer is a rule that describes which data from the
input document must be transformed into a certain type of element in the output document.
The description of a transformer must be a JSON object. This object must contain
an attribute named type, whose value specifies the type the transformer,
plus one or more other attributes whose name and meaning depends on the transformer type.
The following types of transformers are currently supported:
(1) Simple transformers
A simple transformer selects a set of elements and extracts an attribute
or the inner text from these elements; optionally applies a regular expression
to the resulting text and returns the substring matched by a specific group within
the regular expression.
The JSON object that describes a simple transformer must contain
the following attributes:
type: this must be the string "simple".
selector: a JSON object describing a selector.
attr: the name of an attribute (from the elements selected
by the selector) whose value is to be extracted.
To extract the inner text of the element, instead of an attribute, use the pseudo-attribute name
"{http://elex.is/wp1/teiLex0Mapper/meta}innerText".
To extract the inner text of the element and all of its descendants,
use "{http://elex.is/wp1/teiLex0Mapper/meta}innerTextRec".
To return a constant value instead of extracting the value of an attribute, use the
pseudo-attribute name
"{http://elex.is/wp1/teiLex0Mapper/meta}constant".
rex: a regular expression that is applied to the value
of the attribute attr. If this string does not contain any match for this
regular expression, the current element is not transformed
(i.e. it is as if it hadn't been selected by the selector at all).
If there are several matches, the first one is used.
This attribute is optional. If present, it must use
the Python
regular expression syntax.
rexGroup: this attribute is optional. If present,
it must be the name of one of the named groups (?P<name>...)
from the regular expression given by the attribute rex.
In this case, only the string that matched this named group will
be used, rather than the entire value of the attribute attr.
const: this attribute should be present it attr was
set to "{http://elex.is/wp1/teiLex0Mapper/meta}constant", and
should provide the constant value that you want to return as the result of
the transformation.
xlat: this attribute is optional. If present,
it should be a hash table that will be used to transform the string
obtained from the previous steps (attribute lookup, regex matching).
In other words, the string s will be replaced by xlat[s]
if s appears as a key in xlat (otherwise, s will remain
unchanged, just as if xlat had not been provided at all).
adoptSelector: this attribute is optional. If present,
it should be a JSON object describing a selector; elements selected by it
will be moved inside the element that the transformer is working on.
Here, when we find an <Example>, we then also find all
<Translation> elements that are descendants of its parent
(the "../Translation" xpath is taken to be relative to the
<Example> that the transformer is working on at the moment).
Those <Translation>s are moved inside the <Example>,
resulting in a transformed structure of the form:
<cit type="example">
<quote>J'ai mangé une pomme.</quote>
<cit type="translation"><quote xml:lang="dk">Jeg har spist et æble.</quote></cit>
...
</cit>
(2) Union transformers
A union transformer takes a set of simple transformers and performs all of their
transformations. This might be useful if you need to combine several different
transformation rules, e.g. extract attribute @a from instances of the element
<b> and also extract attribute @c from instances of the element
<d>.
The JSON object that describes a union transformer must contain
the following attributes:
type: this must be the string "union".
transformers: an array of JSON objects describing the transformers
that are to be combined.