× Searching for words or concepts Get Involved! Contribute to OMW Uploading a wordnet The structure of the LMF file A script for converting the simple tsv used in OMW 1.0 to GWA-LMF Interconverter for desired formats. (external tool) More information about the LMF metadata A script for uploading wordnets from the command line Documentation on the feedback after uploading a wordnet A summary of the wordnets in OMW Information about reporting an issue and giving feedback

OMW Documentation on LMF

This page provides some guidelines on how to prepare LMF for the Open Multilingual Wordnet.

Lexical markup framework (LMF; ISO 24613:2008) is the ISO International Organization for Standardization ISO/TC37 standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The LMF variant that we use here (GWA-LMF) is inspired by Wordnet-LMF. The schema is hosted on github, with documentation.

Guidelines for preparing the LMF

Here are more details on how to prepare the file.

Wordnet Metadata

Each lexicon must have correct metadata (see here for more detail) Extra properties may be included from the Dublin core.

Notes on the entries

There is extensive documentation with the schemas. Here we include a few tips that are not covered there.

Definitions

If you want to include a definition from somewhere else (such as the Princeton wordnet), or in a language other than that of the wordnet, please note it explicitly:

  <Definition language="ja">辞書の編集者または筆者</Definition>
  <Definition dc:source="pwn-3.0" language="en">a compiler or writer of a dictionary</Definition>

Semantic Relations

If you have a relation type not included in the list we have, please use other and give your more explicit type as dc:type. Or, if your type is a more specific subclass of an existing type, you can use the supertype and mark the specific type with dc:type.

<SynsetRelation relType="other" 
                dc:type="emotion" target="example-en-1234-n"/>
<SynsetRelation relType="antonym" 
                dc:type="gradable antonym" target="example-en-1234-n"/>

Variants

You can add variations of lemmas, including orthographic variations and transliterations, as shown below. You can have various classes of transliteration, and if they are automatically generated, you can give them a confidence score.

<LexicalEntry id="w613347">
  <Lemma writtenForm="动物沟通" partOfSpeech="n" script="Hans"/>
  <Form writtenForm="dòngwùgōutōng" script="Latn-pinyin">
  <Tag category="transliteration">pīnyīn</Tag>
    <Tag category="confidence">0.77</Tag>
  </Form>
  <Form writtenForm="dong4wu4gou1tong1" script="Latn-pinyin">
    <Tag category="transliteration">pin1yin1</Tag>
    <Tag category="confidence">0.77</Tag>
  </Form>
  <Form writtenForm="dongwugoutong" script="Latn-pinyin">
    <Tag category="transliteration">pinyin</Tag>
    <Tag category="confidence">0.77</Tag>
  </Form>
</LexicalEntry>

Synset Identifiers and adding Synsets to CILI

Tools for constructing GWA-LMF



References

The basic structure of the OMW and CILI is described here (this web page is more up-to-date):

Piek Vossen, Francis Bond and John P. McCrae (2016)
Toward a truly multilingual Global Wordnet Grid. In Eighth meeting of the Global WordNet Conference (GWC 2016), Bucharest
Piek Vossen, Francis Bond, John P. McCrae and Christiane Fellbaum (2016)
CILI: the Collaborative Interlingual Index. In Eighth meeting of the Global WordNet Conference (GWC 2016), Bucharest