Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. to filter out proper names:
    1. ignore all words that start with a an UPPERCASE letter
  2. to filter out abbreviations:
    1. ignore all words that are fully in UPPERCASE
    2. ignore all words that are shorter than 3 chars
    3. ignore all words that start with a digit
    4. ignore all qnames, e.g., rdfs:subClassOf
  3. to filter out all IDMP-specific words - see the list below
  4. all remaining words are checked against the list from https://pypi.org/project/pyspellchecker/
    1. the list contains mostly nouns in the singular form - for the plural form the simple subtraction of the last letter is performed.

...