...
- to filter out proper names:
- ignore all words that start with a an UPPERCASE letter
- to filter out abbreviations:
- ignore all words that are fully in UPPERCASE
- ignore all words that are shorter than 3 chars
- ignore all words that start with a digit
- ignore all qnames, e.g., rdfs:subClassOf
- to filter out all IDMP-specific words - see the list below
- all remaining words are checked against the list from https://pypi.org/project/pyspellchecker/
- the list contains mostly nouns in the singular form - for the plural form the simple subtraction of the last letter is performed.
...