...
To find possibly misspelled words we use the following algorithm (it is implemented in https://github.com/edmcouncil/tools/blob/develop/spellcheck/onto_spellchecker.py):
- to filter out proper names:
- ignore all words that start with a UPPERCASE letter
- to filter out abbreviations:
- ignore all words that are fully in UPPERCASE
- ignore all words that are shorter than 2 chars
- ignore all words that start with a digit
- to filter out all IDMP-specific words - see the list below
- all remaining words are checked against the list from https://pypi.org/project/pyspellchecker/
- the list contains mostly nouns in the singular form - for the plural form the simple substraction subtraction of the last letter is performed.
...