As you can see from this NGram, the total number of words in the indexed English corpus that were nouns, verbs, adjectives, adverbs, determinants, pronouns, adpositions, numerals, conjunctions, or particles was around 83%.
This could be an infrastructure or programming issue, but assuming it’s not, what possible explanation is there that this number is not 100%? Things like interjections are left out, but are 17% of the words in the entirety of Google’s indexed literature interjections, or is there a better explanation?
Algorithmically speaking, identifying a word or phrase as a particular part of speech is not possible in all contexts. A word functions as one part of speech in a context and another in a different place, even without any morphological changes.
To make matters more difficult, fiction thrives on using words creatively, leaving the reader to fend for himself as to how a word is to be interpreted or in how many different ways.
The statistics probably reflect that up to about 17 per cent of the words could not be categorically determined to belong to one or the other part of speech — not that they do not belong to any of the known parts of speech.
Between circa 1523 – 1650 AD, the number of such unaccounted-for words scales vertically from (14.2) to (83.1) where it settles down and remains to date (2007: 83.9). Probably because the English language, especially in fiction writing, was not “normalized” until very recently.
Hm… as you can see this is a mere hypothesis and I could be wrong, even.