I am a data scientist who has a question about collocations based on a book I am reading. The book is “Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists” by Alice Zheng, Amanda Casari.
In chapter 3, regarding working with text data and natural language processing, the authors state:
“Collocations are more meaningful than the sum of their parts. For instance, “strong tea” has a different meaning beyond “great physical strength” and “tea”; therefore, it is considered a collocation. The phrase “cute puppy,” on the other hand, means exactly the sum of its parts: “cute” and “puppy.” Thus, it is not considered a collocation.”
I’m struggling to understand the difference between the two examples they’ve given. Why exactly is it “cute puppy” is not a collocation whereas “strong tea” is? I’ve asked a friend who is an English language teacher, and he seems to think both are collocations. Are the authors incorrect in this instance?
The Book can be found here (links to relevant section).
Things either collocate or don’t but it is not “scientific” at all. It is about semantics and discourse (a probable arrangement or existence of terms in a particular field or context). It is the expectation of finding one word with another in a particular context.
So (to give a very easy example) in a mathematical treatise, “shit” would not collocate with the discourse or the context of formal mathematics. However, change that context to mathematicians in a chat room, and one might say to another: “That’s just a shit argument, mate.”
The example above is collocation at the level of discourse. But anyone analyzing a text or speech has to define their context: at the lexical level, at the sentence level, in speech, in writing, in a particular field, etc. It’s all up to the person analyzing some linguistic form or phenomenon. But, the definition(s) should cohere with a basic understanding of what a “word” (as found in a dictionary) actually is. Please read on…
Collocate just means “to be located in the same place as” or “to share a location”.
That said, the book you are reading seems to have a somewhat different take on the matter, and, is describing what I would call a “set phrase”, or a “cliché”.
And the example of “strong tea” is not the best because the meaning
“great physical strength” is not the meaning of “strong” in strong
tea. The pertinent meaning here is: extreme,intense, rich in some
active agent (Merriam Webster). This is borne out by considering: “weak tea”, its “natural opposite” or adjectival antonym, if you will. Those two authors are not linguists per se. There is no “rule” that only first meanings of words can be used to compare “side-by-side collocations” of meanings of words in cliché phrases such as “strong tea” and “cute puppy”.
A collocation linguistically speaking is most definitely not limited to an adjective + a noun as in “cute puppy” or “strong tea”. One has to define one’s one’s playground. Those authors have limited their definitions to a very circumscribed usage of the term collocation: an adjective modifying (or describing) noun or one “set” next to one. And seem also to have based their analysis of “first meanings”. That’s the stumbling block for me here.
They are using the meaning of collocation as “side by side” as in to set or arrange in a place or position, especially: to set side by side (Merriam Webster). And the broader meaning in that same dictionary is: intransitive verb: to occur in conjunction with something. Ergo, not necessarily side by side. But in any case, I believe they have made a semantic error by assuming that only first meanings of adjectives count. They are actually comparing non-comparable items.
I think one might say that in general parlance “cute puppy” and “strong tea” are indeed collocations in general speech in English (even if you aren’t Australian [joke]). One would very much expect to find those adjectives side by side with those nouns. What would, however, not collocate in that sense might be: “strong puppy” and “cute tea”.
[Please note: my bias here,if I have one, is that I am a translator, and deal with this collocation issue in my daily grind all the time.]