My native language is Danish, with its gigantic number of vowel sounds, and this undoubtedly affects how I hear English vowels. However, one phenomenon in English has bothered me for many years, especially in Received Pronunciation, and I have so far been unable to locate any descriptions of it.

No matter how many times I listen to the English /u:/ sound, like in “do” /du:/, what I really hear on the phonetic level is not a pure, single-vowel [du:], but something like [dyu:] or even [dyu̯:].
And by this [y], I really mean the vowel sound which in IPA is written [y], that is, the sound of “u” in French and “ü” in German. So what I hear (and, I admit, also say) is a diphthong, starting with a short [y] and ending in a semivowel version of [u]. If I say simply [du:], it sounds in my ears completely different from how I hear most native (RP) speakers say it, and more like something you might hear from people with a strong Italian accent.

Is it just me hearing things, or is this an actual phenomenon?

EDIT: This question is similar to Pronunciation of ‘few’ as [ˈfjyu̯], but not identical, and importantly, the answers to that question do not directly concern the pronunciation phenomenon I am talking about. This is the case, however, for the answers to the present question.


You aren’t just hearing things. For many English speakers, the phoneme /uː/ is realized as a fairly front vowel in most contexts, and since the four English “tense” vowels (the vowels in fleece, goose, face and goat) tend to be realized with a bit of a high offglide at the end, this could reasonably be transcribed as [yu̯]. The frontness of a vowel can be measured acoustically in terms of the value of its “second formant” or “F2”: higher values of F2 are associated with fronter vowels.

The Atlas of North American English gives the following maps showing the areas in the United States where a front or central realization of /u/ is common. Apparently, a front realization is particularly likely after a coronal consonant (this definitely includes /n, t, d, tʃ, dʒ, s, z, ʃ, ʒ/; I’m not sure if it includes l). The captions use “Tuw” to represent a coronal consonant followed by /u/ and “Kuw” to represent a non-coronal consonant followed by /u/.

To summarize, /u/ has a somewhat fronted or centralized realization (mean F2 greater than 1200 Hz) in general for most North American speakers, but there is a band stretching across the north of the United States where more back realizations are still common. The use of a fairly front realization (mean F2 greater than 1550) after coronal consonants is even more widespread, with the backer realizations mostly occurring in certain spots in Minnesota/Wisconsin, New England and New Jersey.

map of Tuw

map of Kuw

The linguist Geoff Lindsey has made some blog posts mentioning the existence of central or front realizations of /uː/ in “Standard Southern British“, and the tendency to use a more back realization before “dark l”.

I (American English speaker) recently measured the position of my vowels in Praat, and found that I pronounce the word “mood” using a vowel with an F1 around 300~350 and F2 around 1800, the word “pool” using a vowel with an F1 around 300~350 and F2 around 800~900, and the word “heed” using a vowel with an F1 around 250~350 and F2 around 2200.

Wikipedia suggests that front-of-center realizations of /u/ (when it’s not before /l/) may be particularly common in “California English“. It cites a web page “Northern California Vowels” from Penny Eckert’s website that says

Below is a vowel plot showing the shifting of /uw/ (new, food). This vowel is represented as black circles with arrows. When /uw/ is followed by /l/ as in school, it does not shift, but remains where we expect it to be. This plot shows that other occurrences of /uw/, however, overlap with the vowel in mister (empty circles) and approach the vowel in me (empty circles with arrows).

scatter plot of F1 vs. F2 for vowels in four words. "New" has F2 between 2000 and 2600, "school" has F2 between 1200 and 14000, "mister" has F2 between 2000 and 2400, and "me" has F2 between 2620 and 3200

