March 7, 2009
Years ago I worked on speech software and learned that speech recognition is hard because all the energy is in the vowels but all the intelligence is in the consonants. That’s why whispering works. Extracting voice out of noise is difficult because the sounds that matter, the consonants, sound a lot like the noise. You get all the vowels, but they don’t help much.
Our son Milo is starting to write, and I’m learning that lesson again. When I ask him to spell a word, such as “hat”, he’ll just come up with “H” and “T”, as if he’s spelling the whispered version of it. He’s focusing on the parts that matter.
Making this harder is that, in English, every vowel can sound like just about any other vowel. For example, the bold vowels here all sound the same: terse, first, worse, and purse. For A and O you’ve got ball and ostrich as well as pal and vowels. When Milo chooses the wrong vowel (which is about half the time), the vowel he guesses does make that sound in another word.
It won’t be long before we switch to memorization (sometimes called “sight words”). Logic can’t take you far with English.