That's like saying because "O" and "0" have the same shape that we can't solve OCR without general AI.
A modern statistical speech recognition system has no trouble determining that "they're unhappy, ness" is a dramatically less likely word sequence than "their unhappiness".
edit: I read your example backwards, but still, a statistical system can easily incorporate contextual words without actually understanding what they mean. Names from the speaker's contacts in particular are widely used in ASR systems for this reason.
That's because it doesn't care, it just goes for the most statistically probable phrasing in a general conversation, not the one you're actually having.
As for the OCR problem, try writing one for Chinese calligraphy and get back to me on if context is important or not.
A modern statistical speech recognition system has no trouble determining that "they're unhappy, ness" is a dramatically less likely word sequence than "their unhappiness".
edit: I read your example backwards, but still, a statistical system can easily incorporate contextual words without actually understanding what they mean. Names from the speaker's contacts in particular are widely used in ASR systems for this reason.