Ah, the Sennheiser VSM201. Just a $30K vocoder. Seems like it was $25K when it released in 1977, but also didn't get to sell even 50 units, so quite rare.
I guess you can get similar results with cheaper hardware, but if you have money and you have it around... ¯\_(ツ)_/¯
I didn't know the device. Also I didn't know that Kai Krause who later got famous through his Kai's Power Tools was an electronic music expert who sort of did sales for Sennheiser in 1977, according to this page (https://de.wikipedia.org/wiki/Sennheiser_VSM_201 - only on the German WP, it seems). He also wrote the manual for it.
I was surprised about how much better the Sennheiser sounds compared to the others. From the audio comparison in the first YouTube video. I did expect minor variations in the harmonics but the differences are quite significant between the models.
The other vocoder that sound almost as good is quite new and it seems to still be a prototype with a "contact us" price.
Yep! The Sennheiser and Ultimate VoIS are in their own league. There are some other rare high-end analogue vocoders that I would have loved to include in the comparison, but I don’t know anyone who owns them. The EMS vocoders are supposed to be amazing, too.
I can’t speak on Dromedary Modular’s behalf and I think rising parts costs have been an issue, but buying an Ultimate VoIS should be a fair bit cheaper than the Moog vocoder.
Never had the pleasure of a Sennheiser but when working in radio I got
my hands on a lot of rack vocoders for doing branding, stings and
idents. Funny how the number 9000 comes up a lot, like Roland VP9000
and Eventide H9000. 80 and 90s vintage ones like Korg VC-10 or
Elektronik EM-26 had unique sounds, but tbh the modern digital
recreations are amazing models. There's not a world of difference
between vocoding, autotune, shifting, harmonising etc once you realise
how all the fx are now based in FFT, convolution etc - just different
variations on processing and control graphs - and so it's fun to
create your own vocal effects in things like
Max/MSP/PureData. Technically there's a distinction between "effects"
and "processing" in terms of how much of the direct (parallel) signal
is put through. Chers Believe is a yardstick for "effect", whereas a
lot of what I hear with Daft Punk (and Air, Kraftwerk) is quite
heavily processed as to disguise the original voice entirely - just
letting a bit of top/sibilant through to define the stops and
fricatives.
Analog vocoders are only nominally like analog FFTs.
The shape of the filters, the smoothing between the filters and the synthesis section, and (on some models) the patchability all create a very different result.
The reason the best analog vocoders are so expensive is because the filter for each band is much more complex than a plain old bandpass filter, with a much higher component count. Typically there's a flatter passband and a steeper slope than you'd expect.
You can do digital convolution with thousands of bins and it sounds nothing like analog vocoding. It's much cleaner, doesn't have those lovely harmonically spaced filter resonances, and creates sounds that can feel more acoustic than electronic.
Did you listen to the example audio in the video? Soft synths and digital emulation can be absolutely amazing these days, but the VSM201 and Ultimate VoIS are in their own league. It’d be pretty easy to pick them out from a blind test with other vocoders.
Oh, it also might be of interest that the IVL algorithm isn’t FFT-based. I think their harmonizers sound better than the rest, so maybe FFT isn’t the best way to go.
Yes exactly, I was really excited when I found out that you do not need a FFT to do speech processing.
If you look at the code of (phone/voice) codecs GSM/Speex/Opus you can see that you can estimate the spectral envelope (or the configuration of a physical tube model for the vocal tract) in time domain with linear prediction coefficients (LPC).
And it is simple, e.g. the often used Levinson-Durbin algorithm is just 22 lines of C code.
It is an interesting exercise to build your own vocoder from scratch that fits in a single screen page.
Many of the code snippets I have seen (which likely have already processed your voice) are just translations of the Fortran code of the book "Linear Prediction of Speech" by Markel and Gray (1976).
Ah yes, ladder or lattice filters. If you don't mind old fashioned
mailing lists there's still a few of hanging around in
MUSIC-DSP@LISTS.COLUMBIA.EDU where code gets shared.
They used the VSM-201 for "Random Access Memories", their last Studio Album. At that time they didn't need such help anymore, they famously rented a huge amount of equipment and large studio-floors (i.e. they recorded simultaneously with microphones from different decades because Thomas heard a difference and wanted this reflected on the record).
Above all, the biggest help from his father was probably to insist that they keep the ownership of their music when signing with any label, regardless of any money, because as a producer he knew that this is how artists get screwed by record labels.
Yeah, owning their masters was a very smart move. I was surprised they rented the Sennheiser instead of buying one. Having said that, they don’t come up for sale very often.
I guess you can get similar results with cheaper hardware, but if you have money and you have it around... ¯\_(ツ)_/¯