Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> If it doesn’t, the buffer runs dry, and the user hears a nasty glitch or crackle: that’s the hard transition from audio, to silence.

It would be awesome if we could prevent this crackle somehow on a lower level of abstraction. What I mean by that is that if the buffer runs dry, the hardware (or the OS/audio driver) could do some prediction in order to bridge any gaps in the audio more nicely.



Its certainly true that different hardware handles buffer underruns differently, my high-quality RME interface emits fairly muted pops when the buffer runs dry, whereas I've had cheaper units make absolutely horrendous noises as the speakers flail around trying to cope with the signal discontinuities. The RME is interpolating the signal towards a zero crossing.


Would it be even better to smoothly join with the zero level instead of coming to a sudden halt?


That's what I mean by interpolating towards a zero crossing.


Sorry, my wording was unclear. I mean asymptotically approaching zero, as opposed to, say, going to zero in a straight line and then switching to a constant zero once you get there (thereby creating a discontinuous derivative).


Indeed, I imagine that is what is going on. The straight line case is what the loudspeaker driver will do in reality if you send it a discontinuity, and that's what produces the pop.

Smooth interpolation will avoid a really nasty pop, but in the real world, musical waveforms are highly complex, so any interpolation algorithm, however smooth, will produce some kind of artefact if you chop the wave in the middle of a cycle and smooth it to zero.

This can be observed when setting loop points in a sampler - you are usually provided with tools to help you match the loop points to the zero crossings. This is not enough however to remove all artefacts. Only some zero crossings will do: one has to match the higher-order cycles in the waveform as well. I don't really have the mathematical vocabulary to really describe what I mean here, but hopefully it's clear.

(BTW when I say driver in these posts I mean the magnet-and-cardboard-cone assembly in the speaker, not any kind of software.)


That's not how sound works. Think of a vibrating object. If it stops vibrating suddenly, slowly moving it to its center point isn't going to make the absence of vibration any less jarring.


Indeed. I think that's what I'm trying to describe in my sibling comment to yours.

However, what it will successfully avoid is the loudspeaker driver attempting to instantaneously snap from some nonzero x-position back to it's origin, which is what causes the really nasty clicks.


Ultimately it's not solving anything, because even with some faking it's still not the audio the performer wants to hear. And, besides, gap-free audio software is a solved problem. The article explains how to do it.


A lot of audio hardware does this - it switches the audio off only when the signal is at a zero crossing. So it can be implemented by the OS by telling the codec that the zero value is an underrun rather than a desired signal.


I solved that problem years ago by keeping a canned piece of white noise in a const buffer that I would switch to if there was an underrun, with an immediate ramped gain reduction to zero. The result were quite good!


Interesting. So what would happen if a sine wave of, say, 1Khz is suddenly shut off by a buffer underrun?


It would be cool if the hardware did some Fourier analysis to resample the buffers it gets (so it could make them run for longer if the buffers run dry), but it probably would cause some kind of latency issue. I reckon that just avoiding buffer underruns is less overhead.


I had to cut off an audio file abruptly once, and I found that I could smooth out the abruptness by quickly fading in a reverberated version of the audio, just as the original audio was about to end, and then letting it ring for a fraction of a second after the original had ended.

(I say "fading in", but it might have been that I had the reverb applied but dry, and transitioned to wet just before the signal ended.)


A colleague of mine actually patented this technique: http://www.google.com/patents/US8538038


:-/


Yep, this should not be a patent..It's a trivial solution to the problem that could be devised by any sound engineer!


If you have any evidence of prior work to that patent (I really hope you do), then you can topple that patent. If your code to do what you said isn't free software I'd recommend you release it as free software now.


I can't check right now, but it's possible that the patent predates my use of the technique. Even if not, I don't know how I could prove it.

By the way, it wasn't done in code; I did it manually in Ardour [0].

[0] <https://ardour.org/>


This seems like the most sensible thing to do, and was actually what I was getting at :)

What do you mean by latency issues? Why would there be any?


Quoting from the Waldorf microwave XT synthesizer FAQ:

The brightness of the click depends on the speed of the level change. The faster the level changes, the brighter is the click. So, the level change speed can be compared with the cutoff of a lowpass filter. There is an easy formula for it:

Let's consider a level change from full to zero (or from zero to full) output from one sample to another on a machine that uses 44.1kHz sample rate. So, we first transfer the sample to milli seconds:

1 sample equals 1/44100 second, which is = 0.02267573696ms.

To calculate the cutoff frequency of the click, just use this formula:

Cutoff (Hz) = 1000 / Level Change Time (ms)

which in the example results in:

44100Hz = 1000 / 0.02267573696ms

Whoops? This the sampling frequency and, err, very bright.

http://faq.waldorfian.info/faq-browse.php?product=xt#116


Waldorf seems like a cool company. It makes me smile to see that they went into such detail in the manual. Plus the Microwave XT sounds amazing.


With zero crossing detection, it would need to know the buffer is running out before the next zero crossing occurs. So if the shutdown signal is given within the last millisecond before the zero crossing, it would ignore all data that comes after the zero, and output zero for the remaining partial wave. On starting up again, it would wait until the signal is at zero before beginning to output. At least that's how the common audio codecs I've seen that implement that feature work. The most common application is volume control, where you don't want a sudden change in the amplitude of the wave to result in a glitch, so you adjust amplitude at zero crossings.


Some network protocols for live audio performances do this.

Filling the audio buffers with some predicted blocks of audio to avoid an ugly sounding gap.

See for instance in this Thesis: "Low-Latency Audio over IP on embedded IP systems" http://www.ti5.tu-harburg.de/staff/meier/master/meier_audio_...

Sect 4.1.2 Packet Loss Handling




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: