Oh I have done this many times, when I said I've been studying this clip I really meant it because I truly want to hear it. LambdaComplex also mentioned tapping foot and counting and suddenly the count should change.
The result of my many experiments is that it is super hard to count 1 2 3 4 when they only clap twice. Most of the times, within seconds, I'm 'rhythmically' counting claps. Be it 'one three one three' or whatever numbers I chose, but the number only changes with the next clap so, well, that won't work because the audience doesn't change how they clap and I'm only using two different numbers.
I'm forcing myself hard to count to 4, basically saying another number in-between claps and if it works for an extended period of time I'm so focused on keeping my own momentum and keeping the 4-count going that I don't even hear the music just the claps and I try to remember to say the other number in between 1 and 3. That also absolutely fails because I'm already checked out of whatever I'm trying to hear
I also did experiments with my friends and some of them just hear when it happens without any counting or whatever. So I tried that as well - I would listen to this with my eyes closed and mark the point when I thought the clapping has changed. I think the results were statistically as good as my internal approximation of how long 40second is until I learned to hear the exact part of music I have to hear before I say 'yeah, it's that' - but it has nothing to do with my understanding of it.
It might help to know that the "instrumental magic" is that he literally adds a beat. Each bar has 4 beats and he switches where they're clapping by giving one bar 5 beats. You kind of have to stop/restart counting to intellectualize it.
Of course, but he adds the beat so smoothly and unobtrusively that the ordinary listener hears no glitch in the matrix. That is where the magic comes in.
So it looks like a perceptual thing, the result of how your physical brain has been wired, where you cannot perceive the phenomenon directly despite having a solid analytic understanding of the matter.
This reminds me of Frank Jackson's famous Knowledge Argument for Qualia
The result of my many experiments is that it is super hard to count 1 2 3 4 when they only clap twice. Most of the times, within seconds, I'm 'rhythmically' counting claps. Be it 'one three one three' or whatever numbers I chose, but the number only changes with the next clap so, well, that won't work because the audience doesn't change how they clap and I'm only using two different numbers.
I'm forcing myself hard to count to 4, basically saying another number in-between claps and if it works for an extended period of time I'm so focused on keeping my own momentum and keeping the 4-count going that I don't even hear the music just the claps and I try to remember to say the other number in between 1 and 3. That also absolutely fails because I'm already checked out of whatever I'm trying to hear
I also did experiments with my friends and some of them just hear when it happens without any counting or whatever. So I tried that as well - I would listen to this with my eyes closed and mark the point when I thought the clapping has changed. I think the results were statistically as good as my internal approximation of how long 40second is until I learned to hear the exact part of music I have to hear before I say 'yeah, it's that' - but it has nothing to do with my understanding of it.