Perception of Tempo Research

The Debate Over Visual Verses Audible Stimuli to Achieve Synchronization During The Performance of Polytemporal Music

The debate over using visual or auditory stimuli to best convey tempo still rages. It is a scholarly debate, centered on the accuracy of response over a specified length of time, speed of cognition, response time from onset of first stimulus, and other measurements. Testing has been passed around the physiology and psychology departments of several institutions dating back to the 1900's. The most popular conclusion, which I find difficult to accept, favors an auditory stimuli, e.g. the use of a click track. Here's a summary of recent findings:

"Rhythmic synchronization (the ability to entrain (fall gradually into synchrony with a rhythm) one’s movements to a perceived periodic stimulus, such as a metronome) is a widespread human ability that has been studied for over a century in the cognitive sciences (Repp & Su, 2012). Across many studies, a basic finding which has been extensively replicated is that entrainment is more accurate to auditory than to visual rhythmic stimuli with identical timing characteristics (e.g., to a metronomic series of tones vs. flashes)."
..."Synchronization to Auditory and Visual Rhythms in Hearing and Deaf Individuals" by Iversen, Patel, Nicodemus, and Emmorey

My challenge to this particular study lies in the selection of the test subjects, i.e. the "hearing and deaf individuals." The study goes on to state that "Hearing participants were recruited from the San Diego area and tested at San Diego State University (SDSU) and were not selected on the basis of musical experience." Right away the test is flawed for our purposes. Our concern is on the perception of tempo in trained musicians. My experience tells me that valuable information is transmitted in the gestures of a conductor (and a concert master), and this information can't possibly be conveyed with auditory stimuli. Just try to imagine the execution of a fermata, a retardando, an accelerando, even the first page of Stravinsky's "Soldier's Tale" without visual cues. Without the incorporation of sub-beats suddenly appearing in the click track, these kinds of nuances become difficult to convey with an auditory system.

Another study, which used only musicians and gamers as test subjects, refined the conclusions of the first study even further:

"Synchronization of finger taps with periodically flashing visual stimuli is known to be much more variable than synchronization with an auditory metronome. When one of these rhythms is the synchronization target and the other serves as a distracter at various temporal offsets, strong auditory dominance is observed. However, it has recently been shown that visuomotor synchronization improves substantially with moving stimuli such as a continuously bouncing ball. The present study pitted a bouncing ball against an auditory metronome in a target–distracter synchronization paradigm, with the participants being auditory experts (musicians) and visual experts (video gamers and ball players). Synchronization was still less variable with auditory than with visual target stimuli in both groups. For musicians, auditory stimuli tended to be more distracting than visual stimuli, whereas the opposite was the case for the visual experts. Overall, there was no main effect of distracter modality. Thus, a distracting spatiotemporal visual rhythm can be as effective as a distracting auditory rhythm in its capacity to perturb synchronous movement, but its effectiveness also depends on modality-specific expertise.

My primary objection to using an audible click track is that it invokes a reactive process. The performer must first listen to several beats before subtle -- or abrupt -- changes in tempo can be comprehended and adjustments made. This is not the way musicians are trained. Idiosyncratic gestures from the conductor and concert master have been worked out over the centuries to relay fluctuations in tempo and precise synchronization. Unlike audible cues, visual cues incorporate that one important element: anticipation. The performer can follow the down stroke of the conductor's hand and anticipate where -- and hence when -- the precise occurrence of the "beat" (called the ictus) takes place. I have seen examples of visual cueing systems that do not include an ictus, and in my opinion, they are difficult to follow.

Musicians (with the possible exception of pianists) are trained to follow visual cues for abrupt and subtle shifts in tempo. And we've all seen the phenomenon where an inexperienced pianist has difficulty playing with a group. So how do I explain the astute capability of blind musicians like Nobuyuki Tsujii? I can't. First of all, I would point out that at Tsjii's level, he is performing with professional conductors and orchestras. The tempos and all variations have been worked out in advance with the conductor. In several interviews, Tsujii mentions that he listens to breathing cues. "While sensing the breath and the signals of the conductor, my own breath would gradually match his."

I would also like to point out that musicians are trained to "count to themselves." We listen to ourselves counting in our mind's ear, and we can hear those beats in our heads...just like we can hear musical notes and rhythms in our heads. What kind of stimuli is being measured when we think "one-e-and-a" to ourselves? We have practiced dividing a single interval of time into two equal parts, three equal parts, four equal parts, five equal parts, etc., and this internal subdivision of a tempo's beat allows us to accurately stay "in tempo." I'm not sure this mental phenomenon is taken into account by researchers. As best as I could determine, this "counting to oneself by internally subdividing a beat" was not a technique verified or considered in the researchers observations. Are musicians processing an audible stimuli or a visual stimuli when they count to themselves.

I decided to investigate the weaknesses inherent in a visual cuing system and ended up developing a program that, unfortunately, omitted the ictus and confirmed what I already doesn't quite work. You can see two examples of wasted time below.

For those who want to browse the actual research work cited above, I have included .docx copies for download here:

  1. Perceived Duration of Events Research
  2. Synchronization to auditory and visual rhythms
  3. BouncingBallMeetsMetronome

My First Attempt at a Visual Synchronizing System

At this point I'd like to introduce some of my own research, which involves the construction of a visual synchronizing system. One positive feature of this attempt is that the graphic program accepts a midi input, which makes the rendering of each performer's part, with all of its fluctuations in tempo, much easier to produce. In other words, the graphic program readily adapts and adheres to fluctuating tempos because it follows midi, which automatically retains all of the tempo-related information.

The first version of the visual polytemporal synchronizing system, shown below, was rendered in black and white and mimics a standard 6-beat conductor's pattern.

You have to wait about 9 seconds at the start because the system is designed to add more information at the beginning, like confirmation of sync, titles, measure numbers, etc., before the graphics start.

Visually, it almost works.


6 Beat Pattern - Black & White

In the second version, shown below, I used the same 6 beat pattern but added color. I thought it might be easier to follow the moving "ball" if it were a brighter color, and that by continually showing the intended path, it might be easier to predict the ictus. However, the improvement was negligible.

Again, you have to be a little bit patient at the start because it's setup to add more information at the beginning.

6 Beat Pattern - Color

Visually, my system failed for two reasons:

  1. The ictus, or recoil at the end of each beat that clearly defines when the beat occurs, is missing. Cartoon animators long ago learned the necessity of using speed ramps when programming objects in motion. Actions such as walking, throwing a ball or cars stopping and starting always incorporate speed ramps in their construction. When the speed ramp is eliminated, determining the precise moment the action starts and stops is much more difficult.
  2. The natural acceleration that occurs between subsequent beats is missing. For example, the path from the top of the pattern to the first beat is longer than the path from the fifth to the sixth beat. The natural acceleration that helps equalize the difference between each ictus is missing. There's just something too smooth in the movement of the ball.

  3. Encountering Professor von Vierordt

    I was rehearsing with a string quartet when Karl von Vierordt dropped by from the mid-19th century. Von Vierordt had investigated our perception of time, or rather our perception of timing. How do we estimate intervals of time? How do we synchronize these intervals to form a tempo? How accurate is our ability to repeat different intervals of time? All of this is key to our ability to accurately sustain a given tempo. In this field of study, the reference to metronome clicks is quite common.

    One of Vierordt's Laws that continues to be investigated even today is that "short temporal intervals tend to be overestimated and long temporal intervals tend to be underestimated." You can expand this postulation to the extent that it seems to explain why we tend to rush fast tempi and drag slow tempi. Or, in an smaller window of concentration, if one listens to two beats (or clicks of a metronome) at a given interval, and is then asked to perform a 3rd beat that matches the time interval (tempo) with a third beat, that third beat will be shorter, i.e. slightly faster, than the first two -- if the tempo is relatively fast -- and longer if the tempo is relatively slow. Somewhere in the middle of fast and slow is a sweet spot where our performance of a consistent tempo is measurably better.

    Here's the way Vierordt's Laws are presented by the researchers:

    When observers are presented with various intervals of different lengths and subsequently asked to reproduce each interval – they tend to overestimate the duration of short intervals, and underestimate long ones (Jazayeri & Shadlen, 2010, 2015). This is a type of ‘central-tendency’ effect — participants migrate their estimates of duration towards the mean of exposed intervals. A prevalent model of such an effect is that the perception of interval duration is derived from not only the perception of current sensory information, but also from the prior knowledge of the duration of previously exposed intervals (Jazayeri & Shadlen, 2010; Lejeune & Wearden, 2009; Murai & Yotsumoto, 2016; Petzschner & Glasauer, 2011; Petzschner et al., 2015; Roach et al., 2016; Shi & Burr, 2016; Taatgen & van Rijn, 2011). Prior knowledge of the temporal statistics of the environment, in this sense, biases temporal perception.

    To excerpt one more passage from the referenced research documents:

    ...Vierordt's law of time estimation is the principle that short temporal intervals tend to be overestimated and long temporal intervals tend to be underestimated. Also, in this context of time perception/estimation, the concept of the in-difference interval is defined as the intermediate length of time that is neither underestimated nor overestimated. Based on the general law of time estimation by Vierordt in the late 1800s, subsequent research in the area of the psychology of time has determined that the overestimation of short durations and the underestimation of long ones is as valid for "filled" durations/intervals as for "empty" durations/intervals. Thus, in turn, and grounded in Vierordt's law of time estimation, psychologists today study the effect of the different forms of "filling" a temporal interval (ranging from the use of short, discrete auditory tones to long, more continuous and meaningful narratives/events/materials) on one's perceived duration and estimation of time.

    von Vierordt Pays A Visit

    A strange event occurred during the rehearsal of a string quartet that seems to be related to Vierordt's Law. The quartet was performing one of my pieces that is, for the most part, written in 12/8 with a duplet feel but conducted in 6/4 for obvious my arm doesn't fall off. The piece is based on Reich's rhythm for "Clapping Music" and the permutations thereof:   

    The tempo at measure 127 drops from 160 BPM to 88 BPM following a fermata. The tempo remains at 88 BPM for several more measures repeating a melodic permutation of the "clapping rhythm." At the start of measure 136, the tempo begins to slowly climb back up to 160 BPM using an absolutely smooth, linear ramp that lasts for 37 measures. (My video networked synchronizing system has a performance tolerance of 1/30 of a second, and the calculations of the tempo ramp are accurate down to 1/1000 of a second.) However, after the 160 BPM tempo had been re-established, that tempo felt faster than the previous 160 BPM. In other words, the return to the "normal" 160 BPM seemed noticeably faster than the previous performance at 160 BPM. The discrepancy was noticeable to me, so I asked the quartet if they had experienced anything strange during that passage. Here are my notes written during the session:

    During the ramp from 88 to 160, players reported a bump in the tempo and pointed to the same general area within the score. It was not a smooth ramp. Players indicated the tempo after measure 173 was not the same as before the ramp. 160 BPM no longer equaled the previous 160 and clapping rhythm was now faster. Strange stuff.

    As soon as I have time, I'll publish the score to that section along with the synchronizing video on this page so you can perform this ramp and verify the phenomenon. Please get back to me with your findings.