Video Network System

Synchronizing Multiple Computers for Playback

Using a Video Network is one of the more precise, complicated and expensive approaches to producing polytemporal music. It is also the most flexible. The potential to expand the system to include any number of players using any number of variable tempo maps is almost unlimited ... assuming you have the budget. Since the specifications and performance of computer equipment keeps changing, the learning curve is almost endless. Just as you settle on the hardware specs for your system, those tolerances become obsolete with the next upgrade of the operating system. The equipment you selected can be abandoned by the manufacturers in favor of a different protocol. Or Apple may decide to discontinue a particular connector in its newer models, making all of your cabling obsolete. It's bad enough to have to choose between Apple and Microsoft operating systems, but to then have to scrap various components and pay more money to support the latest protocols makes this one of the more frustrating approaches to polytemporal synchronization.

When I began working on this project during the XP/P4 era, just getting computers with fast enough CPU processing was a major hurdle. However, with the appearance of multi-core CPU's and, to a degree, the conversion to a 64-bit digital standard, these problems have basically been solved. And, as always, adding more memory is always a problem solver. In addition, extra memory has been added to the graphics cards so the dependency on the motherboard's RAM has been reduced. And if you have the money, the industry's elevation to 60 FPS cameras/monitors/editing gear has tightened the precision of working within this Networked Video system. For much of this we have the gamers to thank. Who would have thought.

The demand for speed, secure connections and additional RAM tends to eliminate the use of laptops and tablets. Keep in mind that you're not just playing a video. You are playing a video program that is constantly comparing itself to the speed (basically SMPTE Time Code indicated in minutes, seconds and frames) of another video program while making adjustments on the fly to keep in sync. Standard video runs at 30 frames per second, and there is good argument to move towards 60 FPS for this application.

Just when you think you've solved the equipment problems, you have to worry about the compression scheme in video codecs, which is another moving target. Many codecs abandon the need to keep the audio in lock-step with the video and resort to a kind of "catch-up" protocol: "We'll wait until X number of video frames have been transmitted and then de-compress the corresponding audio." Or, "let's decompress the audio now and let the video catch up later since it requires more bandwidth." You see these staggered displays of video and audio on Skype and Zoom all the time, and if you carefully watch YouTube videos, you'll start to notice the same "un-sync-ness" between the video and the audio due to the compression/decompression schemes they use. The best format for use with Networked Video is still .mov, which produces large files when compared to other formats, and larger files require faster CPUs and more RAM, which translates to more complication and cost. It's a never ending spiral.

Investing in a system that just starts and stops video programs emanating from different computers or DVD players at precisely the same time is not enough. "Synchronized video" is a term used in the "video signage" field, which claims to run multi-source video programs in sync over an array of monitors. You'll see these "signage" displays used in advertising and at trade shows. Or you may see them at museums as some kind of eye-catching educational program. The signage programs use short, video loops, somewhere in the vicinity of 15 seconds to 2 minute in length. The videos will drift apart over time, but they are then re-synced at the beginning of every loop. In our field, where the program is typically 3-25 minutes in length, there must be continuous resolution of the differences in speed between each computer. In addition, the various computers must be able to start in sync from any point in the program.

In the field of IT, when computer technicians talk of transmitting synchronized information across the internet or a LAN, it's not the same definition of "synchronization" we're using. The old analog video systems of the 90's were much more savvy about synchronizing video and audio emanating from different sources than today's computer industry.

Documentation and Scores

My approach to Networked Video is to produce two separate notation scores (Rehearsal and Performance) for each part, plus a tempo chart for reference by the composer. Yes, producing two notation scores for each part is more work, but I found it worth the effort.

    Rehearsal Score

    The Rehearsal Score uses no video playback and disregards phasing and polytemporal effects, making it easier to rehearse various passages without worrying about tempo offsets. It's just like a regular musical score and makes it easier to work on one section of the music at a time, like "two bars before Letter C". You don't have to incorporate the tempo variations that might be going on at that point. When the tempo maps are then added (in the Performance Scores), it gives the players insight into what is happening. After all, we were all trained to play together, and now, all of sudden, we're being asked to play with a form of precise imprecision. I can assure you that if you start the rehearsal by using the Performance Score, the players will not be able to understand what is supposed to be happening, and at that point, all enthusiasm for the piece will leave the room.

    Performance Score

    The Performance Scores contain the adjustments caused by different tempo ramps and time signatures. This is a little difficult to describe without the examples contained below.

    Tempo Chart

    I found the need to add a detailed tempo chart describing the adjustments in the notation of the Performance Scores. The tempo chart lists different notation techniques which compensate for the variation caused by the use of different tempo in different instruments at different times. The tempo chart contains more detail than the individual players need to know, but it should answer any questions about the need for adjustments in the time signature of individual scores and conducting cues in the different videos. An example of a tempo chart prepared for the string quartet titled String Thru can be downloaded here. It is a boring read, but shows the amount of detail I needed to keep track of during the composition phase.

Phasing Techniques and Dissociated Tempi in Networked Video

Heretofore, the burden of accelerating and decelerating at a tempo different than the other performers, or even continuously playing at a specified yet dissociated tempo, was placed on the performer. Now, the burden of calculating and guiding the execution of a tempo ramp or a dissociated tempo has been placed on the synchronizing system and the composer.

With reference to phasing techniques, the accelerating instrument is intended to reach a specific tempo in a specified number of measures, and then decelerate (or drop) to the tempo held in common by the other player(s). At that point, the accelerating instrument should be precisely 1/8th note -- or whatever value is desired -- ahead of the non-accelerating players. For example, assuming the tempo of the piece is set to 160 BPM with a time signature of 12/8, the composer might specify that at the end of a 24 measure ramp, during which the accelerating instrument will smoothly achieve 161.111 BPM at the end of 8 measures and then decelerate for the next 16 measures back to 160 BPM, at which time the accelerating instrument will be precisely one 1/8th note ahead of the non-accelerating instruments. At the point of unification, the accelerating instrument would rejoin the group at the original tempo (160 BPM), forming a new permutation of the original rhythm or simply continue playing anything the composer desires.

In my approach to incorporating phasing techniques, the number of measures over which the accelerating instrument is moving, the rate of acceleration, the maximum tempo of acceleration, the point where the maximum acceleration is reached, and the length and tempo of the deceleration are all calculated in advance. As the accelerating instrument advances to achieve its goal, the bar lines no longer line up with the other instruments, which makes rehearsal very complicated. I wanted to find a way of adjusting the printed score of that accelerating instrument so that the bar lines would line up with the other instruments following the phasing section...just for the ease of rehearsal. Yes, it would have been possible to simply leave the score of the accelerated instrument unadjusted. But in a scenario where multiple instruments may be accelerating at different rates, over different number of bars, with the possibility of adding another part at a completely unrelated tempo and time signature, the printing of two separate scores, one adjusted for the variations in tempo and the other not adjusted, this seemed to be the best approach.

In those instances where one part uses a completely different time signature or tempo -- which I call a "disassociated tempo" -- those calculations are also worked out during the composition phase. Adjustments in tempo, even the use of rubato, are determined by ear and later accomplished with adjustment of the tempo map while using midi code. Sticking with midi during the composition phase makes it much easier (in my opinion) than adjusting tempo after the sound has been digitized, e.g. converted to .wav files. I have to admit the the latest advancements in DAW programs, along with the improvement in CPU speed, have made the manipulation of tempo using digital audio possible. I just don't see the advantage of working with digital audio in the composition phase. You can avoid stressing the CPU and fighting the clicks and pops that occur when real-time adjustments to digital audio are performed. Once the tempo variations have been worked out in midi, the conversion to .wav format can be easily accomplished if you want to render the piece for distribution or experiment using synthesized effects, e.g. .wav files. However, there is no need to convert to digital .wav formats for the sake of the Video Network program because that program is using visual cues, not audio cues. And there is no need to convert to a .wav format when producing a score with notation programs like Sibelius, Dorico and Finale since these programs can import and export midi quite easily.

Hardware Requirements

Photos of the video network equipment used for a string quartet rehearsal are shown here. My tests included the use of both desktop and laptop computers. The laptop computer had a Cat 5 connector rather than a USB port. Unfortunately most laptop and tablet manufacturers have switched to USB connectors for attachment to a network, but that connector is too susceptible to coming loose. USB connectors, in this application, add one more thing that could go wrong. And don't even think of going wireless. There's too great a chance of signal interference in a concert environment.

Monitor Stands

The design and construction of monitor stands became quite a project. Each performer had to have their own monitor stand that held their own computer and monitor. The height of the monitors had to be adjustable so as to approximate the peripheral vision to which that performer was accustomed. Commercial-grade monitor stands (such as shown on the left) are certainly available, but I didn't feel they were configured to my needs or stable enough to be trusted for stage use. The photo of a monitor stand on the left shows a model that might work, if you discard the glass keyboard platform. It had no wheels, plus a wide, bottom platform for attachment of the computer and network gear. And it was less than $500. The center column was also adjustable. But I was only attaching one monitor, so it was a bit of an overkill, and buying five of these commercial-grade stands while still in my experimental phase was just too expensive to justify.

The steel-column-wooden-base monitor stand I constructed (shown above) can be disassembled into two parts for transportation. The stand will hold different sizes of monitors at different heights on the center column, and has a base big enough for strapping a computer and the network gear to the platform. Most importantly, it is heavy enough to avoid knocking over as players move about the stage. A standard-size music stand sits underneath the monitor. In one variation, I attached an adjustable music stand to the center column, but that proved to be unnecessarily complicated. Every performer and every venue will have music stands, which simplifies setup.

The router can be seen sitting on the stone coffee table on the extreme left. During this particular rehearsal, I was operating a 5th computer to control the other 4 computers used by the string quartet. The Cat 5 network cables connecting my 5th computer to the router and subsequently to all the other computers are color coded. In an actual performance, one of the 4 computers would serve as the Master computer and synchronize the other computers. All of the rest of the cables shown are power cables for the monitors and computers. The power cables for the computers could be eliminated if you commit to using all laptops, but I didn't want to depend on their battery life for rehearsals. A music stands sits in front of each monitor platform just below the monitor. Any standard music stand will suffice.

The computer monitors are attached to a square, steel, center column using monitor brackets available at every computer store. The backs of most office-grade monitors conform to an industry standard template that allows the monitor to be detached from its initial base -- which is usually some kind of adjustable desktop structure -- and then attached to another stand designed to hold more than one monitor. These multiple-monitor stands are very common. In my application, I'm only supporting one monitor. What is important is to make sure the monitors you select have this industry-standard configuration on the back for transfer to another bracket. The monitor brackets shown on the left screw directly into the column and allow the height of the monitor to be adjusted, thereby allowing each performer to approximate their own peripheral vision of a conductor. These images do not show the use of a slightly more expensive, adjustable bracket that offers even more flexibility. In my first attempt, I used a fixed bracket and adjusted the monitor's height by drilling optional holes in the steel column.

I have subsequently determined that the metal columns can be shorter from the length shown here (5-feet), unless the performers are standing or are percussionists. The smallest monitor that seems to work is a 15-inch diagonal screen with a 16:9 aspect ratio. There's no need to get anything larger. The monitors shown in this initial configuration were unnecessarily large and had 27-inch screens with a 3:4 aspect ratio.

I settled on a black, anodized steel column that has a flared base and can be easily bolted to a wooden platform with four bolts. The column detaches from the base for transportation. The steel column is actually a fence post available from Home Depot for just $15.00 each. (US Door & Fence 2 in. x 2 in. x 5 ft. Black Metal Fence Post with Flange and Post Cap)

Of course I admire the more elegant solution Philippe Kocher and the ICST constructed for their Polytempo system (shown above). ICST is using a microphone stand because its wider legs give the added stability needed to hold a monitor. I can not see how the back of the monitors attach to the microphone stand's center column. I don't know if that bracket is commercially available or was fabricated by ICST. I also don't know what their solution is for percussionists and performers who are standing, because those microphone bases were not designed to hold something heavy at that height. They may be a little tipsy. I was hopping to find a keyboard-less laptop or a large tablet with Cat 6 network connectors for my system, but those were not available during the time of my experimentation. The companies that manufacture touchscreen, keyboard-less laptops such as Microsoft's Surface product just hadn't evolved to the point they are today. In some photos on the Polytempo website, I can see they are using Apple's MacMini computers in place of my large desktops and laptops to form the video network. In other photos I believe the monitors are a Dell product and have motherboards behind their screens similar to the Sony and Microsoft Surface design introduced a few years ago. Although I have downloaded the Polytempo software, it has yet to be ported to Windows, meaning I haven't been able to explore how their system works. The Polytempo design is covered in the VPL section of this website.