Understanding dynamic sound in Flash
I made a little test app in order to help me understand how dynamic sound in Flash works. I had done a few of the basic tutorials/experiments and I kind of got the idea, but quite a lot of mystery still remained in the topic... While doing this experiment though, I feel like I have a much more thorough understanding of the way dynamic sound works in Flash and so, I decided to share it, 'cause I would have loved to find something like this on someone elses blog. Besides, I feel like the more people are familiar with this the better because this feature of FP10 is simple awesome!
So, how does dynamic sound work? The new method in Sound called "extract" can read audio data from any position of a Sound object. It extracts the data into a ByteArray, that is, into a bunch of 0s and 1s. But the hardest thing for me to understand was: What is the nature of this data? How does it describe a sound?
Well, byte arrays are highly efficient arrays that contain boolean values, or bits. These 0s and 1s are packed in groups of 8, each being a byte. So byteArray.position = 256 would not place the array pointer at bit 256, it would place it at byte 256, that is, at bit 2048. Now, the extract method populates the byte array with a bunch of bytes, but in order to make sense out of the bytes, we must visualize them as groups of 8 bytes (64 bits). Grouped in this way, each group is what we call a sample in audio...
Real life sound is nothing more than oscillations or vibrations that travel through air. The amplitude of these oscillations correspond to volume and the cycle or wavelength of these oscillations correspond to what we interpret as pitch. The way digital sound works is to "sample" the instantaneous amplitude of these vibrations at a very fast rate (say 44100Hz, that is 44100 samples each second). Such amplitude readings can be stored in 32 bit "float" values. Flash works at 44100Hz, stereo, so the first 4 bytes of each sample correspond to an instantaneous value in the left channel and the next 4 to the amplitude of the right channel. 32 bits per channel. So if we use the extract method, reposition the byte array in 0, we can use the readFloat method of ByteArray in pairs, and hence extract these amplitude values of the entire sound, one by one. Doing this, its crucial to keep in mind that each time you read a part of the byte array, the position marker is shifted to the end of what you've just read. Its just the way byte arrays work, differing from regular arrays.
In this manner, we can read large chunks of samples and give them to the data object in the SampleDataEvent. Doing so, we are delivering a chunk of audio to the computer's audio card so it can make some noise. The beauty in this is that we can read and understand the sound data, and before giving it to the sound card, we can process the audio with complex filters and DSP analysis, use the information for visual display, etc... This is why it is handled in chunks of audio, so we can be able to process this sound in real time.
Its important to note that when we extract a chunk from the Sound object, we don't know if (imagine that we are arriving to the end of a sound) we are going to be able to get the amount of samples we asked for, so the extract method returns a value, indicating how many samples it was able to extract.
So enough talking... see it yourself in the demo by clicking on the image above (view source enabled). In this experiment, I am not doing anything to the original sounds, just using the info extracted each cycle to visualize whats going on. The boxes plot the amplitude of the samples againts time, and from the pixelated look of them, you can easily grasp how many samples these mp3's have... A lot of them. The upper box shows the waveform of the entire sound clip (note that it is not a frequency spectrum), and the lower box shows the instantaneous waveform of the chunk that has just been extracted from the Sound object and been delivered to the sound card. You can change the playback speed or drag the playback head to see the process in slow motion. You can also change the buffer size, which is nothing more that the size of the chunks (in samples) that are processed at a time on each cycle. And, you can change the sounds too, just because waveforms are beautiful... be patient on the loading though, my server is sloooooow...
Processing sounds can be quite cpu intensive, but luckily we have pixel bender and alchemy to give us a little more juice in this area. I'd like to post more about this topic in the future, its just too interesting. Perhaps a spectrum analyzer, pitch shifting, time scale modification, etc... The possibilities are endless. Just look at what guys like Andre Michelle are doing!

July 26th, 2009 - 07:19
Hello Li !
It seems that we focused at the same time on the same target.
I worked on dynamic sound till few days, and I feel really interested by the topic too.
My main goal is to build some interactive music features for Flash.
I don’t share your optimism concerning Alchemy, because realtime would need intensive marshalling, and that’s the thing which is the most expensive with this tech.
About PixelBender, the experiment driven by Tinic Uro shows us that we can mix 15 tracks running twice as fast than standard as3 version, that’s a good start, but that’s not a warhammer concerning realtime audio processing, don’t you think ?
Keep up the good work on your side, I’ll showcase some on mine soon.
July 26th, 2009 - 17:01
Man i suggest to look for more alchemy examples, right now, that C monster rocks in performance, u can even simulate water like it were a plane with it…
ofc if ot doesnt work for sound, well i havent tried out, but i dont think it will have problems…
July 26th, 2009 - 17:05
Yep, thanks for the suggestion. I am already trying some nice things with Alchemy. Unfortunately I am an extreme newbie in C/C++, so I might be a bit slow.
August 30th, 2009 - 17:20
Sounds is a very interesting field (means I have to had 4 hours to the 46 I wish a day could have…), I jumped in it 2 weeks ago,
and still need some works to understand what is under some fx and transformations, but I did this (based on Andre Michel tone matrix but enhanced with fx and an original sharing interface) :
http://www.agence-anonyme.com/lab/nouse/barrell.html
don’t forget to join the user group and share your creations !!
I’ll share the sources soon.
November 17th, 2009 - 11:09
Hi,
Thanks for this post, it is very helpful. I’m struggling with bytearrays and this helped a bit pointing things out. What really had my mind blowing was the audio application on this website:
http://www.hobnox.com/index.1056.en.html