Episode 13: Sound (II) — Doing it

So the last day of April has arrived, the end is nigh, — can we do it? Can we finish in time? Yes, we can …

Playing Sound

As we may recall from last episode, on the VCS, there are two channels of sound controlled by 3 registers each (AUDC0, AUDF0A, AUDV0, and AUDC1, AUDF01, AUDV1). All we need to do is do a) start a sound at the right time (at the right tone, frequency and volme) and b) keep it playing for the right amount of time. Time means for us frames, which comes as a refreshing change after all those tight constraints and cycle counts. What we'll have to do now, is to implement a tiny sequencer.

Sounds will come as sequences for our game, either as a single tone finishing after a couple of frames, or there will be another tone to be played as soon as the first one finishes. What about an index pointing into a table of sounds and a counter, one per channel each?

While TIA registers are apparently commonly handled by name on VCS and nobody really seems to care at what addresses these are (at least, they are not that obviously pointed out in the manuals), it really pays off to peek into the header file for the symbol definitions for these address locations:


AUDC0       ds 1    ; $15   0000 xxxx   Audio Control 0
AUDC1       ds 1    ; $16   0000 xxxx   Audio Control 1
AUDF0       ds 1    ; $17   000x xxxx   Audio Frequency 0
AUDF1       ds 1    ; $18   000x xxxx   Audio Frequency 1
AUDV0       ds 1    ; $19   0000 xxxx   Audio Volume 0
AUDV1       ds 1    ; $1A   0000 xxxx   Audio Volume 1

They are neatly lined up, each register for channel #0 followed by the respective one for channel #1, meaning, we may service the two channels by a simple index (either 0 or 1) in the same routine:



; RAM addresses for sound control

SoundIdx0   = $B9
SoundIdx1   = $BA

SoundTmr0   = $BB
SoundTmr1   = $BC


; sound subroutines

PlaySound                  ; sound in Y, channel/player in X (0,1)
    sty SoundIdx0,X
    cpy #0
    beq playSoundReset     ; index is zero, mute and return
    lda SoundTable + 3,Y   ; get duration in frames
    sta SoundTmr0,X
    lda SoundTable,Y
    sta AUDC0,X            ; tone
    lda SoundTable + 1,Y
    sta AUDF0,X            ; frequency/pitch
    lda SoundTable + 2,Y
    sta AUDV0,X            ; volume
    rts
playSoundReset
    lda #0
    sta AUDV0,X            ; reset volume
    sta SoundIdx0,X
    sta SoundTmr0,X
playSoundDone
    rts

HandleSounds               ; channel in X (0,1)
    ldy SoundIdx0,X
    beq handleSoundsDone   ; index = 0, no sound
    dec SoundTmr0,X        ; decrement frame counter
    beq handleSoundsNext   ; run to zero? next sound from table
handleSoundsDone
    rts
handleSoundsNext
    lda SoundTable + 4,Y   ; next sound
    tay
    jmp PlaySound


; sound table

SoundTable
    .byte 0                ; no sound / stop

;                              tone pitch vol time next
;                              -------------------------
Snd_MissileBounce = * - SoundTable
    .byte                       $04, $04, $06, $04, $00

Snd_Barrier = * - SoundTable
    .byte                       $0F, $19, $07, $04, Snd_Barrier1
Snd_Barrier1 = * - SoundTable
    .byte                       $0F, $1a, $07, $10, Snd_Barrier2
Snd_Barrier2 = * - SoundTable
    .byte                       $0F, $1b, $04, $04, S00

;(...)

All we have to do is to call "PlaySound" for channel #0 to play the Missile-Bounce sound is:

    ldx #0                 ; channel 0
    ldy #Snd_MissileBounce ; sound index
    jsr PlaySound          ; do it

Neat, isn't it?

Now we just have to sort out, where in our code we'll have to insert these calls. Some locations are not that obvious, like the occurence of a bounce for an object. This is hidden deep in a conditional continuation of our "MoveObject" subroutine. Here, we decide to have the state returned in the carry flag. If there's no bounce, it will be zero, if there is a bounce, we'll set the carry before we return. This way, we may branch easily on the event from where we called the "MoveObject" routine. In brief, we'll use the carry flag as, well, a flag.

Generally, we want to have a channel dedicated to the events of one of either player, and, most of the time, we've already an index in the X-register to signify the ship or missile in question. But this is mostly either 0 or 2, so we have to apply a shift to the right, before we jump to the "PlaySound" routine. Big Deal!

A few sound events do not match this scheme: There's the bouncing sound of the ball, which isn't related to any of the players, and there's the explosion of a ship, which requires both of the two channels for artistic reasons. For the first one, we decide to play it only, if any of the channels is currently idle. Normally, a new sound will implicitly terminate a previous one, which is also fine for our explosions: There will be only one at a time and it will be also terminating any other game events.

Having sorted out the coding side of things, it's time to come up with some proper sounds.

Sound-Scapes

As indicated by the headline, it's not just about single sounds in isolation, it's about a composition, which may establish some kind of rhythm. This is really the hard part. For our game, it'll be mostly about patterns of high and low tunes, of phasing ones and clear, pure ones.

However, picking sounds isn't that easy, as well. Our little sound test app (Studio2600) comes handy, but it's not as easy as just clicking buttons and finding a nice sound. In fact, the duration of a tone matters much to how the tone is sounding at all!

Turns out, the rather cryptic information provided by the Stella Programmer's Guide, like "5 bit poly and 4 bit poly" is actually useful and vital! Because, for all those phasing sounds, like those marked by "poly", are actually to be played at least for a single iteration of their phase, in order to produce the modulated effect provided by the phase shifting circuits. So, the bit rate of these effects is important. E.g., what sounds like a rich, pulsing sound, when played over 32 or more frames, is just a simple hissing noise, when played over 6 frames. — Sadly, we're using just sounds, which are rather short, so we have to do mostly without those (which are among the best there are on the VCS) and have to substitute them by sequences of our own.

As may be imagined, this can't be done without frequent testing. Change a parameter, compile it, test it, test it in ensemble, change a parameter, compile it, again, …

In the end, I came up with something, I'm not entirely happy with, but I'm OK with. So there's a public Beta, at the end of this endeavor, and I might come back later and have another look at some of the sounds.

Refraction, Beta 1 — online demo and downloads.

Please mind that sounds are slightly off in the online-emulation. Especially some of the pure sounds are not that pure as they are in "Stella", also there's a slight change in pitch.

When? — Now!

Some of the finer details of implementing sounds are not that obvious. One of these is the question of when exactly to play a sound. In real life, a sound normally follows an event (if it isn't London in 1945 and a V2, however, this is not about gravity's rainbows). In our game, the real event occurs on the screen, while we're busy rendering. Clearly, we have not the time to start a sound, while engaged in this. So, should we start the sound before or after this?

This may appear as minor trifles, since we're talking about a fraction the 60^th part of a second. In NTSC, there are 192 scan-lines of the visible image (kernel), followed by 30 lines of overscan, where we do our business logic (which is also, where our sound events occure on the logical level), followed by another 40 lines of vertical blank (VBLANK). So we're talking about a latency of about a 5^th to up to 3^rd of a frame:

Event timing and perception on the Atari 2600

With regard to real life examples, where sound usually reaches us only after the visual information, due the different speeds of sound and light, the answer seems rather obvious: Have the sound effect follow the visual impuls. However, the sound event occurs on the programming level in the business logic in overscan and we'd have to delay it. Thanks to our nifty sequencer, we may easily put it to a test (by inserting just a single frame with the volume set to zero, right before the first frame of real sound).

Surprisingly, this feels in comparison a bit slugishly or blurry. Having the sound triggered before the visual impulse speeds up the experienced gameplay just by a tiny amount, a bit like offbeat in music. So simple approaches are sometimes actually best.

(This may be also due to us being intensly involved in eye-hand coordination and thus allocating minor capacities only to the perception and decoding of audio events. Moreover, visual perception is complex and happens in many stages, delayed and interleaved loops. Even the limbic system und hence feelings are involved in the timing of visual perception. It's hard to say, when exactly we are cognitively percieving a visual impulse and it depends heavily on the kind and structure of the visual information. This is not your simple Perceptron or neural network! And, as for the structure of the visuals conveyed by our games, this is simple, highly abstracted visual information, while the overall structure of the image, as defined by the borders, barriers and scores, is essentially stable. Thus, we may estimate, decoding will be rather quick, as quickly as we can do, while any experience of audio information may be at the same time delayed.)

What’s Left?

This is also the time, where we should do some playtesting. However, here we face problems related to what we just discussed above: In the days of old, things were simple. There was just the VCS console, the Atari joystick, which was the same controller for all, and a TV. Nowadys, things are a bit more complex, because there isn't a single platform. Actually, there is quite a variety of it, and only in a few edge cases it will match the original setup. So, for whom and for which setup should we test the game?

This is actually a rather important question, when it comes to the actual experience of the game. For example, most of the modern setups will introduce lag: An emulator may lag, there will be input lag for internal USB controllers (famously, modern keyboards have, at times considerably, more lag than the 8-bit computers of the late 1970s and 1980s, which scanned at 60Hz), input lag introduced by the external USB controllers attached to it, image upscaling may introduce another lag (reportedly, the famous Framemeister introdes a lag by 3 or 4 frames), even HDMI may contribute to this. Given that we're playing some of our sounds just for 4 frames, a lag by 4 frames or more is something to be considered. — This isn't your analog realtime circuitry anymore.

Then there's the question of the kind of game controls used: Obviously, using a keyboard is not the best option. It feels always a bit quirky, but some of our target audience will use it anyway. Then, there are the modern controllers. That is, "modern" as in as modern as the Nintendo Entertainment System (NES). When it comes to Atari games, this is another pest. — So it may be also time for a little rant.

What's Wrong With D-Pads?

Interesting that you should ask, because I was just going to address this. Let me tell you a little story: When I began to explore Atari VCS / 2600 games, I soon found some games, I was rather impressed with, especially, when considering the limitations of the platform. For example, there was Alien by 20^th Century Fox (1982). Its art is impressingly well tuned to the visuals, the VCS can produce, and the rhythm of the game is just impressive. It's also the first viable Pac-Man-style game on the VCS, handling interlacing much better than the infamous Atari Pac-Man, using it even for the purpose of visual effects (as, for exmple, for the distinct fields of pellets, er, "eggs"). The sound integration is superb: While the walking sounds are rather harsh and reminiscent of Pac-Man, it's interestingly the phasing siren sound, when we may reverse the hunt, which syncs the game to the horror theme of the movie by a rush of adrenaline that goes with the physical quality of the rich, vibrating sounds. Even the choice of the scene makes sense, because the mazelike grid of the air ducts, where Dallas meets his fate, is also the only part of the movie, where the crew is actually in some kind of command and up to confronting the xenomorphic creature — with known outcome. Even the rather ineffective flame thrower, which just causes an alien to reverse its last directional choice, is blending in with the theme of the first movie, where the effectiveness of this weapon is a unknown quantity (Dallas would tell you, it was rather ineffective). So it's a nice game with a twist, and you even get Space Race (or Chicken Run, if you're in for the ActiVison version of the game) as a bonus (round).

So, I was rather astonished to find that modern reviewers would mostly dismiss the game. What was wrong with it, or wrong with them? It was a few weeks before I recieved the answer in the form of a nice, modern controller, the last cry in retro gaming, 8Bitdo's SF30 Pro gamepad. Fresh out of the box I hooked it up to a computer running Stella and fired up Alien. The experience was underwhelming at best. Nothing to write home about. — All the rhythm, the impression and experience of the game was gone.

How comes? It may be noted that I was playing the game with a recreated Atari joystick before, because I happened to have one around. Why was the experience so different with a D-pad, even with the thumb-sticks?

In essence, the physical side of the game, the bodily sync that enables the very experience (which is basically a feedback loop), was gone. How could the controller contribute to this?

A classic joystick, and the rather quirky Atari one maybe even more, isn't just operated by two fingers of a hand. It requires action of the whole arm, it envolves the entire body. This, the whole body being into the interaction, is also, why joysticks break. In fact, a joystick is operated at jump-scare level, which is also, BTW, the fastest we can do. Using a joystick, we're physically involved, forming a feedback loop at the level of our whole body, basically reacting and interacting ad jump-scare level and speeds. Obviously, this isn't comparable to the just distanced, rather slow kind of interaction which we can manage by our fine motor functions.

Consider, how important rhythm is to a game. A game has been tested over and over again during its often year-long development cycle, and it has been finely adjusted and crafted to establish a suitable experience, as intended by the author(s). Allegedly, establishing the specific rhythm and flow of a pinball game, which is still a physical thing, takes the better part of its development phase and it's also, where the mastership of the designer shows. The rhythm and flow, the very kind of bodily sync with the game, is really its purpose and essence.

Another example would be Cosmic Ark (Imagic, 1982). It's arguably questionable, if this is a traditional video game at all. If it is, it's a rather pointless one. There are two screens, one with a giant flying saucer (you) having to defend a few meteors by moving the stick in the respective direction. You even haven't to press the button to fire! Then, there's another screen, where the ship lowers and you may release a tiny saucer at its bottom, go down to what is allegedly the surface of an alien world to abduct (no, "rescue") two little pixel creatures. This is tricky, well done, and beyond cute. But soon, it's time to go up again, back to the first screen and do this over and over again. There isn't a specific reward for managing the rather tricky endeavor of picking up any of the wiggling pixel creatures by your abduction beam, beyond the scene being well done and cute. So it must be about the first screen? But this is just basic interaction, on toddler level at best, with big, blocky graphics? — What is this game all about?

Interestingly, the rewarding factor of the game is, in deed, in the first screen. It's the kind of sync that is required, where you just intuitively tap into the game, more intuitively operating the joystick in the very rhythm of the game, where the feedback and reward is more on the level of sounds, as in a sequence of tones that are producing a closure. The first screen is about gaming zone, and it's probably one of the trippiest Atari games, there are, interleaved with the more challenging abduction screens. The real difficulty in mastering the game (which is just going on, forth and back in and out the zone) is about switching moods. — Why I am talking about this? Because it's hard to imagine that the game could have come into existence without joysticks and the kind of feedback loop they facilitate. And it's also hard to imagine that such kind of game could be properly experienced and enjoyed, while remotely tapping a D-Pad.

This kind of interaction, I've been talking about, came eventually to an end, when Nintendo decided, they would better go with a controller that was not apt to break, by this also revising the entire video gaming experience as it had been known before. But here, we're essentially talking about before this seminal break and the switch to modern gaming experiences. It's about games, which had been invented for and tuned to a completely different setup, different reaction times, a different kind of involvement. So, if anyone should tell you that Atari games are best played with a Sega Genesis controller, run. (If it's a YouTube channel, unsubscribe. There's no way this reviewer may know what s/he is talking about.)

Why It Matters

Returning to the topic of playtesting, what kind of experience are we to target? The original setup, including a real console in hardware, original joysticks and a TV, wich can still decode analog RF signals, best with a CRT screen? This would hardly match 5% of the audiences. Even more, I do not have a second matching joystick and still no original console myself. The second joystick is important, since the game is all about provoking and moderating the interaction between two human players. With each of the possible setups, the parameters vary. We could just guess, which would be the best fit and match for any of these. For all we know, it will be a different game with different setups and different human temperaments at the respective game controllers. Also, I'm out of time. — So, I guess, it's right as is.

— Hey, I just finished my first Atari game!

▶ Next: Episode 14: Wrapping it up

◀ Previous: Episode 12: Sound (I) — Introducing "Studio2600"

▲ Back to the index.

Norbert Landsteiner
April 2018 (write-up May 1st), Vienna, Austria
www.masswerk.at – contact me.

— This series is part of Retrochallenge 2018/04. —

Retrochallenge 2018/04 (Now in COLOR)Refraction for the Atari 2600