Tao Te KaChing
Workin' the cash register of the Great Tao

Programming the Atari 2600, and Me - Part 4

Ok, our last post showed us our first piece of Atari 2600 code: the "kernel" – what the Stella guide calls the main display loop.  In this post we'll examine our initial kernel code and the modifications we made at the end of the last post, then try adding a sprite or two.  Baby steps.  We need to hammer into our heads this synchronization of our code with the TV.  With the 2600, it's all apparently about timing...

So, running our initial kernel in the Stella emulator should have produced the following:

kernel_v_0 

Nothing to scream about, except WE HAVE A RUNNING 2600 "GAME"!!!  WHOO-HOO!  Let's look at the code:


processor 6502
include "vcs.h" ; DASM pseudo-op to include mnemonics for TIA / RIOT registers

seg 
org $F000 ; our ROM starts at the beginning of the second 4K block

The processor pseudo-op selects the correct instruction mnemonics set for the target machine. It provides symbol table information necessary for the use of the linkage editor.  We specify 6502 (really, identical in all respects to our 6507) as our target environment.

include "vcs.h" works much like the include preprocessor in C or C++.  The contents of vcs.h are added in here before the code is compiled.  I won't show the code in vcs.h here, but I highly recommend just taking a look at it.  You'll see that a lot of the work of providing mnemonics to our registers is taken care of for us already so we can work alongside the information we get from the Stella guide.

The seg pseudo-op tells the assembler you are starting a section of contiguous material (code and/or data).  If a .U extension is specified on segment creation, the segment is an uninitialized segment.  This in conjunction with the following org pseudo-op sets the block of memory we'll be compiling to, until another org pseudo-op changes the address.  Ergo, we've just said our compiled machine code should start from location $F000 onward.


label_Reset
LDA #150
STA COLUBK ; set out background color to a nice blue

Here we're simply setting the background color to decimal value 150, or hex 96 (I like using decimals rather than hex for value literals in the code – easier to distinguish from addresses, etc).  A very good color chart for NTSC, PAL, and SECAM systems is here.  Here's the NTSC for your viewing pleasure:

ntsc-color-map

The HUE is our first byte, the LUM our second, so peach would be hex 3F (63 decimal) and dark brown F2 (242 decimal).  The full chart is at the end of the post for reference.

So, by setting the COLUBK register, we've told the 2600 that when "writing" our scan lines, that is the color we'd like to have.  Obviously the TIA does the rest of breaking it up between the RGB scans, etc.  So, now we get to our "kernel":


; --------------------------- our "kernel" starts here
label_Frame

; --------------------------- do the v-sync (Stella, §3.3)
label_Vsync
LDA #2
STA VSYNC ; enable D1 bit at VSYNC to start vertical sync

STA WSYNC
STA WSYNC
STA WSYNC ; our 3 scanlines

LDA #0
STA VSYNC ; disable D1 but to finish vertical sync


Ok, so first, label_Frame is my indicator that the current frame "rendering" is beginning here.  The following comment tells us where to look in the Stella guide for information on the first thing we need to do: a vertical sync.  The guide tells us that this is accomplished by enabling the D1 bit for the VSYNC register, waiting three scanlines, then disabling same said bit.  The lovely vcs.h include already has our mnemonics mapped, so we don't need to worry about the specific addresses we're writing to.  label_Vsync marks where we begin this process.  We LoaD our Accumulator with a value of 2 (our D1 bit enabled) and then STore the Accumulator at the VSYNC address.  Done.  The TIA doesn't care what value we store in the WSYNC register, just that we did so; by storing to it, we basically say "wait now until the scanline is done and has returned to the start of the next scanline before continuing."  So we do three of those, just as the guide says, then we clear our VSYNC D1 bit.  Done.  Easy.  Gravy.

Technically, this is kind of sloppy.  For setting the D1 bit, we might want to use:


label_Vsync
LDA VSYNC
ORA #2 ; make sure D1 bit is set
STA VSYNC ; enable D1 bit at VSYNC to start vertical sync

STA WSYNC
STA WSYNC
STA WSYNC ; our 3 scanlines

LDA VSYNC
AND #253; AND-out the D1 bit
STA VSYNC ; disable D1 bit to finish vertical sync

Or better, do a TAX or TAY to "save" our accumulator, and TXA or TYA respectively when we're finished.  This way we don't lose it's state during this vertical sync operation.  But this anal-ness on my part is both unnecessary and costly.  I'm not worried about collision-detection at this time (reading from the CXM0P address is the same as reading from VSYNC, just a mnemonic difference for "readability"...we'll get to collisions later), so reading the VSYNC address isn't necessary.  Also, a TAX/TXA or TAY/TYA combo will cost us 4 additional machine cycles, much less the 6 extra cycles "wasted" in the above changes.  Either way, that on the Atari VCS platform is expensive.  We'll see just how expensive when we start trying to simply position our sprites.  At any rate, for our simple purposes, the initial code works just fine and is minimal.  Let's look at the vertical blank:


; --------------------------- do the v-blank (Stella, §3.3)
label_Vblank
LDA #2
STA VBLANK ; enable D1 bit at VBLANK to start vertical blank

repeat 37
STA WSYNC ; do 37 scanlines using DASM "REPEAT" pseudo-op
repend 

LDA #0
STA VBLANK ; disable D1 bit at VBLANK to finish vertical blank

Really, this is identical to the vertical sync block above, except we've used the DASM pseudo-ops repeat/repend to automatically duplicate 37 instructions for our required 37 scanlines.  Our scanlines for the actual display and the overscan are even easier.  Per the Stella guide, the TIA is just looking for 192 scanlines for the display and 30 for the overscan, which do thusly:


; --------------------------- do the picture lines (Stella, §TELEVISION PROTOCOL)
label_Picture
repeat 192
STA WSYNC ; do 192 scanlines using DASM "REPEAT" pseudo-op
repend 

; --------------------------- do the scanlines (Stella, §TELEVISION PROTOCOL)
label_Overscan
repeat 30
STA WSYNC ; do 30 scanlines using DASM "REPEAT" pseudo-op
repend 

; --------------------------- do it all over again, 60 times per second!
JMP label_Frame

And as we can see, per the Stella guide, we're done after we finish the overscan scanlines, so go right back to rastering the next frame using our JuMP to the label_Frame we defined earlier.  Remember, per our specs, the 2600 will raster 60 frames per second.  Smooth.  Oh yeah.

Ok, so what's this stuff at the end for if we're "all done"?


org $FFFA

.word label_Reset ; NMI
.word label_Reset ; RESET
.word label_Reset ; IRQ

end 

There's actually several things going on here.  First, notice we've fast-forwarded to the last 6 bytes of our 4K block ($F000 to $FFFF being 4096 bytes, or 4K).  Then we set those last 6 bytes, and then end our segment.  Remember, the seg pseudo-op tells DASM we have a contiguous block of data, ergo the final .bin will be 4K exactly.  We could change it to:


...
; --------------------------- do it all over again, 60 times per second!
JMP label_Frame
end 
seg 
org $FFFA

.word label_Reset ; NMI
.word label_Reset ; RESET
.word label_Reset ; IRQ

end 

and we'd end up with (or I did, anyways) a 587 byte file that the Stella emulator doesn't recognize as a valid ROM.  So that's how the seg and end pseudo-ops work in DASM.  Other assemblers have the same ops, just named differently, so check your assembler's documentation.

Anyway, so those last 6 bytes...what the...?  Notice it's really three copies of the memory location referred to by our label_Reset label.  That's all the way back before the kernel when we set the background color.  These 6 bytes are the NMI, Reset, and IRQ vectors[1] respectively for our 6507 processor to tell the processor where to jump in the instance of, respectively, a non-maskable interrupt, reset, and interrupt request.  I say just leave them like that – referencing the start of your ROM ($F000).  Do they need to be set?  If we look at (IMHO) two of the great 2600 games of all time – Pitfall! and River Raid – Pitfall has the NMI vector set to zero while River Raid has it set to some weird address or is actually using those bytes for data.  Both have the IRQ set to zero.  Both have the Reset going to the start of the ROM.  So IMHO, if they agree the Reset points to the start of our code, but don't care about the other two, done.  'Nuff said, methinks...

Ok, so with this all in mind, what significant changes were made to accomplish our second kernel-of-joy from the last post:

kernel_v_1

Man, just watching it was nostalgia galore – a flashback to my Atari 800 days.  RESCUE ON FRACTALUS IS THE GREATEST GAME EVER!!!

...sorry.  I digress.

To me, it was impressive how little code had to change to achieve this.  Instead of repeating our WSYNC 192 times for the display, we used a counter to track this, and set our COLUBK for each scanline.  The starting COLUBK value was decreased each frame to give the color movement.  First, we create our variable atColor to store the starting color for drawing a frame:


...
org $F000 ; our ROM starts at the beginning of the second 4K block

atColor = $80

label_Reset
LDA #255
STA atColor

; --------------------------- our "kernel" starts here
label_Frame
...

Next, we need to prepare our variables before starting the display loop.  We do this in the preceding vertical blank routine:


; --------------------------- do the v-blank (Stella, §3.3)
label_Vblank
LDA #2
STA VBLANK ; enable D1 bit at VBLANK to start vertical blank

repeat 36
STA WSYNC
repend 

DEC atColor
LDY atColor
LDX #192
STA WSYNC

LDA #0
STA VBLANK ; disable D1 bit at VBLANK to finish vertical blank

So we decrease our atColor by 1 to provide the color shift.  Note, if atColor is 0 when we call DEC, it will roll back to 255, clear the zero flag in the 6507's status register, and enable the status register's negative flag.  We ignore the negative flag and just let the value keep cycling through.  We then prep our counter to track how many scanlines we've "drawn" using the X index register.

Notice we only repeat our vertical blank WSYNCs 36 times now, however we call another one after setting our states above to give our full 37 for a vertical blank.  This is our first introduction to cycle counting.  The DEC atColor instruction costs us 5 machine cycles, the LDY atColor another 3 cycles, and the LDX #192 an additional 2 cycles.  So a total of 10 machine cycles just for those three instructions.  Per the Stella guide, a scanline is the equivalent of 76 machine cycles, so we have 66 left with this WSYNC call to do more work.  That also means that the following works just as well:


; --------------------------- do the v-blank (Stella, §3.3)
label_Vblank
LDA #2
STA VBLANK ; enable D1 bit at VBLANK to start vertical blank

repeat 36
STA WSYNC
repend 

DEC atColor
LDY atColor
LDX #192

repeat 33
NOP
repend 

LDA #0
STA VBLANK ; disable D1 bit at VBLANK to finish vertical blank

A NOP (NO oPeration) instruction costs 2 machine cycles, ergo 33 NOPs equals our remaining 66 machine cycles.


; --------------------------- do the picture lines (Stella, §TELEVISION PROTOCOL)
label_Picture
INY
STY COLUBK
STA WSYNC ; do 192 scanlines using DASM "REPEAT" pseudo-op
DEX
CPX #0
BNE label_Picture

We'd set our Y index register at the vertical blank phase to the color value we want to start with.  So for each of the 192 scanlines, we increase it, letting it rollover from 255 back to 0, then set it as our scanline's color, "finish" our scanline with a WSYNC call, decrease our counter (the X index register), and if our counter isn't yet zero, do this all again until it does.

Done.

If you're wondering where these "costs" per assembly instruction are coming from, I assure you I did not make them up.  I used a random number generator...

Just kidding.  A good list of the 6502 instruction set can be found here.  The "TIM" column next to the different instruction call types indicates the number of cycles the call costs.  To make our 2600 programming lives that much more fun, several of these calls have conditional costs depending on several factors.  Sweet.  Yes, I'm finding that unlike "writing", say, C#, Java, or C++ code, programming the 2600 is more like sculpture.  Anyways, there are a gazillion 6502 instruction set tables out there on the internet just waiting for you to find them.  The Second Book of Machine Language has great descriptions of how and what the 6502 instructions do.  You can find this particular resource here and instruction set part here.

Ok, that's it for this post.  Next, we'll look a little more at cycle counting and troubleshooting our 6502 assembly.

~ZagNut

[1] A good description of the purpose of the IRQ vector is here.  Basically, to shamelessly purloin:

Three things happen when the 6502 executes a BRK instruction.
  • The program counter (PC) is incremented by 2 and is pushed onto the stack (thus the processor treats BRK as a two-byte instruction).
  • The BREAK bit (B) in the processor status word (PSW) is set to 1, and the PSW is pushed onto the stack.
  • The 6502 transfers control to the address stored in the highest locations in memory (FFFE and FFFF), the IRQ interrupt vector. This address must indicate the starting location of the interrupt-service routine.
The third item is the one we're concerning ourselves with here.  Perhaps this is part of the difference between the 6507 and 6502 and why Pitfall! and River Raid don't care.  However, it can't hurt to have it set to the start of our ROM by any means.
,,,,,,,,,,,,,,,,

COMMENTS