Академический Документы
Профессиональный Документы
Культура Документы
1. INTRODUCTION
Cycle counting's uses don't end at silly screen hacks. It is also useful
for optimizing code to fit within a vertical blank, or a scanline, or a
horizontal blank.
2. CONCEPTS OF COUNTING
Programming the Atari requires one to modify one's perceptions of space and
time, because the Atari observes some sort of Abian physics where space
is time. One frame is 1/60 of a second. One scanline is 1/20000 of a
second. You get the idea. It is important to know how much code can
be executed in the amount of time it takes to draw the screen. The unit
of time we use is cycles.
The CPU clock works at a somewhat slow pace compared to the TIA. The TIA
draws three pixels in the time it takes to execute one CPU cycle. A WSYNC
command will halt the processor until the horizontal blank, which lasts
about 20 CPU cycles, after which the electron beam turns on and begins to
draw the picture once again. Therefore the X position of the electron
beam is determined like this:
X = (CYCLES - 20) * 3
where CYCLES is the number of cycles that have elapsed since the horizontal
blank. But the text I have states that registers are only read every five
cycles, so the equation must be adjusted to account for that. For now,
let's just assume that we round up to the next multiple of 15. The examples
we use will involve RESP0, because I know the rule applies to that register.
The cycle count is especially important to RESP0, but almost all writes to
TIA registers are affected in some way by the cycle count. A player or
missile modification too late in the scanline will cause the player to shift
up one scanline as it moves away from the left side of the screen. Writes
to the playfield must be timed to occur in the center of the screen if one
wishes to produce an asymmetrical playfield. The number between these
asterisks is very important to the program, and you may find yourself
spending hours getting that number to be just what you need it to be for
your particular application.
I usually put relevant comment outside the counting column, but this isn't
relevant code so I decided to illustrate the mnemonic device I used to
determine the cycles for each instruction.
Branching instructions like BNE and BCC are easier than they seem.
All branch instructions take two cycles, plus one extra cycle if
the branch is taken, plus another extra cycle if said branch crosses
the page boundary.
Let's just say for now that the decision of whether to branch should be
generally constant throughout at least the time-sensitive portions of your
routine.
The 6502 has a family of "fast math" opcodes that have similar
characteristics, and consequently they have the same cycle counts. These
fast math opcodes do little more than alter registers or flags using bits
from memory. This family consists of ADC, AND, BIT, CMP, CPX, CPY, EOR,
LDA, LDX, LDY, ORA, and SBC. Not all of these instructions have all of
the following address modes, but these rules apply to whichever modes
are available. I will use ADC as an example.
Also note that Zero Page,Y addressing is only available for LDX and STX.
The instructions STA, STX, and STY have the same timing as fast math
instructions, but in the case of Absolute,XY and (Indirect),Y addressing,
the extra cycle is always added.
These weenie instructions don't even alter memory, only registers and flags.
They are CLC, CLD, CLI, CLV, DEX, DEY, INX, INY, NOP, SEC, SED, SEI, TAX,
TAY, TSX, TXA, TXS, and TYA. They take two cycles.
There are certain instructions that take more clock cycles than simple math
instructions. Some of these instructions can work with the accumulator, but
when given an address to work with, they modify memory directly. The slow
math instructions are ASL, DEC, INC, LSR, ROL, and ROR.
ROR A ; +2 Accumulator
ROR $99 ; +5 Zero Page
ROR $99,X ; +6 Zero Page,X
ROR $1234 ; +6 Absolute
ROR $1234,X ; +7 Absolute,X
Note that when these instructions work with the accumulator, they shrink down
to two cycles and become Weenie Instructions.
The two push instructions, PHA and PHP, each take three cycles.
The two pull instructions, PLA and PLP, each take four cycles.
JSR takes 6 cycles. JMP takes 3 cycles in absolute mode, and 5 cycles
in absolute indirect mode, but absolute indirect mode is for machines
that have a kernel. RTI and RTS take 6 cycles each. But with only a
few dozen instructions available per scanline, you don't have time to
bounce all over the cartridge executing subroutines.
If you can guess what this code does, congratulations. If you can't,
I'll tell you. This code checks bit 8 of location $CC. If it is set,
it goes immediately to set player 0's registers, setting its position
at cycle 20. If it is clear, then the branch isn't taken so that saves
us one cycle, but fourteen more cycles are taken by NOPs, making a net
gain of 13 cycles. Now it takes 33 cycles to reset player 0.
Make sure when putting graphics into your program, to arrange the data so
that they either NEVER cross page boundaries, or ALWAYS cross page
boundaries. As long as you can predict when that extra cycle is going to
pop up, you'll be OK. You might need to play around with the assembler
and the source code to make sure all the bytes in each graphics table
are in the same page of memory.
You can see how I came to the above conclusion if we unroll the loop.
Each time through the loop where Y>0, DEY takes two cycles and BNE takes
three cycles (due to the branch). The last time through the loop, when
Y=0, the branch is not taken so the BNE only takes two cycles.
LDY #NUM ; +2
; extra code possible here
DEYBNE DEY ; }
BNE DEYBNE ; } + NUM*5-1
Note that each iteration takes 5 CPU cycles, or 15 pixels. This is as close
as it gets to perfect for our needs, since the TIA will only let you set up
a player with RESP0 on a multiple of 15.
The X register can also be used to this end, but hey, it needs a name,
doesn't it?
CONCLUSION
Keep your code clean and tight. Make sure your display kernel routines use
the same number of scanlines no matter what happens.