Mastering Machine Code on Your ZX81
By Toni Baker

HOW TO DISASSEMBLE THE ROM

[Or How to Write an Optimised Disassembler for the ZX81]

There are three "levels" at which we may disassemble, each slightly more sophisticated than the previous. The first two levels are not all that satisfactory, but they are very easy to program.

The first "level" we have already achieved - the USR routine HLIST which we saw earlier in the book will do this for us. That is, given an address such as 0808 it will produce an output like this:

0808 57
0809 ED
080A 4B
080B 39 T
080C 40
...

and so on. This is not really disassembly, although you can of course look these bytes up in the tables at the back of the book, but it's quite a time consuming task, and you're also very likely to get lost halfway through. The second "level" is not much better, but again is quite easy to program. What I'm talking about is an output something like this:

0808 57
0809 ED4B3940
080D 79
080E FE21
...

and so on. As you can see, each instruction has its component bytes listed out to exactly the right length. This produces a very pleasing display, and there is little or no chance of you "getting lost" when actually looking these bytes up in tables. The third "level" is the one we are actually aiming at - the one everybody wants. What we'd really like is an output like this:

0808 LD D,A
0809 LD BC,(4039)
080D LD A,C
080E CP 21
...

and so on. This can be quite easy to program - simply make the computer look up the appropriate words from a table instead of doing it ourselves - however this would take up rather a large amount of space just to store the table. Around 4K in fact. The method I will describe to you will allow such a program to fit in just 1K, but be warned: it's rather difficult. There is actually a "fourth level" of disassembly, which I won't even attempt to touch, but you may like to think about. Imagine an output like this:

PRINT     LD D,A
          LD BC,(S_POSN)
          LD A,C
          CP 21
          JR Z,EXIT
          ...

As I've said, I'm not even going to touch this one. The only extra it involves is storing yet another table, this time containing all of the labels used. Let's go back a bit now to something relatively simple. Let's consider a slightly improved version of HLIST which reaches the "second level" of disassembly, and works out the length of each instruction before printing it.

All we need is a table containing just two pieces of information for each byte. These are a) the number of bytes in an instruction beginning with this byte, and b) the number of bytes in an instruction beginning with DD or FD followed by this byte. As you know, some confusion may arise over those instructions beginning with CB or ED, but we don't actually need any tables or anything to cope with these provided we remember the following rules:

All instructions beginning CB are two bytes in length.
All instructions beginning DDCB or FDCB are four bytes in length.
All instructions beginning ED are two bytes in length, except for LD BC,(pq), LD DE,(pq), LD SP,(pq), LD (pq),BC, LD (pq),DE, and LD (pq),SP. The byte immediately after ED for these six instructions is 4B, 5B, 7B, 43, 53, or 73. In binary, all of these numbers have the form 01-- -011. No other instructions have this form.
There are no instructions beginning DDED or FDED.

Thus we need a table containing a very small amount of information relating to each byte. Firstly, those instructions which do not begin DD, ED, or FD can only be one, two, or three bytes in length. This means that to store the required information we only need two bits. Secondly those instructions which begin DD or FD can only be two, three, or four bytes in length, so ignoring the DD or FD itself this leaves one, two, or three bytes. Again we need only two bits. This makes four bits altogether, and we can thus represent the appropriate lengths for each byte by a single hexadecimal digit. Our program then will make use of the following table, called LENS. It should be stored such that each element of the table has the same high part of its address:

LENS      DEFB      5F 55 55 A5 55 55 55 A5
                    AF 55 55 A5 A5 55 55 A5
                    AF F5 55 A5 A5 F5 55 A5
                    AF F5 99 E5 A5 F5 55 A5
                    55 55 55 95 55 55 55 95
                    55 55 55 95 55 55 55 95
                    55 55 55 95 55 55 55 95
                    99 99 99 59 55 55 55 95
                    55 55 55 95 55 55 55 95
                    55 55 55 95 55 55 55 95
                    55 55 55 95 55 55 55 95
                    55 55 55 95 55 55 55 95
                    55 FF F5 A5 55 FE FF A5
                    55 FA F5 A5 55 FA F5 A5
                    55 F5 F5 A5 55 F5 FA A5
                    55 F5 F5 A5 55 F5 F5 A5

As you can see, there are sixteen rows, and sixteen hex digits in each row. Those instructions beginning with DD or FD which do not exist, such as DD00, have been simply assigned the appropriate number of bytes as if the DD/FD were not there.

The following program will "disassemble" to a string of bytes of the right length. It assumes that the table LENS exists, and it assumes that a subroutine HPRINT exists which prints the contents of the A register in hexadecimal without corrupting the other registers. This subroutine was in fact given earlier on in the book.

[Thunor: I recommend using chapter11-hexld3d.p so that you can make use of the available HPRINT and address input mechanism. Therefore I've taken the liberty of adding a couple of instructions near the top and updated a couple of relative jumps lower down so that you can achieve this.]

[Thunor: I should point out that this program works if you are using it to view valid code, but it doesn't deal with invalid prefixed instructions. Because of this I chose to build upon the author's work and create something more suitable; my enhanced version is listed here.]

2A954A    LLIST     LD HL,(ADDRESS)   [Thunor: I've added this to enable
                                      address selection.]

2B        START     DEC HL            HL is just the address from which
23        NEXT      INC HL            we are disassembling.

22954A              LD (ADDRESS),HL   [Thunor: I've added this so that
                                      typing CONT works.]
3E76                LD A,76
D7                  RST 10            Print a newline.
7C                  LD A,H
CDhprint            CALL HPRINT       Print H in hex.
7D                  LD A,L
CDhprint            CALL HPRINT       Print L in hex.
AF                  XOR A
D7                  RST 10            Print a space.
0E00                LD C,00           C is just a flag to let us know
                                      whether or not an instruction
                                      begins with DD or FD.
7E        BYTE      LD A,(HL)         Obtain the byte to be disassembled.
FEDD                CP DD             Does it begin with either DD or FD?
2804                JR Z,DDFD
FEFD                CP FD
2007                JR NZ,NORM
CDhprint  DDFD      CALL HPRINT       If so, print "DD" or "FD" and look
23                  INC HL            at the next byte.
0C                  INC C             Change the flag C accordingly.
18F0                JR BYTE           Continue with next byte.
FEED      NORM      CP ED             Does the instruction begin ED?
201A                JR NZ,NOTED
CDhprint            CALL HPRINT       If so, print "ED" and look at the
23                  INC HL            next byte.
7E                  LD A,(HL)
E6C3                AND C3            Is it of the binary form 01-- -011?
E6C7                AND C7            Is it of the binary form 01-- -011?
FE43                CP 43
2004                JR NZ,ONE
0603                LD B,03           B counts the number of bytes to be
1802                JR THREE          printed after the byte ED.
0601      ONE       LD B,01

CDhprint  THREE     CALL HPRINT       Print the next B bytes.
7E        THREE     LD A,(HL)
CDhprint            CALL HPRINT       Print the next B bytes.
23                  INC HL            [Thunor: I've rearranged this part
7E                  LD A,(HL)         as the AND C3 test above destroyed
10F9                DJNZ THREE        regA e.g. ED5B9540 became ED43...]

18C2                JR NEXT           Continue with next byte.
18BE                JR START          [Thunor: I've modified this because
                                      the INC HL above was skipping over
                                      an address - HL was being
                                      incremented again at NEXT.]

E5        NOTED     PUSH HL           Temporarily store HL.
CB2F                SRA A             Divide A by two.
F5                  PUSH AF           Store the carry flag.

E67F                AND 7F            [Thunor: I've added this as SRA
                                      duplicates bit 7 into bit 6, and so
                                      bytes >= 80h wouldn't halve
                                      properly!]

C6lens-low          ADD A,LENS-low    Find the required position in the
6F                  LD L,A            table.
26lens-high         LD H,LENS-high
F1                  POP AF            Retrieve the carry flag.
7E                  LD A,(HL)
3804                JR C,DIG2         Use the carry flag to decide on
1F                  RRA               which digit from the table will be
1F                  RRA               used.
1F                  RRA
1F                  RRA
0D        DIG2      DEC C             Use C to decide which two bits
2002                JR NZ,OK          to use.
1F                  RRA
1F                  RRA
E603      OK        AND 03            Put this number in B to use as
47                  LD B,A            a count.
E1                  POP HL            Retrieve the address of the byte to
2B                  DEC HL            be disassembled.
23        NXBYT     INC HL
7E                  LD A,(HL)
CDhprint            CALL HPRINT       Print B bytes in hex.
10F9                DJNZ NXBYT
1899                JR NEXT           Continue with next byte.

Download available for 16K ZX81 -> chapter16-lenslist.p.

[Set RAMTOP to 4A00 (18944) with POKE 16389,74/NEW. Type RUN to use LENS LIST. Addresses used: 4A82 to 4B77 is occupied by HEXLD3D, LENS is 4A00, HPRINT is 4A82 and LLIST is 4B78 (19320). To test it, list 4B78 and match the output to the listing above. For a more thorough test, list 4A82 and match it with HEXLD3 in appendix one, remembering that it's been relocated from 4082 to 4A82.]

Now we ascend to the "third level" - REAL disassembly in other words. However, I am not going to write the program for you this time round - you'll have to do it by yourself. I will explain precisely what it is you have to do in order to make a 1K disassembler, but the actual program itself must be your creation.

DISASSEMBLING THE ROM

The following is an algorithm which will enable you to disassemble the hex codes into assembly, that is to change, for example, 69 to LD L,C, or from CB7E to BIT 7,(HL). One way would be to list a vast table - such as I have included in the appendices - but while alright for human beings it lacks the elegance of a well thought out computer program. The data alone would occupy around 4K. This algorithm will enable you to write your own machine language program occupying significantly less - two or even 1K all told depending on how efficient your program is.

In this algorithm, the following conventions will be used:

c(0) means NZ    n(0) means 0     q(0) means BC
c(1) means Z     n(1) means 1     q(1) means DE
c(2) means NC    n(2) means 2     q(2) means Y
c(3) means C     n(3) means 3     q(3) means AF
c(4) means PO    n(4) means 4
c(5) means PE    n(5) means 5
c(6) means P     n(6) means 6
c(7) means M     n(7) means 7

r(0) means B     s(0) means BC    x(0) means ADD A,
r(1) means C     s(1) means DE    x(1) means ADC A,
r(2) means D     s(2) means Y     x(2) means SUB
r(3) means E     s(3) means SP    x(3) means SBC A,
r(4) means H                      x(4) means AND
r(5) means L                      x(5) means XOR
r(6) means X                      x(6) means OR
r(7) means A                      x(7) means CP

Define two variables, CLASS and INDEX, and initially let both of them equal zero.

Write the byte being disassembled in binary, and split it into three parts; F, G, and H. F consists of bits 7 and 6, G of bits 5, 4, and 3, and H of bits 2, 1, and 0. Thus to disassemble the byte 69 (binary 0110 1001) just split it into three parts thus: 01/010/001. In this particular case F is one, G is five, and H is one.

Next, split G into two parts; J and K; with J consisting of bits 2 and 1, and K just bit 0. If G then were binary 101 as above then split it like this: 10/1. In this case we would define J to be two, and K to be one.

Set aside an area of memory called DIS. This is to contain a STRING of unknown length. How you store this string is up to you. There are two different methods you could use - either terminate the data with an end-of-data character (any character will do, FF is as good as any), or begin the area DIS with one byte representing the number of characters of data there are in the string (you only need one byte since DIS will never be more than 255 characters in length). DIS should initially be an empty string, (i.e. containing no characters at all).

The algorithm begins here.....

[Thunor: I have modified the prefixed instruction
classification code at the start of the algorithm and
moved the check for 76 (HALT) down to the F = 1 condition.

Although this algorithm recognises and classifies the many
different instruction types, it doesn't filter out invalid
prefixed instructions. The easiest way I know to validate
prefixed instructions is to use one bit to represent DDxx
and FDxx, another for EDxx with CBxx, DDCBddxx and FDCBddxx
being validated in code; you will only need a 64 byte table
for all the bits (2 * 256 / 8). The validation should be
done before breaking down the byte to be disassembled into
its FGHJK components.

Following is a useful table showing the instruction types
handled by all possible values of CLASS and INDEX:

+-------+------------------------------+
|       |           INDEX              |
+-------+--------+----------+----------+
| CLASS |  0 HL  |   1 IX   |   2 IY   |
+-------+--------+----------+----------+
|   0   | xx     | DDxx     | FDxx     |
|   1   | CBxx   | DDCBddxx | FDCBddxx |
|   2   | EDxx   |    --    |    --    |
+-------+--------+----------+----------+

If you require the byte count for the instruction before
the data byte(s) are added:
Let count = 1
If INDEX > 0 then let count = count + 1
If CLASS > 0 then let count = count + 1
Then when computing the final output, if you convert Vs or
add displacements then you should increment count for each
byte of data.

Y, X and V are place holders and are updated at the end of
the algorithm. The problem with this is that "X" is used
within some instructions and so I recommend that you use
something else such as inverse equivalents. Additionally
you might also want to consider representing VV with an
inverse W or maybe WV to identify the high and low bytes.

At the end of the algorithm before computing the final
output there's a space saving method of constructing the
strings for 16 different instructions after ED. This does
though present a problem with a missing "U" in OUTI and
OUTD and so you'll want to check for H=3 and (G=4 or G=5)
to identify and provide a solution to this.

At the start, when it says that you should start again, it
means read the next byte, split it into FGHJK and then
return to the top of the algorithm.

[Comment end.] 
If CLASS equals zero then the following applies:
1) If the byte is 76 then complete disassembled instruction is HALT.
2) If the byte is CB then let CLASS equal one and start again.
3) If the byte is ED then let CLASS equal two and start again.
4) If the byte is DD then let INDEX equal one and start again.
5) If the byte is FD then let INDEX equal two and start again.
6) If F equals zero then....
[If the byte is CB then...]
    [If INDEX <> zero then point past displacement.]
    [Let CLASS equal one and start again.]
[If INDEX equals zero then...]
    [If the byte is ED then let CLASS equal two and start again.]
    [If the byte is DD then let INDEX equal one and start again.]
    [If the byte is FD then let INDEX equal two and start again.]
[If F equals zero then....]
    If H equals zero then....
        If G greater than three then let DIS equal JR c(G-4),V.
        If G less than four choose the Gth item in this list:
        NOP/EX AF,AF'/DJNZ V/JR V
    If H equals one then...
        If K is zero then let DIS equal LD s(J),VV
        If K is one then let DIS equal ADD Y,s(J)
    If H equals two then...
        Let DIS equal LD plus the Gth item in this list:
        (BC),A/A,(BC)/(DE),A/A,(DE)/(VV),Y/Y,(VV)/(VV),A/A,(VV).
    If H equals three then...
        If K is zero then let DIS equal INC s(J)
        If K is one then let DIS equal DEC s(J)
    If H equals four then let DIS equal INC r(G)
    If H equals five then let DIS equal DEC r(G)
    If H equals six then let DIS equal LD r(G),V
    If H equals seven then choose the Gth item from this list:
    RLCA/RRCA/RLA/RRA/DAA/CPL/SCF/CCF.
If F equals one then let DIS equal LD r(G),r(H).
[If F equals one then...]
    [If the byte = 76 then let DIS = HALT.]
    [If the byte <> 76 then let DIS equal LD r(G),r(H).]
If F equals two then let DIS equal x(G) r(H).
If F equals three then....         
    If H equals 0 then let DIS equal RET c(G)
    If H equals one then...
        If K is zero then let DIS equal POP q(J)
        If K is one then choose the Jth item from this list:
        RET/EXX/JP (Y)/LD SP,Y.
    If H equals two then let DIS equal JP c(G),VV
    If H equals three then choose the Gth item from this list:
    JP VV/-/OUT (V),A/IN A,(V)/EX (SP),Y/EX DE,HL/DI/EI.
    If H equals four then let DIS equal CALL c(G),VV
    If H equals five then...
        If K is zero then let DIS equal PUSH q(J).
        If K is one then let DIS equal CALL VV.
    If H equals six then let DIS equal x(G) V.
    If H equals seven then let DIS equal RST plus the Gth item
    in this list: 00/08/10/18/20/28/30/38.

If CLASS equals one then the following applies:
If F equals zero then choose the Gth item from this list:
RLC/RRC/RL/RR/SLA/SRA/-/SRL and then add r(H).
If F equals one then let DIS equal BIT n(G),r(H).
If F equals two then let DIS equal RES n(G),r(H).
If F equals three then let DIS equal SET n(G),r(H).

If CLASS equals two then the following applies:
F cannot possibly equal zero.
If F equals one then....
    If H equals zero then let DIS equal IN r(G),(C).
    If H equals one then let DIS equal OUT (C),r(G).
    If H equals two then...
        If K equals zero then let DIS equal SBC HL,s(J).
        If K equals one then let DIS equal ADC HL,s(J).
    If H equals three then...
        If K equals zero then let DIS equal LD (VV),s(J).
        If K equals one then let DIS equal LD s(J),(VV).
    If H equals four then let DIS equal NEG.
    If H equals five then...
        If K equals zero then let DIS equal RETN.
        If K equals one then let DIS equal RETI.
    If H equals six then choose the Gth item from this list:
    IM 0/-/IM 1/IM 2/-/-/-/-.
    If H equals seven then choose the Gth item from this list:
    LD I,A/LD R,A/LD A,I/LD A,R/RRD/RLD/-/-.
If F equals two then choose the Hth item from this list: LD/CP/IN/OT/
-/-/-/- and then add the Gth item from this list: I/D/IR/DR/-/-/-/-.
-/-/-/- and then add the Gth item from this list: -/-/-/-/I/D/IR/DR.
F cannot possibly be three.

To compute the final output:
If INDEX equals zero replace every Y by HL.
If INDEX equals one replace every Y by IX.
If INDEX equals two replace every Y by IY.

If INDEX equals zero replace every X by (HL).
If INDEX equals one replace every X by (IX+d) where d is defined by
    the next byte but one after the byte DD.
If INDEX equals two replace every X by (IY+d) where d is defined by
    the next byte but one after the byte FD.
(This does not apply if the X is preceded by I.)

Replace every V by the next byte in sequence (of those being
disassembled).

DIS now contains the correctly disassembled instruction. This should
now be printed to the screen.

Download available for 16K ZX81 -> sif-disasm.p.

[There's also a complete listing available which you can easily compare to the algorithm above.]

It is possible to write a machine language program which disassembles things by using this algorithm. In fact it is possible to write such a program in just 1K. Surprising as this may sound I should add that although it is possible, the program itself is rather complicated, and involves a completely new programming technique.

What I will do is to not actually write the program for you, but to give you hints and suggestions as to how it may be done. The program revolves around eight different subroutines, which are linked together by one MASTER subroutine which calls them all up in any required order. This is achieved as follows.

Somewhere in the program there should be a table called SUBTAB which contains eight different addresses - these are the addresses of the eight subroutines which control the program. The register-pair HL' (note the dash) will be pointing to a sequence of data which tells the MASTER subroutine which order it must call the others in. The data in this sequence is terminated by an item in which bit 7 is one. The data consists simply of numbers zero to seven. Zero calls subroutine zero, one calls subroutine one, and so on. Thus this number zero to seven determines exactly which subroutine the MASTER routine is to call.

So any item of data in this sequence looks, in binary, like this: 0--- -nnn for most items, or 1--- -nnn for the last item (the part written nnn means the appropriate number zero to seven as described). Now some of these eight subroutines will need to be supplied with DATA, which by coincidence will also need to be a number between zero and seven - if this number in binary is ddd then it makes sense to save space by storing this number amongst some of the unused bits of the subroutine-call, thus making it look, in binary, like this: 0-dd dnnn or 1-dd dnnn. We have now made use of every bit except bit 6. This isn't needed, so for [the] sake of argument let's always make it zero. Any item of data in the sequence can then be 00dd dnnn, but the last byte must be 10dd dnnn.

I hope that didn't confuse you. To make things clear, suppose HL' points to an address at which is stored the sequence of data 00 01 22 83. This means that first of all subroutine zero is to be called, then subroutine one, then subroutine two (which will use the data binary 100 somewhere), then finally subroutine three. I say "finally" because bit 7 is set which means we are finished.

The master subroutine which will achieve this is as follows:

D9        MASTER    EXX
7E                  LD A,(HL)         Find byte of data, and increment
23                  INC HL            pointer.
D9                  EXX
5F                  LD E,A            Store this byte, in case bits 5, 4,
                                      and 3 contain data to be used in
                                      the appropriate subroutine.
E607                AND 07            Isolate bits 2, 1, and 0.
17                  RLA               Multiply by two.
4F                  LD C,A            Store this number in the BC
0600                LD B,00           register pair.
21 return           LD HL,RETURN      Specify the return address from
E5                  PUSH HL           each of the eight subroutines.
21 mastrads         LD HL,MASTRADS    Point HL to the start of the table
                                      which stores the eight subroutine
                                      call addresses.
09                  ADD HL,BC         Point HL to the required address.
4E                  LD C,(HL)         Store this address in the BC
23                  INC HL            register pair.
46                  LD B,(HL)
C5                  PUSH BC           Call this subroutine.
C9                  RET
7B        RETURN    LD A,E            If bit 7 was not zero then continue
17                  RLA               with the next byte of data.
30E8                JR NC,MASTER

You can learn a lot from studying this MASTER-SUBROUTINE. Can you see how the appropriate subroutine (one of eight) is called? First of all the label RETURN is pushed onto the stack. This means that if each of the eight routines ends with a RET instruction then control will jump to the label RETURN - just as if the subroutine had been accessed normally. To call the subroutine itself, the address of which was in the register-pair BC, we used PUSH BC followed by RET. Think carefully about how this works. The required address is pushed onto the stack, above the address RETURN. Then a RET instruction is executed. RET has the effect of popping the first number from the stack (the subroutine address) and jumping to that address. The first address left on the stack is now the address RETURN, which enables control to return correctly. All of this is necessary because there is no such instruction as CALL (BC) - in BASIC the statement GOSUB VARIABLE is allowed, but not in machine code. Another way we could have achieved the same as PUSH BC/RET is by using the sequence LD H,B/LD L,C/JP (HL). Can you see why this does the same thing?

You may be wondering how the appropriate address came to be in HL' in the first place. There are two means by which this will be determined. Note that all of the alternative registers have specific jobs. These are:

BC'       The address of the byte to be disassembled.
D'        The variable of INDEX.
E'        The variable CLASS.
HL'       Points to subroutine data.

The byte to be disassembled is located and stored in the D register by the means EXX/LD A,(BC)/INC BC/EXX/LD D,A. From this the quantities I called F, G, and H may later be discovered. Somewhere in the program there should be a table called TABLE containing twelve different addresses. HL' is simply read from this table. The twelve addresses correspond to the cases CLASS equals zero and F equals 0, 1, 2, or 3; CLASS equals one and F equals 0, 1, 2, or 3; and CLASS equals two and F equals 0, 1, 2, or 3.

The other way in which HL' may be determined is if subroutine zero is called. Subroutine zero is called by the data-byte 00. This will be immediately followed by eight different addresses corresponding to the cases H equals zero, up to H equals seven. Subroutine zero has the task of locating the appropriate address from the list and storing it in the register-pair HL'.

One subroutine you will need (but not one of the eight central ones), is a subroutine to add a single character to the end of the string DIS. Using the convention that the string begins at address DIS and is terminated by the byte FF, the string may be emptied by the sequence LD HL,DIS/LD (HL),FF. To add a character (held in the A register) the subroutine is:

C5        ADDDIS    PUSH BC      Store the registers BC and HL so
E5                  PUSH HL      that they won't be altered by the
                                 subroutine.
0601                LD B,01      This is so that CPIR won't stop
                                 because of BC.
21dis               LD HL,DIS    Find the start of the string.
F5                  PUSH AF      Temporarily stack A.
3EFF                LD A,FF
EDB1                CPIR         Find the end of the string.
77                  LD (HL),A    Insert a new end-of-string marker.
2B                  DEC HL
F1                  POP AF       Retrieve A.
77                  LD (HL),A    Add this character.
E1                  POP HL       Retrieve the remaining registers.
C1                  POP BC
C9                  RET          End of subroutine.

The eight subroutines you will need for this disassembly program are as follows:

SUBROUTINE 0 - SPLIT

This is the subroutine called by the byte 00. It is always the first subroutine called, if it is used at all. The byte 00 should be followed [by] eight new addresses within the disassembler program. Located at these addresses are eight different sequences of data, which correspond to the cases H equals zero, H equals one, and so on up to H equals seven. One of these sequences is selected (according to H) and the data used to decide which of the eight subroutines should then be used.

SUBROUTINE 1 - LITERAL

The byte 01 (or 81 if it is the last subroutine-call in sequence) is followed by a series of characters, such as N O and P, which represent part or all of the disassembled instruction. The last character should have one of the unused bits (6 or 7) set, to indicate the fact that it is the last character. The subroutine should use one bit of data, with the meaning that if it is called by the byte 09 (or 89) then the literal data following should have a space inserted after the last character. This literal data is to be added to the end of the data storage area called DIS.

SUBROUTINE 2 - LIST-G

Means select the Gth item from the following list. The subroutine needs data to specify how many items there are in the following list. If there are four items the data 011 (3) is required, if there are eight items, the data 111 (7) is required, and so on, the data always being one less than the number of items in the list. For example the byte 3A (in binary 0/0/111/010 - meaning call subroutine 2 and provide it with the data 111) means select the Gth item from the following list of eight. The list could, for instance, be R, L, C, inverse A, R, R, C, inverse A, R, L, inverse A, R, R, inverse A, D, A, inverse A, C, P, inverse L, S, C, inverse F, C, C, inverse F. I've used 'inverse' to indicate the last character in an individual item. You don't have to do this - you can use any means you choose as long as it works. Thus if G (that is bits 5, 4, and 3 of the instruction being disassembled) were 5, the literal DAA would be added to the end of DIS. The next byte to be interpreted as data will be the byte after the inverse F.

SUBROUTINE 3 - LIST-H

Means select the Hth item in the following list. Its explanation is exactly the same as that of subroutine 2.

SUBROUTINE 4 - SELECT-G

Again, three bits of data are required. Interpret as follows. If the data is 000 select r(G), if the data is 001 select s(G), if the data is 010 select q(G), if the data is 011 select n(G), if the data is 100 select c(G), and if the data is 110 select x(G). The item selected is to be added to the end of DIS.

SUBROUTINE 5 - SELECT-H

As subroutine 4, except that H is used instead of G.

SUBROUTINE 6 - SKIP

Resets bit 5 of E (the data-byte), and if the previous value of bit 5 was one, skips over n bytes of data. The number n is determined by the immediately following byte. If bit 5 was zero this immediately following byte (which is only there to specify n) is ignored, and the next byte after is then interpreted as the next item of data.

SUBROUTINE 7 - K-SKIP

Replace bit 3 of E by bit 4, replace bit 4 by bit 5, and reset bit 5. Effectively this is the same as LET G equal J. Then if the previous value of bit 3 was one, n bytes are skipped over, as in subroutine six. This subroutine can be interpreted as IF K equals zero THEN.... otherwise IF K equals one then....

                        +-----------+        +^^^^^^^^^^^+
Human operator -------->|           |BC'     |           |
                        +-----------+        +-----------+
        address of byte to be | disassembled |           |
                              |              +-----------+
                              +------------->|           |---+ points to
                                             +-----------+   | an address
                                             |           |   | in memory
                the byte to be disassembled  +vvvvvvvvvvv+   |
                        +-----------+                        |
                      D |           |<-----------------------+
   +-----------+        +-----------+
   |CLASS      |--------------+   |          +^^^^^^^^^^^+
   +-----------+              |   |          |           |
   +-----------+              |   |          +-----------+
   |TABLESTART |----------+   |   |          |           |
   +-----------+          V   V   V          +-----------+
                        +-----------+        |           |
       address in table |           |        +-----------+
                        +-----------+        |           |
                              |              +-----------+
                              +------------->|           |---+ points to
                                             +-----------+   | an address
                                             |           |-+ | in memory
                                             +-----------+ | |
                                             |           | | |
                                             +vvvvvvvvvvv+ | |
                                                           | |
                        +-----------+<---------------------+ |
                    HL' |           |                        |
                        +-----------+<-----------------------+
 the address of the start of  |
        the disassembly data  |              +^^^^^^^^^^^+
                              |              |           |
                              |              +-----------+
                              +------------->|           |---+ points to
                                             +-----------+   | an address
                                             |           |   | in memory
                                             +vvvvvvvvvvv+   |
                        +-----------+                        |
                      E |           |<-----------------------+
                        +-----------+
   +-----------+ the data itself  |          +^^^^^^^^^^^+
   |SUBTABSTART|----------+       |          |           |
   +-----------+          |       |          +-----------+
                          V       V          |           |
  address in subroutine +-----------+        +-----------+
                  table |           |        |           |
                        +-----------+        +-----------+
                              |              |           |
                              |              +-----------+
                              +------------->|           |---+ points to
                                             +-----------+   | an address
                                             |           |-+ | in memory
                                             +vvvvvvvvvvv+ | |
                                                           | |
                        +-----------+<---------------------+ |
subroutine call address |           |                        |
                        +-----------+<-----------------------+
                       / | | | | | | \
                      /  | | | | | |  \
                     V   V V V V V V   V

With these eight subroutines, which you will have to write yourself, you can disassemble every instruction. I will give you an example. Suppose CLASS is zero, and F is three. The first byte it has to interpret should be 00. This alters the value of HL' according to the quantity H, that is, bits 2, 1, and 0 of the byte being disassembled. Suppose now that H is one. HL' should now be pointing to the following sequence of data, listed here along with its meaning.

data                binary       meaning
07 05               0000 0111    KSKIP 5
09 35 34 B5         0000 1001    LITERAL POP (space)
94                  1001 0100    SELECT-G.q (EXIT)
9A                  1001 1010    LIST-G.4 (EXIT)
37 2A B9                          RET
2A 3D BD                          EXX
2F 35 00 16 3E 91                 JP (Y)
31 29 00 38 35 1A BE              LD SP,Y

To represent strings of data here you can see I've used just the character codes, with the final character inversed to show that it is the last character. In other words EXX is written as 2A 3D BD rather than just 2A 3D 3D. It is of course very important to know where one string ends and the next begins.

If you follow through which subroutines have been called by the data and what they are supposed to do you'll see that in a total of only twenty-seven bytes we have said IF K equals zero then LET DIS equal POP q(J), IF K equals one then LET DIS equal the Jth item from this list: RET/EXX/JP (Y)/LD SP,Y. If this procedure is continued for every instruction, following the algorithm I gave earlier in the chapter, you'll find that the data required for disassembly is now significantly LESS than 1K.

The entire disassembly program consists of initialising the variables CLASS and INDEX, assigning BC' (usually input by the human operator), finding the address HL' from tables, and then going into the master-routine. On exiting this it must then replace all V's, X's and Y's as defined earlier in this chapter, and then PRINT the result computed and go on to the next byte to be disassembled and treat it in the same way. The rest of the program consists of the eight subroutines, the table of addresses, and the data required for disassembly. The whole of this will occupy rather less than 1K.

However simple, or difficult, I may have made this program sound, you will undoubtedly find writing it a challenge. The vast majority of the program is data, and each address in every table must point to exactly the right byte. If you get any of it wrong it will be very difficult to trace.

You can improve the program too. I haven't used bit 6 of the data - you may be able to think of a use for it, for example it could indicate that a comma needs to be inserted, the choice is yours.

Like draughts, this program is so vast that even though the machine code listing itself will fit into 1K, you will need more than 1K in order for the machine code to be put there. Any editing program, BASIC or machine code, will take you above the 1K.

Good luck.

Mastering Machine Code on Your ZX81 By Toni Baker

[Or How to Write an Optimised Disassembler for the ZX81]

Download available for 16K ZX81 -> chapter16-lenslist.p.

DISASSEMBLING THE ROM

Download available for 16K ZX81 -> sif-disasm.p.

Mastering Machine Code on Your ZX81
By Toni Baker