BASIC (Re)Numbering — with Commodore

How to thoroughly renumber Commodore BASIC programs (PET 2001, VIC-20, C64)

Renumbering Commodore BASIC
The curious wonders of Commodore BASIC renumbering.

Renumbering a is popular beginner’s sport for those who dare to venture into the mythical realms of intricacy, which the BASIC interpreter and its storage formats provide. Since it is also a useful utility, there are numerous programs for this, especially it’s a favorite example in the intructory sections of machine language guides. However, as always, there’s an easy way and a thorough one. And we’ll see soon, why this may be.

Let’s start at the beginning, a stored BASIC program in memory.

(Yes, an advert.)

BASIC Code Representation in Memory

We already had a closer look at how Commodore BASIC stores its programs in a previous episode, however, we may want to recap this here with a somewhat different focus. Importantly, Commodore BASIC stores keywords as single-byte tokens, by this saving quite an amount of memory, but there are also literals reppresented in plain ASCII (or, beyond charter code 127, in PETSCII). And, as we shall see, BASIC lines are stored as a linked list and may be all over the place, regardless of their order in sequence.

Trivia: The tokens are the same for the PET 2001, VIC-20, and the C64. However, the first ROM edition for the PET 2001 is lacking the bare GO keyword, which was only added as the very last token in revision 2 of the PET’s ROMs (see below).

Let’s have a look at (very) a simple program:

10 PRINT "NUMBER";:INPUT A
20 PRINT "YOUR NUMBER IS ";A
30 IF A>0 GOTO 10
40 END

As the attentive reader may have observed already, this tiny program is asking for a number (line 10), which is then printed onto the screen (line 20). If the number is larger than 0, we jump back to the beginning, otherwise, we finish on a quite luxurious END statement in line 40.

And this is, what this program looks in memory, in all the glory of its idle bytes. Here, on a PET, where BASIC memory starts at $0401 (on a C64, it’s $0801, and on a VIC-20 it starts at $1000, or, if there’s a memory extension of at least 8K, at $1200):

0400: 00 15 04 0A 00 99 20 22  ...... "
0408: 4E 55 4D 42 45 52 22 3B  NUMBER";
0410: 3A 85 20 41 00 2F 04 14  :. A....
0418: 00 99 20 22 59 4F 55 52  .. "YOUR
0420: 20 4E 55 4D 42 45 52 20   NUMBER
0428: 49 53 20 22 3B 41 00 3E  IS ";A..
0430: 04 1E 00 8B 20 41 B1 30  .... A>0
0438: 20 89 20 31 30 00 44 04   . 10...
0440: 28 00 80 00 00 00        ......

At first glance, we may already spot some dispersed sequences of ASCII codes, however, the line numbers, we’re particularly interested in, are hiding somewhere in those bytes. Except, for a single “10” standing out near the end of the program, which might be well the jump target of our GOTO command.

Let’s apply a bit more structure to this and find out, how this works:

10 PRINT "NUMBER";:INPUT A’

0401: 15 04   binary address of next line (LO-byte, HI-byte), $0415
0403: 0A 00   BASIC line number as binary (LO-byte, HI-byte), $000A = 10
0405:    99   BASIC token for PRINT (a token’s value is always ≥ $80)
0406:    20   ASCII character <blank>
0407: 22 4E 55 4D 42 45 52 22   ASCII sequence (string literal), "NUMBER"
040F:    3B   ASCII character ;
0410:    3A   ASCII character :
0411:    85   BASIC token for INPUT
0412:    20   ASCII character <blank>
0413:    41   ASCII character A (identifier)
0414:    00   END OF LINE marker (binary zero)

As we may see, almost everything is stored as ASCII text with the notable exception of the tokens, which substitute the BASIC keywords. A token has always the highest bit set and a value ≥ $80. From this follows that any normal ASCII text, outside of string literals, must not have the MSB set, hence, PETSCII special characters are not allowed in normal code.

The most interesting parts, however, are the first 4 bytes and the very last one, which provide the structural aspect to the code: The first pair (in LO-HI order) provides a link to the memory address of the next line in sequence, the second pair the line number in binary. (While this would facilitate line numbers up to 65535 in theory, the highest line number supported by Commodore BASIC is actually 63999.) A trailing zero-byte eventually ends the line of BASIC code.

20 PRINT "YOUR NUMBER IS ";A’

4015: 2F 04   link to next line, $042F
4017: 14 00   line number (binary), $0014 = 20
4019:    99   BASIC token PRINT
041A:    20   ASCII character <blank>
041B: 22 59 4F 55 52 20 4E 55 4D 42 45 52 20 49 53 20 22
              string literal (ASCII), "YOUR NUMBER IS "
042C:    3B   ASCII character ;
042D:    41   ASCII character A (identifier)
042E:    00   END OF LINE marker (binary zero)

This is quite the same, but we may verify that the link address of the first line does actually point to the start of this line. For the next line, as we're already familiar with the pattern, we skip a few bytes and are concerned only with the interesting part:

30 IF A>0 GOTO 10’

042F: 3E 04   link to next line, $043E
0431: 1E 00   line number (binary), $001E = 30
...
0439:    89   BASIC token GOTO
043A:    20   ASCII character <blank>
043B: 31 30   ASCII sequence ‘10’ (line number)
043D:    00   END OF LINE marker (binary zero)

As we can see, while line numbers are generally stored as binaries, when they occur as a jump target in BASIC text, they are stored as plain ASCII strings!

40 END’

043E: 44 04   link to next line, $0444
0440: 28 00   line number (binary), $0028 = 40
0442:    80   BASIC token END
0443:    00   END OF LINE marker (binary zero)

0444: 00 00   line link 0000 => end of code

And here’s our very last line. What’s interesting here, is that it is not the END keyword, which ends the code in memory, this one rather instructs the interpreter to end its execution. What really marks the end of the code, is an empty (zero) line link, which marks the end of the linked list of BASIC lines. This is essential to the housekeeping of the BASIC interpreter. Without, we may encounter some difficulties, when we try to add, insert or edit a line.

Commodore BASIC lines as linked list
Commodore BASIC lines are stored as a linked list.

Renumbering

As you probably have worked out already, a simple renumbering algorithm is as simple as following the line links, starting from the BASIC start address (here, on a PET 2001: $0401) and fixing the binary line numbers, LO-byte first, followed by the HI-byte. This is also what most of the tutorial aplications do. However, this is a bit of a naïve approach, because we would probably end up with a nonfunctional program, since the jump targets were left untouched. By leaving this to the user, our utility wouldn’t do a particularly great job.

Identifying Jump Targets

As we’ve seen above, line numbers as jump targets are stored as ASCII literals in the program text. These are quite easy to identify, since they may be preceded by just a few keywords. Namely, these are:

All of them come with a few of special cases:

Line Numbers as Jump Targets

As previously observed, line numbers are stored as plain ASCCI strings. However, there’s more to them. They may not only be proceded by any number of spaces, they may also contain any number of spaces, which will be simply ignored by the interpreter.

1 0 0  =  100

Moreover, while they may not carry a sign, they may be even decimal numbers. That is, they will not be evaluated as such, as the interpreter will just read the integer part, stop on the first non-number and non-space character, and will happily take the jump, regardless of what may follow in the BASIC text. This is illustrated by the following line:

100 ON A GOTO 200, 300.5, 400

When executed, the interpreter will jump on "A=1" and “A=2”, but will exit with an error message “SYNTAX ERROR IN 100” as it evaluates the decimal point in order to read the next target for “A=3”.

Anyway, we won’t have to care for fractual numbers, but for the integer part only. However, this may contain any number of spaces.

Here are a few examples of the type of code, we'll have to consider:

GOTO100
GOTO 100
GOTO 1 200
GO TO100
GO TO 100
GO TO 1 200
GOSUB200
GOSUB 200
ON … GOTO  100,200, 300
ON … GO TO 100,200, 300
ON … GOSUB 100,200, 300
THEN100
THEN 100
THEN  1 0 0

And, above all, we must ignore anything following a REM statement and any string literals (in quotes, ASCII $22). Any numbers, we may find here, regardless of the PETSCII character by which they might be preceded, are to be left as-is.

Varying Extent — Code Length

As may have become obvious by these explanations, there’s no guarantee that a new line number may be fitted in the same amount of bytes as there has been previously. E.g., as we have to replace “10” by “1000”, there’s no hope to summon the extra bytes from a miraculous source. We'll have to move all consecutive bytes in memory and we'll have to relink the lines. In essence, we'll have to rebuild the very structure of the program in memory. (However, if there are a few dead bytes in memory from previous edits, we may close these gaps and save a few bytes, as we rebuild the entire linked list in sequence.)

Doing it

Time to come up with an algorithm. Here, for illustration purpose and because it’s also real code for a real emulator, we’ll do it in JavaScript. This shouldn’t only be easier to follow than machine code, at the same time, it also provides us with ample external resources. If we did the same in 6502 code, we’d probably have to move stuff around using buffers, namely the the BASIC input buffer and the cassette buffer(s).

Trivia: Amazingly, the PET 2001 had two cassette buffers, one for the internal tape drive and one for a second, external cassette port. This way, you could compute like a pro, loading both from a dedicated program tape and a data tape without DJ-ing multiple cassettes on a single drive. Which may have been actually of use, when running complex business logic on a machine with just 8K, which may require the sequential loading of partial programs or even code overlays.

Anyway, which way we ever decide to do it, we won’t succeed on a single pass. We’ll have to read all occuring line numbers first and build a reference to the new, renumbered ones in a table, since there’s no guarantee that jumps will happen only to lines, which we have encountered and renumbered already. So we’ll have to take a second pass to fix the jump targets according to our reference table, once we’ve succeded on the first pass.

There are aspects, which are open to the implementation. E.g., we could insert the binary line numbers on the first pass, or we could do so on the second pass just the same. If we did it in machine language, we would want to keep at least one of the passes as simple as possible and just skim through the linked list of lines to build the reference table as quickly as possible.

Here, in the case of our external JS utility, we decide to go for a different approach, first parsing and splitting the lines in chunks, which will be either opaque blobs of bytes, for which we do not care, or numbers representing jump targets. We’ll collect them in a list, and then, in a second pass, we’re going to reassemble the code from this list, which is as trivial as rejoining the junks, now using the new numbers for any jump targets, preceded by the new line numbers in binary. And we’ll have to fix up the line link as soon as we know the extent of our newly assembled line. Finally, we’ll add two zero bytes for the end-of-code marker and are done.

(For a machine language implementation, we’d probably move chunks of code into a buffer, parse and reassemble the new code from there into memory, at the same time taking care for any code that has to be moved in order to provide space.)

Let’s dive into this!

First Pass – Parse & Split

For the purpose of this blog, we’ll assume that we have santized our input values already, which will be passed as arguments to our utility function. These are:

Further, there will be variables for the array of lines, containing anything, we’ll parse in this first pass, and a table holding a reference of line numbers (old → new). And we’ll encapsulate each of the passes in its own function. This gives us the following basic structure:

// renumber BASIC in provided memory slice (0..max)
// processes GOTO, GOSUB, GO TO, ON GOTO|GOSUB|GO TO, THEN

function renumberBasic(mem, basicStart, startNo, step) {

   var lineNoTable = {},
       lines = [],
       maxLineNo = 36999;

   // parse and split lines to chunks of blobs and line targets
   // stores parsed chunks in array lines,
   // generates a reference of old and new line numbers in lineNoTable.
   function parseSplit() {
      ...
   }

   // reassamble BASIC code from line chunks using new line numbers
   // from lineNoTable
   function reassembleLines() {
      ...
   }

   parseSplit();
   reassembleLines();

}

As for our function parseSplit(), we’ll have a main loop, which will be also safeguarding the range, in case we’d be dealing with corrupted code. In this, we’ll read lines in sequence from memory following the line links. An empty line link will indicated that we have arrived at the end of the code and are done with the pass. Otherwise, we generate a new entry for the line, both in our array of lines and in the reference table “lineNoTable”. Each entry to the lines array will have the original line number and an array to hold the parsed chunks. After advancing the line number in preparation of a new line to parse, we read the rest of the line until we encounter a zero-byte as the EOL marker.

function parseLineTarget(maybeList) {

    var addr = basicStart,
        lineNo = startNo,
        maxMem = mem.length,
        chunks,
        blob;

    while (addr < maxMem) {
       var lineLink = mem[addr++] + (mem[addr++]<<8);
       if (!lineLink) break;
       chunks = [];
       blob = [];
       var ln = String(mem[addr++] + (mem[addr++]<<8)),
           byte = mem[addr++],
           line = {
              'ln': ln,
              'chunks': chunks
           };
       lineNoTable[ln] = lineNo;
       lineNo += step;
       while (byte) {
          ...
          b = mem[addr++];
       }
       if (blob.length) chunks.push({'type': 'blob', 'val': blob});
       lines.push(line);
       addr = lineLink;
   }
}

The array blob holds a reference to our current working blob and is initially empty, however, it will be filled in our inner “while” loop, where we’re searching for any jump targets.

       while (byte) {
          blob.push(byte);
          switch(byte) {
             case 0x8F: // REM (read rest of line)
                while (mem[addr]) blob.push(mem[addr++]);
                break;
             case 0x22: // quote (read up to next quote)
                var c = mem[addr++];
                while (c) {
                   blob.push(c);
                   if (c === 0x22) break;
                   c = mem[addr++];
                }
                break;
             case 0x89: // GOTO
             case 0x8D: // GOSUB
                parseLineTarget(true);
                break;
             case 0xA7: // THEN
                parseLineTarget(false);
                break;
             case 0xCB: // GO (read ahead and test for TO)
                var t = addr;
                while (mem[t] == 0x20) t++; // skip any blanks
                if (mem[t] !== 0xA4) break;
                while (addr <= t) blob.push(mem[addr++]);
                parseLineTarget(true);
                break;
          }
          byte = mem[addr++];
       }

Here, we first add the byte just read to our current blob. Then, we check, if it is a token of interest in a switch statement:

So, the real business of identifying line numbers and splitting the source is left to a function “parseLineTarget()”, which is a function internal (and private) to “parseSplit()”, looking like this:

    // scans for a jump target,
    // generates a target entry and a new blob in current chunks,
    // any leading blanks are added to the current blob
    function parseLineTarget(maybeList) {
       while (mem[addr] === 0x20) {
          blob.push(0x20);
          addr++;
       }
       // parse ASCII to string, ignoring any blanks
       var n = '',
           c = mem[addr];
       while ((c >= 0x30 && c <= 0x39) || c === 0x20) {
          if (c !== 0x20) n += String.fromCharCode(c);
          c = mem[++addr];
       }
       // if we found a number, push chunks and open a new blob
       if (n !== '') {
          chunks.push({'type': 'blob', 'val': blob});
          chunks.push({'type': 'target', 'val': n});
          blob = [];
          // scan for any comma (ON GOTO|GOSUB|GO TO)
          if (maybeList && c === 0x2C) {
             blob.push(0x2C);
             addr++;
             parseLineTarget(true);
          }
       }
    }

While this function does much of the heavy lifting, it is also quite straight forward:

I guess, the algorithm lends itself rather well to 6502 machine language code, as well, if we replaced the store-in-chunks mechanism by some actual copying to current pointer positions in memory. However, on the 6502, we’d probably prefer to do this on the second pass. Using a higher level language with ample resources at hand, we prefer to divide the tasks along the lines of gathering and preparing information on the one hand and final (re)assembly on the other hand.

By this, we have our entire parse & split routine in place:

var lineNoTable = {},
    lines = [],
    maxLineNo = 36999;

// parse and split lines to chunks of blobs and line targets
// stores parsed chunks in array lines,
// generates a reference of old and new line numbers in lineNoTable.
function parseLineTarget(maybeList) {

    var addr = basicStart,
        lineNo = startNo,
        maxMem = mem.length,
        chunks,
        blob;

    // scans for a jump target,
    // generates a target entry and a new blob in current chunks,
    // any leading blanks are added to the current blob
    function parseLineTarget(maybeList) {
       while (mem[addr] === 0x20) {
          blob.push(0x20);
          addr++;
       }
       // parse ASCII to string, ignoring any blanks
       var n = '',
           c = mem[addr];
       while ((c >= 0x30 && c <= 0x39) || c === 0x20) {
          if (c !== 0x20) n += String.fromCharCode(c);
          c = mem[++addr];
       }
       // if we found a number, push chunks and open a new blob
       if (n !== '') {
          chunks.push({'type': 'blob', 'val': blob});
          chunks.push({'type': 'target', 'val': n});
          blob = [];
          // scan for any comma (ON GOTO|GOSUB|GO TO)
          if (maybeList && c === 0x2C) {
             blob.push(0x2C);
             addr++;
             parseLineTarget(true);
          }
       }
    }

    while (addr < maxMem) {
       var lineLink = mem[addr++] + (mem[addr++]<<8);
       if (!lineLink) break;
       chunks = [];
       blob = [];
       var ln = String(mem[addr++] + (mem[addr++]<<8)),
           byte = mem[addr++],
           line = {
              'ln': ln,
              'chunks': chunks
           };
       lineNoTable[ln] = lineNo;
       lineNo += step;
       while (byte) {
          blob.push(byte);
          switch(byte) {
             case 0x8F: // REM (read rest of line)
                while (mem[addr]) blob.push(mem[addr++]);
                break;
             case 0x22: // quote (read up to next quote)
                var c = mem[addr++];
                while (c) {
                   blob.push(c);
                   if (c === 0x22) break;
                   c = mem[addr++];
                }
                break;
             case 0x89: // GOTO
             case 0x8D: // GOSUB
                parseLineTarget(true);
                break;
             case 0xA7: // THEN
                parseLineTarget(false);
                break;
             case 0xCB: // GO (read ahead and test for TO)
                var t = addr;
                while (mem[t] == 0x20) t++; // skip any blanks
                if (mem[t] !== 0xA4) break;
                while (addr <= t) blob.push(mem[addr++]);
                parseLineTarget(true);
                break;
          }
          byte = mem[addr++];
       }
       if (blob.length) chunks.push({'type': 'blob', 'val': blob});
       lines.push(line);
       addr = lineLink;
    }
}

As this function finishes, we should get a neatly arranged list of lines and chunks, like this:

‘10 …ON A GOTO 1000, 1100’ →

lines = [
   {
      "ln": 10,
      "chunks": [
         {
            "type": "blob",
            "val": […, 0x91, 0x20, 0x41, 0x20, 0x89, 0x20] // …"ON A GOTO "
         },
         {
            "type": "target",
            "val": "1000"
         },
         {
            "type": "blob",
            "val": [0x2C, 0x20]
         },
         {
            "type": "target",
            "val": "1100"
         }
      ]
   },
   ...
]

lineNoTable = {
   "10": 100,
   ...
}

Second Pass — Reassembling the BASIC Code

Reassembling the new code from this should be fairly easy. All we have to do, is joining the chunks, by this replacing old line numbers by new ones. Moreover, we have to generate the linkt to the next line and insert the current line number in binary.

The only tricky part here is the edge case of a dead target line number, for which we do not have an entry in our reference table. What should we do? — Shrug and reuse the old one.

// reassamble BASIC code from line chunks using new line numbers
// from lineNoTable
function reassembleLines() {
   var addr = basicStart;
   for (var i = 0, max = lines.length; i < max; i++) {
      var currLine = lines[i],
          currNo = lineNoTable[currLine.ln],
          linkAddr = addr;
      mem[addr++] = 0;
      mem[addr++] = 0;
      mem[addr++] = currNo & 0xFF;
      mem[addr++] = (currNo >> 8) & 0xFF;
      // transfer chunks to memory
      for (var j = 0; j < currLine.chunks.length; j++) {
         var chunk = currLine.chunks[j];
         if (chunk.type === 'blob') {
            var blob = chunk.val;
            for (var k = 0; k < blob.length; k++)
               mem[addr++] = blob[k];
         }
         else if (chunk.type === 'target') {
            var s = '';
            if (chunk.val) {
               var n = lineNoTable[chunk.val];
               s = typeof n !== 'undefined'? n.toString(10):chunk.val;
            }
            for (var k = 0; k < s.length; k++)
               mem[addr++] = s.charCodeAt(k);
         }
      }
      mem[addr++] = 0; //EOL
      // fix up the line link
      mem[linkAddr++] = addr & 0xFF;
      mem[linkAddr] = (addr >> 8) & 0xFF;
   }
   // end of code (empty line link)
   mem[addr++] = 0;
   mem[addr++] = 0;
   return addr;
}

The new BASIC code is directly injected in the memory slice (mem), which was passed to our function renumberBasic() by reference. Mind that the cursor addr is finally pointing to the first memory location immediately after the reassembled code. Since this may be handy, we pass this as a return value.

Calling the Subroutines

The only thing left is actually calling these functions. Mind that there is an edge case, where the renumbered lines exced the range of valid line numbers. (Compare the value 63999 configured in variable maxLineNo.) Therfore, after calling the function parseSplit(), we should check the highest of the newly assigned line numbers, which happens to be the last entry in lineNoTable. If it is out of range, we abort with an error massage. (On the 6502, we’d probaly call the kernals error routine and output an “ILLEGAL QUANTITY” error.) Otherwise, we call the reassembly routine and return the end address of the newly generated code. Which is also the end of our renumbering routine.

function renumberBasic(mem, basicStart, startNo, step) {
   ...

   parseSplit();
   // check, if we are still in the range of legal line numbers
   if (lines.length) {
      var topLineNo = lineNoTable[lines[lines.length - 1].ln];
      // we could throw an exception as well…
      if (topLineNo > maxLineNo) return {
         'addr': -1,
         'message': 'Out of range. Top line number is ' + topLineNo +
                    ' (' + maxLineNo + ' allowed).'
      };
   }
   var endAddr = reassembleLines();
   return {
      'addr': endAddr
   };
}

One Thing Left — Or, What Could Possibly Go Wrong?

Now, this looks great. If we call our utility routine from the emulator, we should get a nice, renumbered code. Indeed, consider this as simple as pointless test sample (purposefully including some questionable formatting):

10 REM RENUMBER TEST
20 PRINT "NUMBER",:INPUT A
30 IF A<0 THEN 6 0
40 ON A GOTO 100,200, 300
50 GO TO 400
60 GOSUB 500:PRINT "THE END."
70 END
100 PRINT "1 IS A BEGINNING.":GOTO 20
200 PRINT "2 ARE A PAIR.":GOTO 20
300 PRINT "3 ARE A GROUP.":GOTO 20
400 PRINT "MORE THAN 3 ARE A LOT.":GOTO 20
500 PRINT "LESS THAN ZERO IS NOTHING.":RETURN

Calling

renumberBasic(Emulator.getRam(), 0x0401, 100, 10);

and typing “LIST”, we get:

100 REM RENUMBER TEST
110 PRINT "NUMBER",:INPUT A
120 IF A<0 THEN 150
130 ON A GOTO 170,180, 190
140 GO TO 200
150 GOSUB 210:PRINT "THE END."
160 END
170 PRINT "1 IS A BEGINNING.":GOTO 110
180 PRINT "2 ARE A PAIR.":GOTO 110
190 PRINT "3 ARE A GROUP.":GOTO 110
200 PRINT "MORE THAN 3 ARE A LOT.":GOTO 110
210 PRINT "LESS THAN ZERO IS NOTHING.":RETURN

While not particularly intellectual pleasing or challenging, it certainly looks like what we expected. We may even RUN it.

The program even withstands a closer inspection of memory:

0400: 00 15 04 64 00 8F 20 52  ...... R
0408: 45 4E 55 4D 42 45 52 20  ENUMBER
0410: 54 45 53 54 00 29 04 6E  TEST.)..
0418: 00 99 20 22 4E 55 4D 42  .. "NUMB
0420: 45 52 22 2C 3A 85 20 41  ER",:. A
0428: 00 39 04 78 00 8B 20 41  .9.... A
0430: B3 30 20 A7 20 31 35 30  .0 . 150
0438: 00 50 04 82 00 91 20 41  .P.... A
0440: 20 89 20 31 37 30 2C 31   . 170,1
0448: 38 30 2C 20 31 39 30 00  80, 190.
0450: 5C 04 8C 00 CB 20 A4 20  \.... .
0458: 32 30 30 00 73 04 96 00  200.....
0460: 8D 20 32 31 30 3A 99 20  . 210:.
0468: 22 54 48 45 20 45 4E 44  "THE END
0470: 2E 22 00 79 04 A0 00 80  ."......
0478: 00 99 04 AA 00 99 20 22  ...... "
0480: 31 20 49 53 20 41 20 42  1 IS A B
0488: 45 47 49 4E 4E 49 4E 47  EGINNING
0490: 2E 22 3A 89 20 31 31 30  .":. 110
0498: 00 B5 04 B4 00 99 20 22  ...... "
04A0: 32 20 41 52 45 20 41 20  2 ARE A
04A8: 50 41 49 52 2E 22 3A 89  PAIR.":.
04B0: 20 31 31 30 00 D2 04 BE   110....
04B8: 00 99 20 22 33 20 41 52  .. "3 AR
04C0: 45 20 41 20 47 52 4F 55  E A GROU
04C8: 50 2E 22 3A 89 20 31 31  P.":. 11
04D0: 30 00 F7 04 C8 00 99 20  0......
04D8: 22 4D 4F 52 45 20 54 48  "MORE TH
04E0: 41 4E 20 33 20 41 52 45  AN 3 ARE
04E8: 20 41 20 4C 4F 54 2E 22   A LOT."
04F0: 3A 89 20 31 31 30 00 1C  :. 110..
04F8: 05 D2 00 99 20 22 4C 45  .... "LE
0500: 53 53 20 54 48 41 4E 20  SS THAN
0508: 5A 45 52 4F 20 49 53 20  ZERO IS
0510: 4E 4F 54 48 49 4E 47 2E  NOTHING.
0518: 22 3A 8E 00 00 00        ":....

However, have a look at what the next attempt to list the program looks like, after running the program:

Something here is certainly wrong. (PET 2001 emulation)
Certainly, something is looking wrong here.

Just look at that mess all over the screen, what is happening here? Are we missing something? Even worse, if we try to edit the program, the editor doesn’t return, as it became lost in some endless loop, apparently! Are we missing something?

Housekeeping — Memory Allocation & BASIC Pointers

Indeed, we are missing something, namely, correct values in the BASIC pointers, which control how BASIC stores and accesses its variables. As our programs grows thanks to longer line numbers in the BASIC text, it also grows into the range of memory, which is used by BASIC for its variables. As a result, the end of our program was overwritten by variables as we ran the program.

Now, what are these pointers and how do they control variable allocation?

These pointers are found in a consecutive area of bytes in the zero page and look like this:

• C64

label   loc.hex       loc.dec    comment

TXTTAB  002B-002C      43-44     Pointer: Start of BASIC Text
VARTAB  002D-002E      45-46     Pointer: Start of BASIC Variables
ARYTAB  002F-0030      47-48     Pointer: Start of BASIC Arrays
STREND  0031-0032      49-50     Pointer: End of BASIC Arrays (+1)
FRETOP  0033-0034      51-52     Pointer: Bottom of String Storage
FRESPC  0035-0036      53-54     Utility String Pointer
MEMSIZ  0037-0038      55-56     Pointer: Highest Address Used by BASIC


• PET 2001 ROM 2.0 ("new ROM")

TXTTAB  0028-0029      40-41     Pointer: Start of BASIC Text
VARTAB  002A-002B      42-42     Pointer: Start of BASIC Variables
ARYTAB  002C-002D      44-45     Pointer: Start of BASIC Arrays
STREND  002E-002F      46-47     Pointer: End of BASIC Arrays (+1)
FRETOP  0030-0031      48-49     Pointer: Bottom of String Storage
FRESPC  0032-0033      50-51     Utility String Pointer
MEMSIZ  0034-0035      52-52     Pointer: Highest Address Used by BASIC


• PET 2001 ROM 1.0 ("old ROM")

TXTTAB  007C-007D     124-125    Pointer: Start of BASIC Text
VARTAB  007E-007F     126-127    Pointer: Start of BASIC Variables
ARYTAB  0080-0081     128-129    Pointer: Start of BASIC Arrays
STREND  0082-0083     130-131    Pointer: End of BASIC Arrays (+1)
FRETOP  0084-0085     132-133    Pointer: Bottom of String Storage
FRESPC  0086-0087     134-135    Utility String Pointer
MEMSIZ  0088-008A     136-137    Pointer: Highest Address Used by BASIC


• VIC 20 (same as C64)

TXTTAB  002B-002C      43-44     Pointer: Start of BASIC Text
VARTAB  002D-002E      45-46     Pointer: Start of BASIC Variables
ARYTAB  002F-0030      47-48     Pointer: Start of BASIC Arrays
STREND  0031-0032      49-50     Pointer: End of BASIC Arrays (+1)
FRETOP  0033-0034      51-52     Pointer: Bottom of String Storage
FRESPC  0035-0036      53-54     Utility String Pointer
MEMSIZ  0037-0038      55-56     Pointer: Highest Address Used by BASIC

And this is the meaning of these pointers:

TXTTAB
is none other than our old friend, the start of BASIC.
VARTAB
is the start of the variable space. It follows immediately after the end of the program in memory (the final empty line-link.) Simple variables are stored here. Each variable occupies 7 bytes of memory, regardless of the type. ARYTAB points to the next available space.
ARYTAB
is (also) the beginning of the space allocated for array storage. It points to the location following immediately after the last byte allocated for simple variables.
Note: Since arrays are stored directly above simple variables, adding a new simple variable requires all arrays to be moved in memory in order to provide the required space!
STREND
is the lower end of the space used for storing string literals. (Strings grow from top to bottom.) This is also the beginning of any free memory, which hasn’t been allocated yet, between the array space and any stored string literals.
FRETOP
marks the top end of the unallocated memory, below the last allocated string.
FRESPC
is a utility pointer used internally by BASIC. It is not directly involved in variable management, but used for string handling (scratch space, etc).
MEMSIZ
is the top address of accessible memory. Strings start growing down from here.

Pointers in Action

Here’s an example for what this looks like in action after running a simple BASIC program (PET 2001, ROM 2.0, 8K RAM: $0000$1FFF):

• PET 2001 8K, ROM 2.0 ("new ROM")

10 DIM A(10)
20 FOR I=0 TO 10:A(I)=I:NEXT
30 B$=CHR$(66)

→ dump of BASIC prg

0400: 00 0D 04 0A 00 86 20 41  ...... A
0408: 28 31 30 29 00 25 04 14  (10).%..
0410: 00 81 20 49 B2 30 20 A4  .. I.0 .
0418: 20 31 30 3A 41 28 49 29   10:A(I)
0420: B2 49 3A 82 00 32 04 1E  .I:..2..
0428: 00 42 24 B2 C7 28 36 36  .B$..(66
0430: 29 00 00 00              )...

→ RUN →

0028: 01 04  ...  TXTTAB: $0401
002A: 34 04  ...  VARTAB: $0434
002C: 42 04  ...  ARYTAB: $0442
002E: 80 04  ...  STREND: $0480
0030: FF 1F  ...  FRETOP: $1FFF
0032: FF 1F  ...  FRESPC: $1FFF
0034: 00 20  ...  MEMSIZ: $2000

→ dump at $0434 (addr in VARTAB):

0434:             49 00 84 30      I..0  real I
0438: 00 00 00 42 80 01 FF 1F  ...B....  string B, len. 1 → $1FFF
0440: 00 00 41 00 3E 00 01 00  ..A.>...  array A(10)
0448: 0B 00 00 00 00 00 81 00  ........
0450: 00 00 00 82 00 00 00 00  ........
0458: 82 40 00 00 00 83 00 00  .@......
0460: 00 00 83 20 00 00 00 83  ... ....
0468: 40 00 00 00 83 60 00 00  @....`..
0470: 00 84 00 00 00 00 84 10  ........
0478: 00 00 00 84 20 00 00 00  .... ...
0480: AA                       ........  end of array storage
....
1FF8: AA AA AA AA AA AA AA 42  .......B  literal "B"
                                         top of string storage
Commodore BASIC memory pointers

While this may be still a bit complicated, now have a look at what this pointers look like, just after we load a very simple program without running it:

10 PRINT "HELLO WORLD"’

→ loaded into memory, $0401 - $0416

0400: 00 15 04 0A 00 99 20 22  ...... "
0408: 48 45 4C 4C 4F 20 57 4F  HELLO WO
0410: 52 4C 44 22 00 00 00     RLD"...

0028: 01 04  ...  TXTTAB: $0401
002A: 17 04  ...  VARTAB: $0417
002C: 17 04  ...  ARYTAB: $0417
002E: 17 04  ...  STREND: $0417
0030: 00 20  ...  FRETOP: $2000
0032: 00 20  ...  FRESPC: $2000
0034: 00 20  ...  MEMSIZ: $2000

All, there is, is

This, we easily can provide. (Mind that the address, pointers VARTAB, ARYTAB, STREND are pointing to, is the same address, which our renumbering routine returns.)
Moreover, if our program grew into the variable space in the process of renumbering, any previous data is corrupted anyway. There’s no harm in resetting the pointers to similar (and similarly sane) values as after loading a program. What’s good enough for the Kernal, can’t be totally wrong for us.

Fixing the Pointers

Now that we know this, we may easily adjust the pointers to sane values, like here in this example, where we call our renumbering utility from a function inside the emulator UI (slightly adjusted for the purpose of this write-up):

function renumber() {

   // write a 2-byte value to a pointer (lo-byte, li-byte)
   function setPointer(ptrAddr, word) {
      Emulator.write(ptrAddr,     word & 0xFF);
      Emulator.write(ptrAddr + 1, (word >> 8) & 0xFF);
   }

   var txtTab     = Emulator.getRomVers() == 1? 0x7C:0x28,
       mem        = Emulator.getRam(),
       basicStart = mem[txtTab] | (mem[txtTab + 1] << 8),
       result     = Utils.renumber(mem, basicStart, startLineNo, step);
       endAddr    = result.addr;

   if (endAddr > 0) {
      // success, reset pointers
      setPointer(txtTab, basicStart);
      setPointer(txtTab +  2, endAddr);
      setPointer(txtTab +  4, endAddr);
      setPointer(txtTab +  6, endAddr);
      setPointer(txtTab +  8, maxRAM);
      setPointer(txtTab + 10, maxRAM);
      setPointer(txtTab + 12, maxRAM);
   }
   return result.message;

}

And this is actually all there is. We are finished. Also, we may have learned some on the way.
(E.g., every time BASIC encounters a new variable, it has to move any arrays in their entirety, as defined as the block of memory between the addresse in pointers ARYTAB and STREND. You may want to optimize your BASIC programs in light of this.)

And we may call our utility in the emulator (see the Utils/Export menu), where we get something like this:

BASIC renumbering. (PET 2001 emulation)
Renumbering a tiny BASIC program, before and after.

The following is an experiment: ;-)
While I do care about appropriate UX, I do not want to track my readers, nor do I want to forward your data to any Geo-IP vendors. So I’m picking a country or region by random — please feel addressed.
If this doesn’t match, please feel offended or relieved, just to your liking. However, should this match, be assured that it does so by mere coincidence. Also, what do I know what readers may find interesting? So I’m rather picking some articles at random, as well.

Other readers from Pitcairn found also interesting:

A closer look at the logic behind Commodore ASCII, AKA PETSCII, and the PET 2001.

Because there might be no Web tomorrow…

Affordable computing by rotating bits at 3700 rpm.

Because there might be no Web tomorrow…

mass:werk proudly presents: Running BASIC on a virtual PET 2001 from URL-data.

Because there might be no Web tomorrow…