PETSCII Revealed

A closer look at the logic behind Commodore ASCII, AKA “PETSCII”, and the PET 2001.

PETSCII and the Commodore PET 2001
Investigations into a somewhat mysterious character code.

The flavor of ASCII used by the Commodore 8 bit computers, commonly known as PETSCII, is asking for a bit of an explanation. PETSCII is a peculiar beast, close to ASCII, but not quite, somewhat compatible, but not really, there are duplicate ranges of characters all over the place, and the special characters are lacking any recognizable order… — But look at all these these funny graphics characters!

In order to make sense of this and how the character set is organized, it may be helpful to have a closer look at it with a particular focus on the PET 2001. At least, this is the very machine, this character set originated on and for which it was designed for, with no idea yet that this may become the ancestor of a succesful line of home computers. Here, we may discover logic, in what must remain a puzzling enigma on the more popular and better known machines that followed, like the C64.

Implementation details aside, since PETSCII is still Commodore ASCII, we may best start with ASCII.

(Yes, an advert.)

ASCII

Let’s have a look at the organization of the ASCII code (7-bit) as it ought to be looked at, in groups of 32:

000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F
00 – 1F
00xxxxx
0 – 31
N
U
L
S
O
H
S
T
X
E
T
X
E
O
T
E
N
Q
A
C
K
B
E
L
B
S
 
T
A
B
L
F
 
V
T
 
F
F
 
C
R
 
S
O
 
S
I
 
D
L
E
D
C
1
D
C
2
D
C
3
D
C
4
N
A
K
S
Y
N
E
T
B
C
A
N
E
M
 
S
U
B
E
S
C
F
S
 
G
S
 
R
S
 
U
S
 
20 – 3F
01xxxxx
32 – 63
!"#$%&'()*+,-./0123456789:;<=>?
40 – 5F
10xxxxx
64 – 95
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
60 – 7F
11xxxxx
96 – 127
`abcdefghijklmnopqrstuvwxyz{|}~D
E
L

Lo and behold this beauty and this logic!

Easily to discern, there are 4 groups, as there are,

As used from Baudot code and paper tape operations, the very last character with all bits set is the DELETE character, nullifying the previous code. (This is, BTW, a strong argument for even parity: on 8-bit paper tape you’d still want to have all rows punched for this.)

All these groups transform into one another, from a higher order one to a lower order one, in a logical manner:

000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F
CTRL +@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
00 – 1F
00xxxxx
0 – 31
N
U
L
S
O
H
S
T
X
E
T
X
E
O
T
E
N
Q
A
C
K
B
E
L
B
S
 
T
A
B
L
F
 
V
T
 
F
F
 
C
R
 
S
O
 
S
I
 
D
L
E
D
C
1
D
C
2
D
C
3
D
C
4
N
A
K
S
Y
N
E
T
B
C
A
N
E
M
 
S
U
B
E
S
C
F
S
 
G
S
 
R
S
 
U
S
 

Transforming from lower case to upper case and vice versa is as easy as masking bit 5 (“c & 0xD6”) or OR-ing it (“c | 0x20”), respectively. In terms of electronics and keyboards, implementing the SHIFT key or the CONTROL key is as easy as breaking one or two wires while the modifyer key is pressed.

But it’s even better than that, have a look at the punctations and numerals: drop bit 4 on SHIFT and your traditional keyboard layout is complete!

000102030405060708090A0B0C0D0E0F
20 – 2F
010xxxx
32 – 47
!"#$%&'()*+,-./
30 – 3F
011xxxx
48 – 63
0123456789:;<=>?

PETSCII   (Commodore ASCII)

Now that we have an idea about how ASCII code is laid out, let’s see what Commodore was doing, when it came up with PETSCII for the PET 2001. PETSCII, that is, is not about screen codes, but about the organization of the actual codes as used in strings, about what we get using “CHR$()” and what we read by “ASC()”. While screen codes are often confused with this, those are really just about matching these logical codes with visual representations as stored in the character ROMS. The latter may vary, as with special, national character sets or the ROMs, which went with business keyboards, but the internal organization stays the same, regardless of how the characters are represented on the screen.

Since it’s PETSCII as in PET-ASCII, let’s have a look at what we get on the original PET 2001:

Commodore PETSCII, upper case (PET 2001).
Commodore PETSCII, upper-case / graphics set, PET 2001.
Controls: D, U, R, L: cursor movements; 1, 0: reverse video on/off; H, C: home, clear; “←”, I: delete, insert.
(“Blind” or unused control characters are non-printing and do not advance the cursor.)

Now, this looks familiar and also a bit strange. Let's compare this with the actual ASCII set:

000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F
00 – 1F
00xxxxx
0 – 31
N
U
L
S
O
H
S
T
X
E
T
X
E
O
T
E
N
Q
A
C
K
B
E
L
B
S
 
T
A
B
L
F
 
V
T
 
F
F
 
C
R
 
S
O
 
S
I
 
D
L
E
D
C
1
D
C
2
D
C
3
D
C
4
N
A
K
S
Y
N
E
T
B
C
A
N
E
M
 
S
U
B
E
S
C
F
S
 
G
S
 
R
S
 
U
S
 
20 – 3F
01xxxxx
32 – 63
!"#$%&'()*+,-./0123456789:;<=>?
40 – 5F
10xxxxx
64 – 95
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
60 – 7F
11xxxxx
96 – 127
`abcdefghijklmnopqrstuvwxyz{|}~D
E
L

As we may see, the numeric/punctations group and the upper-case group are mostly the same. As the only exeption to this, where ASCII has the caret and the underscore at 0x5E and 0x5F, PETSCII has — still recognizable — an up-arrow and a left-arrow, respectively. (Apparently, since BACKSPACE became delete-to-the-left with common on-screen editors, the original ASCII glyphs weren’t found of much use without overprint capabilities. Moreover, PETSCII doesn’t implement BACKSPACE in any way, but rather assigns its own code in place of DC4.)

Update: Curt J. Sampson (@cjs) said in the Retro Computing Forum,
«Actually, what Commodore has are the original ASCII characters, from ASCII-1963. The 1965 revision replaced ‘’ and ‘’ with ‘^’ and ‘_’. It’s not clear to my why Commodore, in 1976, went with ASCII-1963 rather than ASCII-1967. Maybe they just had only old reference books lying around?»
I stand corrected! (Or, at least, complemented, as I still think that the choice makes sense for the PET.)

However, where there are the lower-case characters in ASCII, from 0x60 to 0x7F, PETSCII just repeats the upper-case group. Apparently, Commodre 8-bits are just 6-bit machines, as far as character encoding is concerned. But that isn’t all, since there are all those block graphics symbols as well, from 0xA0 to 0xFF in the upper 8-bit bank, which show the same, peculiar mirroring of the group form 0xA0 to 0xBF.

And, just to add another special flavor (and, because the exception proves the rule), the very last character code, 0xFF, is used for π, normally found at 0xDF. Apparently jammed in, much like a fix to an oversight.

As for control characters, there are, in its original form, just a few that show any effect. Namely those are the carriage return (CR), the DC-group (device control) from ASCII is repurposed, and the group separator (GS) is used for a cursor key. However, these control codes are also in the upper bank, where they are inverted into their respective functional opposites. (There’s als SHIFT+RETURN, for which there’s no code in ASCII.)

Note: The placement of some the control characters is lacking a bit of an explanation. While it makes sense to use the DC-group for cursor and screen controls, it would have been only logical to have CRSR RIGHT in position of DC2, just after CRSR DOWN, with the code for reverse video swapped out, instead. Maybe, using CRSR RIGHT for GS was found to be the least harmful and/or destructive, in case an actual group sparator was encountered in a file. As the PET was also intended for (small) business use, which may have involved files that originated on other systems, this may have been worth a consideration.

Later machines, like the VIC-20 and the C64, added further control codes for colors, function keys, and for switching character case. Also, the backslash character had to give way to the British Pound (GBP) currency symbol (“£”).

Having a look at the lower case set of the PET 2001, we may finally recognize, what this is all about:

Commodore PETSCII, lower case (PET 2001).
Commodore PETSCII, lower-case set, PET 2001.
Mind that, while the row of common graphics characters from 0xA0 to 0xBF is essentially the same as in the upper-case set, the character at 0xBA is replaced by a check mark / root symbol.

As we have already seen in a previous discussion of abbreviations in Commodore BASIC, the most significant bit (MSB), bit 7, is used to indicate a shifted character. Therefore, just a single bit has to be checked in PETSCII, using the sign-flag of the 6502 processor, where there had two bits to be checked using ASCII encoding. With the lower-case group of the ASCII code now being of no particular use, the upper-case group is just repeated instead.

Again, considering business use and the chance of encountering a file from another system, this isn’t a particularly bad choice for what is essentially a single case system. No need for explicit character conversions. While actual processing may require transformations, readability of foreign files is guaranteed out of the box.

Hence, the PET (and any of its 8-bit successors) is much more like a single case machine with switchable representations of the upper bank, than a real upper-case/lower-case machine. This is underlined by the fact that, in this original form, the unshifed characters stay the same in lower-case representations as they are in upper-case mode, with lower-case characters being effected in combination with the shift key! Just the opposite of what we would expect.

In order to work around these arguably unusual operations, Commodore swapped the lower-case and upper-case glyphs in the character ROMs of later machines. The internal implementation of the character sets, however, stayed the same.

BTW, this is how you select character sets on a PET 2001 (by interfacing directly with the hardware by POKEs to 0xE84C, the Peripheral Control Register PCR of the VIA):

POKE 59468,12 :REM USE UPPER-CASE/GRAPHICS (DEFAULT)
POKE 59468,14 :REM USE LOWER-CASE/UPPER-CASE

Still 8 Bits?

This peculiar mirroring at 0x60–0x7F and 0xE0–0FF may raise concerns. Are Commodore 8-bit machines still real 8-bit machines as far as character encoding is concerned? This really looks more like 7-bit with a bit of magical switching around added behind the scenes. Meaning, are those codes actually unique and just transformed by the output mechanism, or are the mirrored characters substituted as soon as they are processed? Mind that we previously proudly presented an algorithm to implement fast FIFO queues for byte-sized values by the abuse of string operations! Are approaches like this compromized and apt to fail for those mirrored ranges? — Let’s check to be sure:

Asserting string operations with Commodore PETSCII (PET 2001).
Asserting string operations with Commodore PETSCII (PET 2001).

Phew! — “CHR$()” just jams a byte into a memory location in string storage and “ASC()” retrieves the very same value again, even for a character in one of the mirrored code ranges. The same is true for any control characters. Moreover, all strings generated by “CHR$()” have a string-length of 1, even the non-printing control characters that do not show any effect when printed onto the screen. — As far as BASIC is concerned, these are indeed unique characters!

Familiar Unfamiliarites

If we dare to inspect the graphics characters and their individual locations in the character set a bit more closely, we may reveal a distinct lack of order and organization. Some related characters seem to form a group, with unrelated chacracters interpersed, while other groups are dispersed all over the place. While there seems to be some sort of order in a few places, this soon falls apart and dosen’t withstand any scruteny. How could Commodore come up with such a scheme?

If without any clue, compare the principal upper-case/graphics set with the original chiclet keyboard layout of the PET 2001:

Commodore PETSCII, upper case (PET 2001).
Commodore PETSCII, upper-case / graphics set, PET 2001.
Commodore PET 2001 chiclet keyboard.
Commodore PET 2001 chiclet keyboard.

The order isn’t in the character set, but on the keyboard! Like we’ve seen it with ASCII code (espacially, in the relationship between punctuations and numeric characters), the characters are arranged in a way that they match the unshifted keys on the PET 2001 chiclet keyboard. Also, mind that the graphics characters are particularly arranged so that the most important ones, like those for drawing frames, are available regardless of the character ROM in use. Admire the logical arrangement!

Something you wouldn’t have figured out, if you only knew the C64!

For an example, consider the order of the frame characters in the code and the corresponding arrangement on the numeric key pad of the PET 2001:

PETSCII order in code and arrangement on the keyboard (PET 2001).
PETSCII order in code and arrangement on the keyboard (PET 2001).

Pi (π)

You may recall that shifted characters, those with the highest bit set, are used as tokens by BASIC. This may be the reason for the peculiar copy of “π” at 0xFF. In a BASIC program, graphics characters are not allowed outside of strings, as their code values conflict with the encoding of the BASIC keywords as tokens. However, there’s need for π as a constant and it’s not in the basic character set. What to do about it?

It will be a special case to be checked by the operating system, much like a token of its own, there’s no way around this, but at least you want that character to be out of the way to avoid any conflicts. Like on the opposite end of the range that starts the list of BASIC tokens. Also, a distinc code value may help, a value that stands out easily. Like 0xFF.

Therefore, only the copy at 0xFF represents the constant, while the original at 0xDE is just anotherer graphics character, as far as BASIC is concerend. Most likely, this was not in the original design, but a fix by Microsoft. Or it may have been an adaption to MS BASIC made by Commodore. Anyway, the copy of π at 0xFF isn’t really to be considered a PETSCII code, but is much more a feature of BASIC.

Screen Codes

As should have become apparent by now, the “magic” (or, maybe, irritating nature) of PETSCII is related rather to the arrangement of the character codes than to their representations in the character ROM. However, one doesn’t go without the other. Therefore we may ask, what is the particular relationship between PETSCII and Commodore screen codes?

Commodore screen codes (PET 2001).
Commodore screen codes (PET 2001).

Obviously, there’s no need for control characters in the character ROMs. While they are required in order to provide organized output on the screen, once we’re actually displaying something, we’ve done with them. (How they are represented in string context in the editor is regulated by the operating system and not directly related to the character set.) Just the same, there’s no need to store the mirrored regions as unique glyphs. By this, the entire set of glyphs can be jammed into a block of 7-bit codes, using the 8th bit to indicate reverse video. The reverse video characters, however, aren’t stored in the ROM, but are generated on the fly by inverting the repsective bit patterns. A single character is drawn onto a 8 × 8 pixel matrix (streched to double height on the later 80-column PETs), thus occupying 8 bytes per character in ROM. As 128 × 8 give 1024 bytes, a single 1K chip is all what’s needed to store an entire set of screen characters.

So, is there an obvious relation between the PETSCII code set and the screen codes? Chances are, since this would certainly make things much simpler. At least, this is about a piece of engineering…

PETSCII codes and Commodore screen codes (PET 2001).
PETSCII codes and screen codes (PET 2001).

As may be observed, PETSCII codes (mirrored regions and control characters removed) match directly to screen codes, but are swapped and relocated in groups of 32. As it happens, PETSCII shows this structural resemblence to ASCII, regarding groups of 32, down to the very lowest levels of implementation. As may be expected, this is handled in exactly the same way for the upper-case/graphics set and the lower-case set.

The screen representations for the control characters follow a similar logic:

Commodore PETSCII control characters and screen codes (PET 2001).
Commodore PETSCII control characters and respective screen codes, PET 2001.
(D, U, R, L: cursor movements; 1, 0: reverse video on/off; H, C: home, clear; “←”, I: delete, insert.)

0x00–0x1F mapped 0x80–0x9F, 0x80–0x9F mapped to 0xC0–0xDF.

By this, we can come up with the following maps:

PETSCII to Screen Code

Duplicate PETSCII code ranges and substitutions:

Add 0x80 to screen codes for reverse video.

Screen Code to PETSCII

For inverse screen codes in string context (control characters):

Otherwise, subtract 0x80 and switch to reverse video.

And here are some useful hex-to-decimal conversions, ready for the use in Commodore BASIC:

0x00 ....   0       0x1F ....  31
0x20 ....  32       0x3F ....  63
0x40 ....  64       0x5F ....  95
0x60 ....  96       0x7F .... 127
0x80 .... 128       0x9F .... 159
0xA0 .... 160       0xBF .... 191
0xC0 .... 192       0xDF .... 207
0xD0 .... 208       0xFF .... 255

PETSCII Tables

And here are the tables for PETSCII on the PET 2001 in both character sets, using the closest sensible Unicode representation available for the graphics characters. (Block-graphics are used to indicate the large, border-sided frame characters, some of the patterned block characters have no matching equivalent in Unicode and are substituted by a matching shape.)

The Upper-Case/Graphics Set (PET 2001)

000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F
0x00 C
R
 
D
W
N
R
V
S
H
O
M
D
E
L
R
G
T
0x20 !"#$%&'()*+,-./0123456789:;<=>?
0x40@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]
0x60 !"#$%&'()*+,-./0123456789:;<=>?
0x80 S
C
R
U
P
 
R
O
F
C
L
R
I
N
S
L
F
T
0xA0
0xC0π
0xE0 π

The Upper-Case/Lower-Case Set (PET 2001)

000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F
0x00 C
R
 
D
W
N
R
V
S
H
O
M
D
E
L
R
G
T
0x20 !"#$%&'()*+,-./0123456789:;<=>?
0x40@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]
0x60 !"#$%&'()*+,-./0123456789:;<=>?
0x80 S
C
R
U
P
 
R
O
F
C
L
R
I
N
S
L
F
T
0xA0
0xC0abcdefghijklmnoqrrstuvwxyz
0xE0

(HOMHOME, SRCSHIFT + CR, ROFRVS OFF, CLRCLEAR.)

Beyond the PET 2001

Finally, since no write-up on PETSCII may be considered even mildly complete without, here’s how those characters appear on the C64 and in its characterstic, bold font:

PETSCII characters on the C64.
PETSCII characters on the C64. (Shlomi Tal / Wikipedia, Vice emulation.)
The backslash character (“\”) at 0x5C was now substituted by the British Pound symbol (“£”).

Mind, how the lower-case group and the upper-case group were swapped in the “shifted” set by a simple modification of the character ROM, as compared to the original character set of the PET 2001.

And another chart, including the control characters and listing codes by decimal values:

Codes 192-223 as codes  96-127
Codes 224-254 as codes 160-190
Code  255     as code  126
Commodore PETSCII on the C64 (c64-wiki.de).
Commodore PETSCII codes on the C64 (www.c64-wiki.de).
Click for a larger image.
GROSS/KLEIN”: lower case/upper case, “GROSS/GRAFIK”: upper case/graphics.

— That’s all, folks! —