Episode 10: Not the End, Yet, Still Some of a Wrap-Up

Some thoughts on the project.

RetroChallenge is over — and our project isn't really finished yet. But I love the idea of the project — and I'm going to finish this in the next days, just the same. So this is not the end yet. However, doing the project was/is far from in vain, as I gained some insights that are rather interesting, at least to me. As some may remember, I did the same program, a simulation of Computer Space, the very first arcade game for last RetroChallenge for the DEC PDP-1, an early 1960s computer (introduced in 1959), which could be considered the 6502 of its days — as in comparatively cheap and often used for visual applications, as, for example, the very first digital video game. Doing the same for two architectures which are separated by two decades (the PDP-1 with its roots in the mid-1950s TX-0, and the mid/late 1970s 6502) does for an interesting comparison.

DEC PDP-1

I'm not going much into detail regarding this past project, for those truely interested, there's an extensive write-up, the program running in in-browser emulation, a YouTube video of the gameplay and even a short video of the program running on the real machine (CHM). What we're here interested in, is a comparison of system architectures regarding the coding experience.

Ironic Computer Space Simulator for the PDP-1, edited image.

A glimpse of the program on the real machine (CHM):

Ironic Computer Space Simulator on the DEC PDP-1 at the CHM, Dec. 2016 (youtube.com)

▶ See it in in-browser emulation: Ironic Computer Space Simulator (ICSS).

The architecture and the instruction set of the PDP-1 ows much to Wes Clark's mid-1950s experimental TX-0 machine (1955/56, MIT Lincoln Labs, with oversight by later DEC founder Ken Olsen), the very first experimental transistors-only computer. Likewise, the DEC PDP-1 was one of the earliest fully transtorized machines on the market and may be regarded as some kind of commercial version of TX-0. What separated it from other machines of its time, like the IBM 1401, was its extraordinary low price (just 11 family homes and a brand new car) and an orientation towards realtime and visual computing, much in the tradition of experimental MIT machines.

The most striking differences between the PDP-1 and the PET's 6502 are the size — the PDP-1 comes in a 3-compartments refrigerator-sized cabinet, while the 6502 comes in a 40-pins DIP package — and the kind of visual display hardware used (a then sensational radar screen-like point plotting x/y-display at a resolution of 1024 × 1024 for the PDP-1 and a 25 lines × 40 characters raster display for the PET). On the other hand, both architectures are also somewhat comparable: For example, a simple program adding and storing two 16-bit numbers requires about the same runtime in microseconds and about the same amount of bits in memory. Also, the 8K and 16K of 8-bit RAM of the original PET 2001 are in the same league as the PDP-1's stock 4K of 18-bit memory. Moreover, both architectures took a rather RISC-like approach to a reduced, but still versatile instruction set. So we may expect the two machines being comparatively equally fit for a given task and purpose, while the look and feel of the program and the experience of actually running it will be totally different.

Notably, the PDP-1 predates the common notion of hardware optimization for higher level languages by a few years. Therefor it lacks allmost everything which has become an essential part of any processor arcitecture since, like index registers (once also known as B-registers) or a processor stack. Like on the 6502, there is just a single all-purpose register, the accumulator, and yet a single additional register, the IO-register, which in comparison to the 6502's index registers even lacks a direct transfer instruction to and from the accumulator. Due to the lack of a processor stack, any subroutines are notoriously self-modifying, since the return address is simply put into the accumulator and a jump instruction has to be fixed up on-the-fly by the program in order to perform the return. While this may sound tedious and archaic, the PDP-1 is well equipped for this, providing built-in assembly support, by storing and reading instructions that are selectively accessing just the address part or the opcode of an instruction. (BTW, the PDP-1 uses an opcode-and-address/operand single word 18-bit instruction format, which is abstracted into the now usual opcode + operand format by its assembler language.)

On Programable Numbers

It's true that a Turing Machine describes computability and the concept of a fully implemented computer in theory and concept. On the other hand, it doesn't describe a usable machine. No one will ever want to abstract simple partial solutions to a problem class by stacking state machines onto state machines and then juggle, say, 152 state machines by yet another Turing Machine in order to finally approach the problem.

What makes computers actually program- and usable is yet another concept (theoretically included in the very concept of the Turing Machine, but yet not described as a vital part), namely pointers. There are two core concepts of forming and using pointers and references, indirect addressing and indexed access, and any combinations and mixed modes of these two. Index registers, adding an offset to a given address by the value momentary stored in them, are the key feature of higher level language support in the processor hardware. With the rise of this feature, indirect addressing without the involvement of any index registers has become some of a second class citizen. E.g., on the 6502, there's just a single instruction with plain indirect addressing, namely the jump instruction, all other indirect addressing modes are restricted to memory addresses in the zero page and involve index registers as well.

And this is, where the PDP-1 really shines: Predating the notion of higher level language support, but deeply rooting in computing experiences with the MIT's experimental machines, it provides the wonder of all indirect addressing modes, there are, universal indirect addressing. That is, indirect addressing is controlled by a flag-bit in the instruction word. If this bit is set for any instruction with memory access, the operand will be interpreted as an address for indirect look-up. And, if this resolved address now has the indirect bit set as well, it will be used for another look-up in turn. Probably, you could set up a state machine by cleverly arranging the i-bits and then performing a single instruction. But no one ever attempted this (to best knowledge), and for good reason. While this is a very powerful feature, it's also a very dangerous one, as the side-effects of multi-level indirect addressing, peeking into the address part of other instructions, may be a guaranteed way to programer's hell without hopes of timely salvation. Therefor, US forces programing style guides of the time strictly advise to use just a single level of indirect addressing, and I haven't seen an assembler program yet that would make use of multilevel indirectness in earnest. But, and this is the important part, every memory operation profits from indirect addressing out of the box and every address may be used as a pointer. On the PDP-1, indirect addressing and pointers are first class citizen at machine code level, like functions in some of the modern programing languages.

Computers found their early application in numerical problems, because this is, where the funding was, thermodynamics, artillery tables, targeting problems, signal versus noise, etc. — Hence the name. But, inspite of the name, computing isn't really, what computers are good at. Arithmetics don't yield to binary logic easily and it takes a lot of circuitry to implement a ripple carry to perform even the simplest arithmetic additions, not to speak of the extra hardware involved in implementing a basic, partial multiplication or division step, to be performed for a single bit of an operand. What computers are really about, is moving and jamming bits, regardless of the meaning we may apply to them. (This is also, what early mainframes were all about, delegating arithmetics to separated units, which operated often in slow, but fail-safe BCD mode — maybe even in bi-quinary, like on the 1401 —, and being really good in moving stuff around.)

Again, the PDP-1 shines in this domain, e.g., by providing rotation and shifts in up to 9 bit positions, even across register boundaries, in a single instruction. Where you have to perform step by step single-bit operations on other processors (like the 6502), often involving multiple cycles for any of these operational steps, it's a-thought-an-instruction on the PDP-1. While we may describe the core operations of a computer by mangling bits encoding purpose in what may be called a cryptographic process operated on tables (of which the program itself is just another one), doing this on the PDP-1 is rather a joy. Single-instruction multi-bit shifts, modular operations and binary logic work hand-in-hand fluently by an up-to-purpose basic instruction set, which is intended for manual coding from the very beginning and which may be also enhanced on the fly by combining micro-coded operations. Just chose your own level of complexity. Even the 1's complement arithmetcis of the PDP-1 add to this comparatively imediate experience by their simply symmetry, which makes up for the minor hassles of dealing with edge-cases involving negative zero.

Another core feature of computability is decision branching, whithout which no machine may be Turing-complete. In philosophical terms, this is, where negativity is introduced into the apparatus, or, as Norbert Wiener once wrote [in a letter to John von Neumann], purpose. The PDP-1 does it by skips, as opposed to conditional jumps. That is, any condition is again a single cycle operation internal to the CPU, skipping the next instruction in the program. While this is rather inverse to what we generally see in modern processors, it is actually more intuitive. The reason for this is found in the broader concept of branching and programing. At a higher level, branching is often about loops. Like some of the aspects of computing, we've discussed here, the implementation of loops may be considered a core feature of programability. In fact, there is hardly a program that isn't performing a loop, as operating a program may be described likewise as looping over the code, with a branch representing somewhat of an epicycle to this major operation loop. For this, the PDP-1 provides basic building blocks out of the box by combined index and skip instructions, which may be applied to any memory address, thus performing as a counter, or a pointer, or even both. Moreover, what may be a detour into yet another tiny epicycle of operations with other instruction sets, is often just a single instruction on the PDP-1. Thus, a skip is more often than not exactly what we may need and want.

As a result, while there are just 29 basic instructions and 6 microcoded instruction groups on the PDP-1, it makes for a versatile and straight-forward instruction set. As straight forward is the programing experience. While some of the concepts may be foreign to a modern programer (like me), a thought translates easily into machine code. Building a loop is like a sentence, and implementing complex behavior yields to compact and intuitive code (not to the least for indirect addressing and loop instructions). Writing assembly code for the PDP-1 isn't that far from writing code in C (which, BTW, originated as some kind of higher level assembler language on later PDP-machines). In fact, while this was my first program for the PDP-1, I was able to do so even without laying out the program, or any of its flow, on paper.

Instructing the PDP-1 is much like speaking to it, without any intermediary. Which is also literally, what you're about to do, by 'riding' the bare machine, whithout any operating system or any other third party instance or interference whatsoever.

And I haven't yet mentioned one of the most versatile features of the PDP-1: All I/O operations are asynchronous with hardware support out-of-the-box, with any communications abstracted into a single signal flag (the "completion pulse"), for which there are several ways to wait for or to access it by just setting a flag bit in the respective I/O instructions. (For high-speed I/O devices, there's also direct memory access transparent to the program that may be running concurrently.) For comparison, on the 6502, you have to write your own handlers and subroutines to communicate with them, for any of this.

MOS 6502

The 6502 is very much the epitome of a low-cost, versatile, RISC-like microprocessor architecture. Unlike the 8080 and its descendants, it hasn't any roots in the solid-state TTL era and may be considered a pure microprocessor design. And — along and in friendly rivalry with the Z80 — it was the driving force behind the microcomputer revolution of the late 1970s and early 1980s, of which the Commodore PET 2001 is a prime example. (However, unimpressed by its historical role and as opposed to its former rivals, the 6502 remains in production to date.)

▶ See the last iteration for RC2017/04 in in-browser emulation.

▶ And this link for the game in its current state of development.

Despite of its reduced instruction set, the 6502's 64 official opcodes outnumber the PDP-1's instruction set (ignoring micro-coded sub-sets and their combinations) by count. On the other hand, the instructions, there are, often lack the symmetry found on the PDP-1. There isn't always a reverse operation, or, if there is an operation for one of the auxilliary index registers, it may be missing for the accumulator. (E.g., you may increment or decrement the X and Y index registers or even any memory location, but not the accumulator. Thus, adding an offset or decrementing a value always involves a memory or register transfer, to be applied in a separate instruction, adding to the cycle count.) The effect of this is a bit like the 6502 telling you, how things are to be done and how things are to be addressed, in this specific way and not by any other approach.

As revealed by the existence of the index registers and the presence of a processor stack, the 6502 comes with hardware support for higher level languages. Or, in other words, it is intended for compiled or interpreted code and embedded applications, in the first place — while we're still confined to writing machine code for any runtime-sensitive applications for its humble speed of generally 1MHz (up to 2 Mhz, in popular computers only found on the BBC Micro). Interestingly, there are some instructions in the so-called inofficial opcodes that may come handy for manual coding, like 2-byte skips, etc. The reason for them being just inofficial, that is, neither officially listed or supported, and not guaranteed to work, is probably for them not being tested throughout the whole range of operations (which is, BTW, a rather complex procedure). For the sake of cost-effectiveness, this is rather a machine's machine only. (In the age of emulation, we better stick with just the official instruction set. Moreover, some later versions of the 6502, were, indeed, breaking some of the inofficial instructions.)

This really shows, when coding for the 6502: Your concepts may not always translate directly into code. There are conceptional restrictions. Thus, we have either to introduce abstractions, or wrap our mind around the 6502's ways of doing things. A conceptual atom may translate to a tiny sub-program, and the way this may approached at all, is defined much by the addressing modes of the 6502. Looping over data has to be done by index registers, thus a loop may not exceed 256 bytes or 128 address words. If you require more, there are two index registers and you may build nested loops, but the choice of the register used, also defines, how indirect addressing may be involved. As mentioned before, indirect addressing is restricted to zero page addresses only, of which there are just so many/few. And with the restricted set of operations to be performed in the CPU alone, you really need zero page addresses as auxilliary registers (what is exactly, what they are intended for). As may be already seen by this, programing the 6502 is much about building abstractions and thus deligating concepts to dedicated pieces of code. You are supposed to have your piece of code for moving data, your piece of code for copying, to build up from basic routines to higher level concepts. Else, you're running out of resources. Which, BTW, become scarce anyway, because there's also always an operating system involved, like MS BASIC, which reserves most of the precious zero page addresses for its own purposes. (In fact, these are the complete the range of the zero-page, and there's just the BASIC input buffer, which may be reused by any machine language program running aside of MS BASIC.) And, as another side effect of the choices made in the 6502's design, there are only so many parts of code, which may actually build and service higher level data structures. Moreover, accessing these routines and switching between them, involves yet another extra amount of memory transfer and transfer of control.

This also translates to the use of computational resources: While we were able to run the game on the PDP-1 in high graphics resolution at 60 frames per second and still had some cycles to burn in a delay loop, we were hardly able to keep up with the 60Hz cycle of the raster display on the PET, even with a reduced version of the program.

When we were speaking of programability before, the experience on the 6502 is much more related to building Turing machines than it is on the PPD-1. Because of the restrictions in the instruction set and its addressing modes, you've to build tiny sub-machines of your own, in order to implement an algrorithm. On the way from thought to code, this effects in a certain level of conceived indirectness. While you're riding the bare-metal machine with the PDP-1, you have to become a machine yourself with the 6502, in certain ways. Personally, while once having been much more acquainted to the 6502 than to the PDP-1, I found this rather tedious at times. Or, in other words, the experience is a much nerdier one on the 6502, as you really have to keep creeping right into the machine, as opposed to sovereignly governing it.

I was somewhat surprised by the outcome of this comparative coding experience, with the 6502 having been some of my home base, back in the day. (It also reminded me of why I then stayed away from computers at all for several years.) I was rather confident to implement the program, which I had done in an assembler language before, in just a few sessions, but, as may be seen, it was in actuallity a different story. At times, bugs crept into the program, which were much harder to pinpoint on the 6502 as compared to the PDP-1, and I lost considerable time in scratching my head and hunting down the stupit thing, I'd done. Residing to paper and flow charts may have helped, but we did without them on PDP-1 rather well.

But, inspite of this rather questionable praise of the 6502, I'll continue with the program. — Promised.
Because, the PET deserves it.

Mockup of Computer Space 2001 running on an original PET 2001 computer

— Stay tuned! —

▶ Next: Addendum I: The Quest Continues

◀ Previous: Episode 9: Progress Update (The End is Neigh)

▲ Back to the index.

Norbert Landsteiner
April 2017, Vienna, Austria
www.masswerk.at – contact me.

— This series is part of Retrochallenge 2017/04. —

Retrochallenge 2017/04:Personal Computer Space Transactor 2001

Episode 10: Not the End, Yet, Still Some of a Wrap-Up

DEC PDP-1

On Programable Numbers

MOS 6502

Retrochallenge 2017/04:
Personal Computer Space Transactor 2001