Go to the Index

# Inside Spacewar!Part 6: Fatal Attraction — Gravity

In this part of our journey through the internals of Spacewar!, the first digital video game, we're paying another visit to the heavy star. While we've already seen in Part 2 how its flashing presence would have been brought onto the scope, we're now going to investigate how it would unfold its influence on the gameplay by its ever present, all-encompassing effect on the spaceships, the single constant of the Spacewar!-universe and the driving force of space tactics: gravity.

In other words, it's math time in "Inside Spacewar!". But be assured, there's nothing ahead to be afraid of.
It even provides a pretext for some nice graphics like this one:

BTW: You can play here the original code of Spacewar! running in an emulation.

This piece of code, responsible for calculating the effects of the heavy star was done by Dan Edwards, and we also know, how it came into being, thanks to Martin Graetz's seminal article "The Origin of Spacewar" (Creative Computing, 1981) quoting Steve Russell as follows:

Russell: "Dan Edwards was offended by the plain spaceships, and felt that gravity should be introduced. I pleaded innocence of numerical analysis and other things" — in other words, here's the whitewash brush and there's a section of fence — "so Dan put in the gravity calculations."

We've already seen in the previous episode (Part 5) where this piece of code would have been placed — and there's already something interesting about it:

```	jda sin		// get sine and store it
dac \sn
dzm \bx		// clear \bx and \by (gravity factors)
dzm \by
szs 60		// sense switch 6 set?

// heavy star related code here

bsg,	cla		// clean up AC from any leftovers
clf 6
lac i mth
jda cos		// get cosine```

(As usual, comments starting with double slashes are mine and not in the original code.)

The interesting thing being, where this "fence" was put up. We could argue that the code related to the star would be starting with the clearing of the variables `\bx` and `\by` which will be holding the resulting factors to be added to the movement of the ship, quite like the final check for sense switch 6 (and the conditional jump to label `bsg` immediately following to this). This, and the clearing of the accumulator (`cla`) at label `bsg` is the "fence". But what's on either side of this "section"? The jump to the subroutine for the sine of the turning angle (`\mth` as in theta) at the one side, and the same for the cosine at the other one. Provided it would be quite "natural" to have them in a common block at either side of the gap, this seems to be a bit like a statement, quite like, "Now, there you are — now, this was really an interruption and here we're back to business, get off my lawn!" Or we could view it rather as some kind of emphasis, like a Sperrung or caesura (the seperation of adjactive and noun in poetry), a framing order, which is emphasizing the separated elements.

However, this rather peculiar placement was kept in each of the various versions of Spacewar!, from the earliest known version, 2b, to the last one, 4.8, despite off any refactoring, even when the code itself was changed for the update of the hardware multiply/divide option in version 4.

Now, let's have a look at this code, as usual, it's Spacewar 3.1, and as usual, we won't do so, without including the standard credits first:

Spacewar! was conceived in 1961 by Martin Graetz, Stephen Russell, and Wayne Wiitanen. It was first realized on the PDP-1 in 1962 by Stephen Russell, Peter Samson, Dan Edwards, and Martin Graetz, together with Alan Kotok, Steve Piner, and Robert A Saunders. Spacewar! is in the public domain, but this credit paragraph must accompany all distributed versions of the program.

## The Code

```	lac i mx1
sar 9s
sar 2s
dac \t1
jda imp
lac \t1
dac \t2
lac i my1
sar 9s
sar 2s
dac \t1
jda imp
lac \t1
sub str
sma i sza-skp
jmp pof

dac \t1
jda sqt
sar 9s
jda mpy
lac \t1
scr 2s
szs i 20		/ switch 2 for light star
scr 2s
sza
jmp bsg
dio \t1
lac i mx1
cma
jda idv
lac \t1
opr
dac \bx
lac i my1
cma
jda idv
lac \t1
opr
dac \by```

(Code as in version 3.1. For the basics of PDP-1 instructions and Macro assembler code, please refer to Part 1.)

As we can see, this is a single strain of code, without a single label, and there are only two jumps to anything outside, one to fast-forward to the label `bsg`, and one branching to another piece of code at label `pof` (for point of force or for a more graphical "Poff!"? — however, we learn from the heading comments, what this would be all about):

```/ here to handle spaceships dragged into star

/ spaceship in star

pof,	dzm i \mdx
dzm i \mdy
szs 50
jmp po1
lac (377777
dac i mx1
dac i my1
lac i mb1
dac \ssn
count \ssn, .
jmp srt

po1,	lac (mex 400000	/ now go bang
dac i ml1
law i 10
dac i ma1
jmp srt```

The main part is actually dealing with two different aspects of the star, handling any collisions with it and calculating the influence of gravity, dragging the ships there in the first place. By this Spacewar! is putting its astronauts in the position of a Homeric hero, facing a double monster, while riding their ships through the ether (© E. E. "Doc" Smith) of outer space on the waves of gravity.

## Gravity's End (Scylla)

```	lac i mx1	// load x position
sar 9s		// shift right 11 bits
sar 2s
dac \t1		// store it in \t1
jda imp		// integer multiply AC (x)
lac \t1		// by \t1 (x)
dac \t2		// store result in \t2 (x square)
lac i my1	// load y
sar 9s		// shift right 11 bits and store it in \t1
sar 2s
dac \t1
jda imp		// integer multiply y (in AC)
lac \t1		// by y (in \t1)
sub str		// subtract collision radius
sma i sza-skp	// is it a collision?

This piece of code is quite straight forward. It starts with loading the x-coordinate of the current objects position (`lac i mx1`) and shifting it to the right by 11 bits. (We may recall that the maximum amount of positions for a single shift or rotate operation is 9 bits, as in the Macro assembler's constant "`9s`". Thus, we'll have to split this in two separate instructions, as in "`sar 9s`" and "`sar 2s`".) The resulting value in the accumulator is then stored in variable `\t1` by the instruction "`dac \t1`".

So, why would we want to shift this left by 11 bits? We may recall that the positions are stored as screen coordinates in the highest 10 bits (including the sign-bit in bit 0), the lower 8 bits being just fractional values. We're going to perform an integer multiplication on this, therefor we actually have to do some scaling, at least we would want to shift this 8 bits to the right. This is leaving an extra shift by another 3 bits (or division by decimal 8) as the effective scaling. We'll see a bit later, why this would have been applied.

The next two instructions are performing the multiplication. Happens, the PDP-1 doesn't perform any multiplications or divisions out of the box (but there are special instructions for multiply and divide steps performing a dedicated multiple register shift operation). Fortunately, there's BBN (Bolt, Beranek and Newman), and they were friendly enough to supply a multiply subroutine (DEC Memo M-1096, DEC/BBN, 19 Jan 1961), both for single word results and for double precision results:

```/BBN multiply subroutine
/Call: lac one factor, jda mpy or imp, lac other factor.

dap im1
im1,	xct
// ... snip ...
idx im1
// ... snip ...
jmp i im1

mpy,	0		/returns 34 bits and 2 signs
// ... snip ...```

According to the heading comment, we're expected to supply one of the two factors in the accumulator and to execute a "`jda imp`" (for the single precision routine), followed by a "`lac`" on the other factor, as in "`lac \t1`". (Actually, a `law` instruction would do as well, as long as it would be loading the required value into the accumulator.)

How would this work? The instruction `jda` puts the contents of the accumulator in the memory location specified in the address part and performs a jump to the subroutine in the next address immediately following to this. As with a normal `jsr` instruction, the current value of the program counter `+ 1 ` is put in the accumulator as the return address. By executing "`jda imp`", we've stored the contents of `AC` into location `imp` and are jumping to the next location with the return address in `AC`. The instruction "`dap im1`" deposits this in the address part of the location labeled `im1`, where the opcode `xct` (execute instruction at given address) is already set up and waiting. This is then executed, and as this instruction happens to be a `lac`, it will load the second argument into the accumulator. By now, we've the first argument in `imp` and the second one in `AC`. In order to jump to a sensible return address, we'll have to finally increment the address in `imp` and jump there, as in "`idx im1`" and "`jmp i im1`".

(We've seen a similar construct in Part 3 and Part 4, where we investigated how the outline compiler would be called.)

For this, the instruction "`lac \t1`" after the "`jda imp`" is not overwriting the result of the multiplication (returned in the accumulator), but provides the second factor. The multiply subroutine now returns at "`dac \t2`", where the result is stored in variable `\t2`. Thus, the sequence

```	dac \t1
jda imp
lac \t1
dac \t2```

calculates the square of the value in the accumulator (the modified x-coordinate) and stores the result in `\t2`.

The next few instructions are repeating this for the y-coordinate in `my1`. This square of y is then summed with the previously calculated square of x by the instruction "`add \t2`":

```	lac i my1	// load y
sar 9s		// shift right 11 bits and store it in \t1
sar 2s
dac \t1
jda imp		// integer multiply y (in AC)
lac \t1		// by y (in \t1)

By this, we just computed the square of the distance to the origin of the screen co-ordinates. As luck will have it, this happens to be at the center of the screen, right where the heavy star is placed, liberating us from the task of computing some delta x and delta y values. Thus we are ready to compare this to some epsilon in order to detect a collision. The capture radius (or rather the square of it) is defined in the table of constants at the very top of the source:

`str, 15,	1	/ star capture radius`

This quite lapidar value of `1` is now subtracted from the square and the result (in the accumulator) is then checked for a negative value:

```	sub str		// subtract collision radius (1)
sma i sza-skp	// is it a collision? (not sma or sza)

The instruction "`sma i sza-skp`" is a compound instruction that will be performed in a single cycle at once, like we've seen it already in other parts of the source: It sums the instructions `sma` (skip on minus AC) and `sza` (skip on zero AC) and subtracts the opcode value of the skip group from this (since this would be represented twice, in each of the two instructions) and the `i` bit causes the instruction to invert the condition given by this union. (Please mind that the resulting octal instruction `650500` is not the same as "`spa`" or `640200`, the latter simply testing for the sign-bit not being set.) Thus, the next instruction will be skipped, if the accumulator would not be zero or negative (as indicated by a set sign-bit).

If not skipped, the square distance is less then `1` and we have a collision, handled at location `pof`. — I really love this graphic label! (Someone seems to have been reading Batman or the like.) — Otherwise, we add the value in `str` again to restore the original value of square distance for later use.

As we may have observed, the capture radius is not that much defined by the constant `str`, but by the shift to the right by 11 bits. Since a shift to the right by 8 bits would have been perfect for transforming the positional coordinates to integers, we've an extra of 3 bits resulting in a division by 8. Thus, a ship is captured, if the absolute values of both of the coordinates would happen to be less than 8. This would have been quite an expensive calculation for something that could be checked by some simple shifts, if we wouldn't have any other use for the square distance. But if so, we're getting the absolute distance for free to be checked by a single condition. — In fact, we were already deep in the computations of gravity as well:

## The Merciless Ship Sucker (Charybdis)

Let there be gravity:

```	dac \t1		// store square distance in \t1
jda sqt		// get the square root of it
sar 9s		// shift result 9 bits to the right
jda mpy		// calculate distance x square distance (\t1)
lac \t1
scr 2s		// shift result in AC and IO right 2 bits```

The instruction "`jda sqt`" is calling another BBN subroutine, this time for the square root:

```/integer square root
/input in ac, binary point to right of bit 17, jda sqt
/answer in ac with binary point between bits 8 and 9
/largest input number = 177777

sqt,	0
dap sqx
// ... snip ...
sqx,	jmp .
// ... snip ...```

The heading comment is already providing all we need to know: The first line of the instructions is another (a bit complicated way) of saying that the input is expected to be an integer, the second one is specifying the output format, and we are learning that the highest input processed by this routine is (octal) `177777`. (Something we do not learn from this, is where this routine came from. It could be by Adams Associates like the sine-cosine routines, but it's not the same as distributed by DEC in the preliminary memo "M-1092" from 22 Nov. 1960. Or it could be by BBN as well, as suggested by its placement right in between the multiply and the divide subroutines, both by BBN.)
Anyway, we're learning two things here: First, why the positional coordinates were shifted to the right by 11 bits in the first place, and second, why the result is then shifted right by 9 bits. The latter one is easy, since the floating point is right in the middle with 9 bits to the left and 9 bits to the right. This is simply converting the result to an integer.

The first one is mostly related to the maximum input of octal `177777` or decimal `65535`: Let's recall, how our input was derived: It's a sum of two squares, which would give us a maximum value for any coordinate of `sqrt(65535/2)` or (decimal) `181`. With our screen postions ranging from `-512` to `+512`, we would have been clearly out of range here and some scaling was needed. Actually, we would have been on the safe side with a shift by two extra bits (making this a maximum of decimal `128`), so there's still a single extra shift involved in our initial shift by 11 bits to the right. Let's see ... (While speaking of ranges, the check for a capture radius of less than `1` is preventing us here on performing the square root routine on a zero value, which would probably return an undefined result.)

The resulting square root (now converted to an integer) is then multiplied by the square of the distance (this time calling the double precision 34-bit version of the multiply routine) resulting in the absolute cube of the distance.

This serves also the final solution of the riddle of the initial shift by 11 bits. If we would want to have the value of the resulting cube fitting into a single 18-bit word, we're sailing already hard at the limits with the maximum value of (decimal) `512` scaled to `64`, since the cube of `64` is `26,2144`, or octal `1000000`. Only the absolute values (and some losses by using integer operations only) are preventing an overflow, making this the maximum value for our purpose.

And yet, there's another shift, this time by 2 bits to the rights on the result in the combined AC and IO registers:

`	scr 2s		// shift result in AC and IO right 2 bits`

This is obviously scaling the result of the high precision multiply subroutine, but by what amount?
The comment at the call location is providing us with the output format, "returns 34 bits and 2 signs". Seems to be clear enough — but, where would these two signs actually be located? Since the product of two 18-bit words less the sign (making 17 bits) may not exceed 34 bits, this leaves two spare bits. If the signs of the two operands happen to be different, the result is negative and both of the registers are complemented at the end of the procedure (forming a combined 1's complement), making the two spare bits the sign-bits. With the value stretching from AC to IO in consecutive bits, the additional sign-bit is obviously not the highest bit of IO. Where is it located then? Right at the end of IO, by this padding the value by sign-bits at either side. Thus, we've to shift the registers 1 bit to the right in order to make the value in IO an ordinary integer number.
Therefor, this shift by 2 bits is merely resulting in a division by 2.

But we haven't arrived at the end of shifts yet:

```	szs i 20	/ switch 2 for light star
scr 2s```

If sense switch 2 would not be zero (mind the `i`-bit for the inverted condition), we're skipping the next instruction. If zero (not set), we're applying another "`scr 2s`", thus low gravity will be a quarter of the normal amount. (This might be a bit confusing, but we'll see soon, why a smaller value would effect in a stronger influence of gravity.)

Thus, we just computed a parabolic function as in

```(Idealized)  f(x,y) = sqrt((x/8)2 + (y/8)2)3 / 8

Actually:
x1 = int(x / 8)
y1 = int(x / 8)
d2 = x1 * x1 + y1 * y1
d1 = int(sqrt(d2))
g(x,y) = int((d1 * d2) / 8)

where g(x,y) <= f(x,y)```

The higher part (as in `AC`) is now checked to be zero. If not, something went utterly wrong and we're abborting the sequence and jump to label `bsg`, ignoring any gravity for this frame. (This will actually never happen with a regular value, since we shifted the entire result into a single 18 bit value in `IO`.)

```	sza
jmp bsg```

Otherwise, we're calculating the vector of influence, as in `\bx` and `\by` (to be added to any other other movement vector in the code following at `bsg`):

```	dio \t1
lac i mx1
cma
jda idv
lac \t1
opr
dac \bx
lac i my1
cma
jda idv
lac \t1
opr
dac \by```

First, we're storing the output of our parabolic function in `\t1`. Since we branched on any non-zero value in the accumulator, this may only be the part left in the `IO` register, as in "`dio \t1`". The next instructions are calling the last of the BBN subroutines, the integer divide (DEC Memo M-1097, DEC/BBN, 19 Jan 1961):

```/BBN Divide subroutine

/calling squence: lac hi-dividend, lio lo-dividend, jda dvd, lac divisor.
/returns quot in ac, rem in io.

idv,	0		/integer divide, dividend in ac.
// ... snip ...
dvd,	0
dap dv1
// ... snip ...```

Once again, we're supposed to provide two factors, the dividend and the divisor. The dividend may be a 36 bit number with the higher part in `AC` and the lower part in `IO` (just like the result of the square root routine, maybe a clue for its origin at BBN) for use with the high precision routine `dvd` or a simple 18 bit value (to be returned in `AC`) for the low precision routine `idv` (used here). The divisor is handed over quite like we've already seen it with the multiply routine, by a `lac` following immediately after the "`jda idv`". The routine will return at the next instruction in case of an illegal operation as indicated by an overflow, or skip this instruction and return at the next one. For this, the instruction following the "`lac \t1`" is a simple `opr` (synonymous to the void instruction `nop`), providing no handling of this case at all. (We are pretty confident that our code won't end up here.)

Thus, this sequence is loading the x-position (referred to by pointer `mx1`) into the accumulator and complements this by the following `cma` instruction. This is then divided by the output of the parablic function in `\t1` and the result is stored in variable `\bx`. (Mind that the value referred to by `mx1` is the unshifted x-coordinate, the equivalent of the x-position multiplied by decimal `256`.)

The same procedure is then repeated for the y-cordinate in `my1` and the result is stored in variable `\by`.
This is effectively transforming our parabolic function into a vectorized hyperbolic one:

```bx(x,y) = -x / f(x/256, y/256)
by(x,y) = -y / f(x/256, y/256)```

And since a greater divisor makes for a smaller quotient, we now also understand, why low gravity was achieved by skipping an extra shift to the right.

Let's investigate this a bit more thoroughly, in order to understand it properly. Here is a table of actual inputs and outputs of the gravity calculations for some prominent values of x and y, as calculated for a freely floating spaceship:

```Spaceship #1
free falling from 0777/0777 (top right) towards 0/0 (center origin)
all values octal, complete sequence provided for x/y positions < 0100
× ... integer multiply, ÷ ... integer divide, √ ... integer sqrt

f(x,y) = ( √((x >> 3)2 + (y >> 3)2) × ((x >> 3)2 + (y >> 3)2) ) >> 3
\bx = -mx1 ÷ f(mx1 >> 8, my1 >> 8)
\by = -my1 ÷ f(mx1 >> 8, my1 >> 8)

x      y        mx1      my1      f(x,y)       \bx     \by
777    777     377777   377777     254366         -1      -1
700    700     340177   340177     170760         -1      -1
600    600     300011   300011     113300         -2      -2
500    500     240223   240223      53600         -3      -3
400    400     200000   200000      26400         -5      -5
300    300     140071   140071      11220        -12     -12
201    201     100507   100507       2600        -27     -27
177    177      77726    77726       2235        -33     -33
141    141      60653    60653       1100        -53     -53
120    120      50315    50315        536        -73     -73
101    101      40732    40732        260       -137    -137
77     77      37612    37612        156       -223    -223
75     75      36447    36447        156       -216    -216
72     72      35263    35263        156       -210    -210
70     70      34056    34056        156       -202    -202
65     65      32630    32630        110       -276    -276
62     62      31353    31353        110       -265    -265
60     60      30047    30047        110       -253    -253
55     55      26516    26516         53       -415    -415
52     52      25123    25123         53       -373    -373
47     47      23471    23471         24       -766    -766
43     43      21740    21740         24       -713    -713
40     40      20115    20115         24       -635    -635
34     34      16207    16207         11      -1453   -1453
30     30      14133    14133         11      -1264   -1264
23     23      11731    11731          2      -4754   -4754
16     16       7031     7031          0          0       0
10     10       4131     4131          0          0       0
*capture*```

As we may observe, there are some minor discontinuities, due to the limited 18-bit precision, and the result of the divison (`\bx = -mx1/f(x,y)`) is not decreasing steadily all over the range for `mx1`, because the intermediate result of the parabolic function `f(x,y)` not being unique for each value of `mx1` and `my1`. (So, yes, there are actually waves of gravity to ride.) Moreover, the two divisions by zero are yielding a zero result and not some irregular value, while there isn't any handling of this edge-case. (We'll see later why this would be.)

We may also note that while the curvature of this hyperbolic function isn't exactly steep, gravity will be applied at every frame, resulting into a near to immedate plunge into the Star, if we would approach it as near as `34/34` in any direction, unless our ship has an opposing velocity that is compensating or exceeding the pull exerted by gravity. On the other hand, Scylla and Charybdis are not exactly on par and the function will apply the typical sling-shot effect to a ship, if we manage to maneuver closely around the gravitational center in any direction.

Finally, we should note here that, while the gravity factors `\bx` and `\by` are directly added to the movement vector in pointers `mdx` and `mdy`, only a 2048th part of this vector is applied to a spaceship's postion in each frame. Thus, the most extreme value of `4754` recorded in the series above attributes to an advance of a mere 2 display locations in the same frame, whereas normaly the effect will be building up gradually over a series of frames.

## Blazing in the Sun (Pof!)

So, what happens, if our ship falls prey to the Homeric monsters and we happen to collide with the heavy star?

```/ here to handle spaceships dragged into star

/ spaceship in star

pof,	dzm i \mdx	// reset x and y
dzm i \mdy
szs 50		// sense switch 5 set
jmp po1		// yes, explode
lac (377777	// no, reset to antipode
dac i mx1
dac i my1
lac i mb1	// load count of instructions
dac \ssn
count \ssn, .	// spend it on counting up in \ssn
jmp srt		// jump to return of spaceship routine (continues at mb1)

po1,	lac (mex 400000	/ now go bang
dac i ml1	// store as new control word
law i 10	// set up explosion timing
dac i ma1

As we've seen previously, a square distance smaller than octal `10` will lead to a jump to label `pof`. Here, the instructions "`dzm i \mdx`" and "`dzm i \mdy`" are resetting the x and y velocities of the ship to zero.
What's happening next, is decided on the setting of sense switch 5: If set, we jump to `po1` and set up a nice explosion. If not set, the jump is skipped and the spaceship is warped to the antipode by loading `377777` into the two locations in pointers `mx1` and `my1`. "`377777`" corresponds to the screen location `777/777` (decimal +511/+511) in the upper right corner, where the spaceship will be drawn, split to a quadrant in each of the corners of the screen. But we're not going to display the warping ship (preserving some of the magic), so we'll spend the time involved in a count-up. For this we load the instruction count as stored in pointer `mb1` (`-4000`) into variable `\ssn` (else used to store the scaled sine) and spend the instruction in an empty loop. When finished, we jump to the exit of the spaceship routine at `srt` (the jump to the return address, effectively to `mb1`, just before the end of the comparison loop).

If we end up at label `po1`, sense switch 5 is set to high for a killing star and we are up to see some pixel dust. As nicely illustrated by the comment, "now go bang", the instruction "`lac (mex 400000`" is setting up the explosion routine: "`mex`" is the address of the routine and `400000` is the numerical equivalent of a set sign-bit to indicate a non-colliding state of the ship. This is loaded as a constant expression into the accumulator and deposited as the new control word into the location in pointer `ml1` (the handler property of the current object). "`law i 10`" loads the value `-10` to be deposited into the location in pointer `ma1`, setting up the duration of the explosion. By this, we're done for this frame and jump to the return at `srt`, happily awaiting the explosion to be displayed in the next frame.

## Excursus — De Computatione Mechanica

Cum deus caculat fit mundus. (Leibniz)

When God calculates, the world arises. If possible, this is even more true for the Spacewar!-universe. But there's a little problem here, since our deus ex machina, arising from the contraption of the PDP-1, isn't exactly good at calculating. Ironically, inspite of its name, a digital computer is good at things like transferring and jamming bits, flipping them, or applying basic Boolean operations to them, but it is exceptionally bad when it comes to computing. A single addition of two binary digits wouldn't be that bad, since it's just an exclusive OR (XOR), but when it comes to numbers, there's a carry involved and we would have to add another XOR, two ANDs, and an inclusive OR, just to perform this on a single digit. Even something as simple as an addition is a serious business involving a serious amount of specialized hardware. Even a binary shift is expensive, but this is an essential operation to any Turing machine, as well as incrementing. It may be or it may be not due to the fact that most humans, if not all, aren't capable of any more complex mathematical operations themselves and are rather resigning to some kind of lookup table of partial results, they've learned to memorize, just like IBM's 1620 (CADET — Can't Add, Doesn't Even Try) did it. But this is not an option for realtime computing.

### Multiplication

So, how could we implement something like a multiplication on a binary computer? A naive approach would be a count down on one factor, while adding the other factor to the product in each of the steps. This might be according to the very definition of multiplication, but we would be up for a lengthy procedure involving up to (decimal) 131071 steps for an unsigned 17-bit number. Can't we do any better?

While looking at a binary number, it's obvious that this is representing the sum of `1`s multiplied by `2` raised to a power corresponding to a digit's positions. Since multiplication is distributive, it doesn't matter to which of the two factors we would apply this multiplications (or, in terms of electronics, the shifts). Thus, we may just scan one of the factors for `1`s and add the other factor to the product with the appropriate number of shifts applied. (Compare Leibniz' medal draft above.)

By this we've arrived at an algorithm that completes in a limited number of steps, one for each digit involved and one operation to be applied, if the scanned number (reduced by shifts itself) would happen to be an odd number. — I believe it was Leibniz, who was quoting Vergil in this context by annotating "numero deus impare gaudet" ("god delights in an odd number"). — Anway, a multiplication step involves a shift on the multiplier to the right to scan the least significant digit, and if we would find this to be what Shakespear called a lucky number, we would add the multiplicant to the product. Since we also need a double length register for a product of two factors of n length, we would rather apply the shift to the product instead of the multiplicant, and add the multiplicant to the higher part.

This is also, what the instruction `MUS` (multiply step) does. Since the procedure involves both a right shift on the multiplier and a right shift on the double length output register, we may also combine these in a further step of optimization in a single shift: We will start with a cleared accumulator (AC) and one of the factors in IO. Then, for each step, we'll add the other factor (from memory) to the higher part of the double register in AC, if the least significant bit of the combined register would be a `1`. Subsequently we'll shift the combined register right by a single bit. In the next step, the next bit of the original multiplier would be in the least significant position and the accumulator would represent a position in the final product relatively shifted left by one bit. If we apply this 17 times (for each unsigned bit), we'll end up with the appropriate result in the combined AC-IO registers. All we have to do is to take care of the signs, since our algorithm is working on unsigned numbers.

Let's have a look at the full scale multiplication of the BBN subroutine starting at `mpy`:

```/BBN multiply subroutine
/Call. lac one factor, jda mpy or imp, lac other factor.

dap im1
im1,	xct
jda mpy
lac imp
idx im1
rir 1s
rcr 9s
rcr 9s
jmp i im1

mp2,	0

mpy,	0		/returns 34 bits and 2 signs
dap mp1
lac mpy
spa
cma
rcr 9s
rcr 9s
mp1,	xct
spa
cma
dac mp2
cla
repeat 21, mus mp2
dac mp2
xct mp1
xor mpy
sma
jmp mp3
lac mp2
cma
rcr 9s
rcr 9s
cma
rcr 9s
rcr 9s
dac mp2
mp3,	idx mp1
lac mp2
jmp i mp1```

The `jda` put's one factor into location `mpy` and starts at the instruction "`dap mp1`", complementing its opcode `xtc` by inserting the address of the instruction that will fetch the other factor. The next instructions are loading the first factor into AC, convert it to an absolute value and swaps them to the IO register. Then the `xtc` at label `mp1` fetchs the other factor in AC and, after converting this one to an absolute value, too, it is stored in `mp2`. Now, we've one factor in IO and the other in `mp2`. After the preliminary clearing of AC, we perform the actual multiplication by applying 17 steps of "`mus mp2`" (as in "`repeat 21, mus mp2`") — thus computing a raw 34-bit product with the sign-bit in AC and the least significant bit in IO both empty.

Now we're left with the task of appling the appropriate sign. Obviously the product will be negative, when the two factors were of different signs, else the product will be positive. In order to detect the signs, we save AC (the higher value part of the product) in `mp2` and once again load the original second factor in AC by "`xct mp1`" (this will perform another `xtc` instruction). By performing an exclusive or with the first factor on this and testing for the sign-bit by the instruction `sma`, we'll either jump to `mp3`, if the factors were of the same sign, or complement the two parts of the double length product. At `mp3` the value in `mp1` is incremented to point to the final return address. With the double length registor restored by "`lac mp2`", we're finsihed and take the jump to the return address.
Notably, the result will be a 34 bit value padded by two signs in AC0 and IO17.

The single length multiplication at `imp` is a derivative form of this. It will actually execute the routine at `mpy` that we've just investigated. Then it applies an extra shift in order to normalize the lower part result in IO (for the same reason as we've discussed while inspecting the gravity calculations) and performs a final swap to provide the result in AC.

### Division

Division is a bit more complicated. The general algorithm for division is the long division algorithm as in:

```Q: quotient, R: remainder, N: dividend, D: divisor

Q := 0
R := 0
for i = n-1..0 do
R := R << 1
R(0) := N(i)
if R >= D then
R = R - D
Q(i) := 1
end
end```

Please note that this is using the standard notation with the index starting on the least significant digit, as opposed to the notation used by DEC, numbering the bits from left to right, we're using generally here.

Generally, any division algorithm has to adjust for the following conditions:

`dividend = remainder + quotient × divisor,  0 <= remainder < divisor`

As we can see, the product of the quotient and the divisor has to be arranged in a way, that it will equal the dividend when summed with a remainder that is less than the divisor. We may see that there is a condition on the remainder for achieving a correct result, and we may see that the only way to arrive at a satisifying value for the remainder is by producing a corresponding quotient (since the divisor is a constant in this procedure). As for the product, we've already seen that we could regard this as a sum of products of powers of two, corresponding to the digits set to one in its factors. We should be able to reverse this procedure, reducing the tasks essentially to exactly knowing where to slice these factors out of the dividend.

We'll start with a partial remainder set to equal the dividend, and like in multiplication, we will scan the number by shifting it one bit at each step. Since this is essentially the reverse operation to multiplication, we'll start at the highest significant digit and shift it to the left (thus dividing it by 2). If the divisor is fitting into the partial remainder (`P`), as expressed by the condition "`P - D >= 0`", we'll subtract the divisor (effectively multiplied by `2n-i`) and set the corresponding bit in the quotient to high (since the divisor would fit into the dividend for the according times of this power of two).

Thus, we have arrived at the slow division method, generally expressed in:

`Pi+1 = R × Pi - qn-(i+1) × D`

Or, expressed as an algorithm:

```P := N
D := D << n            -- divisor will be implicitly shifted right by n bits
for i = n-1..0 do
P := 2P - D          -- trial subtraction from shifted value
if P >= 0 then
q(i) := 1          -- result-bit 1
else
q(i) := 0          -- result-bit 0
P := P + D         -- new partial remainder is (restored) shifted value
end
end```

Since the subtraction has to be applied to the partial remainder only, if this subtraction is yielding a positive result, we're dealing effectively with two states of the partial remainder, before and after the subtraction, with the condition on the second one. One way to handle this is a non-destructive algorithm, where we would apply the subtraction anyway and restore the partial remainder by adding the divisor again, if the result would be negative (see above). On the other hand, the sign of the partial remainder would indicate the condition for the previous step, so we could put our emphasis on this. If the sign-bit would be unset, the subtraction would have satisfied our condition and we would set the corresponding bit in the quotient. Else, we would add the quotient multiplied by two (to compensate the shift). In both cases, we would then subtract the divisor again, to provide the sign-bit for the next step. Simplifying this, we will either subtract the divisor, if the partial remainder is positive, or we will add the divisor, if it happens to be negative.

For our implementation, we're combining AC and IO into a double-length register once again (since the dividend is a double-length product) and use this for the partial remainder, being initially the dividend. Since we can provide for the corresponding bit position in the quotient by just setting the least significant bit and shifting the quotient to the left as well, we may reuse the AC-IO double-length register and gradually shift in the quotient at the lower side, while the partial remainder is shifted out at the other side. This way, the least significant bit of the double-length register is set to one, if the sign-bit before the shift was unset (indicating a positive partial remainder), and cleared, if the sign-bit is set. In other words, the complemented previous sign-bit becomes the new least significant bit in the gradually produced quotient.

A thing to consider is the sign of the least significant bit shifted in when we would apply the addition for the previous partial remainder being negative. Since the less significant bits in IO will always be positive and the partial remainder is in this case negative (represented in 1's complement), we'll have to add `1` to the addition in order to correct this new bit to the sign. Thus, the addition step becomes "add divisor plus 1", if the sign-bit was negative.

Since each of the steps is presupposing a subtraction previously applied, we'll have to provide for an initial one before applying the first step. Similarly, the last step will apply a surplus one and we'll have to componsate this by a final addition. Moreover, we're dealing with operands of different word length, so we'll have to decide what they would represent. Generally, for this algorithm, they are regarded to represent fractional values, so we will apply the divison to the higher part of the double-length AC-IO register, leaving the part where the quotient is gradually produced intact. (Also, any subtractions or additions will be applied to AC exclusively and we will do fine with any adder-circuitry provided for AC only.) If we would want to use this for an integer divide, we'll have to provide a single-word dividend adjusted to the right of the double-word register in IO only (shifted left one bit to for the sign).

Thus, we've finally arrived at the following algorithm:

```0 <= N < 1,  0 < D < 1,  nP = 2nD

D := D << nD           -- D is applied do the higher part of double word P only
P := N

P := P - D             -- initial subtraction
for i = 0..nD-1 do
s := sign(P)
P := P << 1
if is_set(s) then    -- was negative (Pi-1 - D < 0)
P(lsb) := 0
P := P + D + 1*    -- *) PDP-1: + 1 corrects 1's complement of AC(lsb) shifted in
else                 -- was positive (Pi-1 - D >= 0)
P(lsb) := 1
P := P - D
end
end
P := P + D             -- compensate the surplus subtraction applied in last step```

The divide step as in instruction `DIS` expects the dividend in the combined AC and IO registers, and the divisor in the memory location referred to in the address part. At each step the combined registers are shifted left and the complement of the former sign-bit (in AC) is jammed into the least significant position (in IO). If this bit (now in IO bit 17) happens to be `1`, the divisor (from memory) is subtracted from AC, else AC is incremented by `1` and the divisor is added. (Actually, the subtraction is performed by an addition, too, complementing AC previously and afterwards.) Applied for each binary digit, it will produce the remainder in AC and will shift the intermediary quotient into IO.

There are actually some constraints on this algorithm: First, it works on absolute values, so we'll have to manage the signs once again on our own. Second, it works only, if the divisor (notably shifted left by a word-length) is greater than the dividend. So, we have an error condition here (an overflow) that is also covering the edge-case of a divsion by zero.

The BBN subroutine is testing the overflow condition after the initial subtraction. If there is an overflow (as indicated by a positive result), the routine returns at the location after the `lac` for the divisor. If the routine produces a valid result, this instruction is skipped and the routine will return at the next location. Further, there are two entry points: One for a simple integer divide and one for two fractional values with a double-length dividend. The call at `dvd` is performing a fractional divide (with the fractional point immediately after the sign-bit), the call at `idv` (as used in Spacewar!) an integer divide on a simple value in AC:

```/BBN Divide subroutine

/calling squence. lac hi-dividend, lio lo-dividend, jda dvd, lac divisor.
/returns quot in ac, rem in io.

idv,	0		/integer divide, dividend in ac.
dap dv1		// store address to fetch the divisor
lac idv		// load the dividend
scr 9s		// prepare the dividend: swap it in IO
scr 8s		//                      (less sign-bit)
dac dvd		// deposit sign (-0/+0) in dvd
jmp dv1		// now continue with fractional divide

dvd,	0		// fractional divide
dap dv1		// store address to fetch the divisor
dv1,	xct		// fetch divisor (in AC)
spa		// make it an absolute value
cma
dac idv		// store divisor in idv
lac dvd		// load hi-dividend into AC
sma		// make dividend an absolute value
jmp dv2
cma
rcr 9s
rcr 9s
cma
rcr 9s
rcr 9s
dv2,	sub idv		// initial subtraction (dividend - divisor)
sma		// negative? (divisor > dividend)
jmp dve		// no, abbort on overflow
repeat 22, dis idv	// apply division steps
dio idv		// store quotient in idv
cli		// now arrange sign, first clear IO
rcr 1s		// rotate AC and IO right 1 bit (sign AC = 0)
lio dvd		// remainder is same sign as divdend, load hi-dividend
spi		// dividend was positive?
cma		// no, complement remainder
dac dvd		// deposit remainder in dvd
xct dv1		// compute sign of quotient, fetch divisor
xor dvd		// xor remainder (get sign diff)
rcr 9s		// swap AC and IO (store computed sign in IO)
rcr 9s
idx dv1		// good divide, increment return address
dve,	idx dv1		// increment return address
lac idv		// load quotient into AC, arrange sign
cma		// negative, complement AC (quotient)
lio dvd		// remainder in IO
jmp i dv1	// return```

The instructions after the final addition are preparing the sign-bits: The remainder is of the same sign as the dividend and the quotient will be negative, if the signs of the divisor and the dividend (or the remainder) are different.

Notably, the input format for the long dividend to be provided for routine `dvd` is the same as the output format of the high precision multiply routine, a continuous number stretching from AC to IO with no sign in the highest bit of IO. Therefore, if we would want to assemble this from two ordinary values, we would have to apply a shift to the left by one bit position to the low dividend in IO in advance, just like we see it in the preliminary steps of the integer divide routine at `idv`.

(It is only here that we can see, why the series of Spacewar! gravity factors didn't produce any irregular value on a division by zero and returned `0` instead: Since `idv` was called for the single-word integer division, AC would have been zero (with the absolute dividend in IO), when the initial subtraction was applied. With zero also subtracted as the divsior, the routine abborted on a positive value by a jump to `dve`. At this time there's still the divisor in `idv`, which is now loaded as the result and adjusted to the sign of the absolute dividend, still in IO. Here, this is a small value with the higher bits cleared, resulting in an empty sign-bit. Thus the result returned in AC is a plain zero.)

The automatic hardware multiply/divide option, required by Spacewar! 4.x, is essentially an automatization of these algrorithms performed by the BBN-subroutines with all values generally regarded to represent fractional values. When installed, the instructions multiply step `MUS` and divide step `DIS` are replaced by the instructions `MUL` and `DIV` respectively (reusing the opcodes of the former single-step instructions). The hardware solution will not only provide the results in fewer cycles, but it will also restore the values in AC and IO, in case a division would return with an overflow.

*****

There is a rather small chance that you, dear reader, would now be asking breathlessly, "And how could we do square roots?" O.k., this may be unlikely, but there are two other, higher order arithmetic subroutines in Spacewar!, namely the square root routine and the sine-cosine routine. And we really should have a look at them, too. Maybe, you, like me, weren't taught a square root algorithm in school, since there had already been this small shiny box, called an electronic calculator (they don't make them shiny anymore), and maybe you are interested in how it came up with its answers. Here is, how it would have done a square root — at least, if it would have been restricted to integer square roots:

*****

### Square Roots

When it comes to integer square roots, there is again an approach both naive and literal to the definition: We would go over all the squares of the integers until we would encounter a square that would be greater than our argument. The answer would be the last square before this one, the one that just fitted into the given number. (Since we are dealing here with integers only, we would forget about any remainder.) Again, this is not a viable option for realtime computing and we may want to think about something different. Maybe something that would be working on factors. — But, are there any factors in squares?

We may want to consider the well known formula for solving the square of a binomial term here:

`(a + b)2  =  a2 + 2ab + b2`

From this follows that the square of any number in a number system of base n may be cosidered to be the square of the sum of the various powers of n in this number, as for any 2-digit number x:

```x = np + q
x2  =  (np + q)2  =  n2p2 + 2npq + q2```

p.e.

```95 = 10×9 + 5
952  =  (10×9 + 5)2  =  100×81 + 20×9×5 + 25  =  8100 + 900 + 25  =  9025```

If we would want to reverse this, we are learning from the expressions "`n2p2`" and "`q2`" (as in "`100×81 + 25`") that the square parts of this term won't overlap for any two adjacent digits of the root, since the product of any two single-digit numbers must be less than `n2` (e.g, 9×9 < 100). Thus we may want to scan the square in pairs of two adjacent digits in order to extract the squares of the individual digits of the underlying root. Also, as in division, we will try to extract the biggest factors first, so we will scan the square number from left to right, "bringing off by two". But there might be some remainder involved and there will be some overlap as in the expression "`2npq`". How would we handle this?

Let's see: Considering our base 10 example from above, we inspect the first pair `90`, and obviously the biggest square fitting into this is `92`. So, we note `9` as the highest digit of our result and subtract its square `81` from `90`, makes `9`. As we advance to the next digit, bringing down `25` to get q, what does this remainder represent with respect to the square of a binomial?

```x2  =  100p2 + 20pq + q2
q(20p + q)  =  x2 - 100p2

x2 - 100p2
q = –––––––––––
20p + q```

Generally, the steps we have to apply here are not that much different from solving the remainder and bringing down the next digit in long division, while a bit more complex: Bringing down two involves a shift by two digits to the left (`×100`), so "`x2 - 100p2`" represents the remainder of our subtraction, now to be shifted left two digits and the next two digits of the argument added to it. (The other way round, "`x2`" is the square number as far as we inspected it, and we've already factored out the square of the first root digit(s). As we arrive at q, we have applied a double shift to it and brought down the next two digits, providing the two new lower digits in this for `x2` and `100p2`, or `n2p2`.) In order to get q, we will have to arrange for the divisor in the expression: We will double the partial root, we've just determined, shift it left one digit and add q to it. — Here, we're facing a little problem, since we can't know q at this time. We'll have to come up with a guess.

In our example the dividend is "`9×100+25`" (or `925`) and we have to come up with a guess fo the divisor as in "`20×9 + q`" (or "`180 + q`"). In this case we're lucky, since this will work out perfectly with "`q = 5`", providing the final result `95`.

But odds are, we will not be that lucky each time and there will be a remainder. Actually we have to go back and prove our guess:

`r = x2 - 100p2 - q(20p + q)`

In case we would get a negative remainder, we would have to track back and redo with a smaller guess for q before we would be able to continue with the next pair of digits by bringin down the next two highest digits of the argument at the right of r and q now becoming the next p. — Notably, it is this need for a trial-and-error operation and some potential backtracking which is contributing to the image of this long form method as rather being some kind of dark art or alchemy than an enlightened, clean algorithm.

But see what's happening, when we do the same with binary numbers, with just a single case to be considered, namely q equalling `1`:

```     x2 - 4p2
q = ––––––––––
4p + 1

r = x2 - 4p2 - 1(4p + 1)```

Amazingly, there is no need for guesses and backtracking with binary numbers: Since q is either `1` or `0`, the divisor will either fit into the remainder or not. The term "`+ q`" becomes a constant expression and the divisor is to be directly solved. With the various multiplications by 4 being the equivalent of double shifts, we arrive at a straight forward algorithm using only basic operations like shifts, increments, and maybe a subtraction:

1. Shift any remainder 2 bits to the left, shift in next two highest digits from the argument;
Shift the partial result (p) 1 bit to the left (advance to next digit).
2. Optional shortcut: Is there any remainder? If zero, continue with step 1 (q = 0).
3. Get the divisor by "`(p << 1) + 1`" (we already applied one shift to the partial result above)
4. Does this fit into the remainder?
5. If it fits, subtract the divisor from the remainder and increment the partial result by `1` (add q).
6. Repeat for any remaining digits.

Let's have a look at the implementation used by Spacewar!:

```/integer square root
/input in ac, binary point to right of bit 17, jda sqt
/answer in ac with binary point between bits 8 and 9
/largest input number = 177777

sqt,	0
dap sqx		// deposit return address
law i 23	// load 23 (number of digits to produce + 1)
dac sq1		// store it in sq1 (loop counter)
dzm sq2		// clear sq2 (result)
lio sqt		// load argument into IO
dzm sqt		// clear sqt, now used for remainder

sq3,	isp sq1		// index sq1, skip on positive (performs 18 iterations)
lac sq2		// load sq2 into AC (result)
sqx,	jmp .		// return

lac sq2		// load partial result
sal 1s		// shift left 1 bit
dac sq2		// and store it (sq2 = sq2 << 1)
rcl 2s		// combined rotation, multiply & get next 2 bits from IO
sza i		// is it zero?
jmp sq3		// yes, continue with next 2 digits at sq3 (root digit is 0)
dac sqt		// store remainder
lac sq2		// prepare divisor: load sq2 (partial result)
sal 1s		// shift left 1 bit
add (1		// add 1 to it  (ac = sq2 << 1 + 1)
sub sqt		// subtract remainder (ac ≤ sqt ?)
sma+sza-skip	// is it zero or minus?
jmp sq3		// no, continue with next 2 digits (root digit is 0)
spa		// is it positive?
cma		// no, complement it
dac sqt		// store as remainder
idx sq2		// increment partial result (root digit is 1)
jmp sq3		// continue

sq1,	0
sq2,	0```

Transforming this into some kind of C-like pseudocode (which happens to be valid JavaScript) we get:

```function sqrt(num) {
var result = 0;
var remainder = 0;
var divisor, i, t;

for (i = 0; i < 18; i++) {               // for each resulting digit
result <<= 1;
t = (remainder << 2) + (num >> 16);   // this and next line: lac sqt, rcl 2s
num <<= 2;
if (t) {
remainder = t;
divisor = (result << 1) + 1;
if (divisor <= remainder) {
remainder -= divisor;
result++;
}
}
}
return result;
}```

This is the perfect image of the algorithm we've drafted above: Inside the loop, we first advance to the next digit of our result by applying a shift. Then we apply the double shift to the remainder and add the next two digits to be inspected to it by rotating the combined AC & IO registers, containing the remainder and the argument respectively. If the remainer would now be empty, there's nothing to fit and this digit of the root is a zero. Thus, we may continue by a jump to the next iteration of the loop. (In the assembler code, we won't even store the remainder in this case, since it must have been zero before.) If we've got some remainder, we will check the divisor to fit into the remainder. (This is done by a subtraction and testing the result for being either zero or a negative number. If the divisor fits, it is restored by complementing it.) In case the divisor is less than or equal to the remainder, our result digit is a 1 and we subtract the divisor from the remainder. After repeating this procedure for each bit of our 18-bit result, we're eventually done and return the result.

(Those more familar with the conventional high-school method may want to have a look at the excellent description at http://www.embedded.com/electronics-blogs/programmer-s-toolbox/4219659/Integer-Square-Roots, which is arriving at the very same algorithm.)

What's interesting here on a more textual level, is the clarity of the design of this square routine and its organization in three distinctive blocks, each representing a functional context: The first one at label `sqt` is storing the return address and initializing the various registers and memory locations used. (We may note the reuse of `sqt` for the remainder, after the argument, put there by the calling `jda` instruction, has been transferred to the IO register.) The second one constructs the loop by a single `isp` instruction (index and skip on positive) that is incrementing the counter and either performs a jump to the loop body, if the contents of the counter would be still negative after the increment, or skips this jump, when the counter becomes zero, and puts the result into the accumulator before jumping to the return address. The main part being the loop body is doing the heavy work, with any branches for another iteration jumping to the loop instruction at label `sq3`. Finally, there are the two memory registers used for the loop counter and the partial result.

### Sine & Cosine

It may be surprising, but there is no standard method to calculate sine and cosine. Moreover, we may not hope to come up with some easy to operate algorithm based on shifts and additions only, since sine and cosine are essentially about projecting a 2D unit vector onto a 1D length. Hence, we may expect to see some higher order functions involved.

While various algorithms are used in computer implementations, like the CORDIC algorithm, then just recently presented in 1959, or the approximation series published in "Approximations for Digital Computers" by Cecil Hastings Jr (Princeton, New Jersey, Princton University Press, 1955, p.138-140) which was also popular at DEC, the sine-cosine routine by Adams Associates, Inc (DEC Preliminary Memo M-1094, 22 Nov 1960) is using the well known Taylor series for approximation:

```              x3    x5    x7
sin(x) = x - ––– + ––– - ––– ...
3!    5!    7!```

While this series is actually extending infinitely, we may observe that the higher order terms, like "`x9/9!`", are becoming smaller and smaller, especially with a small value for x. E.g., for x = 0.1 the 9th order term becomes `0.000000001/362880`, a value hardly representable in standard precision. Hence, we may keep the error small and restrict the approximation to just a few terms, as long as we're keeping our arguments small.

Since the sine is both periodical (sin(θ) = sin(θ + 2π)) and symmetrical (sin(θ) = sin(π/2 – θ)), we may reduce the argument to a quadrant (or π/2 radians) in order to keep the error small and to optimize the approximation. And since there is also a symmetry between sine and cosine (a phase shift by π/2), we may use the same core algorithm to approximate both of these functions.

By extracting the constants and rearranging the formula to determine the higher powers of x by repetitive multiplications (Horner scheme), we finally arrive at something that may be implemented on a digital computer:

```x = θ/2π   // -1 ≤ x ≤ 1

(2πx)3   (2πx)5   (2πx)7
sin(x) = 2πx - –––––– + –––––– - ––––––
3!       5!      7!

= C1x + C3x3 + C5x5 + C7x7

= ((((C7)x2 + C5)x2 + C3)x2 + C1)x```

Let's have a closer look at the sine-cosine routine used by Spacewar!. It is actually not the original form of the Adams Associates routine as provided in the Preliminary Memo M-1094 from November 1960, but a revised version, reformatted for use with the Macro assembler and the BBN multiply routine:

```/sine-cosine subroutine. Adams associates
/calling sequence= number in AC, jda sin or jda cos.
/argument is between ±2 pi, with binary point to right of bit 3.
/answer has binary point to right of bit 0.  Time = 2.35 ms.

define	mult Z		// macro to call to the BBN multiply routine
jda mpy		// 34 bit routine
lac Z
term

cos,	0		// entrance to cosine, argument deposited here by jda
dap csx		// routine starts here, deposit return address
dac sin		// store in sine

sin,	0		// entrance to sine, argument deposited here by jda
dap csx		// routine starts here, deposit return address
spa		// is it positive?
sub (62210	// subtract π/2
sma		// is it minus? (operand was < π/2)
// operand is now -π/2 ≤ AC ≤ π/2, 3 integer bits
si3,	ral 2s		// rotate left 2 bits (now 1 integer bit)
mult (242763	// × 2/π (with binary point right of sign)
dac sin		// store x
mult sin	// get square
dac cos		// store x^2
mult (756103	// C7x^2
add (121312	// C7x^2 + C5
mult cos	// C7x^4 + C5x^2
add (532511	// C7x^4 + C7x^2 + C3
mult cos	// C7x^6 + C5x^4 + C3x^2
add (144417	// C7x^6 + C5x^4 + C3x^2 + C1
mult sin	// C7x^7 + C5x^5 + C3x^3 + C1x
scl 3s		// shift AC & IO left 3 bits (×8)
dac cos		// store result in cos
xor sin		// now adjust signs, xor x
sma		// is it negative? (are result and x of different signs?)
jmp csx-1	// no, return result (cos)
lac (377777	// max positive value (2^17-1, nearly 1.0)
lio sin		// load x into IO
spi		// is x positive?
cma		// no, complement AC (400000, 1.0 with bit-shift)

lac cos		// load result from cos (as before xor-ing with sin)
csx,	jmp .		// return

si2,	cma		// (operand - π/2 ≥ 0) complement AC
sma		// is it negative?  (π/2 - opsi2 < 0?)
spa		// is it positive? (π - opsi2 > 0?)
jmp .+3		// no, skip next two instructions
sub (62210	// undo last addition (subtract π/2 again)

sub (62210	// subtract π/2 (effectively: ac = π/2 - opsi2)
jmp si1		// and start over at si1```

Our interpretation of the constant values here is backed by the labels in the original code as in Memo M-1094, where "`xOy`" means x/y and "`P`" is for π:

```	C7	756103
C5	121312
C3	532511
C1	144417
2OP	242763
2P	311040
PO2	062210
ONE	377777```

(To identify the constants otherwise, we have to inspect the binary representation of these fractional octal numbers for equivalents. E.g., "`242763`" is `010100010111110011` with the binary point just after the sign-bit as in "`0.10100010111110011`", the equivalent of 2/π or "`0.1010001011111001100000110…`". Notably, the outer parts at labels `si1` and `si2` are using a representation with a sign and 3 integer digits, hence "`311040`" is "`0110.01001000100000`" or .)

So, what's happening here? Let's start at the sine routine (since the cosine ist quite trivial by adding π/2 to the argument and then continuing with the code for the sine):

The routine is called by "`jda sin`", hence the argument (or operand) is stored at label `sin` and the execution starts at the next instruction with the return address in AC. After storing the return address in the address part of location `csx` the operand is first reduced to the range `0..2π` due to the periodicy of the sine. Then, if the operand isn't in the range `0..π/2` already, it is further reformatted and reduced to the range of `-π/2..π/2` at label `si2`.

As we enter the core part at label `si3`, the value of the operand is reduced to plus/minus a quadrant, but still in the input format of a sign and 3 binary integer digits. The instruction "`ral 2s`" rotates the first two bits (which must be empty by now) out to the left, by this gaining some precision. Since we're going to use the format of the high-precision multiply routine with the binary point next to the sign-bit, this is effectively a divide by two in relation to the values of the following operations. The multiplication by `2/π` transforms this to a parametric function. The next instructions are cumulatively building up the series approximating the sine, by first setting up a square and then multiplying this square and alternately adding a constant to the product. After a last multiplication by the simple operand, we've arrived at a raw value for the sine in the combined AC and IO registers, which are now shifted to the left by 3 bits. (We may observe that the constants C7 and C3 have their sign-bits set and are thus representing negative values. — This is also, why we can single out the algorithm presented by Cecil Hastings Jr, even without inspecting the constants more closely, because this one would be using just a single negative component in its four-terms version. E.g., compare this 1962 DEC library implementation by Margaret Wu, who may be better known to some readers for her textbook An Introduction to Computer Data Processing from 1975.)

With the higher part of the raw result stored in `cos`, we're now ready to adjust the sign: By applying an exlusive or of the simple operand to the result, we're comparing the sign-bits. If they are the same, we return the contents of `cos` as the result, by jumping to "`csx-1`". Notably, the return format is a fractual value with the binary point next to sign-bit. If the signs would differ, we skip the jump and the maximum positive value (`377777`) is loaded into the accumulator. Since we're returning an all-fractional value, this represents the value as close to `1.0` as possible. With the sign of the simple operand loaded and inspected in the IO register, the value will be complemented to (exactly) `1.0` (with the bit-shift applied), if the operand was positive (as of the entrance to `si3`). In both cases we then jump to the return at `csx`, returning `+/-1`.

So, why would the Taylor series work at all? It's a well known fact that for a small value of an angle θ the sine sin(θ) will be almost θ (and thus cos(θ) will be nearly 1, with cos(0) being just a phase shift of sin(π/2)). If we think of the sine and cosine in terms of a projection of a 2D unit vector on a simple length (along the axes of the coordinates system), we can also see why: The projection onto the y-axis, the sine, will be measuring the deviation of this vector from the x-axis, and as this deviation, described by the angle θ, approaches zero, the length of this projection onto the y-axis, the sine, will be approaching zero, too. But as the angle is becoming greater, the difference will be becoming more substantial and we will be in need for some kind of compensation to adjust for the characteristic slopes of these transcendental functions. The terms xn/n! are adding the compensation needed with an increasing fineness and level of fidelity as the order of n increases, and each one will be also compensating the additions already applied by the preceding ones as the terms are of alternating signs, thus minimizing the error. (We may observe that these expressions are of some parabolic form, as these are exponential terms, and the effect will decrease by increasing order.) Theses terms are becoming smaller and smaller with increasing order of n, until they would be approximating zero, especially for small input values. (Which also conforms with the observation that the sine of a small angle will be approximating the angle itself.) Thanks to the periodical and symmetrical nature of the sine and cosine, we may limit the range of the input value, as long as we're taking care of the sign for the given quadrant. Obviously, the sign of the sine will be positive for the first two quadrants and negative for the third and fourth ones. Further, we may transform the input values for the second and fourth quadrants to conform to the transformation sin(θ) = sin(π/2 – θ), thus reducing our effective input to a range of -π/2..π/2 (which will be transformed to a prametric range by multiplying it by the factor 2/π), as we see it in the Adams Associates routine.

*****

Now, that we've also peeked inside the subroutines that were essentially black boxes to Spacewar!, we may return to our principal subject:

## Spacewar 4.x and the Hardware Multiply/Divide

Spacewar! 4.x is all about the automatic multiply/divide option added to MIT's PDP-1 at the turn of 1962/63. Basically it is Spacewar! 3.1 modified to run with the hardware multiply/divide and some small fixes (like for the small bug caused by the property `mco` never being set, so the continuous/single fire option will work as intended). Spacewar! 4.0 was done by Monty Preonas (aka "Diamantis Drakos Preonas", thus signing "`ddp`") in Jannuary/February 1963 and the final version 4.0 is dated 2 Feb. 1963. (There's also a version 4.1 and a version 4.8 both signed "`dfw`", but the author of those hasn't been identified yet. Steve Russell had eventually left MIT, just returning to arrange the consolidated version 3, and wasn't involved in version 4 at all.)

While Martin Graetz had already announced a somewhat refined method for calculating gravity in his DECUS paper on Spacewar! 2b,

"In the final version of SPACEWAR! he [Dan Edwards] is going to provide an improved integration to eliminate some of the more unexpected, albeit interesting, properties of the 'heavy star.'" ("Spacewar! Real-Time Capability of the PDP-1", DECUS Proceedings 1962),

the algorithm remained the same until version 4.0. (Apparently, the minor discontinuities produced by the approximation used in versions 2b – 3.1 was somewhat of a thorn in the side: It would have been good enough for a game, but it wasn't right for a simulation.)

When Monty Preonas addressed version 4, he wasn't only upgrading the program to make use of the faster hardware multiply/divide, he also introduced a new algorithm, which finally provided the long announced "improved integration".
(Aside from this, he also applied a small change to the sine-cosine routine, originally provided by Adams Associates, Inc., DEC Memo M-1094, 22 Nov 1960, in order to make it work with the hardware arithmetics. — On Monty Preonas (ddp) and Spacewar! 4.0 see Joe Morris, alt.folklore.computers, 5 April 2001.)

Note: While Joe Morris recalls ddp's full name as "Demitrios D. Peronas", Peter Samson is confident of it being "Diamantis Drakos Preonas" — we're going here with Peter Samson.
As an alternate/complementary source the TMRC members list shows an entry for "Monty Preonas" as a member since 9/1960.

Due to the new, improved integration method any ships entering the star at the same time would now be warped to the antipode before colliding. With Spacewar 4.x else being quite undistinguishable from version 3.1, the easiest way to tell them apart is to start a new game and let the ships float freely, until they would plunge into the heavy star. If the ships are exploding inside the star, you're watching version 3.1. If the explosion of the colliding ships is spread over the corners of the screen (the antipode), you're watching some flavor of version 4.

Since we're dedicated to the "classic" version 3.1, we're not going to explore version 4.x in every detail. (Actually, we've already spent some pages on this topic, and the gravity algorithm of version 4.x isn't an easy one. However, we'll scratch at least the surface in a bit more detail below.) Anyway, here's the source for gravity in Spacewar 4.0, the grayed instruction marking the parts unchanged from the previous versions:

```spacewar 4.0  2/2/63 ddp
...

str, 15,	100	/ star capture radius
grv, 32,	sar 6s	/ gravitational constant

...

dzm \bx
dzm \by
szs 60
jmp bsg
lac i mx1
dac \t1
mul \t1
scr 1s
dac \acx
cla
scr 2s
dio \iox
lac i my1
dac \t1
mul \t1
scr 1s
dac \acy
cla
scr 2s
swap
swap
scl 2s
sub str
sma i sza-skp
jmp pof
varsft
dac \t1
jda sqt
mul \t1
undosft
scr 9s
scr 6s
szs i 20		/ switch 2 for light star
scr 2s
sza
jmp bsg
scr 1s
dio \t1
integrate mx1, \bx
integrate my1, \by
bsg,	...```

And there are some new macro definitions for this as well:

```define
varsft
dzm \xys
dac \t1
idx \xys
v2,	idx \xys
lac \t1
scr 2s
dac \t1
sza
jmp v2+R
scr 2s
swap
terminate

define
undosft
dac t1
dio \t2
lac \xys
dap .+1
lac .
dac .+6
dac .+6
xor (10000	/ change scr to scl or scl to scr.
dac \xyt
lac \t1
dio \t2
scr .
scr .
terminate

define
integrate A,B
cli
lac i A
scr 9s
scr 1s
div \t1
hlt
cma+cli-opr
xct \xyt
xct grv		// sar 6s
dac B
terminate

sft,	lac .-1
scr 7s
scr 6s
scr 5s
scr 4s
scr 3s
scr 2s
scr 1s
scr
scl 1s```

And here are the changes to the sin-cosine routine, replacing the insertions of the macro "`mult`" by the instruction "`mul`":

```/sine-cosine subroutine; Adams associates
/calling sequence= number in AC, jda sin or jda cos.
/argument is between ±2 pi, with binary point to right of bit 3.
/answer has binary point to right of bit 0.  Time = 2.35-? ms.
/changed for auto-multiply , ddp 1/19/63

define	mult Z
jda mpy
lac Z
term

...
mult (242763
...
mult sin
...
mult (756103
...
mult cos
...
mult cos
...
mult sin
...```

And fianally, here is the raw output of the gravity calculations:

```Spacewar! 4.0, Spaceship #1
free falling from 0777/0777 (top right) towards 0/0 (center origin)
all values octal, complete sequence provided for x/y positions < 0100

x       y         mx1        my1       \bx      \by
777     777      377777     377777        -1       -1
700     700      340077     340077        -1       -1
600     600      300103     300103        -2       -2
500     500      240201     240201        -3       -3
400     400      200112     200112        -5       -5
300     300      140355     140355       -11      -11
200     200      100277     100277       -26      -26
140     140       60345      60345       -47      -47
120     120       50352      50352       -70      -70
100     100       40325      40325      -130     -130
76      76       37247      37247      -136     -136
74      74       36156      36156      -145     -145
72      72       35050      35050      -155     -155
67      67       33724      33724      -166     -166
65      65       32562      32562      -201     -201
62      62       31377      31377      -216     -216
60      60       30173      30173      -235     -235
55      55       26743      26743      -260     -260
53      53       25465      25465      -306     -306
50      50       24156      24156      -342     -342
45      45       22613      22613      -407     -407
42      42       21207      21207      -467     -467
37      37       17534      17534      -571     -571
34      34       16002      16002      -732     -732
30      30       14155      14155     -1161    -1161
24      24       12212      12212     -1560    -1560
20      20       10071      10071     -2605    -2605
*capture*```

As we may observe, the (effectively unnoticeable) discontinuities of the previous implementation are gone and the integration produces a steady hyperbolic function with a smooth curvature that is a bit less steep than the one of version 3.1. Also, a ship is now captured at `20`/`20`, providing the reason, why two ships entering the heavy star at once would be rather warped to the antipode before colliding.

We might guess by the smothness of the graph drawn by Spacewar! 4.x that this was the result of some nifty integration, like the KR4 algorithm (Kutta-Runge order 4 integration) we see employed in games nowadays, as also suggested by the name of the macro `integrate`. But, at closer inspection, the algorithm is quite the same as in Spacewar! 3.1. As we've observed, the jagged graph of Spacewar! 3.1's gravity function is the result of a lack of precision, caused by all the multiplications, divisions, cubes, and square root computations, which left us with just a handfull of significant bits of the initial wealth of 18-bits words. What we're seeing in Spacewar! 4.1 accordingly, is a quest for computational precision.

As stated above, we won't strive for verbal detail, but rather do with inline-comments. However, let's see how it's done:

```	dzm \bx		// initially clear \bx and \by
dzm \by
szs 60		// check sense switch 6
jmp bsg		// deployed, no star, no gravity
lac i mx1	// pos x
dac \t1
mul \t1
scr 1s
dac \acx	// \acx = mx12 / 2  (high part)
cla
scr 2s
dio \iox	// \iox = mx12 / 2  (low part, 2 bits zero padding)
lac i my1	// pos y
dac \t1
mul \t1
scr 1s
dac \acy	// \acy = mxy2 / 2  (high part)
cla
scr 2s		// IO = mxy2 / 2    (low part, 2 bits zero padding)
swap
swap		// IO = (low (y2 / 2) + low (x2 / 2)) >> 2
scl 2s		// shift padding/carry back into AC
add \acy	// square distance (mx1 × mx1 + my1 × my1)```

This is actually not that different from what we've seen in Spacewar! 3.1. We're building a square distance in the accumulator, but now we're using the built-in fixed point multiplications rather than the integer multiply by BBN. Since the `mul` instruction returns a 34-bit result with a high order word in `AC` and a low order word in `IO`, there's something to gain. Actually, the whole handling of the low order results in IO just contributes to a carry for the square distance currently in `AC` at the end of this snippet.

Follows the check for the collision with the heavy star, like in Spacewar! 3.1. In version 3.1, there was an initial shift by 11 bits to the right applied to the x- and y-coordinates, transforming them from screen-coordinates to integers (8 bits) and another 3 bits for some computational reserve. Therefor, when `str` was `1` in version 3.1, it's now octal `100`, as `1` shifted by 6 bits to the left.

```	sub str		// str: 100
sma i sza-skp	// is it smaller than str?
jmp pof		// yes, we collide with the star, jump to pof
add str		// restore previous square distance```

Next, we compute the cube distance by multiplying the square distance by its square root, like in Spacewar! 3.1, but this time it's framed by the macors `varsft` and `undosft`:

```	varsft		// shift to AC justified left, 2 bits zero padding
dac \t1		// \t1 = square distance justified left
jda sqt		// get square root
mul \t1		// \t1 = cube distance
undosft		// undo previous justification```

As the names of the two macros suggest, they are about a variable shift and the respective reverse/undo operation:

```define
varsft
dzm \xys
dac \t1		// store AC in \t1
idx \xys	// \xys = 1
v2,	idx \xys	// increment \xys
lac \t1
scr 2s		// right shift \t1 into IO 2 bits
dac \t1		// store it
sza		// is it zero?
jmp v2+R	// no, redo
scr 2s		// do another shift
swap		// AC justified left, 2 bits zero padding
terminate	// \xys: number of steps```

To get the most out of the 18-bit precision available in `AC`, macro `varsft` inspects the value in `AC` in steps of a double shift to the right. As the value in `AC` eventually becomes zero, the value in `IO` is the significant 18 bits part of the initial value in the combined `AC` and `IO` registers. After an extra shift by two bits to the right (2 bits zero padding to allow for a sign and any carry) we swap this back, with now the significant bits in `AC`. Variable `xys` stores the number of double-shifts applied. (All the storing and restoring of the contents of the accumulator in `\t1` is just for the instruction `idx` destroying its contents by leaving the value resulting from the increment in `AC`.) What's now in `AC`, is the significant part of the initial value, justified left with a padding of two bits for some computational reserve.

Macro `undosft` isn't really applying the reverse operation, it adds another transformation:

```define
undosft
dac t1		// AC in t1
dio \t2		// IO in \t2
lac \xys	// get number of shift steps of varsft
dap .+1		// set up address of according shift
lac .		// load it and deposit it twice
dac .+6
dac .+6
xor (10000	/ change scr to scl or scl to scr.
dac \xyt	// store reverse shift in \xyt
lac \t1		// restore AC and IO
dio \t2
scr .		// apply the shifts
scr .
terminate

sft,	lac .-1
scr 7s
scr 6s
scr 5s
scr 4s
scr 3s
scr 2s
scr 1s
scr		// "scr 0" - no shift at all
scl 1s```

We may want to go a bit more into detail here: When `AC` was holding just up to two significant bits, we only applied a single double-shift in `varsft` and variable `xys` would be `2`. In this case, we would use the instruction at "`sft -1 + 2`", namely a combined shift to the right by 7 bits ("`scr 7s`"), just immediately below label `sft`, and apply this twice. In case we had to apply the full range of 9 double-shifts, we would apply the last "`scl 1s`" twice, and in case we applied 8 double-shifts, we won't apply any transformation at all.

As a notable side effect, we store the reverse operation of the selected shift instruction (by flipping the significant bit of the micro-code, instruction `scl` is `667xxx`, `scr` is `677xxx`) in variable `\xyt` for later use.

Back to main, we're scaling our gravitational factor according to sense switch settings:

```	scr 9s		// shift 17 bits left (low gravity 15)
scr 6s
szs i 20	/ switch 2 for light star
scr 2s
sza
jmp bsg		// overflow, apply no gravity at all
scr 1s		// shift by another bit (complete swap with full gravity)
dio \t1		// deposit IO in \t1```

As in version 3.1, low gravity will be a quarter of normal gravity (by the divisor being 4 times as big). At the end of the operation, we've applied a full swap (by 18 bits for full gravity) and store the factor in variable `\t1`.

Finally, we apply our gravitational factor, so delicately derived, to the position of the ship:

```	integrate mx1, \bx	// apply gravity for x, resulting delta in \bx
integrate my1, \by	// apply gravity for y, resulting delta in \by
bsg,	...```

While the first two macros were actually included just once and their being a macro was merely decoupeling the computations from the main flow, macro `integrate` is applied twice, once for the x-coordinate and once for the y-coordinate, storing the resulting delta in variables `\bx` and `\by` respectively. Here, we finally apply the reverse shift operation prepared in macro `undosft` to account for the shifts applied for left justification of the gravity factor, done to squeeze the most of precision out of the 18-bit words of the PDP-1. (We may recall that no shift will be applied, if the shift in `varsft` was exactly 8 bits to the right twice, and shifts will increase in number of bits to the left with a decreasing number of right shifts applied in `varsft`.)

```define
integrate A,B
cli
lac i A
scr 9s		// AC/IO >> 10 (screen coordinates to integer >> 2)
scr 1s
div \t1		// divide by gravity factor
hlt		// here on div overflow, else skipped
cma+cli-opr	// complement AC, clear IO
xct \xyt	// apply reverse shift ( "rcl 1s " ..  "scl 7s ")
xct grv		// apply "gravitational constant" (sar 6s)
dac B		// deposit result in B
terminate```

Since the positions in `mx1` and `my1` respectively represent an offset to the center origin, the `cma` instruction in "`cma+cli-opr`" reverts the delta in `\bx` and `\by` to the opposite direction, pulling the ship closer to the center when added to the current `dx/dy` movement vector. The code where this is applied, following below this at label `bsg`, is just the same as in version 3.1.

Generally, we may describe the procedures as converting the fixed point values (as derived from the format used by the display instruction: sign, 9 integer bits, 8 fractional bits) to floating point, proceding with the critical computations in floating point, and finally restoring the usual fixed point format.

As the comparative graph of the gravity computations in version 3.1 and 4.x shows, both functions produce nearly similar results in the mid and higher ranges. As might be expected, it's in the lower regions of the positional values that the gain in precision shows. Towards the origin, where the input values become smaller, significant bits are soon moved out of the range of precision without further preparations. But, with the data thus prepared as we've seen it here, computable information is still maintained as the function approaches the origin, as illustrated by the steady smoothness of the graph over the whole range of input data.

With a deeper analysis of Spacewar! 4.x gravity left as a home work — like arriving at an actual formula notation of the procedures involved —, we'll close here. (Note: This episode was originally published in back-to-school week.)
Next time we're going to explore the warfare in Spacewar, as in pyrotechnics and explosions.

*****

### Update: A Final Note on Gravity

In case anyone would doubt a formula based on `1/r3` to be suitable for a perfect simulation of gravity, here is a professional statement on the subject, provided by Magnus Olsson, Department of Theoretical Physics of the University of Lund, Sweden at an alt.folklore.computers-thread in 1993:

```>Did it ever occur to anyone that the gravity simulated in Space War (one of
>my all-time fav. computer games) is not correct
>                                    ~~~

Well, it has occured to me, but then I'm a physicist...

If classical Newtonian gravity were to wok in the same way in a 2-D
universe as in ours, the force wouldn't be proportional to 1/r^2 (r
being the distance from the "sun" to the spaceship) but as 1/r.

From this point of view, one could say that Space War doesn't
"correctly simulate gravity in a 2-D universe".

However, the matter is more complicated than this.

Firstly, Einsteinian gravity, i.e. general relativity, won't work at
all in 2 space dimensions. Either there is no gravitational force at
all, or the theory is radically different. That means that we can't
make any arguments by analogy with our familiar 3-D (or, rather,
(3+1)-D) world.

Secondly, even if one says taht we should stick to the analogy to
Newtonian gravity (i.e., to be more mathematically precise, that the
gravitational potential should be a solution to Poisson's equation in
2 dimensions) and use a force prop. to 1/r we would get problems. The
reason is that in a 1/r force field, there are no stable orbits
(except for the perfectly circular ones). This means that the space
ships would behave quite differently from the way they do in 3-space.

One might ask whether what one really wants is to simulate a truly
2-D world, or to simulate a 2-D cross section through our 3-D
universe. (I.e. we have the ordinary universe with some strange law
added that forces all space ships to keep in the plane of the ecliptic
all the time). All the space games I've seen that even attempt to have
realistic physics (most, alas, don't – space ships stop dead as soon
as you release the thrustre button, and similar atrocities) do the
latter thing.

IMHO this is a good solution, since it's the best simulation you can
get of "real" spaceflight on a 2-D screen. I find it far easier to
swallow that I'm forbidden by divine law to leave the ecliptic, but
that I can still have my familiar elliptical orbits, than to have the
strange, non-closed orbits that I would get in a true 2-D world (with
Newtonian gravity). But this is of course a matter of taste.

Magnus Olsson
Department of Theoretical Physics
University of Lund, Sweden```

(Magnus Olsson, alt.folklore.computers, October 11, 1993)

*****

Vienna, September 2014
www.masswerk.at

In case you would have found any errors or inconsistencies, or would have additional information,