- Select your preferred color scheme by clicking the options "Dark" and "Light" at the top right.
- Paste, drag-and-drop, or upload (using the file dialog button) the source into the source field.
- Click the button "Assemble".
- Observe the listing and edit the source as necessary.
- Copy the resulting object code for further use.
(Uncheck the option "show address" for the pure byte sequence.)
A successful assembly will also reveal a button to transfer the code to the emulator.
This is a simple 2-pass MOS 6502 assembler optimized for compatibility and to accept a broad variety of syntax styles. The general idea is that it should work with any code of somewhat conventional and/or sane format.
A first pass determines instruction lengths and addresses, while the second pass resolves final values and generates the object code (the machine language program).
In the listing, the first pass shows how the assembler "sees" the source, while the second pass represents what the assembler actually resolves and lists this in a normalized format, which is close to the original MOS notation.
A special "BBC Micro mode" activates additional features to provide compatibility with the syntax used for embedded assembler code in BBC BASIC, while it also applies some restriction to the general format.
The assembler supports common 6502 assembler syntax styles. Mind that there must be a seperating white space between labels, opcodes, and any operands. Operands, on the other hand, must not contain any white space. Operands may be simple numeric values, defined symbols, instruction labels, or complex expressions. Compare the 6502 Instruction Set for instruction details and addressing modes.
Here, we use "HHLL" to represent a word-sized 16-bit operand, "LL" for a single-byte addresses, and "BB" for any othe byte-sized operands. (In actuality, these may be any simple or complex expressions.)
- immediate, no operand.
- ROR A
- instruction with accumulator as the operand.
- same as above. "A" is optional and may be omitted.
- LDA #BB
- immediate mode, loading the literal value.
- LDA HHLL
- absolute, loads the value from the provided memory address.
- LDA HHLL,X
- absolute, X-indexed.
- LDA HHLL,Y
- absolute, Y-indexed.
- LDA LL
- zero-page address mode (with automatic address mode detection).
- LDA *LL
- forced zero-page address mode in the style of the original MOS assembler.
- LDA.b LL
- forced zero-page address mode, modern byte-size notation.
- LDA.w LL
- forced absolute address mode, modern word-size notation.
- LDA LL,X
- zero-page, X-indexed.
- LDA LL,Y
- zero-page, Y-indexed.
- LDA (LL,X)
- X-indexed, indirect.
- LDA (LL),Y
- indirect, Y-indexed.
- LDA (LL)Y
- indirect, Y-indexed, old MOS format (no comma).
- JMP (HHLL)
- indirect address.
- BEQ HHLL
- relative addresses (-127 ≤ offset ≤ +127) are computed from absolute target addresses.
- comments start with a semicolon and extend to the end of the line.
- in BBC mode, comments inside assembler blocks start with a backslash and extend to the end of the line or to the next colon ("
The assembler is generally case-insensitive, with the exception of strings and character literals.
Generally, there may be just a single instruction on each line. (In BBC mode, however, there may be multiple instructions on a line, separated by the BASIC colon oporator.)
Values and Numeric Representations
The assembler supports a variety of number formats:
- character value of "A" (
(In BBC mode, "
$" is reserved for the BASIC indirection operator and "
&" denotes any hex-numbers.)
Anywhere a value mayn occur this may be a complex expression as well. Expressions may include addition, subtraction, multiplication, divisions, and unary minus (
There are also the two special unary byte operators "
- low-byte value
- high-byte value
Expressions are evaluated strictly from left to right, without precedence, but may be grouped using square brackets (
- 9 (
1+2 => 3, 3*3 => 9)
- 7 (
[2*3] => 6, 1+6 => 7)
Expressions may include defined symbols and instruction labels.
Mind that there must not be any white space inside an expression.
BBC mode adds an emulation of the string related BASIC functions "
ASC(<string-expr.>)" and "
LENGTH(<string-expr.>)" outside of assembler blocks, and the construct '
ASC"<character>"' inside assembler blocks.
The Program Counter
The program counter (also PC or location counter) represents the memory address of the current instruction. Outside of an instruction, it represents the address, where the next instruction will be inserted. There are several ways to address the prorgam counter:
- * = $1234
- the asterisk represents the "native" (MOS) format. Assigning to it sets the program counter.
- BEQ *+2
- the asterisk may be used in expressions as well.
- * = *+4 $EA
- when assigning to the program counter, an optional second argument specifies a fill-byte to be applied to any gaps. Here, we advance the program counter by 4 locations and fill the gap with
- P% = P%+2
- the BBC BASIC-style symbol "
P%" may be used synonymously to the asterisk anywhere the former may occur.
- .ORG $1234
- the more modern-style directive "
.ORG" may be used for setting PC, as well. (However, you can't use ist in an expression.)
- .ORG = $1234
- you may use
.ORGin assignment style, as well.
- .ORG EQU $1234
- generally, "
EQU" may be used as the assignment operator, as well.
(Mind that there must be white-space around "
EQU" in order for it to be recognized as a token, which is not a requirement with "
- .RORG $1234
- synonym to "
.ORG" (in many assemblers you are not allowed alter the origin set by "
.ORG" and this is meant to provide compatibility.)
- BEQ .+2
- in expressions, a dot (
.) may be used synonymously for the asterisk. However, you can not assign to it. (Strictly speaking, this is local context, but, while the assembler doesn't implement macros, it's the same anyway.)
In order to ensure comaptibility, in BBC mode only "
P%" is available.
Labels and Symbols
Instruction labels and defined symbols start with a letter character or underscore and may contain, letters, digits, or the undescore. Only the first 8 characters are significant. (In BBC mode, a symbol may optionally end with a percent sign, "
%", like a BASIC integer variable.)
Instruction labels may precede an instruction or may be the only entity on a line. They may be optionally end in a trailing colon. Labels may be used anywhere in an expression:
- LOOP LDA A,X
- declares the instruction label
- LOOP: LDA A,X
- labels may end in a colon (optional).
- BEQ LOOP
- using a label as an address value.
- .LOOP LDA,X
- In BBC mode, labels are declared with a preceding dot. However, there is no dot, when used as a value (compare the above example).
Symbols are declared by an assignment and may be used as values anywhere. (In BBC mode, you may not declare a symbol/variable inside an assembler block.)
- TEST = $2000
- declares the symbol “
- TEST EQU $2000
EQU” may be used synonymously.
- C = *+[TEST*2]
- assignments may be complex expressions.
Note on hexadecimal values and automatic zero-page mode
Any numeric values provided by at least 4 hexadecimal digits, where the two leading digits are zeros, will be considered to be of word-size and will effect absolute address modes, when used in ambiguuos context. This "word-size tainting" also propagates to expressions and assignments. (E.g., defining the symbol "
C" by "
C = 0x0002" and using this in "
LDA C+2" will result in a word-sized, absolute instruction, while the effective value is well inside single-byte range. Defining
C as "
0x02", on the other hand, would have resulted in a zero-page address mode instruction.)
If a label or symbol yet undefined is encountered in a value expression in pass #1, a word-size format will be automatically assumed and addresses will be reserved accordingly. If it is still undefined in pass #2, an error will be thrown. (In assignments to the program counter, however, an expression must resolve in pass #1 already, otherwise the assembly fails.)
Pragmas and Directives
Pragmas and directives start generally with a dot and must be the only entity on a line.
Directives for embedding data:
- .BYTE 1, $02
- embeds a single byte or a list of bytes at the current location. Lists are sperated by white-space and/or colons. (An optional "
#", preceding any values, is ignored.) Values may be complex expressions, as well.
- .DBYTE $12EF
- embeds a double byte (in memory order, big endian). This inserts the bytes
$EFat the current location. "
.DBYTE" takes a list of values, as well.
- .WORD $12EF
- embeds a word (HHLL order, little endian). This inserts the bytes
$12at the current location. Again, this may be also provided a list of values.
- .BYT $01
- synonym for "
.BYTE" as used by some assemblers.
- .DBYT $12EF
- synonym for "
.DBYTE" as used by some assemblers.
- .TEXT "Abc"
- embeds a text literal (case-sensitive) using the current encoding (see below).
- .ASCII "Abc"
- embeds a text literal (case-sensitive) using ASCII encoding.
- .PETSCI "Abc"
- embeds a text literal (case-sensitive) using Commodore 8-bit encoding.
- .PETSCR "Abc"
- embeds a text literal (case-sensitive) as Commodore 8-bit screen codes.
- .C64SCR "Abc"
- as above (synonym).
Directives for aligning code or filling space:
- .ALIGN $100
- advances the program counter to the next multiple of the value provided (here, we align to the next memory page). Any gaps will be filled by zero. If no argument is provided "
.ALIGN" aligns to the next even memory location.
- .ALIGN $100 $EA
- an optional second byte may spefiy a byte value to be used to fill any gaps (here
NOP", as used by most Commodre 8-bit machines).
- .FILL $20 $EA
- fill the next n bytes using the value provided by the second argument. If no second argument is providing, zero will be used as the fill-byte.
- .REPEAT n
- repeats the instruction or directive following this directive on the same line n times. An optional "
STEP" parameter defines an increment to be applied to the repeat-counter on each iteration (default
1). The repeat-counter is accessibly as "
.REPEAT 26 .BYTE 'A+R%will fill the next 26 memory locations with the letters of the alphabet.
ODD_NUMS ;generate list of odd numberswill fill the next 5 memory locations with the odd number series 1,3,5,7,9.
.REPEAT 5 STEP 2 .BYTE 1+R%
And this will fill the next 6 bytes by the sequence
.REPEAT 3 STEP 2 *=*+2 R% ;PC += 2, fill-byte R%
- ends the source code, any remaining text is ignored. (optional)
- inserts a blank line in the listing (pass #2). This is mostly for compatibility.
- inserts a blank line and a page number in the listing (pass #2). Any comment found at the head of the source code will be used as a title. Again, this is mostly for compatibility.
- any such directive is ignored in order to ensure compatibility with symbol tables used by the companion disassembler.
In BBC mode, directives do not have a preceding dot and are available only inside assembler blocks. The following directives are available in BBC mode (these are actually additions to the original BBC BASIC and were introduced with Level II, 1982 issue):
- EQUB &01, &02
- inserts a byte value (may be a comma-separated list).
- EQUW &12EF
- inserts a word or a comma-sperated list of words.
- EQUD &11223344
- inserts a series of 4 bytes, starting at the highest significant one. Here, we insert the byte series
0x44, starting at the current memory location.
- EQUS "Abc"
- inserts a text literal (case-sensitive). This must be a single item (not a list), but may be a string concatenation, including the BASIC function "
EQUS CHR$(34)+"foo"+"bar"+CHR$(34)will insert the character sequence '
- in BBC mode "
ALIGN" does not take any arguments and aligns the location counter "
P%" to the next multiple of 4. Any gaps are filled with
0xFF. (This is an even later addition to the standard.)
Options are a special set of directives switching the behavior of the assembler. Like other pragmas, they start with a dot (
- .OPT WORDA
- switches automatic zero-page detection for address modes off. All addresses default to word-size and zero-page address modes must be specified manually by a leading asterisk ("
*") or the byte extension ("
.b"). Use this for fine grain control and/or compatibility with old sources.
- .OPT ZPGA
- switches automatic zero-page detection to on (default).
- .OPT ZPA
- synonym to option "
- .OPT ILLEGALS
- enables support for “illegal” op-codes (see below).
- .OPT LEGALS
- disables support for “illegal” op-codes (default).
- .OPT NOILLEGALS
- synonym to option "
- .OPT ASCII
- set character encoding for “
.TEXT”-directives and character literals to ASCII (default).
- .OPT PETSCII
- set the default character encoding to PETSCII.
- .OPT PETSCR
- set the default character encoding to Coomodore 8-bit screen characters.
- .OPT C64SCR
- synonym to option "
Further, the following options (mostly used by MOS assemblers) are recognized for compatibility, but are otherwise ignored:
In BBC mode, options evaluating to numeric values (BBC BASIC reprorting levels) are ignored. Other, you may use any of the above options, but whitout the leading dot (e.g., "
This assembler is all about a quick assembly session without worrying too much about the specific syntax (starting with the format of the very first MOS cross-assembler and extending to more modern styles). As long as you do not require macros or conditional assembly, you should be able to throw about any style of source code at it.
E.g., the following examples are semantically identical and produce the same object code:
;MOS/traditional * = $4000 TARGET = $20 LDY *$20 LOOP LDA $0080,Y ROL A STA (TARGET)Y DEY BNE LOOP RTS .END
;modern style .ORG 0x4000 TARGET EQU 0xC0 LDY.b 0x20 LOOP: LDA.w 0x80,Y ROL STA (TARGET),Y DEY BNE LOOP RTS .END
\BBC Micro mode P% = &4000 TARGET = &C0 [ LDY &20 .LOOP LDA &80,Y ROL STA (TARGET),Y DEY:BNE LOOP RTS ] END
Here is an example for a complete assembly of a short source: [ try it ]
;fill a page with bytes, ;preserve program *=$800 start ldx #offset loop txa sta start,x inx bne loop brk ;insert bytes here offset=*-start .end
0800: A2 0A 8A 9D 00 08 E8 D0 0808: F9 00
Press "Show Memory" and activate "live update" in the emulator.
pass 1 LINE LOC LABEL CARD 1 ;fill a page with bytes, 2 ;preserve program 4 0800 * = $800 6 0800 START 7 0800 LDX #OFFSET 8 0802 LOOP TXA 9 0803 STA START,X 10 0806 INX 11 0807 BNE LOOP 12 0809 BRK 14 ;insert bytes here 15 OFFSET = *-START 16 .END symbols LOOP $0802 OFFSET $0A START $0800 pass 2 LOC CODE LABEL INSTRUCTION ;fill a page with bytes, ;preserve program 0800 * = $0800 0800 START 0800 A2 0A LDX #$0A 0802 8A LOOP TXA 0803 9D 00 08 STA $0800,X 0806 E8 INX 0807 D0 F9 BNE $0802 0809 00 BRK ;insert bytes here OFFSET = $0A .END done (code: 0800..0809).
About BBC Micro Compatibility Mode
This special mode (enabled by the checkbox "BBC Micro mode") adds basic compatibility for the syntax of the BBC BASIC embedded assembler. This is not an attempt at a complete or strict emulation. You may even use some of the features of standard mode, where these do not conflict with the BBC syntax, e.g., "
0x..." style number formats or character literals like "
'A". Moreover, you may use the assembler even without assembly explicit blocks ("
[…]") for a quick session. (However, some features, like assembler directives, are restricted to assembler blocks.)
BASIC line numbers are recognized, but simply ignored and are not evaluated for sequence. Similarly,
FOR-NEXT loops are ignored (the loop variable will be registered, but will always evaluate to zero), as are
REM statements are allowed outside assembler blocks and the colon ("
:") may be used as a statement/instruction separator anywhere in the code. (As opposed to standard mode, there may be multiple statements or instructions on a line.)
There is also support for BBC BASIC indirection operators outside of assembler blocks. These are "
?M" (query), "
$M" (dollar), "
!M" (pling) with their variations "
M?N" and "
M$N" for the left-hand side of assignements. These will be mostly used with "
P%" for the purpose of embedding data. "
$P%" is also supported for the right-hand side, including an emulation of the BASIC function "
There is basic support for string concatenations and the string related BASIC functions "
LENGTH()", and "
STRING$()". E.g, you may things like,
L = (LEN($P%+"foo"+"bar"+CHR$(33))+4)*2+ASC("A")
(These string and expressions related features are implemented as preprocessors. While errors will be reported in detail, the listing of a successful assembly will onyl include values as returned by these preprocessors.)
Besides the support of BASIC indirection operators, there's also support for the directives
EQUS inside assembler block, as well as for
ASC"<char>". (So you may embed any data using your preferred method.)
However, there are some unsupported features, as well,
- The assembler is still case-insensitive. (Variable "
TEST" is the same as variable "
- The assembler doesn't "know" about BASIC keywords. (Variables may have the names of BASIC keywords. This kind of works around the case issue.)
- There is no support for any BASIC functions, besides the string functions mentioned above.
- Expressions are still evaluated strictly from left to right, without precedence, using square brackets for grouping. (However, you may use normal parentheses in assignments.)
- There is no support for string variables, besides an emulation of the indirection "
- You may not reassign values to a variable that has been defined before, with the notable exception of the special variables "
P%" and "
Still, there's sufficient support to successfully assemble a source like this:
100 REM THIS IS BASIC 110 P% = &4000 : REM SET LOCATION COUNTER 120 TARGET = &4100 : REM A VARIABLE 130 FOR C=0 TO 2 STEP 2 : REM PASS LOOP (ignored) 140 [OPT C \now in asm mode 150 LDX #&20 \copy 32 bytes 160 .LOOP LDA SOURCE,X 170 STA TARGET,X 180 DEX:BNE LOOP \two in a row! 190 .SOURCE 200 ]NEXT : REM BACK TO BASIC 210 $P%=":-) "+STRING$(28,"*") 220 P%=P%+LEN($P%) : REM UPDATE LOCATION 230 END
Support for "illegal" opcodes (undefined instructions) is enabled by the pragma "
The following mnemonics are implemented (supported synonyms in parentheses):
|opc (synonyms) imp imm abs abX abY zpg zpX zpY inX inY|
|ALR (ASR) | 4B ||
|ANC | 0B ||
|ANC2 | 2B ||
|ANE (XAA) | 8B ||
|ARR | 6B ||
|DCP (DCM) | CF DF DB C7 D7 C3 D3 ||
|ISC (ISB, INS) | EF FF FB E7 F7 E3 F3 ||
|LAS (LAR, LAE) | BB ||
|LAX (ATX) | AB AF BF A7 B7 A3 B3 ||
|LXA (LAX imm) | AB ||
|RLA | 2F 3F 3B 27 37 23 33 ||
|RRA | 6F 7F 7B 67 77 63 73 ||
|SAX (AXS, AAX) | 8F 87 97 83 ||
|SBX | CB ||
|SHA (AXA, AHX) | 9F 93 ||
|SHX | 9E ||
|SHY (SAY, SYA) | 9C ||
|SLO (ASO) | 0F 1F 1B 07 17 03 13 ||
|SRE (LSE) | 4F 5F 5B 47 57 43 53 ||
|TAS (SHS, XAS) | 9B ||
|USBC | EB ||
|NOP | EA 80 0C 1C 04 14 ||
|DOP (SKB) | 80 04 14 ||
|TOP (SKW) | 0C 1C ||
|JAM (HLT, KIL) | 02 ||
NOP: There are several
NOPinstructions, but these are the ones commonly used.
NOP" (single-byte operand/address, 2 bytes total).
NOP" (word-address, 3 bytes total).
JAM: freezes the CPU (again, there are several equivalent instructions).
This application is provided for free and AS IS, therefore without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. Use at own risk.
This application uses either Web Storage technology or, if this is not available, a cookie to store your choice for the preferred color mode for the virtual 6502 suite of application. (Either the word "light" or the word "dark" is stored.)
You may be also interested in…
Data Transfer Using BASIC
An 8-bit machine usually comes with BASIC on board. You may use this to transfer the object code to memory using DATA statements. See this little tool to generate such code from the output of the assembler (copy & paste the object code): bytes2basic.html.
- 6502.org — the 6502 microprocessor resource
- visual6502.org — visual transistor-level simulation of the 6502 CPU
- www.65xx.com — official 65xx Website (The Western Design Center Inc.)
© Norbert Landsteiner 2005–2021, mass:werk