|i||Show in Emulator|
- Select your preferred color scheme by clicking the options "Dark" and "Light" at the top right.
- Paste, drag-and-drop, or upload (using the file dialog button) the source into the source field.
- Click the button "Assemble".
- Observe the listing and edit the source as necessary.
- Copy the resulting object code for further use.
(Uncheck the option "show address" for the pure byte sequence.)
You can convert the hex-byte sequence to a BASIC loader program here.
- A successful assembly will also reveal a button to transfer the code to the emulator and another button to download the object file. Choosing the file download will reveal a dialog with options for a standard binary file or a Commodore PRG-file. Select your preferred format and click "Generate", then either click the link to download the file to your download folder or right-click the link and select "Save as…" (the exact wording depends on the browser) from the context menu.
This is a simple 2-pass MOS 6502 assembler optimized for compatibility and to accept a broad variety of syntax styles. The general idea is that it should work with any code of somewhat conventional and/or sane format.
A first pass determines instruction lengths and addresses, while the second pass resolves final values and generates the object code (the machine language program).
In the listing, the first pass shows how the assembler "sees" the source, while the second pass represents what the assembler actually resolves and lists this in a normalized format, which is close to the original MOS notation.
A special "BBC Micro mode" activates additional features to provide compatibility with the syntax used for embedded assembler code in BBC BASIC, while it also applies some restriction to the general format.
The assembler supports common 6502 assembler syntax styles. Mind that there must be a seperating white space between labels, opcodes, and any operands. Operands, on the other hand, must not contain any white space. Operands may be simple numeric values, defined symbols, instruction labels, or complex expressions. Compare the 6502 Instruction Set for instruction details and addressing modes.
Here, we use "HHLL" to represent a word-sized 16-bit operand, "LL" for a single-byte addresses, and "BB" for any other byte-sized operands. (In actuality, these may be any simple or complex expressions.)
- immediate, no operand.
- ROR A
- instruction with accumulator as the operand.
- same as above. "A" is optional and may be omitted.
- LDA #BB
- immediate mode, loading the literal value.
- LDA HHLL
- absolute, loads the value from the provided memory address.
- LDA HHLL,X
- absolute, X-indexed.
- LDA HHLL,Y
- absolute, Y-indexed.
- LDA LL
- zero-page address mode (with automatic address mode detection).
- LDA *LL
- forced zero-page address mode in the style of the original MOS assembler.
- LDA.b LL
- forced zero-page address mode, modern byte-size notation.
- LDA.w LL
- forced absolute address mode, modern word-size notation.
- LDA LL,X
- zero-page, X-indexed.
- LDA LL,Y
- zero-page, Y-indexed.
- LDA (LL,X)
- X-indexed, indirect.
- LDA (LL),Y
- indirect, Y-indexed.
- LDA (LL)Y
- indirect, Y-indexed, old MOS format (no comma).
- JMP (HHLL)
- indirect address.
- BEQ HHLL
- relative addresses (-127 ≤ offset ≤ +127) are computed from absolute target addresses.
- comments start with a semicolon and extend to the end of the line.
- in BBC mode, comments inside assembler blocks start with a backslash and extend to the end of the line or to the next colon ("
Further, in BBC mode, immediate addressing may be denoted by a "
@" prefix as in Acorn Atom BASIC:
- LDA @BB
- alternative immediate adressing mode in Acorn Atom syntax (BBC mode only).
The assembler is generally case-insensitive, with the exception of strings and character literals.
Generally, there may be just a single instruction on each line. (In BBC mode, however, there may be multiple instructions on a line, separated by the BASIC colon oporator.)
Values and Numeric Representations
The assembler supports a variety of number formats:
- character value of "A" (
(In BBC mode, "
$" is reserved for the BASIC indirection operator and "
&" denotes any hex-numbers.)
Anywhere a value mayn occur this may be a complex expression as well. Expressions may include addition, subtraction, multiplication, divisions, and unary minus (
There are also the two special unary byte operators "
- low-byte value
- high-byte value
Expressions are evaluated strictly from left to right, without precedence, but may be grouped using square brackets (
- 9 (
1+2 => 3, 3*3 => 9)
- 7 (
[2*3] => 6, 1+6 => 7)
Expressions may include defined symbols and instruction labels.
Mind that there must not be any white space inside an expression.
BBC mode adds an emulation of the string related BASIC functions "
ASC(<string-expr.>)" and "
LENGTH(<string-expr.>)" outside of assembler blocks, and the construct '
ASC"<character>"' inside assembler blocks.
The Program Counter
The program counter (also PC or location counter) represents the memory address of the current instruction. Outside of an instruction, it represents the address, where the next instruction will be inserted. There are several ways to address the prorgam counter:
- * = $1234
- the asterisk represents the "native" (MOS) format. Assigning to it sets the program counter.
- BEQ *+2
- the asterisk may be used in expressions as well.
- * = *+4 $EA
- when assigning to the program counter, an optional second argument specifies a fill-byte to be applied to any gaps. Here, we advance the program counter by 4 locations and fill the gap with
- P% = P%+2
- the BBC BASIC-style symbol "
P%" may be used synonymously to the asterisk anywhere the former may occur.
- .ORG $1234
- the more modern-style directive "
.ORG" may be used for setting PC, as well. (However, you can't use ist in an expression.)
- .ORG = $1234
- you may use
.ORGin assignment style, as well.
- .ORG EQU $1234
- generally, "
EQU" may be used as the assignment operator, as well.
(Mind that there must be white-space around "
EQU" in order for it to be recognized as a token, which is not a requirement with "
- .RORG $1234
- synonym to "
.ORG" (in many assemblers you are not allowed alter the origin set by "
.ORG" and this is meant to provide compatibility.)
- BEQ .+2
- in expressions, a dot (
.) may be used synonymously for the asterisk. However, you can not assign to it. (Strictly speaking, this is local context, but, while the assembler doesn't implement macros, it's the same anyway.)
In order to ensure comaptibility, in BBC mode only "
P%" is available.
Labels and Symbols
Instruction labels and defined symbols start with a letter character or underscore and may contain, letters, digits, or the undescore. Only the first 8 characters are significant. (In BBC mode, a symbol may optionally end with a percent sign, "
%", like a BASIC integer variable.)
Instruction labels may precede an instruction or may be the only entity on a line. They may be optionally end in a trailing colon. Labels may be used anywhere in an expression:
- LOOP LDA A,X
- declares the instruction label
- LOOP: LDA A,X
- labels may end in a colon (optional).
- BEQ LOOP
- using a label as an address value.
- .LOOP LDA,X
- In BBC mode, labels are declared with a preceding dot. However, there is no dot, when used as a value (compare the above example).
@" prefix for further compatibility (not applicable in BBC mode):
- @LOOP LDA,X
- Unless in BBC mode, labels may be declared an optional "
- @LOOP: LDA,X
- Same as above, but using a trailing colon.
- BNE @LOOP
- Unless in BBC mode, labels may be referred to using an optional "
Symbols are declared by an assignment and may be used as values anywhere. (In BBC mode, you may not declare a symbol/variable inside an assembler block.)
- TEST = $2000
- declares the symbol “
- TEST EQU $2000
EQU” may be used synonymously.
- C = *+[TEST*2]
- assignments may be complex expressions.
Mind that — like with most assemblers — you may not redefine or reuse any symbols or labels per default. However, you may change this behavior by setting option "
REDEF" (see below). In BBC mode, you may reassign symbols (BASIC variables) outside of an assembler block out of the box.
Note on hexadecimal values and automatic zero-page mode
Any numeric values provided by at least 4 hexadecimal digits, where the two leading digits are zeros, will be considered to be of word-size and will effect absolute address modes, when used in ambiguuos context. This "word-size tainting" also propagates to expressions and assignments. (E.g., defining the symbol "
C" by "
C = 0x0002" and using this in "
LDA C+2" will result in a word-sized, absolute instruction, while the effective value is well inside single-byte range. Defining
C as "
0x02", on the other hand, would have resulted in a zero-page address mode instruction.)
If a label or symbol yet undefined is encountered in a value expression in pass #1, a word-size format will be automatically assumed and addresses will be reserved accordingly. If it is still undefined in pass #2, an error will be thrown. (In assignments to the program counter, however, an expression must resolve in pass #1 already, otherwise the assembly fails.)
Anonymous (Temporary) Labels
The assembler also supports anonymous labels for temporary branch and jump targets:
Just mark an instruction by "
!" or "
:" (empty label) and refer to this mark by either "
!+" (or "
:+") for the next anonymous label as a target or by "
!-" (or "
:-") for the previous one. You may refer to a target further away by repeating "
+" or "
-". E.g., "
BNE !--" branches to the second anonymous label before the insertion point. Mind that this counts anonymous labels and not addresses.
! START LDA #0 ;first anonymous label ;anonymous labels may precede a normal label LDX #0 ! ;just mark this address : STA $1000,X ;third label (same address), we may use ":" as well INX BNE !- ;select the closest previous anonymous label JMP :--- ;jump back 3 anonymous labels (same as START) ;again, ":" and "!" are synonymous
This will assemble to (with anonymous labels listed in a column of their own):
LOC CODE LABEL INSTRUCTION 0800 A9 00 ! START LDA #$00 0802 A2 00 LDX #$00 0804 ! 0804 9D 00 10 ! STA $1000,X 0807 E8 INX 0808 D0 FA BNE $0804 080A 4C 00 08 JMP $0800
This feature is not available in BBC mode and is only supported for branch instructions and absolute jump targets. An anonymous target must be the sole operand and cannot be used in an arithmetic expression.
Note: Anonymous labels are not listed in symbol tables.
Pragmas and Directives
Pragmas and directives start generally with a dot and must be the only entity on a line.
Directives for embedding data:
- .BYTE 1, $02
- embeds a single byte or a list of bytes at the current location. Lists are sperated by white-space and/or colons. (An optional "
#", preceding any values, is ignored.) Values may be complex expressions, as well.
- .DBYTE $12EF
- embeds a double byte (in LLHH memory order, little endian). This inserts the bytes
$EFat the current location. "
.DBYTE" takes a list of values, as well.
- .WORD $12EF
- embeds a word (HHLL order, big endian). This inserts the bytes
$12at the current location. Again, this may be also provided a list of values.
- .BYT $01
- synonym for "
.BYTE" as used by some assemblers.
- .DBYT $12EF
- synonym for "
.DBYTE" as used by some assemblers.
- .TEXT "Abc"
- embeds a text literal (case-sensitive) using the current encoding (see below).
- .ASCII "Abc"
- embeds a text literal (case-sensitive) using ASCII encoding.
- .PETSCII "Abc"
- embeds a text literal (case-sensitive) using Commodore 8-bit encoding.
- .PETSCR "Abc"
- embeds a text literal (case-sensitive) as Commodore 8-bit screen codes.
- .C64SCR "Abc"
- as above (synonym).
Directives for aligning code or filling space:
- .ALIGN $100
- advances the program counter to the next multiple of the value provided (here, we align to the next memory page). Any gaps will be filled by zero. If no argument is provided "
.ALIGN" aligns to the next even memory location.
- .ALIGN $100 $EA
- an optional second byte may spefiy a byte value to be used to fill any gaps (here
NOP", as used by most Commodre 8-bit machines).
- .FILL $20 $EA
- fill the next n bytes using the value provided by the second argument. If no second argument is providing, zero will be used as the fill-byte.
- .REPEAT n
- repeats the instruction or directive following this directive on the same line n times. An optional "
STEP" parameter defines an increment to be applied to the repeat-counter on each iteration (default
1). The repeat-counter is accessibly as "
.REPEAT 26 .BYTE 'A+R%will fill the next 26 memory locations with the letters of the alphabet.
ODD_NUMS ;generate list of odd numberswill fill the next 5 memory locations with the odd number series 1,3,5,7,9.
.REPEAT 5 STEP 2 .BYTE 1+R%
And this will fill the next 6 bytes by the sequence
.REPEAT 3 STEP 2 *=*+2 R% ;PC += 2, fill-byte R%
- ends the source code, any remaining text is ignored. (optional)
- inserts a blank line in the listing (pass #2). This is mostly for compatibility.
- inserts a blank line and a page number in the listing (pass #2). Any comment found at the head of the source code will be used as a title. Again, this is mostly for compatibility.
- any such directive is ignored in order to ensure compatibility with symbol tables used by the companion disassembler.
Special directives for Commodore BASIC:
- Generates a short BASIC program, consisting of optional REM-lines and a line with a "
SYS" command, jumping to the next available address immediately following this BASIC text (which starts at
0x0401, the BASIC start address off the Commodore PET). The program counter will be advanced to this start address automatically.
Without any arguments, just a line with the
SYScommand will be generated, using the current year as the line number:
.PETSTART > 2021 SYS 1038If a first, numeric argument is provided, this will be used as a line number for the line holding the
.PETSTART 10 > 10 SYS 1038If a string argument is provided, the assembler will generate a heading line with line number "
0" and a
REMstatement using this string. If a list of strings (separated by white-space and optionally commas) is provided, or a string contains a line-break ("
REMlines will be generated:
.PETSTART 2001 "*** a program ***", "(c) example.com" > 0 REM *** A PROGRAM *** > 1 REM (c) EXAMPLE.COM > 2001 SYS 1084(Mind that lower case letters will appear as upper-case and upper-case letters as graphics characters in standard PETSCII upper-case/graphics mode.)
- Same as "
.PETSTART" (see above), but arranging for a BASIC start address of
0x0801, suitable for the C64.
In BBC mode, directives do not have a preceding dot and are available only inside assembler blocks. The following directives are available in BBC mode (these are actually additions to the original BBC BASIC and were introduced with Level II, 1982 issue):
- EQUB &01, &02
- inserts a byte value (may be a comma-separated list).
- EQUW &12EF
- inserts a word or a comma-sperated list of words.
- EQUD &11223344
- inserts a series of 4 bytes, starting at the highest significant one. Here, we insert the byte series
0x44, starting at the current memory location.
- EQUS "Abc"
- inserts a text literal (case-sensitive). This must be a single item (not a list), but may be a string concatenation, including the BASIC function "
EQUS CHR$(34)+"foo"+"bar"+CHR$(34)will insert the character sequence '
- in BBC mode "
ALIGN" does not take any arguments and aligns the location counter "
P%" to the next multiple of 4. Any gaps are filled with
0xFF. (This is an even later addition to the standard.)
Options are a special set of directives switching the behavior of the assembler. Like other pragmas, they start with a dot (
- .OPT WORDA
- switches automatic zero-page detection for address modes off. All addresses default to word-size and zero-page address modes must be specified manually by a leading asterisk ("
*") or the byte extension ("
.b"). Use this for fine grain control and/or compatibility with old sources.
- .OPT ZPGA
- switches automatic zero-page detection to on (default).
- .OPT ZPA
- synonym to option "
- .OPT ILLEGALS
- enables support for “illegal” op-codes (see below).
- .OPT LEGALS
- disables support for “illegal” op-codes (default).
- .OPT NOILLEGALS
- synonym to option "
- .OPT REDEF
- allows symbols and labels to be redefined / reused.
- .OPT NOREDEF
- reuse of symbols is not allowed and will throw an error (default).
- .OPT ASCII
- set character encoding for “
.TEXT”-directives and character literals to ASCII (default).
- .OPT PETSCII
- set the default character encoding to PETSCII.
- .OPT PETSCR
- set the default character encoding to Coomodore 8-bit screen characters.
- .OPT C64SCR
- synonym to option "
Further, the following options (mostly used by MOS assemblers) are recognized for compatibility, but are otherwise ignored:
In BBC mode, options evaluating to numeric values (BBC BASIC reprorting levels) are ignored. Other, you may use any of the above options, but whitout the leading dot (e.g., "
This assembler is all about a quick assembly session without worrying too much about the specific syntax (starting with the format of the very first MOS cross-assembler and extending to more modern styles). As long as you do not require macros or conditional assembly, you should be able to throw about any style of source code at it.
E.g., the following examples are semantically identical and produce the same object code:
;MOS/traditional * = $4000 TARGET = $20 LDY *$20 LOOP LDA $0080,Y ROL A STA (TARGET)Y DEY BNE LOOP RTS .END
;modern style .ORG 0x4000 TARGET EQU 0xC0 LDY.b 0x20 LOOP: LDA.w 0x80,Y ROL STA (TARGET),Y DEY BNE LOOP RTS .END
\BBC Micro mode P% = &4000 TARGET = &C0 [ LDY &20 .LOOP LDA &80,Y ROL STA (TARGET),Y DEY:BNE LOOP RTS ] END
Here is an example for a complete assembly of a short source: [ try it ]
;fill a page with bytes, ;preserve program *=$800 start ldx #offset loop txa sta start,x inx bne loop brk ;insert bytes here offset=*-start .end
0800: A2 0A 8A 9D 00 08 E8 D0 0808: F9 00
Press "Show Memory" and activate "live update" in the emulator.
pass 1 LINE LOC LABEL PICT 1 ;fill a page with bytes, 2 ;preserve program 4 0800 * = $800 6 0800 START 7 0800 LDX #OFFSET 8 0802 LOOP TXA 9 0803 STA START,X 10 0806 INX 11 0807 BNE LOOP 12 0809 BRK 14 ;insert bytes here 15 OFFSET = *-START 16 .END symbols LOOP $0802 OFFSET $0A START $0800 pass 2 LOC CODE LABEL INSTRUCTION ;fill a page with bytes, ;preserve program 0800 * = $0800 0800 START 0800 A2 0A LDX #$0A 0802 8A LOOP TXA 0803 9D 00 08 STA $0800,X 0806 E8 INX 0807 D0 F9 BNE $0802 0809 00 BRK ;insert bytes here OFFSET = $0A .END done (code: 0800..0809).
About BBC Micro Compatibility Mode
The BBC Micro computers had a quite powerful and versatile implementation of BASIC (BBC BASIC), which in turn built on the BASIC implementation of the Acorn Atom computer. One of its features was a built-in assembler, allowing users to embed assembler code directly in their BASIC sources. "BBC Micro mode" (enabled by the equally named checkbox) adds basic compatibility for the syntax of this embedded assembler of BBC BASIC.
This is not an attempt at a complete or strict emulation. You may even use some of the features of standard mode, where these do not conflict with the BBC syntax, e.g., "
0x..." style number formats or character literals like "
'A". Moreover, you may use the assembler even without assembly explicit blocks ("
[…]") for a quick session. (However, some features, like assembler directives, are restricted to assembler blocks.)
BASIC line numbers are recognized, but simply ignored and are not evaluated for sequence. Similarly,
FOR-NEXT loops are ignored (the loop variable will be registered, but will always evaluate to zero), as are
REM statements are allowed outside assembler blocks and the colon ("
:") may be used as a statement/instruction separator anywhere in the code. (As opposed to standard mode, there may be multiple statements or instructions on a line.)
There is also support for BBC BASIC indirection operators, BBC BASIC's way of accessing memory, outside of assembler blocks. These are "
?M" (query), "
$M" (dollar), "
!M" (pling) with their variations "
M?N" and "
M$N" for the left-hand side of assignements. These will be mostly used with "
P%" for the purpose of embedding data. "
$P%" is also supported for the right-hand side, including an emulation of the BASIC function "
There is basic support for string concatenations and the string related BASIC functions "
LENGTH()", and "
STRING$()". E.g, you may things like,
L = (LEN($P%+"foo"+"bar"+CHR$(33))+4)*2+ASC("A")
(These string and expressions related features are implemented as preprocessors. While errors will be reported in detail, the listing of a successful assembly will onyl include values as returned by these preprocessors.)
Besides the support of BASIC indirection operators, there's also support for the assembler directives used in BBC BASIC Level II for embeding literal values —
EQUS — inside assembler block, as well as for
ASC"<char>". (So you may embed any data using your preferred method.)
Moreover, you may reassign any symbols (BASIC variables) outside of an assembler block, even without explicitly setting "
Since assembler code was often wrapped in
FOR loops for various passes with logging levels passed in
OPT statements, these are recognized, as well, but ignored. (A variable defined in a
FOR loop will be lexically known, but its value will be zero.) This is also true for a few other BASIC statements often found in the context of an assembler program, like
DIM (as used to allocate some space for data),
ENDPROC keywords, which may wrap the entire assembler source, as well as any
CALL statements for executing the resulting machine code.
However, there are some unsupported features, as well,
- The assembler is still case-insensitive. (Variable "
TEST" is the same as variable "
- The assembler doesn't "know" about BASIC keywords. (Variables may have the names of BASIC keywords. This kind of works around the case issue.)
- There is no support for any BASIC functions, besides the string functions mentioned above.
- Expressions are still evaluated strictly from left to right, without precedence, using square brackets for grouping. (However, you may use normal parentheses in assignments.)
- There is no support for string variables, besides an emulation of the indirection "
Still, there's sufficient support to successfully assemble a source like this:
100 REM THIS IS BASIC 110 P% = &4000 : REM SET LOCATION COUNTER 120 TARGET = &4100 : REM A VARIABLE 130 FOR C=0 TO 2 STEP 2 : REM PASS LOOP (ignored) 140 [OPT C \now in asm mode 150 LDX #&20 \copy 32 bytes 160 .LOOP LDA SOURCE,X 170 STA TARGET,X 180 DEX:BNE LOOP \two in a row! 190 .SOURCE 200 ]NEXT : REM BACK TO BASIC 210 $P%=":-) "+STRING$(28,"*") 220 P%=P%+LEN($P%) : REM UPDATE LOCATION 230 END
Please refer to a manual for the BBC Micro for further details.
Support for "illegal" opcodes (undefined instructions) is enabled by the pragma "
The following mnemonics are implemented (supported synonyms in parentheses):
|opc (synonyms) imp imm abs abX abY zpg zpX zpY inX inY|
|ALR (ASR) | 4B ||
|ANC | 0B ||
|ANC2 | 2B ||
|ANE (XAA) | 8B ||
|ARR | 6B ||
|DCP (DCM) | CF DF DB C7 D7 C3 D3 ||
|ISC (ISB, INS) | EF FF FB E7 F7 E3 F3 ||
|LAS (LAR, LAE) | BB ||
|LAX (ATX) | AB AF BF A7 B7 A3 B3 ||
|LXA (LAX imm) | AB ||
|RLA | 2F 3F 3B 27 37 23 33 ||
|RRA | 6F 7F 7B 67 77 63 73 ||
|SAX (AXS, AAX) | 8F 87 97 83 ||
|SBX | CB ||
|SHA (AXA, AHX) | 9F 93 ||
|SHX | 9E ||
|SHY (SAY, SYA) | 9C ||
|SLO (ASO) | 0F 1F 1B 07 17 03 13 ||
|SRE (LSE) | 4F 5F 5B 47 57 43 53 ||
|TAS (SHS, XAS) | 9B ||
|USBC | EB ||
|NOP | EA 80 0C 1C 04 14 ||
|DOP (SKB) | 80 04 14 ||
|TOP (SKW) | 0C 1C ||
|JAM (HLT, KIL) | 02 ||
NOP: There are several
NOPinstructions, but these are the ones commonly used.
NOP" (single-byte operand/address, 2 bytes total).
NOP" (word-address, 3 bytes total).
JAM: freezes the CPU (again, there are several equivalent instructions).
This application is provided for free and AS IS, therefore without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. Use at own risk.
This application uses either Web Storage technology or, if this is not available, a cookie to store your choice for the preferred color mode for the virtual 6502 suite of applications. (Either the word "light" or the word "dark" is stored.)
You may be also interested in…
Data Transfer Using BASIC
An 8-bit machine usually comes with BASIC on board. You may use this to transfer the object code to memory using DATA statements. See this little tool to generate such code from the output of the assembler (copy & paste the object code): bytes2basic.html.
- 6502.org — the 6502 microprocessor resource
- visual6502.org — visual transistor-level simulation of the 6502 CPU
- www.65xx.com — official 65xx Website (The Western Design Center Inc.)
© Norbert Landsteiner 2005–2021, mass:werk