Char8.js – Virtual Unicode 8-bit Character Generator

Char8.js is a virtual character ROM and symbol generator in JavaScript, providing 8 × 8 character matrices for a wide range of Unicode symbols.
Char8.js aimes at generating sensible text representations for as many Unicode symbols as possible pertaining to a reasonable file size (69K uncompressed).
Glyphs are generally based on CBM PET characters with some adjustments for standard Unicode representations.
Some of the basic glyphs (5, 7, p, q) were modified due to personal taste, but special methods exist to access the PET-like forms for these symbols.

Char8.js currently provides:

Full support for
- Basic Latin
- Latin-1 Supplement
- Latin Extended-A
- Latin Extended-B
- Letterlike Symbols
- Currency Symbols
- Greek
- Hiragana
- Katakana
- CJK Symbols and Punctuation
- Small Form Variants (mapped)
- Halfwidth and Fullwidth Forms (mapped)
- Block Elements
- OCR (see "Enclosed Alphanumerics" for MICR/E-13B numerals not implemented in Unicode)

And essential support for
- Cyrillic Characters (common characters only)
- Numerals
- Mathematical Operators and Miscellaneous Technical
- Arrows
- Box Drawing
- Common APL Representations

as well as various other, supplementary symbols.
(Sorry — no emojis or smilies, only classic symbols!)

Version 1.1 adds support for the proposed Graphics for Legacy Computing according to the “Proposal to add characters from legacy computers and teletext to the UCS” (as of 2019-01-04):
- Arrows for Legacy Computing (U+1FB0, U+1FB01)
- Graphics for Legacy Computing (U+1FB00...1FBF9)

The proposed legacy characters inlude elements as used by various systems, including
- Sextant characters (Mattel Aquarius, Minitel, Teletext, Prestel)
- Smooth Mosaic Characters (Minitel, Teletext, Prestel)
- Commodore PETSCII (block and fill elements)
- Atari 8-bit and Atari ST
- MSX Systems
- Amstrad CPC
- Apple II MouseText
- Acorn RISC OS
- TRS-80 (Models I-4)

Glyphs are generally based on the PET character ROM, national variants of the VIC 20 character ROM (Russian, Swedish, Norwegian, Greek), APL symbols of the SuperPET (SP9000), and Katakana symbols from the VIC-1001.
This original set of characters was substantially extended and complement to provide a basic support for common Unicode ranges. Also added were some picture like glyphs from the Sharp MZ80 series.
While the character matrix is of the classic format of 8 × 8 bits, the glyphs are generally intended for use with double-height renditions (8 × 16), as seen on the 80 columns screens of the Commodore PET 8000 and 9000 series.

Click here for a rendering sample of Char8.js-glyph-data.

Note: The following list is auto-generated and requires JavaScript.

Char8.getCode(code) resolve mappings for a given code number, returns effective code or 0: default letter box, -1: zero-width space character, -2: nonbreaking zero-width space character. See also: Char8.transform() (You may want to use the replacement character U+FFFD instead of the default letter box returned by "Char8.getGlyph(0)", as by the idiom "Char8.getCode(myCode) || 0xfffd".)

Char8.getGlyph(code) returns array of bit-map glyph data or undefined. format: 8 × 8 matrix as array of 8 8-bit numbers (rows, top-down), rows are bit-vectors representing 8 pixels each (0x80 = left-most pixel).

Char8.getSymbol(code) like getGlyph resolving substitutions (or undefined for zero-width characters).

Char8.getMulti(code) resolves multi-character substitutions, returns array of character codes or undefined. (Multi-letter substitutes occur for Roman numerals, Hiragana, and Katakana only.)

Char8.isMulti(code) returns Boolean, true, if there is a multi-character subsitution for this code, else false.

Char8.getPETCode(code) like getCode, returns mappings for PET glyphs.

Char8.getPETSymbol(code) like getSymbol using PET glyphs.

Char8.getExtASCIICode(code) returns a resolved Unicode code for an extended 8-bit ASCII code (converting codes 0x80..0xff of "code page 437", IBM 1981).

Char8.getExtASCIISymbol(code) returns glyph data for an extended 8-bit ASCII code (converting codes 0x80..0xff of "code page 437", IBM 1981).

Char8.unescapeHTML(string) returns a string with HTML-entities transformed to characters. Supports both numeric (hex or decimal) and named entities. Any entities either not recognized or malformed are returned as-is. Also normalizes any linebreaks (CR, LF, CR/LF, LF/CR) to Unix new-lines (LF). The following named entities are recognized: Tab, NewLine, excl, quot, QUOT, num, dollar, percnt, amp, AMP, apos, lpar, rpar, ast, midast, plus, comma, period, sol, colon, semi, lt, LT, equals, gt, GT, quest, commat, lsqb, lbrack, bsol, rsqb, rbrack, Hat, lowbar, grave, lclub, lbrace, vert, vertbar, rclub, rbrace, nbsp, iexcl, cent, pound, curren, yen, brvbar, sect, uml, Dot, die, copy, COPY, ordf, laquo, not, shy, reg, REG, circledR, macr, deg, plusmn, pm, PlusMinus, sup2, sup3, acute, micro, para, middot, centerdot, CenterDot, cedil, Cedilla, sup1, ordm, raquo, frac14, frac12, half, frac34, iquest, Agrave, Aacute, Acirc, Atilde, Auml, Aring, AElig, Ccedil, Egrave, Eacute, Ecirc, Euml, Igrave, Iacute, Icirc, Iuml, ETH, Ntilde, Ograve, Oacute, Ocirc, Otilde, Ouml, times, Oslash, Ugrave, Uacute, Ucirc, Uuml, Yacute, THORN, szlig, agrave, aacute, acirc, atilde, auml, aring, aelig, ccedil, egrave, eacute, ecirc, euml, igrave, iacute, icirc, iuml, eth, ntilde, ograve, oacute, ocirc, otilde, ouml, div, divide, oslash, ugrave, uacute, ucirc, uuml, yacute, thorn, yuml, OElig, oelig, Scaron, scaron, Yuml, fnof, circ, caron, Hacek, breve, Breve, dot, ring, ogon, tilde, dblac, Alpha, Beta, Gamma, Delta, Epsilon, Zeta, Eta, Theta, Iota, Kappa, Lambda, Mu, Nu, Xi, Omicron, Pi, Rho, Sigma, Tau, Upsilon, Phi, Chi, Psi, Omega, alpha, beta, gamma, delta, epsilon, zeta, eta, theta, iota, kappa, lambda, mu, nu, xi, omicron, pi, rho, sigmaf, sigma, tau, upsilon, phi, chi, psi, omega, thetasym, upsih, piv, ensp, emsp, thinsp, zwnj, zwj, ndash, mdash, lsquo, rsquo, sbquo, ldquo, rdquo, bdquo, dagger, Dagger, bull, hellip, permil, prime, lsaquo, rsaquo, oline, frasl, euro, image, weierp, real, numero, copysr, trade, TRADE, alefsym, larr, uarr, uparrow, rarr, darr, downarrow, harr, varr, crarr, lArr, uArr, rArr, dArr, hArr, vArr, forall, part, exist, empty, nabla, isin, isinv, in, notin, ni, prod, sum, minus, lowast, compfn, radic, prop, infin, ang, mid, smid, shortmid, and, wedge, or, vee, cap, cup, int, there4, sim, thksim, thicksim, cong, asymp, ne, equiv, le, leq, ge, geq, sub, sup, nsub, sube, supe, oplus, ominus, otimes, top, bottom, bot, perp, sdot, sstarf, Star, lceil, rceil, lfloor, rfloor, lang, rang, ovbar, solbar, loz, cir, spades, clubs, hearts, diams (Compare http://dev.w3.org/html5/html-author/charref.) V.1.2 now returns U+FFFD (replacement character) for numeric entities, if there is no corresponding glyph definition or if there is no support for the code range by the environment (OS, browser, JS-runtime).

Char8.transform(string [, usePetGlyphs [, extendedASCII]]) transforms a given string of Unicode characters to an array of resolved char-codes, also resolves any multi-letter substitutions. Returns an array of numbers (empty, if undefined was suplied as the argument). An optional second argument serves as a boolean flag, resolving to codes for special PET-like glyphs. An optional third argument serves as a boolean flag to indicate that the input is encoded in extended 8-bit ASCII ("code page 437"). Any linebreaks (CR, LF, CR/LF, LF/CR) are preserved as normalized Unix new-lines (LF). Tab stops (0x09) are preserved and resolve to a single blank by "Char8.getGlyph()", but may be handled in special ways by an application. V.1.2: Unrecognized codes resolve now to the replacement character code 0xFFFD.

"Char8.transform()" is probably the method you want to use most for processing text. Just transform a given string and fetch the character matrices for individual characters by calling "Char8.getGlyph()" from a loop over the returned list of resolved char-codes. Usage Examples 1) Basic Usage Example // assuming some function "renderChar()" to decode and draw the character matrices for (var chars = Char8.transform( "Hello world!" ), i = 0; i < chars.length; i++) renderChar( Char8.getGlyph( chars[i] ) ); 2) A Complete "Hello World" Program // stuff for basic display and rendering // (laid out for comprehensibility, obviously there's much left to optimize) var charX = 0, charY = 0, charsPerLine = 80, rows = 25; // set up a canvas element and an image buffer var canvas = document.createElement("canvas"); canvas.width = charsPerLine * 8; canvas.height = rows * 8; document.getElementsByTagName("body")[0].appendChild(canvas); var ctx = canvas.getContext("2d"); var imgBuffer = ctx.createImageData(canvas.width, canvas.height); var pixelData = imgBuffer.data; function setPixel( pixelX, pixelY ) { // set a pixel (in the image buffer) var p = (pixelY * canvas.width + pixelX) * 4; pixelData[p++] = 64; // r pixelData[p++] = 224; // g pixelData[p++] = 96; // b pixelData[p ] = 255; // a } function updateCanvas() { // transfer the image buffer to the HTML5 canvas element ctx.putImageData( imgBuffer, 0, 0 ); } function renderChar( charMatrix ) { // decode the character matrix for (var y = 0; y < 8; y++) { for (var x = 0; x < 8; x++) { var on = charMatrix[y] & (0x80 >> x); if (on) setPixel( charX * 8 + x, charY * 8 + y ); } } } // now actually transform and render some text var charCodes = Char8.transform( "Hello world!" ); for (var i = 0; i < charCodes.length; i++) { var code = charCodes[i]; if (code === 10) { // new line charX = 0; charY++; } else { renderChar( Char8.getGlyph( code ) ); // advance the character position if (++charX === charsPerLine) { charX = 0; charY++; } } // if charY equals rows, scroll up (not implemented) } updateCanvas(); See it in action.

Please mind that the character matrix returned by "Char8.getCode()" is a reference to the actual data and not a copy in order to avoid extensive garbage collection. You would not want to modify this data in your decoding function.

Methods for expanding Char8.js

Char8.map(list) defines a custom substitution map, returns undefined (void). list: object (keys: char-code to substitute, values: char-code to use). Mappings must be finite, i.e., resolve to generic glyphs. Sample Usage: Char8.map( {0x5e: 0x2191} ); // use up arrow (U+2191) for circumflex/caret (U+005E)

Char8.unmap() clears any custom map, returns undefined (void).

Char8.define(definitionList) defines additional symbols and substitutes. Input must be an object of objects, with major keys "symbols" (for glyphs) and/or "substitutes" (to define substitutions). Returns undefined (void). Symbols are to be provided in an object with keys representing character codes and values of arrays of numbers representing the character matrix. Substitutions are provided in an object with keys representing character codes and values of either type number (representing a simple subsitution) or an array of character codes, representing multi-letter substitutes. Substitutes must be finite and do never override symbols or synonyms. For overriding any definitions by special mappings, see "Char8.map()". Example Define the Unicode character U+263A (WHITE SMILING FACE), define it as a subsitute for U+263B (BLACK SMILING FACE), and substitute U+2639 (WHITE FROWNING FACE) by the multi-letter combination ":-(" [U+003A, U+002D, U+0028]: 1 2 6 3 1 8 4 2 6 8 4 2 1 #0 . . . X X X . . 16 + 8 + 4 = 28 #1 . . X . . . X . 32 + 2 = 34 #2 . X . X . X . X 64 + 16 + 4 + 1 = 85 #3 . X . . . . . X 64 + 1 = 65 #4 . X . X X X . X 64 + 16 + 8 + 4 + 1 = 93 #4 . . X . . . X . 32 + 2 = 34 #6 . . . X X X . . 16 + 8 + 4 = 28 #7 . . . . . . . . 0 Char8.define( { "symbols": { 0x263a: [28, 34, 85, 65, 93, 34, 28, 0] }, "substitutes": { 0x263b: 0x263a, 0x2639: [0x3a, 0x2d, 0x28] } } ); Note: Symbols are processed before any substitutes, therefor we may refer to a symbol defined in the same definition list.

mass:werk Char8.js — 8-bit Unicode Character Generator
General purpose 8-bit characters for the 21^st century.

Contents

A. About

B. The Font: Generic Glyphs and Supplementary Mappings

C. Usage

Methods for expanding Char8.js

D. Source Link

E. Author

mass:werk Char8.js — 8-bit Unicode Character GeneratorGeneral purpose 8-bit characters for the 21st century.

Contents

A. About

B. The Font: Generic Glyphs and Supplementary Mappings

C. Usage

Methods for expanding Char8.js

D. Source Link

E. Author

mass:werk Char8.js — 8-bit Unicode Character Generator
General purpose 8-bit characters for the 21^st century.