Announcing meSpeak.js 2.0

Text to Speech in JS, now even better.

I am happy to announce a long planned for update to meSpeak.js, an open source TTS for the Web in JavaScript. This version brings some major update (and some minor discontinuities in API as well.) In a nutshell, meSpeak.js is the open source eSpeak program crosscompiled to JS using Emscripten (a minimal POSIX runtime to run LLVM compiler output in JS) running in the browser with some additional API glued on top. MeSpeak.js is based on speak.js, which has been an early demo application for Emscripten, but differs somewhat in architecture and features (like access to the entirety of eSpeak options, facilities for export and/or buffering of audio data, a built-in audio playback API, modular voice and language descriptions, etc) and also in compatibility.

Announcing meSpeak 2.0
Announcing meSpeak.js 2.0.

And here are the major features of the update:

A Stereo Panning Example

This is an example for reading a text with distributed voices. The dialog, a sample transcript of a session with Joseph Weizenbaum’s famous ELIZA natural language conversation program [1], is read by two distinctive characters: A female, US-English speaking voice representing the user, positioned 80% to the left, and a male character using the Englisch Received Pronunciation (RP), who inpersonates the virtual doctor, speaking from 80% to the right.

(Mind that there are severe limitations for audio playback on mobile with workarounds resulting in limited performance, and most likely memory constraints, as well. So this example may not work properly on a mobile device.)

Men are all alike.

IN WHAT WAY?

They're always bugging us about something or other.

CAN YOU THINK OF A SPECIFIC EXAMPLE?

Well, my boyfriend made me come here.

YOUR BOYFRIEND MADE YOU COME HERE?

He says I'm depressed much of the time.

I AM SORRY TO HEAR YOU ARE DEPRESSED.

It's true. I am unhappy.

DO YOU THINK COMING HERE WILL HELP YOU NOT TO BE UNHAPPY?

I need some help, that much seems certain.

WHAT WOULD IT MEAN TO YOU IF YOU GOT SOME HELP?

Perhaps I could learn to get along with my mother.

TELL ME MORE ABOUT YOUR FAMILY.

My mother takes care of me.

WHO ELSE IN YOUR FAMILY TAKES CARE OF YOU?

My father.

YOUR FATHER?

You are like my father in some ways.

WHAT RESEMBLANCE DO YOU SEE?

You are not very aggressive but I think you don't want me to notice that.

WHAT MAKES YOU THINK I AM NOT VERY AGGRESSIVE?

You don't argue with me.

WHY DO YOU THINK I DON'T ARGUE WITH YOU?

You are afraid of me.

DOES IT PLEASE YOU TO BELIEVE I AM AFRAID OF YOU?

My father is afraid of everybody.

WHAT ELSE COMES TO YOUR MIND WHEN YOU THINK OF YOUR FATHER?

Bullies.

DOES THAT HAVE ANYTHING TO DO WITH THE FACT THAT YOUR BOYFRIEND MADE YOU COME HERE?

(The capitalized lines are the machine responses.)

[1]
Weizenbaum, Joseph "ELIZA – A Computer Program For the Study of Natural Language Communication Between Man and Machine"
in: Communications of the ACM; Volume 9 , Issue 1 (January 1966): p 36-45.

Note: Curiously, this runs on a dated version of Safari (9.1.3) faster than on the current built of Firefox (69.0.1). Specifically, Firefox may introduce pauses when resuming the worker, which is here hidden by buffering audio.

Audio AnalyserNode Example

This is a simple demonstration of meSpeak.getAudioAnalyser(), which returns a Web Audio AnalyserNode. Here, we draw an oscilloscope display of the waveform generated by meSpeak.speak().

This may be useful for animating an avatar or icon…

Backstory (or Why it Did Take so Long)

So, why was this update so long planned for — or in other words, why did it take so long? Now, as mentioned before, meSpeak.js is based on an early incarnation of speak.js (as of 2011) with some changes applied for enhanced compatibility, even then. In the meantime, Emscripten has evolved quite rapidly and (sadly) eventually stopped to compile a working instance of the speak.js-project. So I was essentially stuck with this dated instance, hand-tuned for compatibility as broad as possible. Then, eventually, there was yet another release of speak.js, using a worker, but now Emscripten wouldn’t run on conseccutive calls preserving any loaded files. (This was/is probably due to an orientation towards running video games and other emulations as in JS-MESS.) So, we were stuck with this old instance again. However, while modern Emscripten compiles to WebAssembly, providing much improved runtime-speeds, it also moves the goal post for compatibility quite aggressively. Generally, it requires the respectively latest browsers to run (and there may be even exceptions to this.) On the other hand, running an old *NIX application, which started on Acorn/RISC_OS in 1995, doesn’t require the latest in performance. Running it in the background may be good enough, if we may maintain the benefits a full access to eSpeak’s option, playback via the Web Audio API, and modular voice definitions. As a bonus, this will run on anything as “recent” as 2011. If meSpeak.js did run before, it will do so in version 2.0 as well. All it took was the effort of revisiting the script, separating the core from the front-end and adding a few features, I had wished for for long. (And even a bit of additional hand-tuning of the dated Emscripten core, namely for overwriting existing files.)

So, here you go: meSpeak.js v.2.0.

Discussion/comments on Hacker News: news.ycombinator.com/item?id=20661193. (Oops, front page.)

The following is an experiment: ;-)
While I do care about appropriate UX, I do not want to track my readers, nor do I want to forward your data to any Geo-IP vendors. So I’m picking a country or region by random — please feel addressed.
If this doesn’t match, please feel offended or relieved, just to your liking. However, should this match, be assured that it does so by mere coincidence. Also, what do I know what readers may find interesting? So I’m rather picking some articles at random, as well.

Other readers from Macau found also interesting:

Or, how to lift Kurt Gödel’s hat by −0.

Because there might be no Web tomorrow…

Software archeology of an early computer animation (1960s) for the DEC PDP-1.

Because there might be no Web tomorrow…

Tales from the realms of software archeology.

Because there might be no Web tomorrow…