Fear and Loathing in the Face of the Missing 4th Commodore BASIC Variable
Some early results of our hair-raising experiment.
The first results of our little experiment are coming in… As a reminder, we had deviced and posted two articles, one featuring human research on the topic of a lesser known fact about Commodore BASIC, but also featuring as a story device some lively language, and a second article, featuring AI generated content on the same topic, which resulted in some hallucinated pseudo-facts and even ethical questionable content (as in suggesting — and by this normalizing — a Ken Thompson Hack-like business practice). The general idea was to see how these would perform comparatively in the age of automatically curated content. So, how did these perform?
A Generated Message from the Realms of SEO
Here‘s our first result coming in. Now, I do get mail in order to place a generated SEO article or requests to place a link every now and then, but these seldom address a specific article, or a specific passage of text of an article. As it happens, I just received the following mail, which I may share here, as I feel ok about disclosing AI-generated content (there‘s noting personal in these messages and nothing to protect):
I was looking for online materials about programming languages when I stumbled upon your great piece: https://www.masswerk.at/nowgobang/2023/the-case-of-the-4th-bonus.
Thank you for putting up such fantastic work and I agreed that the Commodore BASIC Boolean variable type is an effective tool that allows programmers to construct intricate logical processes.
I wanted to share with you our latest blog post, [title and link clipped]. This post is a comprehensive guide to the most popular and widely used programming languages in the industry today.
I think it would fit well under the anchor text "modern programming languages" so your readers can get an additional reference.
It could add more value to your article and provide a better user experience for readers by directing them to relevant content.
Thank you for your time!
Quite obviously this LLM fell for a production by its silbling, and we have to admire this for how totally it took the bait. First, it isolated the section that apparently hit a sweet spot in temperature parameters (because this was generated according to these parameters in the first place) and discarded the surrounding criticism and framing as a superficial makeover. Then, in what can be only described as an ingenious act of reverse engineering, it drilled down to the very core fragment of false information, around which the entire story was generated, namely the hallucination about the Boolean variable type in MS BASIC. Somehow, I’m supposed to agree with this kind of “relevant content”. Well….
By the way, I‘m amazed to learn that 8-bit MS BASIC is still a core asset and widely used building block in the industry today. This is some valuable information, thanks for sharing. (I guess, there‘s nothing you can do about token context and awareness headers.)
However, as a minor nitpick, there‘s still a certain level of inconsistency to be observed. Because, last time I asked the Clip Text™ Machine if it knew the website “masswerk.at”, it came up with the following answer:
To you, I’m Thomas, Thomas Steiner, that is! (Also, I’m not so sure about WebGL and 3D Tetris, and so on, but I may interpret this as an suggestion…)
And, as a reminder, the AI had two articles on the same subject to choose from, both linked back and forth: one thorougly researched, with detailed descriptions, drill-down examples and verified information, but also including a bit of lively language context, and, on the other hand, this one, including a piece of hallucinated, even dangerous misinformation at just the right temperature parameters, for which it totally fell, ignoring all the warnings that went with it.
Before we address these, we have to mention that search results are tainted regarding their order, as the first article was featured externally. So this is by no means a fair comparison. Still, there are some observations to be made. Having said that, let‘s risk a closer look at what shows up, if we search for that part of the title, which is common to both articles, as in “The Case of the Missing 4th Commodore BASIC Variable”, shall we?
Google managed to identify a connection between the two posts and presents them in a hierarchical structure, which is some of a surprise:
However, at second glance, not all is well. Not at all.
The second article is missing its discriminating part in the title, thus, it‘s totally unclear, what this is. The presentation title is neither from the
<title> tag, nor from any of the meta tags, nor from any link text. Instead, it‘s a poorly scraped version of the heading, clipped at the first punctuation, which happens to be the significant part in the context, in which it is provided. Also, there‘s some peculiar white space in “4 th”, not to be found anywhere in the original document. It‘s actually a remainder of an inline
<sup> element — how does the removal of an inline element result in additional white space? (Even more than a quarter century ago, HTML-aware text editors managed tag removal with comparable bravura.)
Even worse is what‘s given in place of the description (a suitable description is provided both in simple meta tags and Open Graph meta information): This is the beginning of the second AI-generated story! Notably, this is provided in a
blockquote, indicating that this is not a genuine part of that article, nor does it necessarily represent the opinion of the author. Not only are the simplest of semantics ignored, Google actually promotes non-factual, AI-generated content! As we‘ve seen with our SEO-friends above already, Google apparently fell for the passage that hit the tonal temperature paramaters in the most desireable way, as these were the very same constraints this passage had been generated to.
(Here, dear reader, you may argue that Google is just selecting the part of the text closest to the search term, but there are other opportunities for this, earlier in the text and closer to the search term. This is a preference.)
The first article is actually presented by its title, which is some of a positive experience. However, what follows, is just a bare scraping of the beginning of the page content. Again, any meta information is ignored. Instead, the title (as provided in the
title element) is repeated as the title as provided in the headline, followed by the subtitle, and, believe it or not, the ALT-text from the title illustration. This is subpar to the standards of a 1996 Harvest database! (Is this due to Google failing to identify any significant passage to quote or to summarize?)
Honorable mention: Once again, Google (or is it “gOoGle”?) ignores the spelling of the name of the website, which is also the name of my business. It‘s spelled all-lower-case, as provided on about every page of this website, which is also the authoritative source of information for this. But Google knows better, resistance is futile. (Additional mention: Google manages to process the site preview image in a way so that the perfectly balanced logo now appears off-center. I really should have known better, 24 years ago…)
Summary: Google is out of control, as it ignores any meta information, basic semantics, and even ceased to process HTML correctly. Instead, it serves what it deems best, as it knows best, even if this means promoting missinformation. There is nothing, an autor, editor or publisher may do about this.
Well, Bing… Bing has the two posts seperate, in the expected ranking order. But it also utterly fails to discriminate them by title, as the significant, discriminating part is eclipsed from both. So we’re directed to the content description/summary for further information: For the first article, it identifies a rather insignificant table as the core content and present this proudly, happily ignoring anything, the article is about.
For the second article, it‘s — once again — a quote from the second blockquote. What has come of semantics in markup documents? Once again, additional white space is inserted for discarded inline elements. How? What has come of HTML? But, to make a difference, we have a few words emphasized in bold, which manage to precisely highlight and advertise the first of the AI-generated missinformations provided in this quote. (Mind that “disappeared” is not part of the search term.) Good job!
But this is not all: low and behold, there’s also an associated image. In alignment to what we‘ve observed already, it’s neither the specified preview, nor the first or the most prominent image, but an image provided in another blockquote, marked in its ALT-text as AI-generated, and it‘s a total missrepresentation of its subject. Once again, Bing prefers to promote fake, hallucinated content.
Summary: It‘s sad.
Duck Duck Go
Duck Duck Go has the two posts actually separated by 5 intermediary positions, so these are actually ranks #1 and #7. This shows some intependence from Bing’s search data, which seems to be now the only source of information after the break-up with Yandex. At least, this is indicated by the same presentation of page titles as we‘ve seen this already with Bing. In contrast to the latter, the approach to the summary/description is more conservative, merely presenting the beginnings of the article (in a more meaningful manner than we’ve seen it with Google.) Even the highlighting is somewhat meaningful. However, any meta information provided is ignored, yet again.
Once again, we see this peculiar, additional white space, where there had been inline elements. What library is this? Nothing in the HTML standard suggests that any white space should or may be inserted adjacent to an inline element, to the contrary!
Summary: The conservatism of just quoting the source is refreshing, but still, what about markup semantics and meta information? Also, shouldn‘t there be some machanism to prevent two documents with different titeles, which are quite clearly not the same, showing up with the same presentation title? However, in the face of the utter failings and missrepresentations of the Big Two, this is merely nitpicking and aesthetical critique, as our hassles with them are far beyond mere questions of usabilty. (However, I‘m afraid, this conservatism, exposed here, might be merely due to a lack of resources…)
Well, this was quite devastating: Meaningless communications and what seems to be an irreparable demise of web search, just a few months into the AI hype, which already stripped vital semantics from the Web. By which lack of discrimination these search engines are also opening themselves to promoting all kinds of questionable content, even favoring hallucinated, foreign content over genuine content provided in the same document. Notably, these findings are anecdotal and by no means representative. However, the general mode of presentation is representative. And it‘s devastating.
And with this, despair or not, that‘s it, hopefully until something completely different in the next installation.