How Is AI Changing Our Media--and Our Trust?
You are receiving this post/email because you are a patron/subscriber to Of Two Minds / Charles Hugh Smith. This is Musings Report 2024-38.
CHS NOTE: I understand some readers object to paywalled posts, so please note that my three weekly posts are free and I reserve my weekend Musings Report for subscribers. Hopefully this mix makes sense in light of the fact that writing is my only paid work/job. I am grateful for your readership and blessed by your financial support.
A great many claims are being made about how AI will revolutionize our lives, and the effects are already visible in a number of realms. I've written many essays over the past 18 months addressing a wide range of AI-related topics. Here are a few from the long list:
There's Just One Problem: AI Isn't Intelligent, and That's a Systemic Risk 8/8/24
Will Hollywood and the Music Industry Survive the Super-Abundance of Original AI Content? 7/6/24
Who Error-Corrects AI? 2/28/24
Let's consider the changes visible in media: how media is created, curated, distributed and controlled.
Natural language machine-learning tools such as ChatGPT (i.e. large language models, LLMs) are trained on large amounts of data to assemble statistical relationships and mimic human speech patterns. These tools can summarize topics and concepts, and respond to queries about specific subjects, from programming to history.
They appear authoritative but this authority is illusory. They "hallucinate" fictitious "facts," and place quotation marks around text as if it were a quote from a source, when it is only text generated by the program. Correspondent Bob W. shared the response he received from OpenAI about ChatGPT's use of quotation marks:
"You're correct in noting that ChatGPT, while capable of generating text that appears as direct quotations from specific works, does not access external databases or the internet to pull direct quotes from texts. Instead, it generates responses based on patterns and information it learned during its training process. This means that while ChatGPT can produce text that resembles quotations and attribute them to specific works or authors, these responses are generated based on its understanding and are not pulled directly from the source materials.
This characteristic of ChatGPT is part of its design as a language model that generates responses based on a vast corpus of pre-existing text data up to its last training cut-off in September 2021. As such, it's important to verify any 'quotations' provided by ChatGPT against the original source material, especially for critical or scholarly work."
Bob added this conclusion: "I would think that ChatGPT programming would include knowing the meaning of quotation marks."
That this inherently misleading trait is not readily visible to users is disturbing.
These tools can generate natural language texts, from articles to entire books cobbled together from the program's enormous databases.
I recently experimented with another AI tool from Google, Notebooklm, which generates a podcast conversation between two AI-generated hosts discussing whatever text you upload.
I uploaded my essay 2024, A Year of No Significance.
Here is the AI-generated podcast.
The voices are remarkably natural, though they're a little too perfect: no pauses, stumbles, etc.
Their discussion stays on topic, but it includes references and contexts that aren't in the essay. In other words, the topics are interpreted and recontextualized in accordance with the programming.
Just as these programs "learn" by scanning texts composed by humans, we humans "learn" about the limits and implicit design of these programs by experimenting with them.
My second AI-generated podcast was based on my essay
The Impossible Dream: 70 Million Boomers Retire in Style.
Here is the AI-generated podcast.
We can discern a few things about the program's design. One is that it generates podcasts of pre-set durations: short essays generate a podcast of X length, longer essays generate a podcast of about 11:30 minutes. The program will fill this time with "fluff" as needed.
The program is also designed to generate a "positive ending," because this is America, and there must always be a solution / positive outcome.
I did mention investing in our own health as the most cost-effective option, but there really isn't any way to sugarcoat the impossibility of funding 70 million retirees if a substantial percentage need caregivers.
This reveals the way in which AI tools can subtly contextualize content to suit a pre-programmed agenda that isn't visible to users.
That this agenda could have political dimensions is obvious.
Now that these AI tools are generating texts, audio and video on a mass scale, we can discern structural problems.
One is the potential for hallucinated "facts" and false attributions (text placed in quotes that is not an actual quote), and the subtle recontextualization of topics and data to fit pre-programmed norms.
Another is what I call "the dragon eats its own tail." (Or in this case, "the dragon eats its own tale.") AI programs scooping up source text, audio and video are now scooping up AI-generated content that is not authoritative, nor is it clearly identified as of dubious origin, i.e. AI generated.
So dubious, degraded and outright false content is recycled as authoritative, weakening the entire foundation of these tools. There is no easy fix, as I discuss in Who Error-Corrects AI? 2/28/24
A second issue is the centralization of these tools and the distribution of content. To understand how centralization (concentration of ownership and control) has changed media, we need to return to the pre-social media / Big Tech monopolies days of the early Internet, circa 2000.
In its initial phase, the World Wide Web (a.k.a. the Web or the Internet) was a self-organizing public utility: the cost of Internet access was standardized like a utility: everyone, rich or low-income, paid the same monthly access fee. Though the government (and private governance bodies such as ICANN) provided a basic scaffolding of standards, ownership and control of the content posted on the Web were private: individuals, enterprises, agencies and organizations all paid to host a website (URLs, DNS service, servers or hosting services) and posted their own content.
This level playing field was open to all, and hence self-organizing: sites were linked by their owners / managers to email accounts, bulletin boards and other websites of their own choosing.
In this phase, search was what I call organic, meaning search engines prioritized search results solely by relevance. Google's innovation was Page Rank, which ranked results by incoming and outgoing links. Organic search was not profitable and so Google and other search engines were written off as intrinsically low-margin enterprises.
In the early 2000s, this self-organizing utility model was replaced by another far more centralized and profitable structure. The Internet we now have that is dominated by a handful of immensely powerful corporations devoted not to public utility but to the maximization of profit.
Through search and social media, these mega-monopolies control what content is displayed, prioritized, deleted or buried, effectively shaping the entire media landscape by means that are hidden to us (algorithms) and to purposes / agenda that are equally invisible.
It may seem innocuous that automated podcasts, videos and texts are preprogrammed to generate a "positive ending," but we are naive if we reckon that's the limit of the recontextualizing that's occurring beneath the surface.
The third critical issue is the decay of social trust, a topic I explored in. Our AI-Powered Post-Truth, Post-Trust Unraveling 10/21/23
Keep reading with a 7-day free trial
Subscribe to Charles Hugh Smith's Substack to keep reading this post and get 7 days of free access to the full post archives.