Entropy Zero: AI is Ruining the Internet
September 25, 2023
The “Dead Internet Hypothesis” is the hypothesis that the majority of the content on the internet is AI generated, and that actual human voices make up a small minority of the content that is actually out there.
This is not just a wild conspiracy theory. Automated website development is actually fairly commonplace, and fairly easy. Every page of every website (including this one) is written in HTML, which defines the form of a web page. HTML can be automatically generated by a computer program, and most often is nowadays. Automatically generating simple blog content is just a matter of defining a “template” of HTML content and filling in the the template with particular instances of data. For example, let’s say you wanted to make a blog about installing heat pumps, but you wanted to automate the process so you could write a similar article about every city in the USA, but just have the relevant facts to that city change. Rather than write hundreds of articles, you could write a single template could looked like the article you wanted to write for everything, but had spaces for the relevant info you wanted to fill in later on. It might look something like this: Many families in {{ city_name }} have been able to reduce their heating costs by {{ avg_heating_cost_reduction_percent }} just by installing a heat pump. The average cost to install a heat pump is {{ avg_installation_cost }}...
, etc. Here, each of the sets of braces {{ }}
allows you to insert some external data directly into your HTML using a computer program. This external data could be scraped from different sources on the internet, or purchased in large datasets, or otherwise obtained. Either way, this method allows you to easily generate thousands of pages of content very quickly.
The point of auto-generating content in this manner would be to make a sufficient volume of content to draw traffic for conversions, impressions, clicks, or ad revenue. The data filled in on these sites doesn’t need to be accurate, it just needs to grab enough attention from a search engine user to generate an impression. Volume is the game, not factual accuracy; the goal is to design content that acts like a big enough net to catch a lot of queries from a search engine. Articles that list a bunch of general facts are better than ones that expand upon a single point in detail and depth. And since search engines are location-based too, a single website that focuses on only one region is only going to get a small sliver of the possible engagement that an array of hundreds of websites could get, so part of the auto-generation strategy is not only to generate tons of content, but also tons of websites to host it all on, each tailored for a different region or audience.
And yeah, I was actually hired to do this once. No, I didn’t take the gig, though I’m pretty sure the guy had tens to hundreds of websites that were all built this way, and he turned enough of a profit to hire people on Upwork to do the web scraping for him.
He’s not unique, either. Using auto-generated or AI-generated content has become standard practice in search engine optimization (SEO), internet marketing, and content design. It’s everywhere, and that’s the point: the whole goal is for it to be everywhere. The concept is simple. Use a computer program to generate a bunch of content, make sure it gets indexed by search enginers, and use it to siphon clicks for ad revenue from hapless search engine users. This is the reason most content that is returned from a search engine feels repetitive, bland, robotic, and uninformative. It’s not just you. Most of the results from a search query are just robotic regurgitations of the same basic facts related to the query, or syndications of the exact same content from other blogs and websites. If the Dead Internet Hypothesis is right, this is because the search engines are simply returning all the auto-generated crap they have indexed. The signal is being lost in all the noise.
Another technology which is being used increasingly in this way is AI, specifically large language models like ChatGPT. These models make it even faster to generate content. So-called “prompt engineers” can generate AI prompts like “write me a blog post about hiking in Colorado in the style of Ayn Rand”, and it will do it for you fairly convincingly. As this kind of generative AI proliferates, making AI-generated content will become more sophisticated, deployable, and cheaper. Chances are high that you have already watched AI-generated videos on YouTube, read more than a handful of AI-generated articles on websites, and interacted with AI content and AI bots on social media. As large language models become more mainstream through sites like ChatGPT, more and more AI-generated content will fill the internet at a much faster rate than human beings could generate it organically. Practically every programmer I know uses ChatGPT to write code snippets or debug and explain programs (though I refuse to). I’ve tutored students who instinctually whip out ChatGPT to do their homework for them. I’ve visited with school teachers who brag about how they use it to write curriculum and lesson plans. I’ve even heard that some pastors are using it to write sermons. AI-generate content will soon be everywhere on the internet, and everywhere in your life, if you aren’t careful.
Paradoxically, AI proliferation will have an anti-informative effect on these realms. As more AI-generated content fills search engine results and websites and blogs and social media posts and marketing emails, the internet will cease to be a realm of information. That’s a mathematical statement, not an opinion.
This is because, contrary to the name, “generative” AI cannot actually generate any new information; it can only generate content similar to what it has been trained on. Everything that ChatGPT tells you is just an algorithmic regurgitation of the data it already has. The more these algorithmic regurgitations fill the internet, the more everything begins to sound the same, look the same, and say the same things on the internet. By contrast, what makes information “informative” is the amount of unexpectedness, surprise, and insight that it carries. When everything on the internet is just a re-consumption and re-vomiting of everything else on the internet, nothing will be unexpected, surprising, or insightful. Information theorists call this a zero entropy process; a process from which no information can be gained, because the outcome is always the same.
AI is not an illuminating force. At best it is a tool of convenience, giving you a crappier version of the information that is already freely available on the internet. At worst, it is literally a destroyer of knowledge, an entropy decay engine whose economic incentives necessarily end in the dumbing down of the internet, and possibly everything else.