In case you’re in a hurry:
- Bots now account for nearly 50% of internet traffic, distorting data quality and user experiences.
- Data scientists spend up to 80% of their time cleaning data due to the bot problem.
- Bots impact online advertising by generating fake clicks, wasting marketing budgets.
- The internet has shifted from authentic content to automated, low-quality content due to black hat marketing tactics.
- AI platforms like ChatGPT provide cleaner, more direct access to information.
- Social media and search engines are working on algorithms to prioritize authentic content.
- Quantum computing holds promise in detecting and neutralizing bot activity.
- Early bots were simple programs designed for repetitive tasks or providing basic information, like Eliza and WebCrawler.
- Bots evolved to become sophisticated, leading to misinformation, ad fraud, and cyberattacks.
- Social media and programmatic advertising accelerated the proliferation of bots.
- Past efforts to combat bots include CAPTCHAs and other detection mechanisms.
- AI and machine learning have made bots more intelligent, adaptable, and harder to detect.
- Algorithms are being refined to detect authenticity signals through linguistic patterns and user behavior.
- Ethical implications of content removal and transparency in content moderation are crucial in addressing unauthentic content.
Making the Internet More Authentic: A Battle Against Bots and Fluff
Have you ever clicked on a search result only to be bombarded by nonsensical content and annoying pop-ups? Ugh, that’s bots and malicious blackhat and grey hat marketing along with “Internet Hustlers” at work, and they’re making the internet an oh-so-frustrating place. Bots and internet hustlers are responsible for nearly half of all internet traffic, distorting our data, stealing original content, and making it hard to find good ol’ authentic content. This article explores mostly the problems bots create, their evolution, historical attempts to combat them, and how we can reclaim the internet for genuine content.
The Psychology of Authenticity
Before we dive into how we can fix the internet, let’s take a brief detour into the realm of psychology. Specifically, let’s talk about authenticity. As a practicing Buddhist, one of the key concepts we focus on is “finding our most authentic selves.” Don’t worry, this isn’t going to morph into a spiritual sermon. I just want to highlight how vital this idea has been throughout human history.
Authenticity is the secret sauce behind the success of the best salespeople, the foundation of strong alliances between nations, and even a significant factor in political leadership. People often choose their leaders based on who they perceive to be the most authentic.
When I had my studio at 368/373 Broadway in New York City, I mentioned my neighbor, Casey Neistat, in my article about running. Inspired by him, I also dabbled in vlogging for a while. Casey’s formula was wildly successful for him, but when others, including myself, tried to imitate his style, it didn’t quite hit the mark. Why? Because we were copying his authentic style instead of being our most authentic selves.
Whether we realize it or not, our audience does. Our brains are incredibly adept at sensing authenticity, whether consciously or unconsciously. Sadly, many people go their entire lives never really discovering their most authentic selves and understanding what authenticity really is. They spend their time imitating others in hopes of gaining the recognition or success their role models have achieved. This is why professions like spies in the CIA, con artists, and actors invest so much time and effort into mastering the art of appearing authentic. Their trick? It’s almost a method-acting approach—they actually feel and become the person they’re trying to portray. Without genuine feeling, authenticity simply doesn’t exist.
Authenticity is the cornerstone of genuine connections, whether online or offline. It’s a lesson that applies universally, from our personal lives to the digital world we navigate every day. It takes time to discover who we most authentically are, many years of wisdom built on creating relationships with other authentic personalities, and once the authenticity is there, the only thing left to achieve success is drive. The problem with the internet is it seems more people have the drive, but lack the ability to create authentic genuine content. Sadly, often this drive takes a hard left into darker methods for the sake of some form of success, and thus, we are now left with a problematic internet.
Early Bots and Their Purpose
In the early days of the internet, bots were introduced with excitement and optimism. They were simple programs designed to perform repetitive tasks or provide basic information. An early example is WebCrawler, launched in 1994. WebCrawler was designed to index web pages, making it easier for users to search for information online. These early bots had noble purposes and significantly contributed to the internet’s growth and usability, but oh boy did it open a can of worms later on for BlackHat SEO marketing.
Evolution of Bots and Their Impact
Over time, bots have evolved from simple programs into sophisticated tools capable of performing pretty complex tasks, but sadly not our taxes yet. Advancements in technology have led to the emergence of social bots, spam bots, and malicious bots designed for nefarious purposes. Social bots, for instance, are used to manipulate public opinion on social media platforms, while spam bots flood email inboxes with unsolicited crap, and anyone who has run a WordPress Blog for a while knows the comment box pain, spamming backlinks to Viagra websites in China in the hopes of building SEO clout.
Sophisticated bots are now used for malicious purposes, including election interference and financial fraud. For example, during major elections, bots can spread misinformation and create fake accounts to influence public opinion. And before you chastise me, this is happening from multiple sides of the political spectrum. Financial fraud bots can manipulate stock prices or execute unauthorized transactions, causing significant economic damage, not just on a personal level but on a massive scale, costing banks millions or perhaps even billions every year.
Key Turning Points in Bot History
Several key events have accelerated the growth and impact of bots. The rise of social media platforms provided fertile ground for the proliferation of social bots. These bots can create fake accounts, generate fake likes and shares, and manipulate trending topics, distorting public discourse. This also gave rise to the much despised fake social media influencers scamming advertisers out of millions.
The advent of programmatic advertising fueled the development of ad fraud bots. These bots generate fake clicks on ads, wasting marketing budgets and skewing performance metrics. According to recent studies, businesses can lose up to 25% of their advertising budgets to bot activity, highlighting the significant financial cost.
Current Attempts to Combat Bots
Efforts to address the bot problem have been ongoing for years. One of the earliest solutions was the development of those annoying hard to read letters known as CAPTCHAs (Completely Automated Public Turing tests to tell Computers and Humans Apart). CAPTCHAs require users to complete simple tasks that are easy for humans but difficult for bots, such as identifying distorted text or selecting images. Boy-oh-boy, those things are starting to get really annoying now.
While CAPTCHAs and other bot detection mechanisms have been somewhat effective, they have limitations. Bots have continued to evolve and adapt, finding ways to bypass these security measures. This cat-and-mouse game has made it challenging to stay ahead of the tactics employed by malicious bots.
The Bot Problem: More Than Just a Nuisance
In 2023, bots accounted for nearly half (49.6%) of all internet traffic, a significant portion of which (32%) was attributed to malicious “bad bots.” This surge in automated activity presents a multitude of challenges:
- Data Pollution: Bots generate vast amounts of fake traffic and interactions, making it difficult for data analysts and scientists to extract meaningful insights. This forces them to spend a disproportionate amount of time (up to 80% in some cases) cleaning and filtering data.
- Misinformation & Manipulation: Bad bots actively spread misinformation on social media platforms, manipulating algorithms to amplify misleading or low-quality content. This not only erodes trust in online information but also makes it harder for users to find reliable sources.
- Advertising Fraud: Bots click on ads fraudulently, wasting advertisers’ budgets and distorting campaign performance metrics. This makes it difficult to assess the true effectiveness of advertising efforts.
- Cybersecurity Threats: Malicious bots are used for data mining, exploiting website vulnerabilities, and launching attacks like ransomware. This poses significant risks to businesses and individuals alike, requiring increased investment in robust security measures.
The growing sophistication of bots, in conjunction with their increasing prevalence, means we really need to get creative and innovative about how we plan to protect data integrity, advertising efficacy, and overall cybersecurity because it’s only going to get worse.
AI Platforms and the Battle for Authenticity
AI platforms like ChatGPT and Gemini offer a refreshing way to “get to the point” without dealing with the unauthentic hustlers who have ravaged our beloved internet with fluff and nonsense. These AI-driven solutions provide concise, relevant information quickly, bypassing the noise generated by bots and low-quality content. This is personally why I ditched Bing and Google for 80% of my content needs these days.
However, social media and the web are still integral parts of our internet lives and aren’t going anywhere. This article will address two things: how social media and search engine companies are finally getting a clue and trying to fix their algorithms to focus on more authentic, meaningful content and whether it’s possible to create an algorithm that can detect authentic content.
Refining Algorithms for Authenticity
Back in the ol’ days, search algorithms were uniform, providing the same results regardless of the user’s location or personal preferences. This approach was straightforward but limited, as it didn’t account for the diverse needs and interests of users. It was driven by keyword tags put in the metadata of a site aligning with a user search. Over time, algorithms evolved to be more personalized, leveraging data such as search history, geographical location, and user behavior to deliver more relevant results.
While personalization has improved the user experience, it also raises issues of bias and categorization. Personalization can lead to filter bubbles, and it can inadvertently perpetuate stereotypes based on demographic data. To create truly effective algorithms, we need to move beyond simple categorization and develop systems that serve unique, individualized content.
Social media and search engine companies are beginning to refine their algorithms to prioritize authentic, high-quality content. This involves telling the difference between human and bot interactions and developing new metrics to evaluate content authenticity. These metrics can include linguistic patterns, user behavior, and other signals that indicate genuine human activity.
Prompt Pairing: A New Approach to Personalization
So here is my approach: Imagine setting up your TikTok account by simply telling the app what you’re interested in. Instead of choosing from preset categories, you could type or speak your preferences:
“I am a data scientist interested in STEM, DIY, and computer history topics from authentic content creators who get to the point quickly, unravel complicated topics, and speak in an engaging style. I do not like videos that come across as clickbait where a simple answer is at the end or multi-part videos. Additionally, I don’t want to see videos with dancing, pranks, or politics.”
This method, similar to prompt engineering that we use in Chat GPT, allows for a more personalized and meaningful internet experience. It bypasses the need for age, gender, political affiliation, or location data, focusing purely on user interests. Not everyone is a prompt engineer, so perhaps Grandma needs a little assistance. Imagine a voice interface on TikTok guiding her: “Hey there, what type of content would you like to see?” Grandma: “Is this thing talking to me? Hello! I want to see a video on how to fix my damn sink!” TikTok: “Okay, sounds like you want to see home repair videos and perhaps videos on how things work, such as plumbing, electricity, and carpentry. Does that sound right?” Grandma: “Yes, it does! And don’t get smart with me, young lady!”
Over time, there could be check-in periods to tweak the algorithm to better suit individual preferences. This approach can be applied to other platforms like Instagram, YouTube, and search engines as well. For LinkedIn, you might say: “I am a marketing professional interested in innovative digital strategies, case studies from industry leaders, and networking opportunities with peers. I want to avoid generic posts and sales pitches.”
For Spotify, you might specify: “I’m a metalhead. I prefer discovering new artists similar to Guns N Roses and Static X, and I don’t want to hear top 40 pop or electronic dance music because metal is life!”
Addressing Existing Unauthentic Content
So now that we have talked about making algorithms for the individual audience members who are being served the content, how do we deal with the unauthentic content that already exists and is being created right now?
Well, we can use engagement rates as we have been doing for many years now. How long the average user spends watching your video or reading your blog, but the problem with this again is bots can really screw these results up as they are sometimes on the site for less than a few seconds. So this means we need advanced techniques such as anomaly detection, behavioral analysis, and pattern recognition to identify common bot behavior. A great start, which I imagine is already being practiced, is nullifying the data and tossing out anything that lasts only a few seconds, but this leaves us with another problem:
Imagine you are on TikTok, and an annoying video from a creator with a nasally voice pops up: ” Hey, guys! You’ll never believe…” NEXT Video, PLEASE!
So, if we ignore all of the data from audience members spending a few seconds on a video because we are clustering this behavior in with bots, we are now rewarding a poor content creator creating clickbait videos. Before we write off engagement from our algorithm conundrum just yet, let’s circle back to our prompts from the audience member.
This is where pairing engagement comes in. Let’s say the TikTok Algorithm pairs me with STEM creators on TikTok. When it comes to Machine Learning (ML), which is constantly tweaking the algorithms in the audience’s favor, we want to give certain variables more weight over others. In this case, we want to give a pairing weight. So, for instance, if TikTok notices that I spend 80% of my time looking at STEM videos on TikTok, it’s going to say, “Hey, Erika is like a STEM movie critic, and her opinion matters more than the regular joe schmo about this stuff,” but instead of movies, it’s videos about STEM topics. So TikTok would give me a heavier weight on how well I engage with STEM videos, say, over someone who stumbles upon a STEM video but watches mostly cooking or gardening videos. My engagement counts more than their engagement, and vice versa if I stumble upon a gardening video.
So if I, and heavily weighted STEM audience members who have a good history of engaging with STEM videos, come across a clickbaity STEM video and simply don’t engage with it or give it a poor engagement score, the video gets buried, and more authentic content gets rewarded.
Let’s take a closer look at what this pairing algorithm might look like:
Pairing Refinements
Pairing Refinement Process:
Initial Input: Users provide detailed preferences through text or voice input.
Example: “I am a data scientist interested in STEM, DIY, and computer history topics…”
Pairing Formation:
Broad Pairings: Start with general categories based on user input (e.g., STEM, DIY).
Sub-Pairings: Refine categories as more user data is collected (e.g., STEM -> Physics, Chemistry).
Micro-Pairings: Create specific pairings for niche interests (e.g., Physics -> Astrophysics).
Pairing Weight Calculation: Assign weights to pairings based on user engagement and consistency. Higher weights are given to content types that users engage with more frequently.
Dynamic Adjustment:
Time-Based Decay: Gradually reduce weights over time to reflect changing interests.
Engagement-Based Decay: Adjust weights more rapidly if users start engaging with different content.
User Feedback Integration: Continuously collect feedback on content to refine pairing accuracy. Negative engagement (e.g., skipping videos quickly) is used to adjust pairings.
Clickbait Mitigation: Use pairing weights to evaluate the authenticity of content.
High-weight users’ engagement is given more influence in determining content quality.
The Formula for Pairing Weights:
Pairing Weight (PW):
PW = ∑i=1N (Ei × Wi × Ci × Di) / N
Where:
- Ei = Engagement score for content i (e.g., watch time, likes, comments)
- Wi = Weight of user engagement based on their history in that pairing
- Ci = Confidence score in the pairing based on consistency and engagement
- Di = Decay factor (reflecting changes in user preferences over time)
- N = Number of engagements considered
Example Scenario:
Erika, the STEM Enthusiast:
Input: “I am a data scientist interested in STEM, DIY, and computer history topics.”
Pairing Formation: The algorithm identifies strong initial pairings for STEM and DIY.
Engagement Tracking: Erika’s engagement with specific STEM content (e.g., physics, computer history) is tracked.
Pairing Weight Adjustment: If Erika engages heavily with physics videos, the weight for this pairing increases.
Clickbait Mitigation: Erika’s high-weight engagement helps filter out clickbaity STEM videos.
Evolving Interests: If Erika starts engaging more with gardening content, the algorithm gradually adjusts pairing weights to reflect this new interest.
Challenges and Solutions:
Cold Start Problem:
Solution: Start with broad pairings and quickly adapt based on user engagement.
Pairing Ambiguity:
Solution: Allow for multiple pairings and dynamically adjust based on overlapping interests.
Manipulation:
Solution: Monitor for suspicious behavior, use human moderation, and penalize manipulative tactics.
Of course, this is not a perfect system, as we, as a species, are not perfect, and one of the problems we have not talked about in this article is the years of conditioning many of us have grown accustomed to with an internet that contains mostly poor content, simply made for the sake of generating ads.
Quantum Computing: A Glimpse of Hope
While bots are here to stay, quantum computing offers hope. With companies like IBM and Google pioneering quantum computing, we may soon have the tools to detect and neutralize bot activity effectively. Quantum computers can process vast amounts of data at unprecedented speeds, identifying patterns and anomalies that classical computers might miss. However, given the current state and demand for quantum computing in other fields, it might be a while before it can fully tackle the bot issue.
The Rise of AI and Machine Learning in Bots
The integration of AI and machine learning has revolutionized bots, making them more intelligent, adaptable, and difficult to detect. AI-powered bots can automate tasks and provide personalized experiences, but they also pose significant risks. These bots can be misused for manipulation and deceit, further complicating the fight for authenticity online.
The Road Ahead
Making the internet more authentic requires a multi-faceted approach. From refining algorithms to leveraging AI and quantum computing, there’s a long road ahead. But with continued effort and innovation, we can reclaim the internet as a space for genuine, high-quality information.
While the internet has become a stranger for authenticity, there are promising developments on the horizon. By focusing on improving algorithms, leveraging AI, and preparing for the potential of quantum computing, we can hope to restore the internet’s original promise of genuine, valuable content.
I am not a data scientist and yet I found this article wonderfully readable and I feel I understand the challenges and opportunities of authenticity on the web so much better than before. Thank you, Erika
Thank you so much for reading Dennis! 😀