Are AI detectors accurate? — Debunking The Marketing Hype!

Zia Sherrell Gravatar image

by | Dec 9, 2023 | Tools | 0 comments

Post tags:

Author bio:

Candid advice on all things health copywriting and marketing.
Are AI detectors accurate - image of a microprocessor with the letters AI

Are AI detectors accurate, or are they a sign we’ve lost touch with common sense? Research confirms that AI content detection tools are neither accurate nor reliable. These tools aren’t foolproof, and because they fall short of their intended purpose, clients may incorrectly suspect writers of using AI-generated copy.

Anyone considering the services of a freelance writer knows that content is king. Blogs, white papers, case studies, and social media posts help establish your authority, educate readers, attract new leads, and help your brand thrive. Well-crafted content is the linchpin that connects you with your audience.

However, when ChatGPT exploded into the limelight at the end of 2022, the world of digital marketing and content production entered a new era. AI systems, supported by powerful machine learning algorithms, can churn out content at an unprecedented speed and scale, allegedly offering a supposedly cost-efficient and convenient way to meet the ever-growing demand for online material.

But here’s the conundrum — Can an AI model truly replicate the essence of quality,  human-written content? Can it capture the nuance, the empathy, and the artistry that professional writers infuse into their work? Can it create the flowing rhythm of words that entrances audiences? Judging by some of the outputs I’ve seen, AI falls extremely short.

Becoming a maestro writer is a huge undertaking. Years and years of practice and education go into honing the craft. And so, writers aren’t just wordsmiths — they’re storytellers, researchers, and communicators with the unique ability to distill complex topics into comprehensible yet clear copy that boasts a distinct voice.

Many discerning clients value the skills that an accomplished writer brings to the table, and they prefer to invest in quality rather than quantity. In an effort to uphold authenticity, some clients consider adding  AI content detection tools to their content publishing process. These tools, touted as guardians of quality, are often marketed as the solution to combatting the proliferation of inferior, AI-generated content.

On the surface, using AI to detect AI-generated content seems like a fitting response to a growing concern. However, it’s hugely problematic, and these AI content detection tools are neither infallible nor trustworthy — they’re just digital snake oil. As a result, there’s a worrying and growing trend of writers being accused of plagiarism or content duplication based on the flawed judgments of these tools. It’s a trend that raises significant questions about the effectiveness and fairness of these tools in the context of professional content creation.

Continue reading to learn more about AI detection and why it just doesn’t make sense!

What are AI detection tools?

AI-content detection tools are software applications powered by artificial intelligence algorithms designed to analyze and assess content. They aim to determine the originality and authenticity of a given piece of text, helping users identify potential issues such as plagiarism and AI-generated content.

AI detectors have extensive content databases like academic papers, articles, and websites. Using a combination of machine learning and natural language processing techniques, they compare the text they’re analyzing against the database to identify similarities and patterns of word sequences, phrases, or structural elements. They then generate a report or score that suggests the probability that content is AI- or human-written. 

image showing brigh yellow bulb standing out from 4 other blue bulbs

Is AI and plagiarism detection the same?

AI detection tools seem to be built on the premise that AI-generated writing should exhibit the same detectability as plagiarism. But! There’s a critical distinction.

Plagiarism and AI-generated content are different beasts. Plagiarism involves copying or closely imitating another person’s work, ideas, or expressions without proper attribution or permission and passing it off as your own work. It’s typically derived from existing human-authored sources and so is NOT original.

Conversely, AI-generated content is made by computer programs or algorithms based on patterns and data, often without direct human involvement. AI pulls from hundreds or thousands of sources and creates new text based on this information. So even though it’s machine-created, it’s original, or at least as original as any writer’s. After all, there are only so many sources to learn from and words to use appropriately in any given situation.

And it’s these distinctions that make AI detection challenging, to say the least. Plagiarism detection involves spotting precise matches with previously published content — a quantifiable and replicable criterion. I can easily guarantee that my work is plagiarism-free, as I can confirm it with plagiarism detectors. 

In contrast, AI-generated writing is original in its own right, so how can you trace it? It’s a flawed concept.

image showing 5 wooden cubes. center cube has image of a pair of balance scales, the left shows 2 ai icons and the right shows 2 human head icons

Why use AI detection tools?

Thanks to persuasive copywriting and marketing, you may be led to believe that AI content detection tools can support robust content publishing and quality control processes. 

These tools leverage the belief that all AI-generated content is inherently bad, and some even tout the outdated and incorrect notion that Google will penalize its use, even though Google has confirmed this isn’t the case. Although these views are inaccurate, some people prefer to use AI detectors in an attempt to verify that the content they receive from writers is genuinely human-crafted, even though the most crucial thing is the quality of the content.

That said, I’d have no problems with the tools if they worked! I’ve created high-quality work for many years before AI came onto the scene, so it makes no difference if a client prefers human-written content. But this is where the problem lies. 

Tools such as Originality, Copy Leaks, GPTZero, and others cannot accurately discern between copy written by a human or AI generator. These tools have significant limitations that hinder their effectiveness, and as a result, genuine human-written content can be falsely flagged as AI-generated. 

The irony is that while AI-generated content is often criticized for its inability to replicate the essence of human-written work, AI content detection tools face a similar challenge as they attempt to distinguish between the two. 

What are the problems with AI content detection tools?

The primary concern is unjustly implicating writers and students in using AI-generated content. Even when this isn’t the case, there’s an increasing number of writers being forced to jump through hoops, doubling or trebling their time and effort to create content that complies with the expectations imposed by these tools. For this reason, many freelancers now avoid working with clients who insist on using AI-detection tools instead of valuing the writer’s expertise and skills and assessing their work on its merits.

No matter your views on AI content generation, the crux of the matter lies in the effectiveness of these detectors themselves. There’s a growing realization that they often fall short of their intended purpose.

Don’t take my word for it. There are many experts who agree.

Soheil Feizi is an assistant professor of computer science at the University of Maryland. In a recent article, he stated, “Current detectors of AI aren’t reliable in practical scenarios.” He continued, “There are a lot of shortcomings that limit how effective they are at detecting.”

He explained that the mistakes made by AI detectors can be highly damaging and called for caution when relying solely on AI detectors to authenticate human-created content.

photo of a man being accused by 5 pointing hands

The article continues to quote Feizi. “Let’s say you’re given a random sentence. Theoretically, you can never reliably say that this sentence was written by a human or some kind of AI because the distribution between the two types of content is so close to each other.”

Research earlier this year shows these tools typically have an accuracy of less than 30%, with the best tool achieving just 50% accuracy. Interestingly, Turnitin, a widely recognized AI writing detector, said that the results the tool calculates should be taken “with a grain of salt,” and lecturers should make the final interpretation regarding what is and what isn’t produced by AI. But isn’t that the whole idea and purpose of these tools in the first place? Why use them if you, the human, still have to make the final determination?

And if that wasn’t enough evidence of their failings, Originality.ai, another leading content detection tool, stated that these tools “are not 100% accurate, meaning they can return false negatives and false positives.” They continued, “This is why such apps should not be relied upon solely for detecting AI-written content.”

The issue is that language = patterns. A particular sentence can be put together in so many different ways. Because a computer trained on human writing and an actual human write in very similar ways, it’s all but impossible to be sure who wrote it, a person or a computer.

“There is an upper bound on our detectors that fundamentally limits them, so it’s very unlikely that we’ll be able to develop detectors that will reliably identify AI-generated content,” concluded Feizi.

Feizi is far from alone in this stance. In February 2023, Armin Alimardani, Lecturer at the University of Wollongong, and Emma Jane, Associate Professor at the University of New South Wales in Sydney, tested various AI content detectors and concluded that none were reliable and that it will never be possible to make AI text identifiers perfect.

In fact, the biggest player in the AI field agrees!

piece of paper with the word inaccurate being torn to make the word accurate separate

OpenAI has declared that AI writing detectors don’t work. Open AI is the artificial intelligence research and development organization behind Chat GPT, so if anyone should understand the accuracy of AI writing detectors, it’s them!

In answer to the question, “Do AI detectors work?” OpenAI stated, “In short, no. While some (including OpenAI) have released tools that purport to detect AI-crafted content, none of these have proven to distinguish between AI-generated and human-written content reliably.”

Another huge neon sign signaling that these tools don’t work is the fact that OpenAI has discontinued its own AI writing detector due to its “low rate of accuracy.” Somewhat amusingly, when the OpenAi team tried to train an AI content detector, it marked Shakespeare and the Declaration of Independence as the work of machines!

Furthermore, formulaic or concise writing is more likely to be incorrectly flagged as AI-fabricated. In other words, organized, clear, and coherent articles are more likely to be flagged as AI-generated than rambling, unclear pieces chockablock full of spelling and grammar errors.

These tools don’t definitively say if AI wrote a text. Instead, they assess whether AI could have written the text. This is an important distinction.

Detectors compare the text in question to human-written and AI-generated examples. However, because AI writing models like GPT-3 and 4 are trained on vast amounts of human-written text, the lines between what’s ‘AI-written’ and what’s ‘human-written’ can blur. As a result, well-structured, clear, and coherent writing — just like the writing an experienced freelance writer would produce — can sometimes be mislabeled as AI-generated; the tools are essentially comparing apples to apples.

So are AI detectors accurate? No. The conclusions of AI content detection tools are unreliable, making it a problem without a clear solution.

Image showing the word unreliable spelled out with wooden letter cubes

Are there any AI-content detection experiments?

Yes, there are countless examples! One of the most pervasive is that of Surfer SEO. Surfer is a well-known SEO (Search Engine Optimization) tool that helps people optimize web content to perform better in search results. They performed an experiment with the Originality.ai AI detector using 100 AI-generated and 100 human-written articles. They tested them in three different scenarios.

Scenario 1 used a human score of 50%, meaning articles must score above 50 to be marked as human-written. The results showed that 78% of the AI-generated articles had a human score of at least 50%. Conversely, 10% of the human-written samples were incorrectly deemed AI-written content.

Scenario 2 had a human originality score of 80. At this level, more than 20% of the human-written samples were flagged as AI-generated text. That’s over 1 in 5!

The inaccuracy was even higher when the minimum score was set to 90 in scenario 3. Almost 1 in 3 of the human-written articles were marked as AI-generated.

In other words, the stricter the tool was at flagging AI content, the more inaccurate the results. Scary, right? Would you be happy to use a medical or other test with that inaccuracy rate?

My AI-detector experiments

As a former research scientist, I love nothing more than a good old-fashioned experiment. So, I decided to investigate some AI-content detectors and see what I could find. 

Let’s start with a brief example using Copyleaks, a leading tool that claims to be the most accurate on the market. Here’s a screenshot of the results from a blog from October 2022, before ChatGPT hit the mainstream. I certainly hadn’t heard of it at that point.

Screendump of AI content detector

As you can see, the text has a 57.4% probability of being human-written. I then compared it to a blog from August 2023:

Screendump of AI content detector

Here, the probability is 51.9% of being human-written. It’s virtually the same result as one that couldn’t have been AI-written. 

And what does it mean? How do you interpret a probability of around 50% of something being AI or human text? It’s as random as tossing a coin to decide, with a 50% chance of landing on the AI side and 50% of the human-written side. It’s as likely to be either. So, how are you supposed to interpret results that are around 50%

I then did some more in-depth investigation. You can watch screen recordings of it here to confirm the experiment’s authenticity, but it’s almost an hour long, so here’s the TLDR:

Stage 1

I wrote an article from scratch with zero AI input. At this stage, I haven’t used a grammar or spell checker. You’ll see me copy and paste the text into various AI-content detection tools, including the Open AI detector (which no longer exists!), Conentdetector.ai, Copyleaks, and Content At Scale. 

All the detectors agreed that the text was human-written. Interestingly, even though Content At Scale flags the text with a 100% human-written probability, it marks a couple of paragraphs for AI. That doesn’t add up as if any part of the text is AI-generated, then how can it be 100% human-written? Confusing!

Stage 2

Here, I copied the original document onto a fresh sheet and then ran the copy through Grammarly to correct spelling and grammar issues. Most writers use a tool like Grammarly to edit their work before submission to ensure it’s error-free and it isn’t generative AI. It simply highlights errors such as missing commas, spelling errors, word omissions, etc, helping users enhance the overall quality and effectiveness of their writing.

I then copy and paste the text into the same AI-detection tools. Copyleaks incorrectly flagged a large proportion of the article as AI-generated content. Looking closely, you can see that it even marked copy that hadn’t been flagged in stage 1 yet hadn’t been changed between each analysis. 

Content At Scale flags the text as 50% human probability, meaning it can’t tell either way. 

This step saw the most significant change. Simply correcting spelling and grammar errors seems enough to convince the tools that the work is AI-written!

Stage 3

This is the post-Grammarly initial edit with no AI input. Once again, I pasted the copy into the same AI-detection tools with various results.

Copyleaks marks a substantial proportion of the middle of the article as AI-generated. It’s not identical to the areas flagged in stage 2. Content At Scale also incorrectly marks the copy as likely to be AI-generated, with the probability of it being human-written dropping from 50% to just 33%.

Stage 4

When I write, I leave the article for a day or so, then return to make second edits. 

Here, you watch me make these edits, and, of course, there’s ZERO use of AI. However, Copyleaks continues to mark the copy as potentially AI-generated, although the proportion of attention-grabbing red is less than in stage 3. Despite the edits, Content At Scale still marks the copy as 33% likely human-generated.

Stage 5

Final edits and polish. As previously, the text passes as human-written with Open AI detector and Conentdetector.ai. However, overenthusiastic Copyleaks marks some text as having a 71% probability of being AI-generated, and Content At Scale maintains a human probability of 33%.

Test 2

In the first experiment, the most significant decrease in the likelihood that the text was human-written happened between steps 1 and 2. I conducted another test to investigate further and eliminate Grammarly as a potential cause for this change.

In the second test, Copyleaks initially classified the raw text as human-written. Good! 10/10 Copyleaks. Then, I ran the text through Grammarly’s analysis, and once again, Copyleaks did well, and it remained classified as human-written. 

Interestingly, it was only marked as potentially AI-generated after making my final edits, which primarily involved replacing certain words I had repeatedly used throughout the piece. I made these changes to enhance readability, as text with various unique words tends to read better. Once again, I didn’t employ AI during my writing process at all, yet Copyleaks flagged a few sentences as potentially AI-generated. 

Test 3

I tested an article I wrote from scratch with zero AI input. It featured two experts and various direct quotes. The analysis used Originalty AI, which claims to be the best on the market!

Many areas were flagged with a 90% probability of being AI-generated. However, several highlighted areas are direct quotes from the experts I interviewed and were definitely not AI-generated!

The results of both of my tests were utterly inconsistent, showing that the detector struggled to distinguish between human and AI contributions. The fact that the AI detection tools couldn’t accurately identify authentic human-written content, including 100% genuine quotes, is disconcerting, to say the least.

image showing different podium style microphones with a vibrant background

If AI detectors don’t work, what are the alternatives?

What options are there in light of the limitations and inconsistencies of AI content detectors?

Human assessment

First and foremost, understand that the most critical factor in the AI vs human-written copy debate is *drum-roll* quality. You should assess if an article is well-written, coherent, relevant, engaging, and factually correct. Have you enjoyed reading it? Does it make sense? Would you be happy to feature the article on your website?

rubber quality stamp on vibrant background

Or does it sound robotic, repetitive, formulaic, and lacking in coherence and flow? Is it full of fluff and lacking in ‘meat’? Are the nuanced understanding of human emotions and cultural contexts missing? Is there an absence of genuine relatability and voice that makes quality writing stand out?

I highly recommend giving ChatGPT a try and experiencing its capabilities firsthand. It’s a user-friendly tool that you can explore for free. By experimenting and observing its output, you’ll gain valuable insights and become more adept at recognizing its content. You’ll see that the raw output is far from perfect and lacks the color and value that distinguish professionally crafted content.

While it can generate text on a wide range of topics and mimic human language to some extent, you’ll likely notice that it falls short of capturing the depth of expertise, genuine emotion, and cultural understanding that human writers bring to the table.

Like many freelance writers, you may think I’d be against such tools, fearing that they’ll take my job. But here’s the thing. In skilled and capable hands, it’s an incredible assistance tool for research and complementing human creativity and knowledge. It only poses a threat to low-end, generic content creation as it’s not the tool itself that determines the quality of content but the person wielding it. Much like a paintbrush in the hands of an artist, ChatGPT is a tool that relies on the user’s expertise and intention.

So, I recommend relying on your human expertise to judge the quality of an article written for human readers. Provided that the article is free of plagiarism, the key factors remain rooted in human perception. It’s about recognizing the hallmarks of exceptional content that resonate with your audience.

And Google agrees. Google rewards original, top-notch content with expertise, experience, authoritativeness, and trustworthiness (E-E-A-T), no matter how created. So, if Google isn’t concerned with AI-written content and only cares about quality, who are we to question its value?

Bear in mind, though, that you’ll be penalized for low-quality, mass-produced spammy content that offers no value, whether human or AI-written. So, always prioritize excellence.

Screen capture

But what if you’re still concerned about AI-generated content? In that case, one viable option is to leverage tools that provide transparency in the content generation process. For example, tools like Originality.ai offer a “watch a writer write” feature that records the writer’s screen while they work in Google Docs, allowing you to closely monitor their keystrokes, edits, and deletions in real time.

image of a ldy looking through a pair of binoculars

With this level of visibility, you can easily spot instances where the writer might copy and paste content from an AI source. This added layer of oversight ensures that the content produced remains in line with your quality and authenticity standards. And if that’s something you’d like to add to a blog writing package, it’s certainly something we can discuss.

Unfortunately, because of the infallible nature of AI detection tools, I cannot incorporate them into my writing and editing process. I cannot justify the complex and time-consuming processes required to “beat” the detector if they incorrectly mark my meticulously written work as AI-generated.

I value my professional reputation highly and genuinely care about the work I do for my clients. Over the years, I’ve come to realise that I align most with clients who value my expertise, insight, unique voice, and dedication to producing outstanding content. These are the partners who recognize the intrinsic value of my work without the need for the constant validation of unreliable metrics. Therefore, these are the clients I choose to collaborate with, and it’s a decision that allows me to continue delivering impactful, thought-provoking, and insightful health copy, unfettered by the uncertainties of AI content detection.

Are AI detectors accurate? No! So put quality first

If you value scroll-stopping copy that deeply connects with readers and understands the significance of the human touch to relatable and impactful content, then we’re on the same wavelength. So, if you’re seeking a seasoned professional health copywriter who prioritizes quality, authenticity, and the artistry of the written word, you’ve come to the right place. 

Let’s breathe life into copy that stands out, leaves a lasting impression in the hearts and minds of your audience, and ultimately delivers tangible results. Sounds good? Send me a message, and let’s chat.

Blog piece footer image

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

Looking for compelling health information?