That industry thrives on search engine optimization and is hypersensitive to intellectual property issues such as plagiarism, particularly since Google LLC has made it clear that it frowns on publishers of stolen and auto-generated content. Google isn’t saying whether it has cracked the code of how to detect machine-generated text reliably, but a Canadian content marketer thinks he has come pretty close, even if he isn’t exactly sure how the solution he’s selling actually works.
Originality.ai launched early this month claiming that it can detect content generated by popular natural language processing engines like Generative Pre-trained Transformer-3, GPT-J and GPT-Neo with accuracy rates of well above 90%.
Rescuing the term paper
That could be a big deal for people in the content marketing and academic fields. The prospect that artificial intelligence can soon produce long-form content that rivals the quality of human writers even prompted The Atlantic last week to question recently whether college essays are dead.
“I think there’s a monster wave of AI-generated content coming to universities and they are going to struggle to handle it because it’s not like plagiarism,” said Originality.ai founder Jonathan Gilham.
Content marketers use an assortment of plagiarism checkers to protect themselves from publishing stolen intellectual property. They don’t fear prosecution so much as a knuckle rap from Google that could send their SEO scores plummeting.
Gilham knows this firsthand. He spent seven years starting and running content marketing and advertising businesses where he became frustrated with the quality of plagiarism checkers on the market.
“They weren’t built for teams that were putting out thousands of pieces of content, so we set out to build a plagiarism checker that had team and enterprise-level functionality,” he said. “And we thought it could also detect AI.”
Gilham and co-founder Conor Watt farmed the project out to developers who created a machine learning algorithm trained on the most popular natural language processing models. Unlike Hugging Face Inc.’s AI detector, which uses probabilistic techniques to guess the words an AI content generator is most likely to use, Gilham said, “our AI is much heavier on the compute side and looks at the article holistically, not using a linear function.” He noted that a recent test on a small corpus of articles generated by ChatGPT yielded accuracy rates of over 98%.
So how does it work? Gilham admits he doesn’t exactly know. What he does know is that in a test across 10,000 samples of GPT-generated content, the algorithm detected the stuff written by a machine better than 94% of the time.
Originality.ai is making its service available for a penny per 100 words scanned and has signed up more than 1,000 paying customers in just two weeks, including users from 15 universities. The founders aren’t actively fundraising. “It looks like we’ll be cash flow neutral with the development team we have,” Gilham said.
Google hasn’t said whether it has solved the AI content-detection problem, but given its deep NLP experience, it’s probably well along. Nevertheless, “they’re not going to come out and share their solution, so we think publishers will need their own tool,” Gilham said. “That’s our niche.”
Does it bother him a little that he can’t explain how Originality.ai works? Yes, he admitted but half the trading decisions on stock markets are made by machines without any direct human oversight. “I know how the tests work and I’m very confident it its accuracy,” he said.
Show your support for our mission by joining our Cube Club and Cube Event Community of experts. Join the community that includes Amazon Web Services and Amazon.com CEO Andy Jassy, Dell Technologies founder and CEO Michael Dell, Intel CEO Pat Gelsinger and many more luminaries and experts.