- The AI Product Report
- Posts
- Unmasking the Impostor: Can AI Outsmart AI in the Ultimate Whodunit?
Unmasking the Impostor: Can AI Outsmart AI in the Ultimate Whodunit?
In Other News: Interesting AI Regulation Coming Out Of Wisconsin
For those that are new around here, this is weekly newsletter where I highlight new and innovative AI products that are worth exploring.
Hey hey!
Happy Friday! We’re back for another issue.
In this week’s issue:
Product (Experiment) of the Week
Other AI Things Happened
What I’m Reading
PRODUCT (EXPERIMENT) OF THE WEEK
After testing dozens of AI products this week. Here’s my report!
Pitting AI against itself is something…
In this week's edition, we delve into the intriguing realm of AI detection systems, a sector where artificial intelligence is employed to identify its own kind in a crowd teeming with both human and machine-generated content. This week’s experiment mirrors a narrative that could belong in a science fiction novel, but it’s here, for real, and has become an increasingly sophisticated game of digital hide and seek.
The Shape Of The Experiment
Here’s what the game plan looks like.
Step 1: We need our Human generated data - free of copyrights, and plentiful of English text (most AI get preferentially trained in English), preferably preformatted in plain text. Destination? Project Gutenberg.org , grabbing ourselves a while literature classic. I decided on a piece that was familiar to me already, the first book of the Count of Monte Cristo, written by the one, the only, Alexandre Dumas.
Step 2: We need our AI-Generated data - Getting it dangerously close to the real deal. So off to ChatGPT 4 we go! I used the 2 following prompts in a fresh chat window to generate the testing text.
Prompt 1:Here's the body of text please answer the next prompt replicating the writing style and tone of the text above. The text: [...]
Prompt 2: The text previously provided to you was actually chapter one of a larger story please provide me with a unique, creative, never-before-seen continuation of the story respectful of the writing style and tone.
Step 3: We are now armed with data, lets move on to testing time!
Step 4: Profit! (just kidding, frankly, let’s talk Results)
Some Notes Going Into This
Why do this check myself if others are already reporting on it? To make a few points about AI developments - many parallels to make in this case:
I want to show you, the reader, that the AI diaspora has a number of nuances that are task-contextual. Sometimes the right tech is used for solving the wrong problem, and sometimes it’s the wrong tech for an addressable problem. AI’s REAL challenge for human applications involves quite a complex series of topics to consider:
Until we achieve a collective, teachable baseline of the sense that not every problem is inherently capable of being solved with our current iteration of AI, we will still do quite a bit of guesswork in how AI models are built.
Sometimes frankly not all problems are worth solving with AI as we saw with last week’s photo enhancement experiments.
Obviously these barriers are also looping questions worth reassessing based on the latest developments of model architecture & breakthroughs in fundamental computing science.
Why not test images? Why not test videos? to the exception from extremely new developments, AI generated images with quite a few of the models that are currently on the market are frankly still too discernable from human-produced and/or CGI content. This is a point I'll have to revisit once a few competitors sprout up that catch interest beyond Sora (https://openai.com/sora). Also video and photos have a bunch of trailing metadata that provides an additional vector into the generative conversation that's not a focus, since there's much minutiae.
I'm sidestepping deepfakes because I wont be taking part in perpetuating their use, nor facilitating their creation, nor distribution.
The Competitors’ Roster:
Let’s check out some of the main players in this game of AI detection systems! Our players are, drumroll please:
Results Of This Test:
To shorten this for the busy lot of you, I’ve made you a highlight reel with a reference table of the results! I threw in some confidence scores where I could.
There are some BOLD claims in here from some of these detection solutions 😅😅
Verdict & Trailing Thoughts
Looks to me like the most clear winners in terms of tool detection of classic French literature versus AI inventions were Sapling Ai and GPTZero? Where I hesitate to make a call is due to a problem. With both of those high scorers, there was a large amount of confidence displayed from all the results presented. This brings about the exact point I was concerned about above: training bias. With no user-provided “explainability” of the judgement beyond sentences being highlighted, did this trigger primarily from the detection solution’s ability to recall data that was used in it’s training, or did the computation work successfully.
Developer Thought: This testing makes me winder if there is a way to develop a model state meta-analysis method to flag which “principles” of a task are understood by the model, and indicate which principles raised suspicions during computing.
From an ethics standpoint, I have a few qualms about AI detection tools since they start to inch us as a society closer to a few difficult positions. Consider my points here:
Bias, Fairness and transparency in computing: AI detection systems risk embedding biases from their training data, potentially leading to unjust outcomes that could disproportionately affect certain demographics (like non-native English speakers - like me! This is confirmed by a 2023 study). The lack of clarity on how systems make decisions is still a non-negligible barrier to fully trusting these technologies for cases where accountability is at play.
Dependence and Error Rates : An overdependence on AI detection tools puts human evaluators’ critical thinking skills on a downhill by encouraging blind trust in automated decisions. As we say even during this test, inaccuracies in AI detection, including wrongful identifications and missed detections, raise serious points of concern when the impacts of these computations are very real and can make or break lives & livelihoods. Consider the case of Texas A&M University-Commerce and students being openly accused of a lack of academic integrity.
With this issue’s tests actually, some of these solutions had the gusto to be 100% confident in their answers. Where I find issue with this is that the developers leave little room for the fallibility of their own solution.
OTHER AI THINGS HAPPENED
Some other notable news and product launches from this week
“ Months after it was first announced, Nightshade, a new, free software tool allowing artists to “poison” AI models seeking to train on their works, is now available for artists to download and use on any artworks they see fit.” - Carl Franzen, VentureBeat
We’re getting persistent memory for ChatGPT! Check out the Blog post from the OpenAI crew themselves!
Law and regulation seems to be taking shape in more local circles than the international one! The state of Wisconsin has passed a law about AI generated materials in political advertisement campaigns.
Interesting, somewhat controversial take published by the Times of India stating that a huge percentage of the AI startups cropping up are likely to crumble in a short time span due to high operations costs, and the inability to in-house the technology. Where I think this is a bit off, is that quite a bit of AI tech at its baseline is open-sourced, and can, with some effort, be in-housed with some degree of success. Even LLM-dependent companies have access to quite a few options, and once combined with their ever-growing operational data from being in business, they would have recourse to sustain themselves. Mind you a mild degradation of services might be a more reasonable stance to take than calling for a 90% crunch “apocalypse” scenario.
Raises & Mergers Recap:
SoftBank is opening a round seeking to invest in India-based AI chip manufacturing. This comes at an interesting time given what I’ve reported a few issues ago on how Korea is joining the mêlée that’s already ongoing between major US and Chinese firms in that space.
Here’s one for medical science! The French startup AZmed has closed a funding round to grow its radiology products with a series A. They’re already FDA cleared for a medical imaging solution, and now they’re gearing up for even more innovation.
WHAT I'M READING
"Life can only be understood backwards; but it must be lived forwards."
- Søren Kierkegaard
Just last week there were quite a few large gatherings of AI minded experts in Montreal, Canada. Since it’s impossible to be everywhere at once, I’m playing a little bit of catchup on the topics that were discussed during the previous year’s conference, and tracking down lots of really good information on the greater academia’s impressions of sustainable, ethical AI development, and the guardrails that need to be in place to successfully influence its development.
Here is the link to MILA that you’ll also find interesting:
Stay well, and until next week.
-✌🏽 Sam
P.S. Interested in having me give you private feedback about a product that you are building? Send me an email: [email protected]
Reply