news

Google has just open-sourced the AI-powered text detection tool for everyone.

Eslam RamadanOctober 24, 2024

39 2 minutes read

Google has just open-sourced the AI-powered text detection tool for everyone.

“PCWorld is on a journey to delve into the information that resonates with readers and creates a multifaceted fabric to convey a scene of deep enrichment.” This may sound like nonsense created by artificial intelligence, but it was actually written by a human – you indeed.

The truth is, it’s difficult to know whether a specific part of the text was created by artificial intelligence or written by a human. Google aims to make it easier to detect this through opening the source code for its new software tool.

Called SynthID by Google, it’s a method that “watermarks and identifies content generated by artificial intelligence.” The company, which previously focused on its own language and image generation systems, announced that SynthID will be released as open-source code that can be applied to other AI text generation setups as well. (If you’re more knowledgeable in scientific fields than me, you can find all the details on the prestigious Nature magazine website).

From the perspective of an ordinary person – at least to the extent that this ordinary person can actually understand it – SynthID conceals specific patterns in images and texts that are generally extremely precise so that humans cannot discover them, with a plan to unveil them when tested.

SynthID can “encode a watermark in text generated by artificial intelligence in a way that helps you determine whether the text was generated by your own LLM without affecting how the underlying LLM works or negatively impacting the quality of generation,” according to a post on open-source machine learning database. Interface database.

The good news is that Google says these watermarks can be integrated into almost any AI text generation tool. The bad news is that actually detecting the watermarks is still a rather unproven affair.

While SynthID watermarks can survive some basic tricks used to evade automatic detection – like replacing words to “not call it literary theft” – they can only indicate varying degrees of certainty of the watermark’s presence, and this certainty decreases when applied to “realistic responses,” some of the most critical and problematic uses for text generation, when large sets of text undergo machine translation or other rewrites.

Google says: “SynthID text has not been designed to directly prevent enthusiastic adversaries from causing harm.” (And honestly, even if Google owned offered a magical remedy against misinformation produced by LLM, it would be hesitant to frame it in this manner for liability reasons.) It also requires integrating the watermark system into a text generation tool before it is used, so there’s nothing stopping anyone from simply choosing not to do so, as nefarious government actors or even “free” tools like xAI’s Grok may more clearly outline.

It must be noted that Google isn’t entirely altruistic here. While the company pushes its AI tools to both consumers and businesses, its core search product is at risk from a web that seems to be rapidly filling up with automatically generated texts and images. Competitors to Google, like OpenAI, may choose not to use these types of tools merely as a business practice, hoping to establish their own standard to push the market towards their own products.