Google has announced that its SynthID text watermarking technology, designed to help identify AI-generated text, is now available as open-source through the Google Responsible Generative AI Toolkit. This development allows other generative AI developers to leverage this technology for detecting outputs from their own large language models (LLMs), facilitating responsible AI development.
As the use of large language models grows, watermarking has become essential for addressing issues like political misinformation and nonconsensual content. In response, California is exploring mandatory AI watermarking, while China has already implemented such regulations. However, the technology is still evolving.
SynthID, introduced last August, embeds an invisible watermark within images, audio, video, and text during generation. The text version adjusts probability scores of generated tokens—elements that can be a character, word, or part of a phrase—so that while the output remains coherent, it becomes detectable by software. For instance, in the phrase “My favorite tropical fruits are .”, potential tokens like “mango” or “durian” are assigned probability scores. SynthID selectively modifies these scores without compromising the quality or creativity of the text.
This adjustment occurs throughout the generated content; thus, a single sentence may have numerous adjusted scores, creating a unique pattern that acts as a watermark. Integrated into Google’s Gemini chatbot, SynthID maintains the quality, accuracy, and speed of generated text, addressing a common concern with watermarking technologies. It works effectively on texts as short as three sentences and can detect modified content like paraphrases or cropped texts. However, challenges remain with very short text, rewritten content, and factual queries.
While Google acknowledges that SynthID is not a comprehensive solution for identifying AI-generated content, it emphasizes its role as a crucial component in developing more reliable identification tools, empowering users to make informed choices about AI interactions.