AI watermarking techniques vary significantly depending on the type of content being watermarked. At its core, the process involves two main stages: embedding (or encoding) the watermark, and detection.
Text Text watermarking is considered one of the most challenging modalities because natural language offers relatively limited redundancy compared to images or audio. A logits processor then increments every green-list logit by a fixed bias \delta > 0 before softmax: :\ell'_v = \ell_v + \delta \cdot \mathbf{1}[v \in G] so that, after sampling, green tokens are over-represented but generation is not constrained to green tokens alone; high-entropy positions tolerate the bias gracefully, while low-entropy positions (where one token dominates the logits) override the watermark and preserve correctness on factual content. The bias parameter \delta directly mediates the tradeoff between detectability and quality: a small \delta yields near-natural text but a weak signal, while a large \delta produces a strong statistical fingerprint at the cost of perplexity increase. Wouters (2023) translated this tradeoff into a
multi-objective optimization problem and characterized the
Pareto frontier of green-red watermarks.
Distortion-free schemes A second family of schemes, beginning with an unpublished proposal by
Scott Aaronson (2022) at OpenAI, sidesteps the quality-detectability tradeoff by preserving the model's marginal distribution exactly. Aaronson's Gumbel-max watermark samples the next token as w_t = \arg\max_i \frac{\log \xi_t[i]}{p_t[i]}, where \xi_t \in (0,1)^N is a pseudorandom vector keyed on previous tokens. By the
Gumbel-max identity, w_t is exactly distributed according to p_t, so a single watermarked output is indistinguishable from an unwatermarked one; yet the correlation between \xi_t and w_t can be detected with the secret key. Christ, Gunn, and Zamir (COLT 2024) gave the first cryptographically rigorous construction, proving undetectability against any computationally bounded adversary who lacks the key, under standard assumptions on
pseudorandom functions.
Effectiveness regime All known generation-time text watermarks share the same fundamental dependence: their signal strength is proportional to the
entropy of the model's next-token distribution. On low-entropy outputs (such as code completing a function signature, or factual recall of a single correct answer), there is little room to bias the sampler without breaking correctness, and the watermark is consequently weak. Watermarks therefore work best on essays, creative writing, and other long-form, high-entropy generations. Subsequent systems including StegaStamp (Tancik et al. 2020), which adds robustness to physical-world perturbations such as printing and photographing, and TrustMark (Bui et al. 2023), which targets resolution-agnostic watermarking for
C2PA-style provenance, refined this paradigm.
In-generation watermarking In-generation methods modify the AI model itself so that all of its outputs carry a watermark by construction. Stable Signature (Fernandez et al.,
ICCV 2023) fine-tunes the
VAE decoder of a
latent diffusion model such as
Stable Diffusion so that every decoded image hides a fixed binary signature, recoverable by a pre-trained extractor with a
likelihood ratio test for detection. The authors report >90% detection accuracy after a 90% crop, at a false-positive rate below 10^-6. A complementary approach is Tree-Ring Watermarks (Wen, Kirchenbauer, Geiping & Goldstein,
NeurIPS 2023), which embeds a circular pattern in the
Fourier transform of the initial Gaussian noise vector used to seed the diffusion sampler. Because the ring is invariant under spatial transformations (rotation, flipping, dilation) and survives the entire denoising trajectory, detection requires inverting the diffusion process to recover an estimate of the initial noise; this is a robust scheme that nonetheless requires access to the diffusion model and its inversion.
SynthID-Image SynthID-Image, developed by
Google DeepMind, uses a post-hoc model-independent design: a neural encoder embeds a watermark into the pixel data after generation, and a corresponding decoder detects it. The watermark is distributed holographically across the image, so even cropped fragments retain detectable information. By 2025, SynthID had been used to watermark over ten billion images and video frames across Google's services, making it the largest deployed AI image watermark to date.
Audio Audio watermarking is constrained by the
psychoacoustic threshold of human hearing: signals must be embedded in regions of the spectrum masked by louder content (a phenomenon known as
auditory masking). Modern neural audio watermarks operate either on the raw waveform or on time-frequency representations such as
mel spectrograms. The state of the art is exemplified by AudioSeal (San Roman et al., ICML 2024), which jointly trains a generator network that adds a watermark signal to an input waveform and a localized detector network that returns, for every audio sample, the probability that the watermark is present. AudioSeal introduced a novel perceptual loss based on auditory masking and is the first audio watermark to provide sample-level localization; that is, it can identify which segments of a longer audio file (e.g. a podcast partially modified with AI voice cloning) are watermarked. SilentCipher, and XAttnMark, which uses cross-attention to jointly optimize detection and bit-level attribution. Independent evaluations have, however, shown that all current post-hoc audio watermarks can be effectively removed by neural codecs (such as
EnCodec) and learned denoisers, raising concerns about deployment robustness. == Industry implementations ==