Paper Summary: The Science of Detecting LLM-Generated Texts

Last updated: 29 Jul 2024

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.

The Science of Detecting LLM-Generated Texts Source

WHAT

The authors survey the nascent field of machine-generated text detection, categorize the main types, explain the main approaches (mostly watermarking), and explain the need for fair evaluation.

WHY

There was no comprehensive survey as of then, and some structure was needed.

HOW

The authors categorized existing detection methods into two types:

Black-box methods: You take samples from human-written text and text generated by a specific LLM. Build a binary classifier to tell them apart.
White-box methods: In addition to having samples from both human- and machine-generated text, you also have access to the full weights of the LLM you are trying to detect.

Different approaches to watermarking were discussed too, along with some points about how to compare approaches fairly.

Watermarking

Watermarking refers to methods to encode hidden information in the text, which can be checked or validated a posteriori.

The two basic approaches described are Inference-time and Post-hoc watermarking:

Inference-time Watermarking	Post-hoc Watermarking
Change the sampling process inside the decoder by adding a seed, such that every time a word is chosen, only the variant in a given group is picked.	After the text is generated, modify blocks of text in a way that maintains the semantics. This can be done by paraphrasing, changing syntactic structures, using synonyms, etc.

Watermarking usually decreases the generated text quality and it may cause inference to take longer.

CLAIMS

Adversarial attacks on detectors "...a paraphrasing attack could break a wide array of detectors, including both white-box and black-box approaches."
- This is a simple way to evade model-specific detectors because you can use another LLM to generate paraphrases.

QUOTES

On Emotion: "Initial observations indicate that LLM-generated texts are less emotional and objective compared to human-authored text, which often uses punctuation and grammar to convey subjective feelings"

NOTES

Detection methods listed here are usually limited to a specific LLM. No mention is made of model-agnostic detection.
Fact-checking is also used as a way to detect LLM-generated text, especially hallucinations.

MY 2¢

Model-agnostic detection: What about model-agnostic detection? This is what we want!
Temperature What about the role of temperature when generating text? No mention in the text. Surely this makes a ton of difference; if the temperature is zero then it becomes trivial to detect whether some text is LLM-generated.
Good insights from this paper: This paper (How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection) seems to have lots of interesting insights and statistics about human-written vs LLM-generated text.
- LLM-generated text is longer on average;
- "ChatGPT texts use more determiners, conjunctions, and auxiliary relations"
- "Unlike humans, large language models tend to be neutral by default and lack emotional expression."
Detecting vs Verifying: Detecting is a different problem from verifying. When we are talking about a single LLM, both types collapse to a single problem, but with model-agnostic detection, both are very different. Watermarking, for instance, may help in model-specific detection but not in model-agnostic detection.
Regulation: Maybe governments will require that all public LLMs encode some sort of watermarking in them. But this is impossible with open-source LLMs.
Tradeoffs: There is a clear tradeoff between the ease of detection and the text size. I don't think this was explored enough in this article.

References

Arxiv: Tang et al 2023: The Science of Detecting LLM-Generated Texts.
- This is the article this page is about.
Arxiv: Guo et al 2023: How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection.
- This is another well-cited article on the detection of LLM-generated text
Arxiv: Kirchenbauer et al 2023: A Watermark for Large Language Models
- Most cited work on Watermarking for text.
Huggingface: Watermarking text
- Good info here, especially this part: "Detecting whether a given text was generated using a language model without having access to that model is currently impossible."
GLTR

Felipe 28 Jul 2024 29 Jul 2024 paper-summary language-modeling