The Influence of Large Language Models on Scientific Writing

Artificial Intelligence (AI) has made significant advancements in recent years, with the introduction of large language models (LLMs) that are capable of generating human-like text. However, detecting when a piece of writing has been created using an LLM has proven to be a challenge for many companies. A group of researchers has developed a new method to estimate the usage of LLMs in scientific writing by analyzing changes in vocabulary and word frequency.

The researchers analyzed 14 million paper abstracts published on PubMed between 2010 and 2024 to track the relative frequency of words used in scientific writing. They compared the expected frequency of words based on pre-2023 trends to the actual frequency of words in abstracts from 2023 and 2024, when LLMs were widely used. The study found a significant increase in the usage of certain “style words” such as verbs, adjectives, and adverbs in post-LLM abstracts, indicating the influence of LLMs on vocabulary choice.

Words like “delves,” “showcasing,” and “underscores” saw a dramatic increase in frequency in papers from 2024 compared to previous years. Additionally, common words like “potential,” “findings,” and “crucial” also experienced a notable increase in usage post-LLM era. These changes in word frequency were unprecedented and unrelated to major world events, unlike previous trends observed during events such as the Ebola and Zika outbreaks.

The researchers identified hundreds of “marker words” that became significantly more common in scientific writing after the introduction of LLMs. These marker words serve as indicators of LLM usage in text and can help identify papers that have been written with LLM assistance. The findings suggest that at least 10 percent of papers in the PubMed corpus post-2022 were likely generated with LLM assistance, indicating the widespread impact of LLMs on scientific writing.

While the study provides valuable insights into the influence of LLMs on scientific writing, there are limitations to consider. The researchers acknowledge that their analysis may not capture all LLM-assisted abstracts, as some papers may not contain the identified marker words. Additionally, the study focused on abstracts rather than full-text articles, which may limit the generalizability of the findings to other forms of scientific writing.

The use of LLMs has had a significant impact on scientific writing, leading to changes in vocabulary choice and word frequency. The findings of the study highlight the need for researchers and publishers to be aware of the influence of LLMs on text generation and to develop strategies for identifying LLM-assisted writing. Further research is needed to explore the broader implications of LLMs on scientific communication and to ensure the integrity and authenticity of academic publications in the digital age.

Articles You May Like

Leave a Reply Cancel reply