As we continue to see exploration into ChatGPT’s applications for academic work, there is an observed trend that AI tools will disrupt our traditional models for research and writing. In particular, as we explore how AI tools like ChatGPT can be used to respond to complex writing prompts, it’s important to consider how these tools work – and in some cases do not work as one might expect.
There are many articles on how ChatGPT collects and trains using existing data sets. ChatGPT uses a large language model (LLM) AI model invented by OpenAI, which scrapes data from the internet and can generate responses based on those large amounts of collected data. For more information about how these data sets are used to create models, there’s a helpful page provided by Microsoft here. An observed strength of LLMs is that they are able to interact with users using more natural language processing, which can feel more ‘real’ or conversational. However, because it pulls from such large and diverse sets of data, this can also lead to problems.
As noted on the Center for Digital Learning and Innovation page on ChatGPT at Seattle University, AI can have issues with ‘hallucinations’ or the generation of false data.
In an article on Spectrum, Craig S. Smith writes,
…Large language models (LLMs) hallucinate, a concept popularized by Google AI researchers in 2018. Hallucination in this context refers to mistakes in the generated text that are semantically or syntactically plausible but are in fact incorrect or nonsensical. In short, you can’t trust what the machine is telling you.
In the Library, we’ve observed a few cases where these hallucinations can have a direct impact on the work of those who are using ChatGPT to generate ideas, respond to prompts, or generate background research.
In most cases, while ChatGPT can generate content which appears to contain complete citations to substantiate lines of argument, in practice, those citations are not real, or are at least largely incrorrect. If you try to trace these citations back to the source, while the journal or newspaper may exist, the actual articles do not. What this means, is that the quote, and the information therein cannot be verified – and is therefore not usable an academic paper.
A question may be, if AI is trained on content, why would it hallucinate citations? The reason why, is because ChatGPT is built using LLMs, its responses rely on the data it has scraped it can have difficulty analyzing and synthesizing that data, so while it can recognize language specifically, it is not trained to read or analyze scholarly work. It can also be subject to mistakes, just like we can, in locating and reporting information.
For example, to demonstrate this phenomenon, I asked to ChatGPT to create an annotated bibliography on a prompt of, “Write an annotated bibliography of seven sources on adverse childhood experiences and the impact on later life romantic relationships.” In response, ChatGPT provided the following citations:
At first glance, this citation appears legitimate. If you go into the American Journal of Psychiatry, the volume and issue number even correspond correctly with the date. However, the article is not to be found, and the page numbers are outside the range of that particular issue. The article does exist, however, in a different year.
Johnson, J. G., Cohen, P., Kasen, S., & Brook, J. S. (2002). Childhood adversities associated with risk for eating disorders or weight problems during adolescence or early adulthood. The American journal of psychiatry, 159(3), 394–400. https://doi.org/10.1176/appi.ajp.159.3.394
In some cases though, citations may be complete incorrect.
In this example, the article and its authors do not exist. There is a Journal of Sex Research, and the Years and Volume/Issues match, but nothing else is real. Searching through databases, Google Scholar, and other resources will similarly turn up articles with similar titles, but no exact matches.
If you decide to work with ChatGPT to generate ideas for a paper, or to search for information on a variety of topics, you should always be cautious about the citations and references to work (scholarly or otherwise) that it presents. In the Library, we would recommend you individually search each article by breaking down the citation, and using the Library catalog and academic databases to verify whether or not the article is real – and if the content that is cited maps to/matches content actually found in the article itself.
The Library has a short video which can walk you through with tips and tricks for finding the actual source. Remember, that in the end, you are responsible for the information you collect and synthesize into your own academic work, and you should always be vigilant in tracking down any citations, or references to other work, you find to be sure that you’re including verified information into your own.
0 Comments.