[쇼킹] AI가 만든 가짜 논문들... 실제 학술지에 인용되고 있다

논문에 달린 인용이 실제 존재하는 자료로 연결되는지, 아니면 AI 챗봇이 만들어낸 가짜 참고문헌은 아닌지

[최보식의언론=송영복 객원기자]

AI(인공지능)가 존재하지도 않는 학술 논문들을 만들어내고, 학생들이나 교수들 중에는 이를 모른 채 그 논문을 인용하고 있다.

인공지능은 어떤 사례를 제시하라는 명령을 내리면, 현란한 재주를 부려 학술 논문을 금방 제시한다. 진짜 논문 목록 사이에 가짜 논문들이 포함되어 있다. 이것이 가짜 논문인지를 모든 사람이 알 수 있을까?

문제는 이 가짜 학술 논문들이 실제 학술지에 인용된다는 점이다.

별거 아닌 것 같지만 가짜 논문이 학술 논문 대접을 받고 다른 논문에 인용되는 방식은 매우 교묘하다, 가짜 논문의 인용과 확산은 제도권 연구 전반의 정당성을 훼손할 수 있다고 경고한다.

음악·대중문화 전문지로 창간되었으나 지금은 ‘문화 + 정치 + 탐사저널리즘’ 매체로 자리 잡은 미국 잡지 롤링스톤(Rolling Stone)가 지난 17일 ‘AI IS INVENTING ACADEMIC PAPERS THAT DON'T EXIST - AND THEY'RE BEING CITED IN REAL JOURNALS’( AI가 존재하지 않는 학술 논문을 만들어내고 있다 — 그리고 그것들이 실제 학술지에 인용되고 있다)라는 제목의 기사에서 이 문제를 다루었다.

이 기사는 가짜 논문이 어떻게 지식 체계와 학문 제도를 붕괴시킬 수 있는지 그 위험성을 비판적으로 조망하고 있다.

아래는 이 롤링스톤 기사의 전문이다. (편집자)

가을 학기가 끝나갈 무렵, 조지아주립대 앤드루 영 정책학대학 공공행정·정책학과 조교수인 앤드루 하이스(Andrew Heiss)는 학생들의 과제물을 채점하던 중 충격적인 사실을 발견했다.

요즘 교육자들 사이에서 흔히 그렇듯, 하이스는 논문에 달린 인용이 실제 존재하는 자료로 연결되는지, 아니면 AI 챗봇이 만들어낸 가짜 참고문헌은 아닌지 확인하고 있었다.

예상대로 그는 일부 학생들이 생성형 인공지능을 이용해 부정행위를 한 사실을 적발했다. 챗봇은 글을 대신 써주는 것뿐 아니라, 주장에 근거를 대 달라고 하면 이미 출판된 논문에 연구 결과를 귀속시키는 방식으로 ‘그럴듯한’ 증거까지 제공할 수 있다.

하지만 AI가 존재하지 않는 법적 판례를 제시해 변호사들이 곤욕을 치른 사례와 마찬가지로, 학생들 역시 실제로는 존재하지 않는 학술 논문과 학술지를 가리키는 그럴듯한 각주를 달게 된다.

이 자체는 그리 이례적인 일은 아니었다. 그러나 하이스가 논문들을 검증하는 과정에서 깨달은 것은, AI가 만들어낸 가짜 인용문이 이제는 전문 학술 연구의 세계까지 파고들었다는 사실이었다.

그가 구글 스칼라에서 허구의 출처를 추적하려 할 때마다, 수십 편의 다른 출판 논문들이 동일하거나 약간 변형된 가짜 연구와 가짜 학술지의 ‘연구 결과’를 근거로 삼고 있는 것을 발견했다.

“AI가 생성한 논문 자체는 이미 많이 나왔고, 그런 경우는 보통 빠르게 발견돼 철회됩니다.”

하이스는 롤링스톤과의 인터뷰에서 이렇게 말했다. 그는 이달 초 철회된 한 논문을 예로 들었다. 해당 논문은 AI 모델을 활용해 자폐증 진단을 개선할 수 있다는 내용을 다뤘지만, 텍스트-이미지 생성 모델로 만든 무의미한 인포그래픽을 포함하고 있었다.

“하지만 이런 ‘환각으로 만들어진 학술지 호수(issue)’는 조금 다릅니다.”

존재하지 않는 연구 자료를 인용한 논문들, 즉 AI 사용으로 적발돼 철회되지 않은 논문들이 또 다른 논문에서 다시 인용되면서, 오류가 사실상 ‘세탁’되기 때문이다.

그 결과 학생과 연구자들(그리고 도움을 구하는 대규모 언어 모델들)은 그 출처의 진위를 확인하지 않은 채 이를 신뢰할 만한 자료로 인식하게 된다. 이런 허위 인용이 논문에서 논문으로 아무 의심 없이 반복될수록, 그것이 진짜라는 환상은 더욱 공고해진다.

가짜 인용은 연구 사서들에게 악몽이 되었고, 일부 추산에 따르면 이들은 ChatGPT나 구글 제미나이가 암시한, 실제로는 존재하지 않는 기록을 찾아 달라는 요청에 전체 근무 시간의 최대 15%를 허비하고 있다.

하이스는 AI가 만들어낸 각주가 독자를 설득력 있게 속일 수 있는 이유도 발견했다. 실제로 활동 중인 학자들의 이름을 사용하고, 기존 문헌과 매우 유사한 제목을 달고 있기 때문이다.

어떤 경우에는 실제 저자에게 도달하기도 했지만, 논문 제목과 학술지 이름은 모두 조작된 것이었다. 다만 과거 그 저자가 발표했던 연구나 해당 주제를 다루는 실제 학술지와 비슷하게 들릴 뿐이었다.

“AI가 만들어낸 것들이 다른 실제 연구로 전파되면서, 학생들은 실제 논문에서 인용된 것을 보고 진짜라고 믿게 됩니다. 그래서 다른 실제 출처들도 쓰고 있는데 왜 가짜 자료를 썼다고 감점당하는지 혼란스러워하죠.”

하이스는 말한다.

“겉으로 보기에는 모든 것이 정상이고 정당해 보입니다.”

대규모 언어 모델(LLM)이 일상적인 도구가 된 이후, 학자들은 사기성 콘텐츠가 범람하면서 데이터에 대한 우리의 이해 자체가 무너질 수 있다고 경고해 왔다.

심리학자이자 인지과학자인 이리스 판 로이(Iris van Rooij)는 학술 자원 전반에 퍼지는 AI ‘슬롭(slop)’이야말로 “지식의 파괴”를 예고한다고 주장했다.

지난 7월, 그녀와 관련 분야 연구자들은 대학들이 과장된 기대와 마케팅에 저항해 “고등교육, 비판적 사고, 전문성, 학문적 자유, 과학적 정합성”을 지켜야 한다는 공개서한에 서명했다.

이들은 대학들이 교수들에게 AI 사용을 “강요”하거나 수업에서 이를 허용하도록 압박하고 있다며, 교육에서 AI가 과연 어떤 유용한 역할을 할 수 있는지에 대한 보다 엄격하고 포괄적인 분석을 요구했다.

소프트웨어 엔지니어이자 기술자인 앤서니 모저(Anthony Moser)는 챗봇이 결국 교육 기관을 내부에서부터 공허하게 만들 수 있다고 내다본 인물 중 한 명이다. 그는 2023년 블루스카이에 이렇게 썼다.

“어딘가에서 한 교수가 ChatGPT로 강의계획서를 만들고, 존재하지 않는 책을 읽기 과제로 내는 모습을 상상해 본다. 그런데 학생들은 ChatGPT에게 그 책을 요약하게 하거나 에세이를 쓰게 하니 아무도 눈치채지 못한다.”

그는 이 글을 다시 공유하며 이렇게 덧붙였다.

“이게 문자 그대로 현실이 되기까지 이렇게 빨리 올 줄은 몰랐다.”

모저는 롤링스톤과의 인터뷰에서, LLM이 허구의 출판물을 ‘환각’한다고 표현하는 것 자체가 위협을 오해하는 것이라고 말한다.

“환각이라는 말은 정상적이고 올바른 현실 인식에서 벗어났다는 뉘앙스를 줍니다.”

하지만 챗봇은 “항상 환각하고 있다”고 그는 말한다.

“이건 오작동이 아닙니다. 예측 모델은 어떤 텍스트를 예측할 뿐이고, 그 결과가 정확할 수도, 아닐 수도 있지만 과정은 언제나 같습니다. 달리 말해, LLM은 구조적으로 진실에 무관심합니다.”

“LLM이 위험한 이유는 정보 생태계의 상류를 오염시키기 때문입니다.” 모저는 덧붙인다. “존재하지 않는 인용이 부실하거나 부정직한 연구에 등장하고, 그것이 다시 다른 논문과 기사로 인용되며, 또 그것을 인용한 논문들이 나오면서 결국 정보 환경 전체에 스며듭니다.”

그는 이를 장기간 잔존하는 유해 화학물질에 비유하며 “추적하기도, 걸러내기도 매우 어렵다”고 말했다.

모저는 이 문제를 “충분히 예견 가능했던, 의도적 선택의 결과”라고 표현하며, 경고를 제기한 이들은 “무시되거나 배제됐다”고 덧붙였다.

물론 모든 책임을 AI에만 돌릴 수는 없다. “부실한 연구는 새로운 게 아닙니다.”

모저는 말한다.

“LLM이 문제를 극적으로 증폭시켰지만, 이전부터 출판 압박은 컸고, 인용·학회·연구비로 측정되는 ‘지식 형태의 산출물’ 생산을 중심으로 고등교육이 조직되면서 의심스럽거나 가짜 데이터를 사용한 나쁜 논문도 많았습니다.”

캘리포니아대 샌디에이고 캠퍼스의 철학 교수이자 과학철학협회 회장인 크레이그 캘린더(Craig Callender) 역시 이 평가에 동의한다.

그는 “존재하지 않는 학술지가 정당성을 갖는 것처럼 보이는 현상은 기존 추세의 논리적 종착점과 같다”고 말한다.

이미 수익을 위해 엉터리 논문을 받아주거나, 특정 산업에 유리하도록 편향된 유령 저작 논문을 싣는 학술지도 존재한다는 것이다. “과학 출판의 ‘늪’은 점점 커지고 있습니다.” 그는 말한다.

“합법적이지 않은 학술지나 논문을 그럴 듯하게 보이게 만드는 관행이 이미 많습니다. 그래서 ‘존재하지 않는 학술지’로 나아가는 다음 단계는 끔찍하지만, 놀라울 정도로 예상 가능한 일입니다.”

AI가 여기에 더해지면서 그 ‘늪’은 훨씬 빠르게 커지고 있다고 캘린더는 지적한다. “예컨대 AI가 보조하는 구글 검색과 결합되면, 이 모든 것이 거의 되돌릴 수 없는 방식으로 증폭됩니다. 이런 검색은 이미 많은 허위 정보를 강화하고 있는 것처럼, 존재하지 않는 학술지가 실제로 있는 것처럼 보이게 만들 것입니다.”

이 모든 상황은 연구자들 사이에, 자신들이 감당할 수 없는 양의 ‘슬롭’(Slop, 메리암-웹스터사전 올해의 단어로 선정된 슬롭은 인공지능을 통해 대량 생산되는 저품질 디지털 콘텐츠를 말한다, 편집자)에 파묻히고 있다는 감각을 키우고 있다.

“가짜 콘텐츠가 공공 연구 데이터베이스에 실수로라도 자리 잡게 되면서, 교수진에게는 매우 좌절스러운 일이 되고 있습니다.”

하이스는 말한다. “주장의 출처를 거슬러 올라가 확인하는 일이 너무 어렵습니다.”

물론 많은 이들은 아예 그런 시도조차 하지 않는다. 그래서 가짜 정보가 이렇게 널리 퍼진 것이다. 비판 없이, 순진하게 AI를 받아들인 결과, 우리는 오히려 가장 경계해야 할 시점에 비판적 사고를 포기하고 더 쉽게 속아 넘어가게 된 것처럼 보인다.

어쩌면 누군가는 바로 지금, 그 현상 자체를 분석하는 (진짜) 연구를 진행하고 있을지도 모른다.

<원문>

AI IS INVENTING ACADEMIC PAPERS THAT DON'T EXIST - AND THEY'RE BEING CITED IN REAL JOURNALS

The proliferation of references to fake articles threatens to undermine the legitimacy of institutional research across the board

As the fall semester came to a close, Andrew Heiss, an assistant professor in the Department of Public Management and Policy at the Andrew Young School of Policy Studies at Georgia State University, was grading coursework from his students when he noticed something alarming.

As is typical for educators these days, Heiss was following up on citations in papers to make sure that they led to real sources - and weren't fake references supplied by an AI chatbot. Naturally, he caught some of his pupils using generative artificial intelligence to cheat: not only can the bots help write the text, they can supply alleged supporting evidence if asked to back up claims, attributing findings to previously published articles. But, as with attorneys who have been caught generating briefs with AI because a model offered false legal precedents, students can end up with plausible-sounding footnotes pointing to academic articles and journals that don't exist.

That in itself wasn't unusual, however. What Heiss came to realize in the course of vetting these papers was that AI-generated citations have now infested the world of professional scholarship, too. Each time he attempted to track down a bogus source in Google Scholar, he saw that dozens of other published articles had relied on findings from slight variations of the same made-up studies and journals.

"There have been lots of AI-generated articles, and those typically get noticed and retracted quickly," Heiss tells Rolling Stone. He mentions a paper retracted earlier this month, which discussed the potential to improve autism diagnoses with an AI model and included a nonsensical infographic that was itself created with a text-to-image model. "But this hallucinated journal issue is slightly different," he says.

That's because articles which include references to nonexistent research material - the papers that don't get flagged and retracted for this use of AI, that is - are themselves being cited in other papers, which effectively launders their erroneous citations. This leads to students and academics (and any large language models they may ask for help) identifying those "sources" as reliable without ever confirming their veracity. The more these false citations are unquestioningly repeated from one article to the next, the more the illusion of their authenticity is reinforced. Fake citations have turned into a nightmare for research librarians, who by some estimates are wasting up to 15 percent of their work hours responding to requests for nonexistent records that ChatGPT or Google Gemini alluded to.

Heiss also noticed that the AI-generated notes could be convincing to a reader because they included the names of living academics and titles that closely resemble existing literature. In some cases, he found, the citation led him to an actual author, yet the heading of the article and the journal were both fabricated - they just sounded similar to work the author has published in the past and a real periodical that covers such topics. "The AI-generated things get propagated into other real things, so students see them cited in real things and assume they're real, and get confused as to why they lose points for using fake sources when other real sources use them," he says. "Everything looks real and above-board.“

Since LLMs have become commonplace tools, academics have warned that they threaten to undermine our grasp on data by flooding the zone with fraudulent content. The psychologist and cognitive scientist Iris van Rooij has argued that the emergence of AI "slop" across scholarly resources portends nothing less than "the destruction of knowledge." In July, she and others in related fields signed an open letter calling on universities to resist the hype and marketing in order to "safeguard higher education, critical thinking, expertise, academic freedom, and scientific integrity." The authors claimed that schools have "coerced" faculty into using AI or allowing it in their classes, and they asked for a more rigorous, comprehensive analysis of whether it can have any useful role in education at all.

Anthony Moser, a software engineer and technologist, was among those who foresaw how chatbots could eventually hollow out educational institutions. "I'm imagining an instructor somewhere making a syllabus with ChatGPT, assigning reading from books that don't exist," he wrote in a post on Bluesky in 2023, less than a year after the AI model first came out. "But the students don't notice, because they are asking ChatGPT to summarize the book or write the essay." This month, Moser reshared that post, commenting: "I wish it had taken longer for this to become literally true.“

Moser tells Rolling Stone that to even claim LLMs "hallucinate" fictional publications misunderstands the threat they pose to our comprehension of the world, because the term "implies that it's different from the normal, correct perception of reality." But the chatbots are "always쟦allucinating," he says. "It's not a malfunction. A predictive model predicts some text, and maybe it's accurate, maybe it isn't, but the process is the same either way. To put it another way: LLMs are tructurally indifferent to truth."

"LLMs are pernicious because they're essentially polluting the information ecosystem upstream," Moser adds. "Nonexistent citations show up in research that's sloppy or dishonest, and from there get into other papers and articles that cite them, and papers that cite those, and then it's in the water," he says, likening this content to like harmful, long-lasting chemicals: "hard to trace and hard to filter out, even when you're trying to avoid it." Moser calls the problem "the entirely foreseeable outcome of deliberate choices," with those who raised objections "ignored or overruled."

But AI can't take all the blame. "Bad research isn't new," Moser points out. "LLMs have amplified the problem dramatically, but there was already tremendous pressure to publish and produce, and there were many bad papers using questionable or fake data, because higher education has been organized around the production of knowledge-shaped objects, measured in citations, conferences, and grants."

Craig Callender, a philosophy professor at the University of California San Diego and president of the Philosophy of Science Association, agrees with that assessment, observing that "the appearance of legitimacy to non-existent journals is like the logical end product of existing trends." There are already journals, he explains, that accept spurious articles for profit, or biased ghost-written research meant to benefit the industry that produced it. "The 'swamp' in scientific publishing is growing," he says. "Many practices make existing journals [or] articles that aren't legitimate look legitimate. So the next step to쟮on-existent쟩ournals is horrifying but not too surprising."

Adding AI to the mix means that "swamp" is growing fast, Callender says. "For instance, all of this gets compounded in a nearly irreversible way with AI-assisted Google searches. These searches will only reinforce the appearance that these journals exist, just as they currently reinforce a lot of disinformation.“

All of which contributes to a feeling among researchers that they're being buried in an avalanche of slop, with limited capacity to sift through it. "It's been incredibly disheartening for faculty, I think fairly universally, especially as fake content gets accidentally enshrined in public research databases," says Heiss. "It's hard to work back up the citation chain to see where claims originated.“

Of course, many aren't even trying to do that - which is why the phony stuff has been so widely disseminated. It's almost as if the uncritical and naive adoption of AI has made us more credulous and sapped our critical thinking at the precise moment we should be on guard against its evolving harms. In fact, someone may be toiling away on a (real) study of that phenomenon right now.

#AI환각 #가짜논문 #학문신뢰위기

상단영역

본문영역