The increase in power and availability of Large Language Models (LLMs) since late 2022 led to increased concerns with their usage to automate academic paper mills. In turn, this poses a threat to bibliometrics-based technology monitoring and forecasting in rapidly moving fields. We propose to address this issue by leveraging semantic entity triplets. Specifically, we extract factual statements from scientific papers and represent them as (subject, predicate, object) triplets before validating the factual consistency of statements within and between scientific papers. This approach heavily penalizes blind usage of stochastic text generators such as LLMs while not penalizing authors who used LLMs solely to improve the readability of their paper. Here, we present a pipeline to extract such triplets and compare them. While our pipeline is promising and sensitive enough to detect inconsistencies between papers from different domains, the intra-paper entity reference resolution needs to be improved to ensure that triplets are more specific.

Research Paper:


Source: EEKE2024


  title = {LLM-Resilient Bibliometrics: Factual Consistency Through Entity Triplet Extraction},
  author={Sternfeld, Alexander and Kucharavy, Andrei and {Percia David}, Dimitri and Jang-Jaccard, Julian and Mermoud, Alain},
  journal = {EEKE2024},