August 18, 2025

New benchmarking tool evaluates the factuality of LLMs

A team of AI researchers and computer scientists from Cornell University, the University of Washington and the Allen Institute for Artificial Intelligence has developed a benchmarking tool called WILDHALLUCINATIONS to evaluate the factuality of multiple large language models (LLMs). The group has published a paper describing the factors that went into creating their tool on the arXiv preprint server.

This article was originally published on this website.