A Framework for Robust Cognitive Evaluation of LLMs

Bibliographic Details
Title:	A Framework for Robust Cognitive Evaluation of LLMs
Authors:	de Langis, Karin, Park, Jong Inn, Hu, Bin, Le, Khanh Chi, Schramm, Andreas, Mensink, Michael C., Elfenbein, Andrew, Kang, Dongyeop
Publication Year:	2025
Collection:	Computer Science
Subject Terms:	Computer Science - Computation and Language
More Details:	Emergent cognitive abilities in large language models (LLMs) have been widely observed, but their nature and underlying mechanisms remain poorly understood. A growing body of research draws on cognitive science to investigate LLM cognition, but standard methodologies and experimen-tal pipelines have not yet been established. To address this gap we develop CognitivEval, a framework for systematically evaluating the artificial cognitive capabilities of LLMs, with a particular emphasis on robustness in response collection. The key features of CognitivEval include: (i) automatic prompt permutations, and (ii) testing that gathers both generations and model probability estimates. Our experiments demonstrate that these features lead to more robust experimental outcomes. Using CognitivEval, we replicate five classic experiments in cognitive science, illustrating the framework's generalizability across various experimental tasks and obtaining a cognitive profile of several state of the art LLMs. CognitivEval will be released publicly to foster broader collaboration within the cognitive science community.
Document Type:	Working Paper
Access URL:	http://arxiv.org/abs/2504.02789
Accession Number:	edsarx.2504.02789
Database:	arXiv

More Details
Description not available.