Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination

Bibliographic Details
Title: Both Text and Images Leaked! A Systematic Analysis of Multimodal LLM Data Contamination
Authors: Song, Dingjie, Lai, Sicheng, Chen, Shunian, Sun, Lichao, Wang, Benyou
Publication Year: 2024
Collection: Computer Science
Subject Terms: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence, Computer Science - Computation and Language, Computer Science - Multimedia
More Details: The rapid progression of multimodal large language models (MLLMs) has demonstrated superior performance on various multimodal benchmarks. However, the issue of data contamination during training creates challenges in performance evaluation and comparison. While numerous methods exist for detecting models' contamination in large language models (LLMs), they are less effective for MLLMs due to their various modalities and multiple training phases. In this study, we introduce a multimodal data contamination detection framework, MM-Detect, designed for MLLMs. Our experimental results indicate that MM-Detect is quite effective and sensitive in identifying varying degrees of contamination, and can highlight significant performance improvements due to the leakage of multimodal benchmark training sets. Furthermore, we explore whether the contamination originates from the base LLMs used by MLLMs or the multimodal training phase, providing new insights into the stages at which contamination may be introduced.
Comment: Code Available: https://github.com/MLLM-Data-Contamination/MM-Detect
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2411.03823
Accession Number: edsarx.2411.03823
Database: arXiv
More Details
Description not available.