Title: |
Towards Better Understanding Table Instruction Tuning: Decoupling the Effects from Data versus Models |
Authors: |
Deng, Naihao, Zhang, Sheng, Zhu, Henghui, Chang, Shuaichen, Zhang, Jiani, Li, Alexander Hanbo, Hang, Chung-Wei, Kobayashi, Hideo, Hu, Yiqun, Ng, Patrick |
Publication Year: |
2025 |
Collection: |
Computer Science |
Subject Terms: |
Computer Science - Computation and Language |
More Details: |
Recent advances in natural language processing have leveraged instruction tuning to enhance Large Language Models (LLMs) for table-related tasks. However, previous works train different base models with different training data, lacking an apples-to-apples comparison across the result table LLMs. To address this, we fine-tune base models from the Mistral, OLMo, and Phi families on existing public training datasets. Our replication achieves performance on par with or surpassing existing table LLMs, establishing new state-of-the-art performance on Hitab, a table question-answering dataset. More importantly, through systematic out-of-domain evaluation, we decouple the contributions of training data and the base model, providing insight into their individual impacts. In addition, we assess the effects of table-specific instruction tuning on general-purpose benchmarks, revealing trade-offs between specialization and generalization. |
Document Type: |
Working Paper |
Access URL: |
http://arxiv.org/abs/2501.14717 |
Accession Number: |
edsarx.2501.14717 |
Database: |
arXiv |