Bibliographic Details
Title: |
NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries |
Authors: |
Zhao, Wei, Hou, Zhitao, Wu, Siyuan, Gao, Yan, Dong, Haoyu, Wan, Yao, Zhang, Hongyu, Sui, Yulei, Zhang, Haidong |
Publication Year: |
2024 |
Collection: |
Computer Science |
Subject Terms: |
Computer Science - Computation and Language, Computer Science - Artificial Intelligence |
More Details: |
Writing formulas on spreadsheets, such as Microsoft Excel and Google Sheets, is a widespread practice among users performing data analysis. However, crafting formulas on spreadsheets remains a tedious and error-prone task for many end-users, particularly when dealing with complex operations. To alleviate the burden associated with writing spreadsheet formulas, this paper introduces a novel benchmark task called NL2Formula, with the aim to generate executable formulas that are grounded on a spreadsheet table, given a Natural Language (NL) query as input. To accomplish this, we construct a comprehensive dataset consisting of 70,799 paired NL queries and corresponding spreadsheet formulas, covering 21,670 tables and 37 types of formula functions. We realize the NL2Formula task by providing a sequence-to-sequence baseline implementation called fCoder. Experimental results validate the effectiveness of fCoder, demonstrating its superior performance compared to the baseline models. Furthermore, we also compare fCoder with an initial GPT-3.5 model (i.e., text-davinci-003). Lastly, through in-depth error analysis, we identify potential challenges in the NL2Formula task and advocate for further investigation. Comment: To appear at EACL 2024 |
Document Type: |
Working Paper |
Access URL: |
http://arxiv.org/abs/2402.14853 |
Accession Number: |
edsarx.2402.14853 |
Database: |
arXiv |