NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries

Bibliographic Details
Title: NL2Formula: Generating Spreadsheet Formulas from Natural Language Queries
Authors: Zhao, Wei, Hou, Zhitao, Wu, Siyuan, Gao, Yan, Dong, Haoyu, Wan, Yao, Zhang, Hongyu, Sui, Yulei, Zhang, Haidong
Publication Year: 2024
Collection: Computer Science
Subject Terms: Computer Science - Computation and Language, Computer Science - Artificial Intelligence
More Details: Writing formulas on spreadsheets, such as Microsoft Excel and Google Sheets, is a widespread practice among users performing data analysis. However, crafting formulas on spreadsheets remains a tedious and error-prone task for many end-users, particularly when dealing with complex operations. To alleviate the burden associated with writing spreadsheet formulas, this paper introduces a novel benchmark task called NL2Formula, with the aim to generate executable formulas that are grounded on a spreadsheet table, given a Natural Language (NL) query as input. To accomplish this, we construct a comprehensive dataset consisting of 70,799 paired NL queries and corresponding spreadsheet formulas, covering 21,670 tables and 37 types of formula functions. We realize the NL2Formula task by providing a sequence-to-sequence baseline implementation called fCoder. Experimental results validate the effectiveness of fCoder, demonstrating its superior performance compared to the baseline models. Furthermore, we also compare fCoder with an initial GPT-3.5 model (i.e., text-davinci-003). Lastly, through in-depth error analysis, we identify potential challenges in the NL2Formula task and advocate for further investigation.
Comment: To appear at EACL 2024
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2402.14853
Accession Number: edsarx.2402.14853
Database: arXiv
More Details
Description not available.