GFS: Graph-based Feature Synthesis for Prediction over Relational Databases
Title: | GFS: Graph-based Feature Synthesis for Prediction over Relational Databases |
---|---|
Authors: | Zhang, Han, Gan, Quan, Wipf, David, Zhang, Weinan |
Publication Year: | 2023 |
Collection: | Computer Science |
Subject Terms: | Computer Science - Machine Learning, Computer Science - Databases |
More Details: | Relational databases are extensively utilized in a variety of modern information system applications, and they always carry valuable data patterns. There are a huge number of data mining or machine learning tasks conducted on relational databases. However, it is worth noting that there are limited machine learning models specifically designed for relational databases, as most models are primarily tailored for single table settings. Consequently, the prevalent approach for training machine learning models on data stored in relational databases involves performing feature engineering to merge the data from multiple tables into a single table and subsequently applying single table models. This approach not only requires significant effort in feature engineering but also destroys the inherent relational structure present in the data. To address these challenges, we propose a novel framework called Graph-based Feature Synthesis (GFS). GFS formulates the relational database as a heterogeneous graph, thereby preserving the relational structure within the data. By leveraging the inductive bias from single table models, GFS effectively captures the intricate relationships inherent in each table. Additionally, the whole framework eliminates the need for manual feature engineering. In the extensive experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases, demonstrating its superior performance. Comment: 13 pages, 5 figures, VLDB 2024 under review |
Document Type: | Working Paper |
Access URL: | http://arxiv.org/abs/2312.02037 |
Accession Number: | edsarx.2312.02037 |
Database: | arXiv |
FullText | Text: Availability: 0 CustomLinks: – Url: http://arxiv.org/abs/2312.02037 Name: EDS - Arxiv Category: fullText Text: View this record from Arxiv MouseOverText: View this record from Arxiv – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20231204&spage=&pages=&title=GFS: Graph-based Feature Synthesis for Prediction over Relational Databases&atitle=GFS%3A%20Graph-based%20Feature%20Synthesis%20for%20Prediction%20over%20Relational%20Databases&aulast=Zhang%2C%20Han&id=DOI: Name: Full Text Finder (for New FTF UI) (s8985755) Category: fullText Text: Find It @ SCU Libraries MouseOverText: Find It @ SCU Libraries |
---|---|
Header | DbId: edsarx DbLabel: arXiv An: edsarx.2312.02037 RelevancyScore: 1073 AccessLevel: 3 PubType: Report PubTypeId: report PreciseRelevancyScore: 1073.17028808594 |
IllustrationInfo | |
Items | – Name: Title Label: Title Group: Ti Data: GFS: Graph-based Feature Synthesis for Prediction over Relational Databases – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Zhang%2C+Han%22">Zhang, Han</searchLink><br /><searchLink fieldCode="AR" term="%22Gan%2C+Quan%22">Gan, Quan</searchLink><br /><searchLink fieldCode="AR" term="%22Wipf%2C+David%22">Wipf, David</searchLink><br /><searchLink fieldCode="AR" term="%22Zhang%2C+Weinan%22">Zhang, Weinan</searchLink> – Name: DatePubCY Label: Publication Year Group: Date Data: 2023 – Name: Subset Label: Collection Group: HoldingsInfo Data: Computer Science – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Machine+Learning%22">Computer Science - Machine Learning</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Databases%22">Computer Science - Databases</searchLink> – Name: Abstract Label: Description Group: Ab Data: Relational databases are extensively utilized in a variety of modern information system applications, and they always carry valuable data patterns. There are a huge number of data mining or machine learning tasks conducted on relational databases. However, it is worth noting that there are limited machine learning models specifically designed for relational databases, as most models are primarily tailored for single table settings. Consequently, the prevalent approach for training machine learning models on data stored in relational databases involves performing feature engineering to merge the data from multiple tables into a single table and subsequently applying single table models. This approach not only requires significant effort in feature engineering but also destroys the inherent relational structure present in the data. To address these challenges, we propose a novel framework called Graph-based Feature Synthesis (GFS). GFS formulates the relational database as a heterogeneous graph, thereby preserving the relational structure within the data. By leveraging the inductive bias from single table models, GFS effectively captures the intricate relationships inherent in each table. Additionally, the whole framework eliminates the need for manual feature engineering. In the extensive experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases, demonstrating its superior performance.<br />Comment: 13 pages, 5 figures, VLDB 2024 under review – Name: TypeDocument Label: Document Type Group: TypDoc Data: Working Paper – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2312.02037" linkWindow="_blank">http://arxiv.org/abs/2312.02037</link> – Name: AN Label: Accession Number Group: ID Data: edsarx.2312.02037 |
PLink | https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2312.02037 |
RecordInfo | BibRecord: BibEntity: Subjects: – SubjectFull: Computer Science - Machine Learning Type: general – SubjectFull: Computer Science - Databases Type: general Titles: – TitleFull: GFS: Graph-based Feature Synthesis for Prediction over Relational Databases Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Zhang, Han – PersonEntity: Name: NameFull: Gan, Quan – PersonEntity: Name: NameFull: Wipf, David – PersonEntity: Name: NameFull: Zhang, Weinan IsPartOfRelationships: – BibEntity: Dates: – D: 04 M: 12 Type: published Y: 2023 |
ResultId | 1 |