GFS: Graph-based Feature Synthesis for Prediction over Relational Databases

Bibliographic Details
Title: GFS: Graph-based Feature Synthesis for Prediction over Relational Databases
Authors: Zhang, Han, Gan, Quan, Wipf, David, Zhang, Weinan
Publication Year: 2023
Collection: Computer Science
Subject Terms: Computer Science - Machine Learning, Computer Science - Databases
More Details: Relational databases are extensively utilized in a variety of modern information system applications, and they always carry valuable data patterns. There are a huge number of data mining or machine learning tasks conducted on relational databases. However, it is worth noting that there are limited machine learning models specifically designed for relational databases, as most models are primarily tailored for single table settings. Consequently, the prevalent approach for training machine learning models on data stored in relational databases involves performing feature engineering to merge the data from multiple tables into a single table and subsequently applying single table models. This approach not only requires significant effort in feature engineering but also destroys the inherent relational structure present in the data. To address these challenges, we propose a novel framework called Graph-based Feature Synthesis (GFS). GFS formulates the relational database as a heterogeneous graph, thereby preserving the relational structure within the data. By leveraging the inductive bias from single table models, GFS effectively captures the intricate relationships inherent in each table. Additionally, the whole framework eliminates the need for manual feature engineering. In the extensive experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases, demonstrating its superior performance.
Comment: 13 pages, 5 figures, VLDB 2024 under review
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2312.02037
Accession Number: edsarx.2312.02037
Database: arXiv
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://arxiv.org/abs/2312.02037
    Name: EDS - Arxiv
    Category: fullText
    Text: View this record from Arxiv
    MouseOverText: View this record from Arxiv
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20231204&spage=&pages=&title=GFS: Graph-based Feature Synthesis for Prediction over Relational Databases&atitle=GFS%3A%20Graph-based%20Feature%20Synthesis%20for%20Prediction%20over%20Relational%20Databases&aulast=Zhang%2C%20Han&id=DOI:
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: edsarx
DbLabel: arXiv
An: edsarx.2312.02037
RelevancyScore: 1073
AccessLevel: 3
PubType: Report
PubTypeId: report
PreciseRelevancyScore: 1073.17028808594
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: GFS: Graph-based Feature Synthesis for Prediction over Relational Databases
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Zhang%2C+Han%22">Zhang, Han</searchLink><br /><searchLink fieldCode="AR" term="%22Gan%2C+Quan%22">Gan, Quan</searchLink><br /><searchLink fieldCode="AR" term="%22Wipf%2C+David%22">Wipf, David</searchLink><br /><searchLink fieldCode="AR" term="%22Zhang%2C+Weinan%22">Zhang, Weinan</searchLink>
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2023
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Computer Science
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Machine+Learning%22">Computer Science - Machine Learning</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Databases%22">Computer Science - Databases</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Relational databases are extensively utilized in a variety of modern information system applications, and they always carry valuable data patterns. There are a huge number of data mining or machine learning tasks conducted on relational databases. However, it is worth noting that there are limited machine learning models specifically designed for relational databases, as most models are primarily tailored for single table settings. Consequently, the prevalent approach for training machine learning models on data stored in relational databases involves performing feature engineering to merge the data from multiple tables into a single table and subsequently applying single table models. This approach not only requires significant effort in feature engineering but also destroys the inherent relational structure present in the data. To address these challenges, we propose a novel framework called Graph-based Feature Synthesis (GFS). GFS formulates the relational database as a heterogeneous graph, thereby preserving the relational structure within the data. By leveraging the inductive bias from single table models, GFS effectively captures the intricate relationships inherent in each table. Additionally, the whole framework eliminates the need for manual feature engineering. In the extensive experiment over four real-world multi-table relational databases, GFS outperforms previous methods designed for relational databases, demonstrating its superior performance.<br />Comment: 13 pages, 5 figures, VLDB 2024 under review
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Working Paper
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2312.02037" linkWindow="_blank">http://arxiv.org/abs/2312.02037</link>
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsarx.2312.02037
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2312.02037
RecordInfo BibRecord:
  BibEntity:
    Subjects:
      – SubjectFull: Computer Science - Machine Learning
        Type: general
      – SubjectFull: Computer Science - Databases
        Type: general
    Titles:
      – TitleFull: GFS: Graph-based Feature Synthesis for Prediction over Relational Databases
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Zhang, Han
      – PersonEntity:
          Name:
            NameFull: Gan, Quan
      – PersonEntity:
          Name:
            NameFull: Wipf, David
      – PersonEntity:
          Name:
            NameFull: Zhang, Weinan
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 04
              M: 12
              Type: published
              Y: 2023
ResultId 1