flexBART: Flexible Bayesian regression trees with categorical predictors

Bibliographic Details
Title: flexBART: Flexible Bayesian regression trees with categorical predictors
Authors: Deshpande, Sameer K.
Publication Year: 2022
Collection: Statistics
Subject Terms: Statistics - Methodology, Statistics - Machine Learning
More Details: Most implementations of Bayesian additive regression trees (BART) one-hot encode categorical predictors, replacing each one with several binary indicators, one for every level or category. Regression trees built with these indicators partition the discrete set of categorical levels by repeatedly removing one level at a time. Unfortunately, the vast majority of partitions cannot be built with this strategy, severely limiting BART's ability to partially pool data across groups of levels. Motivated by analyses of baseball data and neighborhood-level crime dynamics, we overcame this limitation by re-implementing BART with regression trees that can assign multiple levels to both branches of a decision tree node. To model spatial data aggregated into small regions, we further proposed a new decision rule prior that creates spatially contiguous regions by deleting a random edge from a random spanning tree of a suitably defined network. Our re-implementation, which is available in the flexBART package, often yields improved out-of-sample predictive performance and scales better to larger datasets than existing implementations of BART.
Comment: Software available at https://github.com/skdeshpande91/flexBART
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2211.04459
Accession Number: edsarx.2211.04459
Database: arXiv
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://arxiv.org/abs/2211.04459
    Name: EDS - Arxiv
    Category: fullText
    Text: View this record from Arxiv
    MouseOverText: View this record from Arxiv
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20221108&spage=&pages=&title=flexBART: Flexible Bayesian regression trees with categorical predictors&atitle=flexBART%3A%20Flexible%20Bayesian%20regression%20trees%20with%20categorical%20predictors&aulast=Deshpande%2C%20Sameer%20K.&id=DOI:
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: edsarx
DbLabel: arXiv
An: edsarx.2211.04459
RelevancyScore: 1037
AccessLevel: 3
PubType: Report
PubTypeId: report
PreciseRelevancyScore: 1037.18188476563
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: flexBART: Flexible Bayesian regression trees with categorical predictors
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Deshpande%2C+Sameer+K%2E%22">Deshpande, Sameer K.</searchLink>
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2022
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Statistics
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Statistics+-+Methodology%22">Statistics - Methodology</searchLink><br /><searchLink fieldCode="DE" term="%22Statistics+-+Machine+Learning%22">Statistics - Machine Learning</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Most implementations of Bayesian additive regression trees (BART) one-hot encode categorical predictors, replacing each one with several binary indicators, one for every level or category. Regression trees built with these indicators partition the discrete set of categorical levels by repeatedly removing one level at a time. Unfortunately, the vast majority of partitions cannot be built with this strategy, severely limiting BART's ability to partially pool data across groups of levels. Motivated by analyses of baseball data and neighborhood-level crime dynamics, we overcame this limitation by re-implementing BART with regression trees that can assign multiple levels to both branches of a decision tree node. To model spatial data aggregated into small regions, we further proposed a new decision rule prior that creates spatially contiguous regions by deleting a random edge from a random spanning tree of a suitably defined network. Our re-implementation, which is available in the flexBART package, often yields improved out-of-sample predictive performance and scales better to larger datasets than existing implementations of BART.<br />Comment: Software available at https://github.com/skdeshpande91/flexBART
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Working Paper
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2211.04459" linkWindow="_blank">http://arxiv.org/abs/2211.04459</link>
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsarx.2211.04459
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2211.04459
RecordInfo BibRecord:
  BibEntity:
    Subjects:
      – SubjectFull: Statistics - Methodology
        Type: general
      – SubjectFull: Statistics - Machine Learning
        Type: general
    Titles:
      – TitleFull: flexBART: Flexible Bayesian regression trees with categorical predictors
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Deshpande, Sameer K.
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 08
              M: 11
              Type: published
              Y: 2022
ResultId 1