Comparing two deep learning sequence-based models for protein-protein interaction prediction

Bibliographic Details
Title: Comparing two deep learning sequence-based models for protein-protein interaction prediction
Authors: Richoux, Florian, Servantie, Charlène, Borès, Cynthia, Téletchéa, Stéphane
Publication Year: 2019
Collection: Computer Science
Quantitative Biology
Statistics
Subject Terms: Computer Science - Machine Learning, Quantitative Biology - Quantitative Methods, Statistics - Machine Learning
More Details: Biological data are extremely diverse, complex but also quite sparse. The recent developments in deep learning methods are offering new possibilities for the analysis of complex data. However, it is easy to be get a deep learning model that seems to have good results but is in fact either overfitting the training data or the validation data. In particular, the fact to overfit the validation data, called "information leak", is almost never treated in papers proposing deep learning models to predict protein-protein interactions (PPI). In this work, we compare two carefully designed deep learning models and show pitfalls to avoid while predicting PPIs through machine learning methods. Our best model predicts accurately more than 78% of human PPI, in very strict conditions both for training and testing. The methodology we propose here allow us to have strong confidences about the ability of a model to scale up on larger datasets. This would allow sharper models when larger datasets would be available, rather than current models prone to information leaks. Our solid methodological foundations shall be applicable to more organisms and whole proteome networks predictions.
Document Type: Working Paper
Access URL: http://arxiv.org/abs/1901.06268
Accession Number: edsarx.1901.06268
Database: arXiv
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://arxiv.org/abs/1901.06268
    Name: EDS - Arxiv
    Category: fullText
    Text: View this record from Arxiv
    MouseOverText: View this record from Arxiv
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20190114&spage=&pages=&title=Comparing two deep learning sequence-based models for protein-protein interaction prediction&atitle=Comparing%20two%20deep%20learning%20sequence-based%20models%20for%20protein-protein%20interaction%20prediction&aulast=Richoux%2C%20Florian&id=DOI:
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: edsarx
DbLabel: arXiv
An: edsarx.1901.06268
RelevancyScore: 983
AccessLevel: 3
PubType: Report
PubTypeId: report
PreciseRelevancyScore: 982.666015625
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Comparing two deep learning sequence-based models for protein-protein interaction prediction
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Richoux%2C+Florian%22">Richoux, Florian</searchLink><br /><searchLink fieldCode="AR" term="%22Servantie%2C+Charlène%22">Servantie, Charlène</searchLink><br /><searchLink fieldCode="AR" term="%22Borès%2C+Cynthia%22">Borès, Cynthia</searchLink><br /><searchLink fieldCode="AR" term="%22Téletchéa%2C+Stéphane%22">Téletchéa, Stéphane</searchLink>
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2019
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Computer Science<br />Quantitative Biology<br />Statistics
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Machine+Learning%22">Computer Science - Machine Learning</searchLink><br /><searchLink fieldCode="DE" term="%22Quantitative+Biology+-+Quantitative+Methods%22">Quantitative Biology - Quantitative Methods</searchLink><br /><searchLink fieldCode="DE" term="%22Statistics+-+Machine+Learning%22">Statistics - Machine Learning</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Biological data are extremely diverse, complex but also quite sparse. The recent developments in deep learning methods are offering new possibilities for the analysis of complex data. However, it is easy to be get a deep learning model that seems to have good results but is in fact either overfitting the training data or the validation data. In particular, the fact to overfit the validation data, called "information leak", is almost never treated in papers proposing deep learning models to predict protein-protein interactions (PPI). In this work, we compare two carefully designed deep learning models and show pitfalls to avoid while predicting PPIs through machine learning methods. Our best model predicts accurately more than 78% of human PPI, in very strict conditions both for training and testing. The methodology we propose here allow us to have strong confidences about the ability of a model to scale up on larger datasets. This would allow sharper models when larger datasets would be available, rather than current models prone to information leaks. Our solid methodological foundations shall be applicable to more organisms and whole proteome networks predictions.
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Working Paper
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/1901.06268" linkWindow="_blank">http://arxiv.org/abs/1901.06268</link>
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsarx.1901.06268
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.1901.06268
RecordInfo BibRecord:
  BibEntity:
    Subjects:
      – SubjectFull: Computer Science - Machine Learning
        Type: general
      – SubjectFull: Quantitative Biology - Quantitative Methods
        Type: general
      – SubjectFull: Statistics - Machine Learning
        Type: general
    Titles:
      – TitleFull: Comparing two deep learning sequence-based models for protein-protein interaction prediction
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Richoux, Florian
      – PersonEntity:
          Name:
            NameFull: Servantie, Charlène
      – PersonEntity:
          Name:
            NameFull: Borès, Cynthia
      – PersonEntity:
          Name:
            NameFull: Téletchéa, Stéphane
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 14
              M: 01
              Type: published
              Y: 2019
ResultId 1