Frequency-Adaptive Low-Latency Object Detection Using Events and Frames

Bibliographic Details
Title: Frequency-Adaptive Low-Latency Object Detection Using Events and Frames
Authors: Zhang, Haitian, Wang, Xiangyuan, Xu, Chang, Wang, Xinya, Xu, Fang, Yu, Huai, Yu, Lei, Yang, Wen
Publication Year: 2024
Collection: Computer Science
Subject Terms: Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
More Details: Fusing Events and RGB images for object detection leverages the robustness of Event cameras in adverse environments and the rich semantic information provided by RGB cameras. However, two critical mismatches: low-latency Events \textit{vs.}~high-latency RGB frames; temporally sparse labels in training \textit{vs.}~continuous flow in inference, significantly hinder the high-frequency fusion-based object detection. To address these challenges, we propose the \textbf{F}requency-\textbf{A}daptive Low-Latency \textbf{O}bject \textbf{D}etector (FAOD). FAOD aligns low-frequency RGB frames with high-frequency Events through an Align Module, which reinforces cross-modal style and spatial proximity to address the Event-RGB Mismatch. We further propose a training strategy, Time Shift, which enforces the module to align the prediction from temporally shifted Event-RGB pairs and their original representation, that is, consistent with Event-aligned annotations. This strategy enables the network to use high-frequency Event data as the primary reference while treating low-frequency RGB images as supplementary information, retaining the low-latency nature of the Event stream toward high-frequency detection. Furthermore, we observe that these corrected Event-RGB pairs demonstrate better generalization from low training frequency to higher inference frequencies compared to using Event data alone. Extensive experiments on the PKU-DAVIS-SOD and DSEC-Detection datasets demonstrate that our FAOD achieves SOTA performance. Specifically, in the PKU-DAVIS-SOD Dataset, FAOD achieves 9.8 points improvement in terms of the mAP in fully paired Event-RGB data with only a quarter of the parameters compared to SODFormer, and even maintains robust performance (only a 3 points drop in mAP) under 80$\times$ Event-RGB frequency mismatch.
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2412.04149
Accession Number: edsarx.2412.04149
Database: arXiv
FullText Text:
  Availability: 0
CustomLinks:
  – Url: http://arxiv.org/abs/2412.04149
    Name: EDS - Arxiv
    Category: fullText
    Text: View this record from Arxiv
    MouseOverText: View this record from Arxiv
  – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20241205&spage=&pages=&title=Frequency-Adaptive Low-Latency Object Detection Using Events and Frames&atitle=Frequency-Adaptive%20Low-Latency%20Object%20Detection%20Using%20Events%20and%20Frames&aulast=Zhang%2C%20Haitian&id=DOI:
    Name: Full Text Finder (for New FTF UI) (s8985755)
    Category: fullText
    Text: Find It @ SCU Libraries
    MouseOverText: Find It @ SCU Libraries
Header DbId: edsarx
DbLabel: arXiv
An: edsarx.2412.04149
RelevancyScore: 1128
AccessLevel: 3
PubType: Report
PubTypeId: report
PreciseRelevancyScore: 1128.04370117188
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Frequency-Adaptive Low-Latency Object Detection Using Events and Frames
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Zhang%2C+Haitian%22">Zhang, Haitian</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Xiangyuan%22">Wang, Xiangyuan</searchLink><br /><searchLink fieldCode="AR" term="%22Xu%2C+Chang%22">Xu, Chang</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Xinya%22">Wang, Xinya</searchLink><br /><searchLink fieldCode="AR" term="%22Xu%2C+Fang%22">Xu, Fang</searchLink><br /><searchLink fieldCode="AR" term="%22Yu%2C+Huai%22">Yu, Huai</searchLink><br /><searchLink fieldCode="AR" term="%22Yu%2C+Lei%22">Yu, Lei</searchLink><br /><searchLink fieldCode="AR" term="%22Yang%2C+Wen%22">Yang, Wen</searchLink>
– Name: DatePubCY
  Label: Publication Year
  Group: Date
  Data: 2024
– Name: Subset
  Label: Collection
  Group: HoldingsInfo
  Data: Computer Science
– Name: Subject
  Label: Subject Terms
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Computer+Vision+and+Pattern+Recognition%22">Computer Science - Computer Vision and Pattern Recognition</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Artificial+Intelligence%22">Computer Science - Artificial Intelligence</searchLink>
– Name: Abstract
  Label: Description
  Group: Ab
  Data: Fusing Events and RGB images for object detection leverages the robustness of Event cameras in adverse environments and the rich semantic information provided by RGB cameras. However, two critical mismatches: low-latency Events \textit{vs.}~high-latency RGB frames; temporally sparse labels in training \textit{vs.}~continuous flow in inference, significantly hinder the high-frequency fusion-based object detection. To address these challenges, we propose the \textbf{F}requency-\textbf{A}daptive Low-Latency \textbf{O}bject \textbf{D}etector (FAOD). FAOD aligns low-frequency RGB frames with high-frequency Events through an Align Module, which reinforces cross-modal style and spatial proximity to address the Event-RGB Mismatch. We further propose a training strategy, Time Shift, which enforces the module to align the prediction from temporally shifted Event-RGB pairs and their original representation, that is, consistent with Event-aligned annotations. This strategy enables the network to use high-frequency Event data as the primary reference while treating low-frequency RGB images as supplementary information, retaining the low-latency nature of the Event stream toward high-frequency detection. Furthermore, we observe that these corrected Event-RGB pairs demonstrate better generalization from low training frequency to higher inference frequencies compared to using Event data alone. Extensive experiments on the PKU-DAVIS-SOD and DSEC-Detection datasets demonstrate that our FAOD achieves SOTA performance. Specifically, in the PKU-DAVIS-SOD Dataset, FAOD achieves 9.8 points improvement in terms of the mAP in fully paired Event-RGB data with only a quarter of the parameters compared to SODFormer, and even maintains robust performance (only a 3 points drop in mAP) under 80$\times$ Event-RGB frequency mismatch.
– Name: TypeDocument
  Label: Document Type
  Group: TypDoc
  Data: Working Paper
– Name: URL
  Label: Access URL
  Group: URL
  Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2412.04149" linkWindow="_blank">http://arxiv.org/abs/2412.04149</link>
– Name: AN
  Label: Accession Number
  Group: ID
  Data: edsarx.2412.04149
PLink https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2412.04149
RecordInfo BibRecord:
  BibEntity:
    Subjects:
      – SubjectFull: Computer Science - Computer Vision and Pattern Recognition
        Type: general
      – SubjectFull: Computer Science - Artificial Intelligence
        Type: general
    Titles:
      – TitleFull: Frequency-Adaptive Low-Latency Object Detection Using Events and Frames
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Zhang, Haitian
      – PersonEntity:
          Name:
            NameFull: Wang, Xiangyuan
      – PersonEntity:
          Name:
            NameFull: Xu, Chang
      – PersonEntity:
          Name:
            NameFull: Wang, Xinya
      – PersonEntity:
          Name:
            NameFull: Xu, Fang
      – PersonEntity:
          Name:
            NameFull: Yu, Huai
      – PersonEntity:
          Name:
            NameFull: Yu, Lei
      – PersonEntity:
          Name:
            NameFull: Yang, Wen
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 05
              M: 12
              Type: published
              Y: 2024
ResultId 1