Frequency-Adaptive Low-Latency Object Detection Using Events and Frames
Title: | Frequency-Adaptive Low-Latency Object Detection Using Events and Frames |
---|---|
Authors: | Zhang, Haitian, Wang, Xiangyuan, Xu, Chang, Wang, Xinya, Xu, Fang, Yu, Huai, Yu, Lei, Yang, Wen |
Publication Year: | 2024 |
Collection: | Computer Science |
Subject Terms: | Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence |
More Details: | Fusing Events and RGB images for object detection leverages the robustness of Event cameras in adverse environments and the rich semantic information provided by RGB cameras. However, two critical mismatches: low-latency Events \textit{vs.}~high-latency RGB frames; temporally sparse labels in training \textit{vs.}~continuous flow in inference, significantly hinder the high-frequency fusion-based object detection. To address these challenges, we propose the \textbf{F}requency-\textbf{A}daptive Low-Latency \textbf{O}bject \textbf{D}etector (FAOD). FAOD aligns low-frequency RGB frames with high-frequency Events through an Align Module, which reinforces cross-modal style and spatial proximity to address the Event-RGB Mismatch. We further propose a training strategy, Time Shift, which enforces the module to align the prediction from temporally shifted Event-RGB pairs and their original representation, that is, consistent with Event-aligned annotations. This strategy enables the network to use high-frequency Event data as the primary reference while treating low-frequency RGB images as supplementary information, retaining the low-latency nature of the Event stream toward high-frequency detection. Furthermore, we observe that these corrected Event-RGB pairs demonstrate better generalization from low training frequency to higher inference frequencies compared to using Event data alone. Extensive experiments on the PKU-DAVIS-SOD and DSEC-Detection datasets demonstrate that our FAOD achieves SOTA performance. Specifically, in the PKU-DAVIS-SOD Dataset, FAOD achieves 9.8 points improvement in terms of the mAP in fully paired Event-RGB data with only a quarter of the parameters compared to SODFormer, and even maintains robust performance (only a 3 points drop in mAP) under 80$\times$ Event-RGB frequency mismatch. |
Document Type: | Working Paper |
Access URL: | http://arxiv.org/abs/2412.04149 |
Accession Number: | edsarx.2412.04149 |
Database: | arXiv |
FullText | Text: Availability: 0 CustomLinks: – Url: http://arxiv.org/abs/2412.04149 Name: EDS - Arxiv Category: fullText Text: View this record from Arxiv MouseOverText: View this record from Arxiv – Url: https://resolver.ebsco.com/c/xy5jbn/result?sid=EBSCO:edsarx&genre=article&issn=&ISBN=&volume=&issue=&date=20241205&spage=&pages=&title=Frequency-Adaptive Low-Latency Object Detection Using Events and Frames&atitle=Frequency-Adaptive%20Low-Latency%20Object%20Detection%20Using%20Events%20and%20Frames&aulast=Zhang%2C%20Haitian&id=DOI: Name: Full Text Finder (for New FTF UI) (s8985755) Category: fullText Text: Find It @ SCU Libraries MouseOverText: Find It @ SCU Libraries |
---|---|
Header | DbId: edsarx DbLabel: arXiv An: edsarx.2412.04149 RelevancyScore: 1128 AccessLevel: 3 PubType: Report PubTypeId: report PreciseRelevancyScore: 1128.04370117188 |
IllustrationInfo | |
Items | – Name: Title Label: Title Group: Ti Data: Frequency-Adaptive Low-Latency Object Detection Using Events and Frames – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Zhang%2C+Haitian%22">Zhang, Haitian</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Xiangyuan%22">Wang, Xiangyuan</searchLink><br /><searchLink fieldCode="AR" term="%22Xu%2C+Chang%22">Xu, Chang</searchLink><br /><searchLink fieldCode="AR" term="%22Wang%2C+Xinya%22">Wang, Xinya</searchLink><br /><searchLink fieldCode="AR" term="%22Xu%2C+Fang%22">Xu, Fang</searchLink><br /><searchLink fieldCode="AR" term="%22Yu%2C+Huai%22">Yu, Huai</searchLink><br /><searchLink fieldCode="AR" term="%22Yu%2C+Lei%22">Yu, Lei</searchLink><br /><searchLink fieldCode="AR" term="%22Yang%2C+Wen%22">Yang, Wen</searchLink> – Name: DatePubCY Label: Publication Year Group: Date Data: 2024 – Name: Subset Label: Collection Group: HoldingsInfo Data: Computer Science – Name: Subject Label: Subject Terms Group: Su Data: <searchLink fieldCode="DE" term="%22Computer+Science+-+Computer+Vision+and+Pattern+Recognition%22">Computer Science - Computer Vision and Pattern Recognition</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+Science+-+Artificial+Intelligence%22">Computer Science - Artificial Intelligence</searchLink> – Name: Abstract Label: Description Group: Ab Data: Fusing Events and RGB images for object detection leverages the robustness of Event cameras in adverse environments and the rich semantic information provided by RGB cameras. However, two critical mismatches: low-latency Events \textit{vs.}~high-latency RGB frames; temporally sparse labels in training \textit{vs.}~continuous flow in inference, significantly hinder the high-frequency fusion-based object detection. To address these challenges, we propose the \textbf{F}requency-\textbf{A}daptive Low-Latency \textbf{O}bject \textbf{D}etector (FAOD). FAOD aligns low-frequency RGB frames with high-frequency Events through an Align Module, which reinforces cross-modal style and spatial proximity to address the Event-RGB Mismatch. We further propose a training strategy, Time Shift, which enforces the module to align the prediction from temporally shifted Event-RGB pairs and their original representation, that is, consistent with Event-aligned annotations. This strategy enables the network to use high-frequency Event data as the primary reference while treating low-frequency RGB images as supplementary information, retaining the low-latency nature of the Event stream toward high-frequency detection. Furthermore, we observe that these corrected Event-RGB pairs demonstrate better generalization from low training frequency to higher inference frequencies compared to using Event data alone. Extensive experiments on the PKU-DAVIS-SOD and DSEC-Detection datasets demonstrate that our FAOD achieves SOTA performance. Specifically, in the PKU-DAVIS-SOD Dataset, FAOD achieves 9.8 points improvement in terms of the mAP in fully paired Event-RGB data with only a quarter of the parameters compared to SODFormer, and even maintains robust performance (only a 3 points drop in mAP) under 80$\times$ Event-RGB frequency mismatch. – Name: TypeDocument Label: Document Type Group: TypDoc Data: Working Paper – Name: URL Label: Access URL Group: URL Data: <link linkTarget="URL" linkTerm="http://arxiv.org/abs/2412.04149" linkWindow="_blank">http://arxiv.org/abs/2412.04149</link> – Name: AN Label: Accession Number Group: ID Data: edsarx.2412.04149 |
PLink | https://login.libproxy.scu.edu/login?url=https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&scope=site&db=edsarx&AN=edsarx.2412.04149 |
RecordInfo | BibRecord: BibEntity: Subjects: – SubjectFull: Computer Science - Computer Vision and Pattern Recognition Type: general – SubjectFull: Computer Science - Artificial Intelligence Type: general Titles: – TitleFull: Frequency-Adaptive Low-Latency Object Detection Using Events and Frames Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Zhang, Haitian – PersonEntity: Name: NameFull: Wang, Xiangyuan – PersonEntity: Name: NameFull: Xu, Chang – PersonEntity: Name: NameFull: Wang, Xinya – PersonEntity: Name: NameFull: Xu, Fang – PersonEntity: Name: NameFull: Yu, Huai – PersonEntity: Name: NameFull: Yu, Lei – PersonEntity: Name: NameFull: Yang, Wen IsPartOfRelationships: – BibEntity: Dates: – D: 05 M: 12 Type: published Y: 2024 |
ResultId | 1 |