Frequency-Adaptive Low-Latency Object Detection Using Events and Frames

Bibliographic Details
Title:	Frequency-Adaptive Low-Latency Object Detection Using Events and Frames
Authors:	Zhang, Haitian, Wang, Xiangyuan, Xu, Chang, Wang, Xinya, Xu, Fang, Yu, Huai, Yu, Lei, Yang, Wen
Publication Year:	2024
Collection:	Computer Science
Subject Terms:	Computer Science - Computer Vision and Pattern Recognition, Computer Science - Artificial Intelligence
More Details:	Fusing Events and RGB images for object detection leverages the robustness of Event cameras in adverse environments and the rich semantic information provided by RGB cameras. However, two critical mismatches: low-latency Events \textit{vs.}~high-latency RGB frames; temporally sparse labels in training \textit{vs.}~continuous flow in inference, significantly hinder the high-frequency fusion-based object detection. To address these challenges, we propose the \textbf{F}requency-\textbf{A}daptive Low-Latency \textbf{O}bject \textbf{D}etector (FAOD). FAOD aligns low-frequency RGB frames with high-frequency Events through an Align Module, which reinforces cross-modal style and spatial proximity to address the Event-RGB Mismatch. We further propose a training strategy, Time Shift, which enforces the module to align the prediction from temporally shifted Event-RGB pairs and their original representation, that is, consistent with Event-aligned annotations. This strategy enables the network to use high-frequency Event data as the primary reference while treating low-frequency RGB images as supplementary information, retaining the low-latency nature of the Event stream toward high-frequency detection. Furthermore, we observe that these corrected Event-RGB pairs demonstrate better generalization from low training frequency to higher inference frequencies compared to using Event data alone. Extensive experiments on the PKU-DAVIS-SOD and DSEC-Detection datasets demonstrate that our FAOD achieves SOTA performance. Specifically, in the PKU-DAVIS-SOD Dataset, FAOD achieves 9.8 points improvement in terms of the mAP in fully paired Event-RGB data with only a quarter of the parameters compared to SODFormer, and even maintains robust performance (only a 3 points drop in mAP) under 80$\times$ Event-RGB frequency mismatch.
Document Type:	Working Paper
Access URL:	http://arxiv.org/abs/2412.04149
Accession Number:	edsarx.2412.04149
Database:	arXiv

More Details
Description not available.