SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network

Bibliographic Details
Title: SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network
Authors: Li, Tianlong, Liu, Wenhao, Lv, Changze, Gu, Yufei, Xu, Jianhan, Zhang, Cenyuan, Wu, Muling, Zheng, Xiaoqing, Huang, Xuanjing
Publication Year: 2023
Collection: Computer Science
Subject Terms: Computer Science - Neural and Evolutionary Computing, Computer Science - Computation and Language, Computer Science - Computer Vision and Pattern Recognition, Computer Science - Machine Learning
More Details: Spiking Neural Networks (SNNs) have emerged as a promising alternative to conventional Artificial Neural Networks (ANNs), demonstrating comparable performance in both visual and linguistic tasks while offering the advantage of improved energy efficiency. Despite these advancements, the integration of linguistic and visual features into a unified representation through spike trains poses a significant challenge, and the application of SNNs to multimodal scenarios remains largely unexplored. This paper presents SpikeCLIP, a novel framework designed to bridge the modality gap in spike-based computation. Our approach employs a two-step recipe: an ``alignment pre-training'' to align features across modalities, followed by a ``dual-loss fine-tuning'' to refine the model's performance. Extensive experiments reveal that SNNs achieve results on par with ANNs while substantially reducing energy consumption across various datasets commonly used for multimodal model evaluation. Furthermore, SpikeCLIP maintains robust image classification capabilities, even when dealing with classes that fall outside predefined categories. This study marks a significant advancement in the development of energy-efficient and biologically plausible multimodal learning systems.
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2310.06488
Accession Number: edsarx.2310.06488
Database: arXiv
More Details
Description not available.