PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning

Bibliographic Details
Title: PSVMA+: Exploring Multi-granularity Semantic-visual Adaption for Generalized Zero-shot Learning
Authors: Liu, Man, Bai, Huihui, Li, Feng, Zhang, Chunjie, Wei, Yunchao, Wang, Meng, Chua, Tat-Seng, Zhao, Yao
Publication Year: 2024
Collection: Computer Science
Subject Terms: Computer Science - Computer Vision and Pattern Recognition
More Details: Generalized zero-shot learning (GZSL) endeavors to identify the unseen categories using knowledge from the seen domain, necessitating the intrinsic interactions between the visual features and attribute semantic features. However, GZSL suffers from insufficient visual-semantic correspondences due to the attribute diversity and instance diversity. Attribute diversity refers to varying semantic granularity in attribute descriptions, ranging from low-level (specific, directly observable) to high-level (abstract, highly generic) characteristics. This diversity challenges the collection of adequate visual cues for attributes under a uni-granularity. Additionally, diverse visual instances corresponding to the same sharing attributes introduce semantic ambiguity, leading to vague visual patterns. To tackle these problems, we propose a multi-granularity progressive semantic-visual mutual adaption (PSVMA+) network, where sufficient visual elements across granularity levels can be gathered to remedy the granularity inconsistency. PSVMA+ explores semantic-visual interactions at different granularity levels, enabling awareness of multi-granularity in both visual and semantic elements. At each granularity level, the dual semantic-visual transformer module (DSVTM) recasts the sharing attributes into instance-centric attributes and aggregates the semantic-related visual regions, thereby learning unambiguous visual features to accommodate various instances. Given the diverse contributions of different granularities, PSVMA+ employs selective cross-granularity learning to leverage knowledge from reliable granularities and adaptively fuses multi-granularity features for comprehensive representations. Experimental results demonstrate that PSVMA+ consistently outperforms state-of-the-art methods.
Comment: Accepted to TPAMI 2024. arXiv admin note: text overlap with arXiv:2303.15322
Document Type: Working Paper
Access URL: http://arxiv.org/abs/2410.11560
Accession Number: edsarx.2410.11560
Database: arXiv
More Details
Description not available.