Multi-Modal Hand-Object Pose Estimation With Adaptive Fusion and Interaction Learning

Bibliographic Details
Title:	Multi-Modal Hand-Object Pose Estimation With Adaptive Fusion and Interaction Learning
Authors:	Dinh-Cuong Hoang, Phan Xuan Tan, Anh-Nhat Nguyen, Duy-Quang Vu, Van-Duc Vu, Thu-Uyen Nguyen, Ngoc-Anh Hoang, Khanh-Toan Phan, Duc-Thanh Tran, Van-Thiep Nguyen, Quang-Tri Duong, Ngoc-Trung Ho, Cong-Trinh Tran, Van-Hiep Duong, Phuc-Quan Ngo
Source:	IEEE Access, Vol 12, Pp 54339-54351 (2024)
Publisher Information:	IEEE, 2024.
Publication Year:	2024
Collection:	LCC:Electrical engineering. Electronics. Nuclear engineering
Subject Terms:	Pose estimation, robot vision systems, intelligent systems, deep learning, supervised learning, machine vision, Electrical engineering. Electronics. Nuclear engineering, TK1-9971
More Details:	Hand-object configuration recovery is an important task in computer vision. The estimation of pose and shape for both hands and objects during interactive scenarios has various applications, particularly in augmented reality, virtual reality, or imitation-based robot learning. The problem is particularly challenging when the hand is interacting with objects in the environment, as this setting features both extreme occlusions and non-trivial shape deformations. While existing works treat the problem of estimating hand configurations (that is pose and shape parameters) in isolation from the recovery of parameters related to the object acted upon, we stipulate that the two problems are related and can be solved more accurately concurrently. We introduce an approach that jointly learns the features of hand and object from color and depth (RGB-D) images. Our approach fuses appearance and geometric features in an adaptive manner which allows us to accent or suppress features that are more meaningful for the upstream task of hand-object configuration recovery. We combine a deep Hough voting strategy that builds on our adaptive features with a graph convolutional network (GCN) to learn the interaction relationships between the hand and held object shapes during interaction. Experimental results demonstrate that our proposed approach consistently outperforms state-of-the-art methods on popular datasets.
Document Type:	article
File Description:	electronic resource
Language:	English
ISSN:	2169-3536
Relation:	https://ieeexplore.ieee.org/document/10499806/; https://doaj.org/toc/2169-3536
DOI:	10.1109/ACCESS.2024.3388870
Access URL:	https://doaj.org/article/28cab9981f8549d09855c1e48f7f023d
Accession Number:	edsdoj.28cab9981f8549d09855c1e48f7f023d
Database:	Directory of Open Access Journals

More Details
ISSN:	21693536
DOI:	10.1109/ACCESS.2024.3388870
Published in:	IEEE Access
Language:	English