Bibliographic Details
Title: |
Multi-Modal Hand-Object Pose Estimation With Adaptive Fusion and Interaction Learning |
Authors: |
Dinh-Cuong Hoang, Phan Xuan Tan, Anh-Nhat Nguyen, Duy-Quang Vu, Van-Duc Vu, Thu-Uyen Nguyen, Ngoc-Anh Hoang, Khanh-Toan Phan, Duc-Thanh Tran, Van-Thiep Nguyen, Quang-Tri Duong, Ngoc-Trung Ho, Cong-Trinh Tran, Van-Hiep Duong, Phuc-Quan Ngo |
Source: |
IEEE Access, Vol 12, Pp 54339-54351 (2024) |
Publisher Information: |
IEEE, 2024. |
Publication Year: |
2024 |
Collection: |
LCC:Electrical engineering. Electronics. Nuclear engineering |
Subject Terms: |
Pose estimation, robot vision systems, intelligent systems, deep learning, supervised learning, machine vision, Electrical engineering. Electronics. Nuclear engineering, TK1-9971 |
More Details: |
Hand-object configuration recovery is an important task in computer vision. The estimation of pose and shape for both hands and objects during interactive scenarios has various applications, particularly in augmented reality, virtual reality, or imitation-based robot learning. The problem is particularly challenging when the hand is interacting with objects in the environment, as this setting features both extreme occlusions and non-trivial shape deformations. While existing works treat the problem of estimating hand configurations (that is pose and shape parameters) in isolation from the recovery of parameters related to the object acted upon, we stipulate that the two problems are related and can be solved more accurately concurrently. We introduce an approach that jointly learns the features of hand and object from color and depth (RGB-D) images. Our approach fuses appearance and geometric features in an adaptive manner which allows us to accent or suppress features that are more meaningful for the upstream task of hand-object configuration recovery. We combine a deep Hough voting strategy that builds on our adaptive features with a graph convolutional network (GCN) to learn the interaction relationships between the hand and held object shapes during interaction. Experimental results demonstrate that our proposed approach consistently outperforms state-of-the-art methods on popular datasets. |
Document Type: |
article |
File Description: |
electronic resource |
Language: |
English |
ISSN: |
2169-3536 |
Relation: |
https://ieeexplore.ieee.org/document/10499806/; https://doaj.org/toc/2169-3536 |
DOI: |
10.1109/ACCESS.2024.3388870 |
Access URL: |
https://doaj.org/article/28cab9981f8549d09855c1e48f7f023d |
Accession Number: |
edsdoj.28cab9981f8549d09855c1e48f7f023d |
Database: |
Directory of Open Access Journals |