Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Irene Viola

Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality

May 16, 2024

Jiahuan Pei, Irene Viola, Haochen Huang, Junxiao Wang, Moonisa Ahsan, Fanghua Ye, Jiang Yiming, Yao Sai, Di Wang, Zhumin Chen(+2 more)

Figure 1 for Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality

Figure 2 for Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality

Figure 3 for Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality

Figure 4 for Autonomous Workflow for Multimodal Fine-Grained Training Assistants Towards Mixed Reality

Abstract:Autonomous artificial intelligence (AI) agents have emerged as promising protocols for automatically understanding the language-based environment, particularly with the exponential development of large language models (LLMs). However, a fine-grained, comprehensive understanding of multimodal environments remains under-explored. This work designs an autonomous workflow tailored for integrating AI agents seamlessly into extended reality (XR) applications for fine-grained training. We present a demonstration of a multimodal fine-grained training assistant for LEGO brick assembly in a pilot XR environment. Specifically, we design a cerebral language agent that integrates LLM with memory, planning, and interaction with XR tools and a vision-language agent, enabling agents to decide their actions based on past experiences. Furthermore, we introduce LEGO-MRTA, a multimodal fine-grained assembly dialogue dataset synthesized automatically in the workflow served by a commercial LLM. This dataset comprises multimodal instruction manuals, conversations, XR responses, and vision question answering. Last, we present several prevailing open-resource LLMs as benchmarks, assessing their performance with and without fine-tuning on the proposed dataset. We anticipate that the broader impact of this workflow will advance the development of smarter assistants for seamless user interaction in XR environments, fostering research in both AI and HCI communities.

* Accepted by ACL 2024

Via

Access Paper or Ask Questions

PointPCA+: Extending PointPCA objective quality assessment metric

Nov 23, 2023

Xuemei Zhou, Evangelos Alexiou, Irene Viola, Pablo Cesar

Figure 1 for PointPCA+: Extending PointPCA objective quality assessment metric

Figure 2 for PointPCA+: Extending PointPCA objective quality assessment metric

Figure 3 for PointPCA+: Extending PointPCA objective quality assessment metric

Figure 4 for PointPCA+: Extending PointPCA objective quality assessment metric

Abstract:A computationally-simplified and descriptor-richer Point Cloud Quality Assessment (PCQA) metric, namely PointPCA+, is proposed in this paper, which is an extension of PointPCA. PointPCA proposed a set of perceptually-relevant descriptors based on PCA decomposition that were applied to both the geometry and texture data of point clouds for full reference PCQA. PointPCA+ employs PCA only on the geometry data while enriching existing geometry and texture descriptors, that are computed more efficiently. Similarly to PointPCA, a total quality score is obtained through a learning-based fusion of individual predictions from geometry and texture descriptors that capture local shape and appearance properties, respectively. Before feature fusion, a feature selection module is introduced to choose the most effective features from a proposed super-set. Experimental results show that PointPCA+ achieves high predictive performance against subjective ground truth scores obtained from publicly available datasets. The code is available at \url{https://github.com/cwi-dis/pointpca_suite/}.

* ICIP 2023

Via

Access Paper or Ask Questions

Volumetric video streaming: Current approaches and implementations

Sep 05, 2022

Irene Viola, Pablo Cesar

Figure 1 for Volumetric video streaming: Current approaches and implementations

Figure 2 for Volumetric video streaming: Current approaches and implementations

Figure 3 for Volumetric video streaming: Current approaches and implementations

Abstract:The rise of capturing systems for objects and scenes in 3D with increased fidelity and immersion has led to the popularity of volumetric video contents that can be seen from any position and angle in 6 degrees of freedom navigation. Such contents need large volumes of data to accurately represent the real world. Thus, novel optimization solutions and delivery systems are needed to enable volumetric video streaming over bandwidth-limited networks. In this chapter, we discuss theoretical approaches to volumetric video streaming optimization, through compression solutions, as well as network and user adaptation, for high-end and low-powered devices. Moreover, we present an overview of existing end-to-end systems, and we point to the future of volumetric video streaming.

* Valenzise, G., Alain, M., Zerman, E., and Ozcinar, C., Immersive media technologies. 2022, Elsevier

Via

Access Paper or Ask Questions