Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xian Wang

Valley2: Exploring Multimodal Models with Scalable Vision-Language Design

Jan 13, 2025

Ziheng Wu, Zhenghao Chen, Ruipu Luo, Can Zhang, Yuan Gao, Zhentao He, Xian Wang, Haoran Lin, Minghui Qiu

Figure 1 for Valley2: Exploring Multimodal Models with Scalable Vision-Language Design

Figure 2 for Valley2: Exploring Multimodal Models with Scalable Vision-Language Design

Figure 3 for Valley2: Exploring Multimodal Models with Scalable Vision-Language Design

Figure 4 for Valley2: Exploring Multimodal Models with Scalable Vision-Language Design

Abstract:Recently, vision-language models have made remarkable progress, demonstrating outstanding capabilities in various tasks such as image captioning and video understanding. We introduce Valley2, a novel multimodal large language model designed to enhance performance across all domains and extend the boundaries of practical applications in e-commerce and short video scenarios. Notably, Valley2 achieves state-of-the-art (SOTA) performance on e-commerce benchmarks, surpassing open-source models of similar size by a large margin (79.66 vs. 72.76). Additionally, Valley2 ranks second on the OpenCompass leaderboard among models with fewer than 10B parameters, with an impressive average score of 67.4. The code and model weights are open-sourced at https://github.com/bytedance/Valley.

Via

Access Paper or Ask Questions

LMLPA: Language Model Linguistic Personality Assessment

Oct 23, 2024

Jingyao Zheng, Xian Wang, Simo Hosio, Xiaoxian Xu, Lik-Hang Lee

Abstract:Large Language Models (LLMs) are increasingly used in everyday life and research. One of the most common use cases is conversational interactions, enabled by the language generation capabilities of LLMs. Just as between two humans, a conversation between an LLM-powered entity and a human depends on the personality of the conversants. However, measuring the personality of a given LLM is currently a challenge. This paper introduces the Language Model Linguistic Personality Assessment (LMLPA), a system designed to evaluate the linguistic personalities of LLMs. Our system helps to understand LLMs' language generation capabilities by quantitatively assessing the distinct personality traits reflected in their linguistic outputs. Unlike traditional human-centric psychometrics, the LMLPA adapts a personality assessment questionnaire, specifically the Big Five Inventory, to align with the operational capabilities of LLMs, and also incorporates the findings from previous language-based personality measurement literature. To mitigate sensitivity to the order of options, our questionnaire is designed to be open-ended, resulting in textual answers. Thus, the AI rater is needed to transform ambiguous personality information from text responses into clear numerical indicators of personality traits. Utilising Principal Component Analysis and reliability validations, our findings demonstrate that LLMs possess distinct personality traits that can be effectively quantified by the LMLPA. This research contributes to Human-Computer Interaction and Human-Centered AI, providing a robust framework for future studies to refine AI personality assessments and expand their applications in multiple areas, including education and manufacturing.

Via

Access Paper or Ask Questions

Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning

Sep 25, 2024

Xian Wang, Jin Zhou, Yuanli Feng, Jiahao Mei, Jiming Chen, Shuo Li

Figure 1 for Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning

Figure 2 for Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning

Figure 3 for Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning

Figure 4 for Dashing for the Golden Snitch: Multi-Drone Time-Optimal Motion Planning with Multi-Agent Reinforcement Learning

Abstract:Recent innovations in autonomous drones have facilitated time-optimal flight in single-drone configurations and enhanced maneuverability in multi-drone systems through the application of optimal control and learning-based methods. However, few studies have achieved time-optimal motion planning for multi-drone systems, particularly during highly agile maneuvers or in dynamic scenarios. This paper presents a decentralized policy network for time-optimal multi-drone flight using multi-agent reinforcement learning. To strike a balance between flight efficiency and collision avoidance, we introduce a soft collision penalty inspired by optimization-based methods. By customizing PPO in a centralized training, decentralized execution (CTDE) fashion, we unlock higher efficiency and stability in training, while ensuring lightweight implementation. Extensive simulations show that, despite slight performance trade-offs compared to single-drone systems, our multi-drone approach maintains near-time-optimal performance with low collision rates. Real-world experiments validate our method, with two quadrotors using the same network as simulation achieving a maximum speed of 13.65 m/s and a maximum body rate of 13.4 rad/s in a 5.5 m * 5.5 m * 2.0 m space across various tracks, relying entirely on onboard computation.

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

Towards Massive Interaction with Generalist Robotics: A Systematic Review of XR-enabled Remote Human-Robot Interaction Systems

Mar 26, 2024

Xian Wang, Luyao Shen, Lik-Hang Lee

Figure 1 for Towards Massive Interaction with Generalist Robotics: A Systematic Review of XR-enabled Remote Human-Robot Interaction Systems

Figure 2 for Towards Massive Interaction with Generalist Robotics: A Systematic Review of XR-enabled Remote Human-Robot Interaction Systems

Figure 3 for Towards Massive Interaction with Generalist Robotics: A Systematic Review of XR-enabled Remote Human-Robot Interaction Systems

Figure 4 for Towards Massive Interaction with Generalist Robotics: A Systematic Review of XR-enabled Remote Human-Robot Interaction Systems

Abstract:The rising interest of generalist robots seek to create robots with versatility to handle multiple tasks in a variety of environments, and human will interact with such robots through immersive interfaces. In the context of human-robot interaction (HRI), this survey provides an exhaustive review of the applications of extended reality (XR) technologies in the field of remote HRI. We developed a systematic search strategy based on the PRISMA methodology. From the initial 2,561 articles selected, 100 research papers that met our inclusion criteria were included. We categorized and summarized the domain in detail, delving into XR technologies, including augmented reality (AR), virtual reality (VR), and mixed reality (MR), and their applications in facilitating intuitive and effective remote control and interaction with robotic systems. The survey highlights existing articles on the application of XR technologies, user experience enhancement, and various interaction designs for XR in remote HRI, providing insights into current trends and future directions. We also identified potential gaps and opportunities for future research to improve remote HRI systems through XR technology to guide and inform future XR and robotics research.

Via

Access Paper or Ask Questions

VR PreM+ : An Immersive Pre-learning Branching Visualization System for Museum Tours

Nov 01, 2023

Ze Gao, Xiang Li, Changkun Liu, Xian Wang, Anqi Wang, Liang Yang, Yuyang Wang, Pan Hui, Tristan Braud

Figure 1 for VR PreM+ : An Immersive Pre-learning Branching Visualization System for Museum Tours

Figure 2 for VR PreM+ : An Immersive Pre-learning Branching Visualization System for Museum Tours

Figure 3 for VR PreM+ : An Immersive Pre-learning Branching Visualization System for Museum Tours

Figure 4 for VR PreM+ : An Immersive Pre-learning Branching Visualization System for Museum Tours

Abstract:We present VR PreM+, an innovative VR system designed to enhance web exploration beyond traditional computer screens. Unlike static 2D displays, VR PreM+ leverages 3D environments to create an immersive pre-learning experience. Using keyword-based information retrieval allows users to manage and connect various content sources in a dynamic 3D space, improving communication and data comparison. We conducted preliminary and user studies that demonstrated efficient information retrieval, increased user engagement, and a greater sense of presence. These findings yielded three design guidelines for future VR information systems: display, interaction, and user-centric design. VR PreM+ bridges the gap between traditional web browsing and immersive VR, offering an interactive and comprehensive approach to information acquisition. It holds promise for research, education, and beyond.

* Accepted for publication at The Eleventh International Symposium of Chinese CHI (Chinese CHI 2023), Bali

Via

Access Paper or Ask Questions

Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables

Aug 19, 2021

Tao Wang, Yong Li, Jingyang Peng, Yipeng Ma, Xian Wang, Fenglong Song, Youliang Yan

Figure 1 for Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables

Figure 2 for Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables

Figure 3 for Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables

Figure 4 for Real-time Image Enhancer via Learnable Spatial-aware 3D Lookup Tables

Abstract:Recently, deep learning-based image enhancement algorithms achieved state-of-the-art (SOTA) performance on several publicly available datasets. However, most existing methods fail to meet practical requirements either for visual perception or for computation efficiency, especially for high-resolution images. In this paper, we propose a novel real-time image enhancer via learnable spatial-aware 3-dimentional lookup tables(3D LUTs), which well considers global scenario and local spatial information. Specifically, we introduce a light weight two-head weight predictor that has two outputs. One is a 1D weight vector used for image-level scenario adaptation, the other is a 3D weight map aimed for pixel-wise category fusion. We learn the spatial-aware 3D LUTs and fuse them according to the aforementioned weights in an end-to-end manner. The fused LUT is then used to transform the source image into the target tone in an efficient way. Extensive results show that our model outperforms SOTA image enhancement methods on public datasets both subjectively and objectively, and that our model only takes about 4ms to process a 4K resolution image on one NVIDIA V100 GPU.

* Accepted to ICCV2021

Via

Access Paper or Ask Questions

Robust Image Captioning

Dec 06, 2020

Daniel Yarnell, Xian Wang

Abstract:Automated captioning of photos is a mission that incorporates the difficulties of photo analysis and text generation. One essential feature of captioning is the concept of attention: how to determine what to specify and in which sequence. In this study, we leverage the Object Relation using adversarial robust cut algorithm, that grows upon this method by specifically embedding knowledge about the spatial association between input data through graph representation. Our experimental study represent the promising performance of our proposed method for image captioning.

Via

Access Paper or Ask Questions

Image Super-Resolution Using VDSR-ResNeXt and SRCGAN

Oct 10, 2018

Saifuddin Hitawala, Yao Li, Xian Wang, Dongyang Yang

Figure 1 for Image Super-Resolution Using VDSR-ResNeXt and SRCGAN

Figure 2 for Image Super-Resolution Using VDSR-ResNeXt and SRCGAN

Figure 3 for Image Super-Resolution Using VDSR-ResNeXt and SRCGAN

Figure 4 for Image Super-Resolution Using VDSR-ResNeXt and SRCGAN

Abstract:Over the past decade, many Super Resolution techniques have been developed using deep learning. Among those, generative adversarial networks (GAN) and very deep convolutional networks (VDSR) have shown promising results in terms of HR image quality and computational speed. In this paper, we propose two approaches based on these two algorithms: VDSR-ResNeXt, which is a deep multi-branch convolutional network inspired by VDSR and ResNeXt; and SRCGAN, which is a conditional GAN that explicitly passes class labels as input to the GAN. The two methods were implemented on common SR benchmark datasets for both quantitative and qualitative assessment.

Via

Access Paper or Ask Questions

A Short Image Series Based Scheme for Time Series Digital Image Correlation

Oct 28, 2014

Xian Wang, Shaopeng Ma

Figure 1 for A Short Image Series Based Scheme for Time Series Digital Image Correlation

Figure 2 for A Short Image Series Based Scheme for Time Series Digital Image Correlation

Figure 3 for A Short Image Series Based Scheme for Time Series Digital Image Correlation

Figure 4 for A Short Image Series Based Scheme for Time Series Digital Image Correlation

Abstract:A new scheme for digital image correlation, i.e., short time series DIC (STS-DIC) is proposed. Instead of processing the original deformed speckle images individually, STS-DIC combines several adjacent deformed speckle images from a short time series and then processes the averaged image, for which deformation continuity over time is introduced. The deformation of several adjacent images is assumed to be linear in time and a new spatial-temporal displacement representation method with eight unknowns is presented based on the subset-based representation method. Then, the model of STS-DIC is created and a solving scheme is developed based on the Newton-Raphson iteration. The proposed method is verified for numerical and experimental cases. The results show that the proposed STS-DIC greatly improves the accuracy of traditional DIC, both under simple and complicated deformation conditions, while retaining acceptable actual computational cost.

* 15 pages, 10 figures

Via

Access Paper or Ask Questions