Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ni Wang

Embodied Intelligence: The Key to Unblocking Generalized Artificial Intelligence

May 11, 2025

Jinhao Jiang, Changlin Chen, Shile Feng, Wanru Geng, Zesheng Zhou, Ni Wang, Shuai Li, Feng-Qi Cui, Erbao Dong

Abstract:The ultimate goal of artificial intelligence (AI) is to achieve Artificial General Intelligence (AGI). Embodied Artificial Intelligence (EAI), which involves intelligent systems with physical presence and real-time interaction with the environment, has emerged as a key research direction in pursuit of AGI. While advancements in deep learning, reinforcement learning, large-scale language models, and multimodal technologies have significantly contributed to the progress of EAI, most existing reviews focus on specific technologies or applications. A systematic overview, particularly one that explores the direct connection between EAI and AGI, remains scarce. This paper examines EAI as a foundational approach to AGI, systematically analyzing its four core modules: perception, intelligent decision-making, action, and feedback. We provide a detailed discussion of how each module contributes to the six core principles of AGI. Additionally, we discuss future trends, challenges, and research directions in EAI, emphasizing its potential as a cornerstone for AGI development. Our findings suggest that EAI's integration of dynamic learning and real-world interaction is essential for bridging the gap between narrow AI and AGI.

* 19pages,7 figures,3 tables

Via

Access Paper or Ask Questions

Multi-Scale Temporal Difference Transformer for Video-Text Retrieval

Jun 23, 2024

Ni Wang, Dongliang Liao, Xing Xu

Figure 1 for Multi-Scale Temporal Difference Transformer for Video-Text Retrieval

Figure 2 for Multi-Scale Temporal Difference Transformer for Video-Text Retrieval

Figure 3 for Multi-Scale Temporal Difference Transformer for Video-Text Retrieval

Figure 4 for Multi-Scale Temporal Difference Transformer for Video-Text Retrieval

Abstract:Currently, in the field of video-text retrieval, there are many transformer-based methods. Most of them usually stack frame features and regrade frames as tokens, then use transformers for video temporal modeling. However, they commonly neglect the inferior ability of the transformer modeling local temporal information. To tackle this problem, we propose a transformer variant named Multi-Scale Temporal Difference Transformer (MSTDT). MSTDT mainly addresses the defects of the traditional transformer which has limited ability to capture local temporal information. Besides, in order to better model the detailed dynamic information, we make use of the difference feature between frames, which practically reflects the dynamic movement of a video. We extract the inter-frame difference feature and integrate the difference and frame feature by the multi-scale temporal transformer. In general, our proposed MSTDT consists of a short-term multi-scale temporal difference transformer and a long-term temporal transformer. The former focuses on modeling local temporal information, the latter aims at modeling global temporal information. At last, we propose a new loss to narrow the distance of similar samples. Extensive experiments show that backbone, such as CLIP, with MSTDT has attained a new state-of-the-art result.

Via

Access Paper or Ask Questions

Learning cooperative behaviours in adversarial multi-agent systems

Feb 10, 2023

Ni Wang, Gautham P. Das, Alan G. Millard

Abstract:This work extends an existing virtual multi-agent platform called RoboSumo to create TripleSumo -- a platform for investigating multi-agent cooperative behaviors in continuous action spaces, with physical contact in an adversarial environment. In this paper we investigate a scenario in which two agents, namely `Bug' and `Ant', must team up and push another agent `Spider' out of the arena. To tackle this goal, the newly added agent `Bug' is trained during an ongoing match between `Ant' and `Spider'. `Bug' must develop awareness of the other agents' actions, infer the strategy of both sides, and eventually learn an action policy to cooperate. The reinforcement learning algorithm Deep Deterministic Policy Gradient (DDPG) is implemented with a hybrid reward structure combining dense and sparse rewards. The cooperative behavior is quantitatively evaluated by the mean probability of winning the match and mean number of steps needed to win.

* Lecture Notes in Computer Science(), vol 13546. Springer, Cham. 2022
* 23rd Annual Conference, Towards Autonomous Robotic Systems 2022

Via

Access Paper or Ask Questions

Towards bio-inspired unsupervised representation learning for indoor aerial navigation

Jun 17, 2021

Ni Wang, Ozan Catal, Tim Verbelen, Matthias Hartmann, Bart Dhoedt

Figure 1 for Towards bio-inspired unsupervised representation learning for indoor aerial navigation

Figure 2 for Towards bio-inspired unsupervised representation learning for indoor aerial navigation

Figure 3 for Towards bio-inspired unsupervised representation learning for indoor aerial navigation

Abstract:Aerial navigation in GPS-denied, indoor environments, is still an open challenge. Drones can perceive the environment from a richer set of viewpoints, while having more stringent compute and energy constraints than other autonomous platforms. To tackle that problem, this research displays a biologically inspired deep-learning algorithm for simultaneous localization and mapping (SLAM) and its application in a drone navigation system. We propose an unsupervised representation learning method that yields low-dimensional latent state descriptors, that mitigates the sensitivity to perceptual aliasing, and works on power-efficient, embedded hardware. The designed algorithm is evaluated on a dataset collected in an indoor warehouse environment, and initial results show the feasibility for robust indoor aerial navigation.

Via

Access Paper or Ask Questions

A Natural Language Processing Pipeline of Chinese Free-text Radiology Reports for Liver Cancer Diagnosis

Apr 10, 2020

Honglei Liu, Yan Xu, Zhiqiang Zhang, Ni Wang, Yanqun Huang, Zhenghan Yang, Rui Jiang, Hui Chen

Figure 1 for A Natural Language Processing Pipeline of Chinese Free-text Radiology Reports for Liver Cancer Diagnosis

Figure 2 for A Natural Language Processing Pipeline of Chinese Free-text Radiology Reports for Liver Cancer Diagnosis

Figure 3 for A Natural Language Processing Pipeline of Chinese Free-text Radiology Reports for Liver Cancer Diagnosis

Figure 4 for A Natural Language Processing Pipeline of Chinese Free-text Radiology Reports for Liver Cancer Diagnosis

Abstract:Background Despite the rapid development of natural language processing (NLP) implementation in electronic medical records (EMRs), Chinese EMRs processing remains challenging due to the limited corpus and specific grammatical characteristics, especially for radiology reports. This study sought to design an NLP pipeline for the direct extraction of clinically relevant features from Chinese radiology reports, which is the first key step in computer-aided radiologic diagnosis. Methods We implemented the NLP pipeline on abdominal computed tomography (CT) radiology reports written in Chinese. The pipeline was comprised of word segmentation, entity annotation, coreference resolution, and relationship extraction to finally derive the symptom features composed of one or more terms. The whole pipeline was based on a lexicon that was constructed manually according to Chinese grammatical characteristics. Least absolute shrinkage and selection operator (LASSO) and machine learning methods were used to build the classifiers for liver cancer prediction. Random forest model was also used to calculate the Gini impurity for identifying the most important features in liver cancer diagnosis. Results The lexicon finally contained 831 words. The features extracted by the NLP pipeline conformed to the original meaning of the radiology reports. SVM had a higher predictive performance in liver cancer diagnosis (F1 score 90.23%, precision 92.51%, and recall 88.05%). Conclusions Our study was a comprehensive NLP study focusing on Chinese radiology reports and the application of NLP in cancer risk prediction. The proposed method for the radiological feature extraction could be easily implemented in other kinds of Chinese clinical texts and other disease predictive tasks.

* 12Pages,4 Figures, 3 Tables

Via

Access Paper or Ask Questions

Deep Density-based Image Clustering

Dec 11, 2018

Yazhou Ren, Ni Wang, Mingxia Li, Zenglin Xu

Figure 1 for Deep Density-based Image Clustering

Figure 2 for Deep Density-based Image Clustering

Figure 3 for Deep Density-based Image Clustering

Figure 4 for Deep Density-based Image Clustering

Abstract:Recently, deep clustering, which is able to perform feature learning that favors clustering tasks via deep neural networks, has achieved remarkable performance in image clustering applications. However, the existing deep clustering algorithms generally need the number of clusters in advance, which is usually unknown in real-world tasks. In addition, the initial cluster centers in the learned feature space are generated by $k$-means. This only works well on spherical clusters and probably leads to unstable clustering results. In this paper, we propose a two-stage deep density-based image clustering (DDC) framework to address these issues. The first stage is to train a deep convolutional autoencoder (CAE) to extract low-dimensional feature representations from high-dimensional image data, and then apply t-SNE to further reduce the data to a 2-dimensional space favoring density-based clustering algorithms. The second stage is to apply the developed density-based clustering technique on the 2-dimensional embedded data to automatically recognize an appropriate number of clusters with arbitrary shapes. Concretely, a number of local clusters are generated to capture the local structures of clusters, and then are merged via their density relationship to form the final clustering result. Experiments demonstrate that the proposed DDC achieves comparable or even better clustering performance than state-of-the-art deep clustering methods, even though the number of clusters is not given.

Via

Access Paper or Ask Questions