Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jingwen Wu

Can Vision-Language Models Think from a First-Person Perspective?

Nov 27, 2023

Sijie Cheng, Zhicheng Guo, Jingwen Wu, Kechen Fang, Peng Li, Huaping Liu, Yang Liu

Figure 1 for Can Vision-Language Models Think from a First-Person Perspective?

Figure 2 for Can Vision-Language Models Think from a First-Person Perspective?

Figure 3 for Can Vision-Language Models Think from a First-Person Perspective?

Figure 4 for Can Vision-Language Models Think from a First-Person Perspective?

Abstract:Vision-language models (VLMs) have recently shown promising results in traditional downstream tasks. Evaluation studies have emerged to assess their abilities, with the majority focusing on the third-person perspective, and only a few addressing specific tasks from the first-person perspective. However, the capability of VLMs to "think" from a first-person perspective, a crucial attribute for advancing autonomous agents and robotics, remains largely unexplored. To bridge this research gap, we introduce EgoThink, a novel visual question-answering benchmark that encompasses six core capabilities with twelve detailed dimensions. The benchmark is constructed using selected clips from egocentric videos, with manually annotated question-answer pairs containing first-person information. To comprehensively assess VLMs, we evaluate eighteen popular VLMs on EgoThink. Moreover, given the open-ended format of the answers, we use GPT-4 as the automatic judge to compute single-answer grading. Experimental results indicate that although GPT-4V leads in numerous dimensions, all evaluated VLMs still possess considerable potential for improvement in first-person perspective tasks. Meanwhile, enlarging the number of trainable parameters has the most significant impact on model performance on EgoThink. In conclusion, EgoThink serves as a valuable addition to existing evaluation benchmarks for VLMs, providing an indispensable resource for future research in the realm of embodied artificial intelligence and robotics.

Via

Access Paper or Ask Questions

Dual-Space Attacks against Random-Walk-based Anomaly Detection

Jul 26, 2023

Yuni Lai, Marcin Waniek, Yulin Zhu, Liying Li, Jingwen Wu, Tomasz P. Michalak, Talal Rahwan, Kai Zhou

Figure 1 for Dual-Space Attacks against Random-Walk-based Anomaly Detection

Figure 2 for Dual-Space Attacks against Random-Walk-based Anomaly Detection

Figure 3 for Dual-Space Attacks against Random-Walk-based Anomaly Detection

Figure 4 for Dual-Space Attacks against Random-Walk-based Anomaly Detection

Abstract:Random Walks-based Anomaly Detection (RWAD) is commonly used to identify anomalous patterns in various applications. An intriguing characteristic of RWAD is that the input graph can either be pre-existing or constructed from raw features. Consequently, there are two potential attack surfaces against RWAD: graph-space attacks and feature-space attacks. In this paper, we explore this vulnerability by designing practical dual-space attacks, investigating the interplay between graph-space and feature-space attacks. To this end, we conduct a thorough complexity analysis, proving that attacking RWAD is NP-hard. Then, we proceed to formulate the graph-space attack as a bi-level optimization problem and propose two strategies to solve it: alternative iteration (alterI-attack) or utilizing the closed-form solution of the random walk model (cf-attack). Finally, we utilize the results from the graph-space attacks as guidance to design more powerful feature-space attacks (i.e., graph-guided attacks). Comprehensive experiments demonstrate that our proposed attacks are effective in enabling the target nodes from RWAD with a limited attack budget. In addition, we conduct transfer attack experiments in a black-box setting, which show that our feature attack significantly decreases the anomaly scores of target nodes. Our study opens the door to studying the dual-space attack against graph anomaly detection in which the graph space relies on the feature space.

* 13 pages

Via

Access Paper or Ask Questions

FedGEMS: Federated Learning of Larger Server Models via Selective Knowledge Fusion

Oct 21, 2021

Sijie Cheng, Jingwen Wu, Yanghua Xiao, Yang Liu

Figure 1 for FedGEMS: Federated Learning of Larger Server Models via Selective Knowledge Fusion

Figure 2 for FedGEMS: Federated Learning of Larger Server Models via Selective Knowledge Fusion

Figure 3 for FedGEMS: Federated Learning of Larger Server Models via Selective Knowledge Fusion

Figure 4 for FedGEMS: Federated Learning of Larger Server Models via Selective Knowledge Fusion

Abstract:Today data is often scattered among billions of resource-constrained edge devices with security and privacy constraints. Federated Learning (FL) has emerged as a viable solution to learn a global model while keeping data private, but the model complexity of FL is impeded by the computation resources of edge nodes. In this work, we investigate a novel paradigm to take advantage of a powerful server model to break through model capacity in FL. By selectively learning from multiple teacher clients and itself, a server model develops in-depth knowledge and transfers its knowledge back to clients in return to boost their respective performance. Our proposed framework achieves superior performance on both server and client models and provides several advantages in a unified framework, including flexibility for heterogeneous client architectures, robustness to poisoning attacks, and communication efficiency between clients and server. By bridging FL effectively with larger server model training, our proposed paradigm paves ways for robust and continual knowledge accumulation from distributed and private data.

* Under review as a conference paper at ICLR 2022

Via

Access Paper or Ask Questions