Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zeyu Ma

Evaluating Robustness of Monocular Depth Estimation with Procedural Scene Perturbations

Jul 01, 2025

Jack Nugent, Siyang Wu, Zeyu Ma, Beining Han, Meenal Parakh, Abhishek Joshi, Lingjie Mei, Alexander Raistrick, Xinyuan Li, Jia Deng

Abstract:Recent years have witnessed substantial progress on monocular depth estimation, particularly as measured by the success of large models on standard benchmarks. However, performance on standard benchmarks does not offer a complete assessment, because most evaluate accuracy but not robustness. In this work, we introduce PDE (Procedural Depth Evaluation), a new benchmark which enables systematic robustness evaluation. PDE uses procedural generation to create 3D scenes that test robustness to various controlled perturbations, including object, camera, material and lighting changes. Our analysis yields interesting findings on what perturbations are challenging for state-of-the-art depth models, which we hope will inform further research. Code and data are available at https://github.com/princeton-vl/proc-depth-eval.

Via

Access Paper or Ask Questions

OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration

Nov 28, 2024

Yiming Zuo, Willow Yang, Zeyu Ma, Jia Deng

Figure 1 for OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration

Figure 2 for OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration

Figure 3 for OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration

Figure 4 for OMNI-DC: Highly Robust Depth Completion with Multiresolution Depth Integration

Abstract:Depth completion (DC) aims to predict a dense depth map from an RGB image and sparse depth observations. Existing methods for DC generalize poorly on new datasets or unseen sparse depth patterns, limiting their practical applications. We propose OMNI-DC, a highly robust DC model that generalizes well across various scenarios. Our method incorporates a novel multi-resolution depth integration layer and a probability-based loss, enabling it to deal with sparse depth maps of varying densities. Moreover, we train OMNI-DC on a mixture of synthetic datasets with a scale normalization technique. To evaluate our model, we establish a new evaluation protocol named Robust-DC for zero-shot testing under various sparse depth patterns. Experimental results on Robust-DC and conventional benchmarks show that OMNI-DC significantly outperforms the previous state of the art. The checkpoints, training code, and evaluations are available at https://github.com/princeton-vl/OMNI-DC.

Via

Access Paper or Ask Questions

Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs

Oct 15, 2024

Wanying Wang, Zeyu Ma, Pengfei Liu, Mingang Chen

Figure 1 for Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs

Figure 2 for Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs

Figure 3 for Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs

Figure 4 for Revisiting Benchmark and Assessment: An Agent-based Exploratory Dynamic Evaluation Framework for LLMs

Abstract:While various vertical domain large language models (LLMs) have been developed, the challenge of automatically evaluating their performance across different domains remains significant in addressing real-world user needs. Current benchmark-based evaluation methods exhibit rigid, purposeless interactions and rely on pre-collected static datasets that are costly to build, inflexible across domains, and misaligned with practical user needs. To address this, we revisit the evaluation components and introduce two definitions: **Benchmark+**, which extends traditional QA benchmarks into a more flexible ``strategy-criterion'' format; and **Assessment+**, which enhances the interaction process for greater exploration and enables both quantitative metrics and qualitative insights that capture nuanced target LLM behaviors from richer multi-turn interactions. We propose an agent-based evaluation framework called *TestAgent*, which implements these two concepts through retrieval augmented generation and reinforcement learning. Experiments on tasks ranging from building vertical domain evaluation from scratch to activating existing benchmarks demonstrate the effectiveness of *TestAgent* across various scenarios. We believe this work offers an interesting perspective on automatic evaluation for LLMs.

Via

Access Paper or Ask Questions

SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values

Sep 09, 2024

Chengwei Sun, Jiwei Wei, Yujia Wu, Yiming Shi, Shiyuan He, Zeyu Ma, Ning Xie, Yang Yang

Figure 1 for SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values

Figure 2 for SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values

Figure 3 for SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values

Figure 4 for SVFit: Parameter-Efficient Fine-Tuning of Large Pre-Trained Models Using Singular Values

Abstract:Large pre-trained models (LPMs) have demonstrated exceptional performance in diverse natural language processing and computer vision tasks. However, fully fine-tuning these models poses substantial memory challenges, particularly in resource-constrained environments. Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, mitigate this issue by adjusting only a small subset of parameters. Nevertheless, these methods typically employ random initialization for low-rank matrices, which can lead to inefficiencies in gradient descent and diminished generalizability due to suboptimal starting points. To address these limitations, we propose SVFit, a novel PEFT approach that leverages singular value decomposition (SVD) to initialize low-rank matrices using critical singular values as trainable parameters. Specifically, SVFit performs SVD on the pre-trained weight matrix to obtain the best rank-r approximation matrix, emphasizing the most critical singular values that capture over 99% of the matrix's information. These top-r singular values are then used as trainable parameters to scale the fundamental subspaces of the matrix, facilitating rapid domain adaptation. Extensive experiments across various pre-trained models in natural language understanding, text-to-image generation, and image classification tasks reveal that SVFit outperforms LoRA while requiring 16 times fewer trainable parameters.

Via

Access Paper or Ask Questions

Through-the-Wall Radar Human Activity Micro-Doppler Signature Representation Method Based on Joint Boulic-Sinusoidal Pendulum Model

Aug 22, 2024

Xiaopeng Yang, Weicheng Gao, Xiaodong Qu, Zeyu Ma, Hao Zhang

Figure 1 for Through-the-Wall Radar Human Activity Micro-Doppler Signature Representation Method Based on Joint Boulic-Sinusoidal Pendulum Model

Figure 2 for Through-the-Wall Radar Human Activity Micro-Doppler Signature Representation Method Based on Joint Boulic-Sinusoidal Pendulum Model

Figure 3 for Through-the-Wall Radar Human Activity Micro-Doppler Signature Representation Method Based on Joint Boulic-Sinusoidal Pendulum Model

Figure 4 for Through-the-Wall Radar Human Activity Micro-Doppler Signature Representation Method Based on Joint Boulic-Sinusoidal Pendulum Model

Abstract:With the help of micro-Doppler signature, ultra-wideband (UWB) through-the-wall radar (TWR) enables the reconstruction of range and velocity information of limb nodes to accurately identify indoor human activities. However, existing methods are usually trained and validated directly using range-time maps (RTM) and Doppler-time maps (DTM), which have high feature redundancy and poor generalization ability. In order to solve this problem, this paper proposes a human activity micro-Doppler signature representation method based on joint Boulic-sinusoidal pendulum motion model. In detail, this paper presents a simplified joint Boulic-sinusoidal pendulum human motion model by taking head, torso, both hands and feet into consideration improved from Boulic-Thalmann kinematic model. The paper also calculates the minimum number of key points needed to describe the Doppler and micro-Doppler information sufficiently. Both numerical simulations and experiments are conducted to verify the effectiveness. The results demonstrate that the proposed number of key points of micro-Doppler signature can precisely represent the indoor human limb node motion characteristics, and substantially improve the generalization capability of the existing methods for different testers.

* 17 pages, 14 figures, 7 tables, in IEEE Transactions on Microwave Theory and Techniques, 2024

Via

Access Paper or Ask Questions

Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Jun 17, 2024

Alexander Raistrick, Lingjie Mei, Karhan Kayan, David Yan, Yiming Zuo, Beining Han, Hongyu Wen, Meenal Parakh, Stamatis Alexandropoulos, Lahav Lipson(+2 more)

Figure 1 for Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Figure 2 for Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Figure 3 for Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Figure 4 for Infinigen Indoors: Photorealistic Indoor Scenes using Procedural Generation

Abstract:We introduce Infinigen Indoors, a Blender-based procedural generator of photorealistic indoor scenes. It builds upon the existing Infinigen system, which focuses on natural scenes, but expands its coverage to indoor scenes by introducing a diverse library of procedural indoor assets, including furniture, architecture elements, appliances, and other day-to-day objects. It also introduces a constraint-based arrangement system, which consists of a domain-specific language for expressing diverse constraints on scene composition, and a solver that generates scene compositions that maximally satisfy the constraints. We provide an export tool that allows the generated 3D objects and scenes to be directly used for training embodied agents in real-time simulators such as Omniverse and Unreal. Infinigen Indoors is open-sourced under the BSD license. Please visit https://infinigen.org for code and videos.

* Accepted to CVPR 2024

Via

Access Paper or Ask Questions

View-Dependent Octree-based Mesh Extraction in Unbounded Scenes for Procedural Synthetic Data

Dec 13, 2023

Zeyu Ma, Alexander Raistrick, Lahav Lipson, Jia Deng

Abstract:Procedural synthetic data generation has received increasing attention in computer vision. Procedural signed distance functions (SDFs) are a powerful tool for modeling large-scale detailed scenes, but existing mesh extraction methods have artifacts or performance profiles that limit their use for synthetic data. We propose OcMesher, a mesh extraction algorithm that efficiently handles high-detail unbounded scenes with perfect view-consistency, with easy export to downstream real-time engines. The main novelty of our solution is an algorithm to construct an octree based on a given SDF and multiple camera views. We performed extensive experiments, and show our solution produces better synthetic data for training and evaluation of computer vision models.

Via

Access Paper or Ask Questions

Infinite Photorealistic Worlds using Procedural Generation

Jun 26, 2023

Alexander Raistrick, Lahav Lipson, Zeyu Ma, Lingjie Mei, Mingzhe Wang, Yiming Zuo, Karhan Kayan, Hongyu Wen, Beining Han, Yihan Wang(+5 more)

Figure 1 for Infinite Photorealistic Worlds using Procedural Generation

Figure 2 for Infinite Photorealistic Worlds using Procedural Generation

Figure 3 for Infinite Photorealistic Worlds using Procedural Generation

Figure 4 for Infinite Photorealistic Worlds using Procedural Generation

Abstract:We introduce Infinigen, a procedural generator of photorealistic 3D scenes of the natural world. Infinigen is entirely procedural: every asset, from shape to texture, is generated from scratch via randomized mathematical rules, using no external source and allowing infinite variation and composition. Infinigen offers broad coverage of objects and scenes in the natural world including plants, animals, terrains, and natural phenomena such as fire, cloud, rain, and snow. Infinigen can be used to generate unlimited, diverse training data for a wide range of computer vision tasks including object detection, semantic segmentation, optical flow, and 3D reconstruction. We expect Infinigen to be a useful resource for computer vision research and beyond. Please visit https://infinigen.org for videos, code and pre-generated data.

* Accepted to CVPR 2023, Camera Ready Version. Update 06/26/23: Change the open-source license to BSD

Via

Access Paper or Ask Questions

CoCo: A Coupled Contrastive Framework for Unsupervised Domain Adaptive Graph Classification

Jun 10, 2023

Nan Yin, Li Shen, Mengzhu Wang, Long Lan, Zeyu Ma, Chong Chen, Xian-Sheng Hua, Xiao Luo

Abstract:Although graph neural networks (GNNs) have achieved impressive achievements in graph classification, they often need abundant task-specific labels, which could be extensively costly to acquire. A credible solution is to explore additional labeled graphs to enhance unsupervised learning on the target domain. However, how to apply GNNs to domain adaptation remains unsolved owing to the insufficient exploration of graph topology and the significant domain discrepancy. In this paper, we propose Coupled Contrastive Graph Representation Learning (CoCo), which extracts the topological information from coupled learning branches and reduces the domain discrepancy with coupled contrastive learning. CoCo contains a graph convolutional network branch and a hierarchical graph kernel network branch, which explore graph topology in implicit and explicit manners. Besides, we incorporate coupled branches into a holistic multi-view contrastive learning framework, which not only incorporates graph representations learned from complementary views for enhanced understanding, but also encourages the similarity between cross-domain example pairs with the same semantics for domain alignment. Extensive experiments on popular datasets show that our CoCo outperforms these competing baselines in different settings generally.

Via

Access Paper or Ask Questions

Semantic Enhanced Knowledge Graph for Large-Scale Zero-Shot Learning

Dec 26, 2022

Jiwei Wei, Yang Yang, Zeyu Ma, Jingjing Li, Xing Xu, Heng Tao Shen

Abstract:Zero-Shot Learning has been a highlighted research topic in both vision and language areas. Recently, most existing methods adopt structured knowledge information to model explicit correlations among categories and use deep graph convolutional network to propagate information between different categories. However, it is difficult to add new categories to existing structured knowledge graph, and deep graph convolutional network suffers from over-smoothing problem. In this paper, we provide a new semantic enhanced knowledge graph that contains both expert knowledge and categories semantic correlation. Our semantic enhanced knowledge graph can further enhance the correlations among categories and make it easy to absorb new categories. To propagate information on the knowledge graph, we propose a novel Residual Graph Convolutional Network (ResGCN), which can effectively alleviate the problem of over-smoothing. Experiments conducted on the widely used large-scale ImageNet-21K dataset and AWA2 dataset show the effectiveness of our method, and establish a new state-of-the-art on zero-shot learning. Moreover, our results on the large-scale ImageNet-21K with various feature extraction networks show that our method has better generalization and robustness.

Via

Access Paper or Ask Questions