Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei Tian

DiffPose-Animal: A Language-Conditioned Diffusion Framework for Animal Pose Estimation

Aug 12, 2025

Tianyu Xiong, Dayi Tan, Wei Tian

Abstract:Animal pose estimation is a fundamental task in computer vision, with growing importance in ecological monitoring, behavioral analysis, and intelligent livestock management. Compared to human pose estimation, animal pose estimation is more challenging due to high interspecies morphological diversity, complex body structures, and limited annotated data. In this work, we introduce DiffPose-Animal, a novel diffusion-based framework for top-down animal pose estimation. Unlike traditional heatmap regression methods, DiffPose-Animal reformulates pose estimation as a denoising process under the generative framework of diffusion models. To enhance semantic guidance during keypoint generation, we leverage large language models (LLMs) to extract both global anatomical priors and local keypoint-wise semantics based on species-specific prompts. These textual priors are encoded and fused with image features via cross-attention modules to provide biologically meaningful constraints throughout the denoising process. Additionally, a diffusion-based keypoint decoder is designed to progressively refine pose predictions, improving robustness to occlusion and annotation sparsity. Extensive experiments on public animal pose datasets demonstrate the effectiveness and generalization capability of our method, especially under challenging scenarios with diverse species, cluttered backgrounds, and incomplete keypoints.

* 13pages,2figures

Via

Access Paper or Ask Questions

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Aug 08, 2025

GLM-4. 5 Team, :, Aohan Zeng, Xin Lv, Qinkai Zheng, Zhenyu Hou, Bin Chen, Chengxing Xie, Cunxiang Wang, Da Yin(+161 more)

Abstract:We present GLM-4.5, an open-source Mixture-of-Experts (MoE) large language model with 355B total parameters and 32B activated parameters, featuring a hybrid reasoning method that supports both thinking and direct response modes. Through multi-stage training on 23T tokens and comprehensive post-training with expert model iteration and reinforcement learning, GLM-4.5 achieves strong performance across agentic, reasoning, and coding (ARC) tasks, scoring 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified. With much fewer parameters than several competitors, GLM-4.5 ranks 3rd overall among all evaluated models and 2nd on agentic benchmarks. We release both GLM-4.5 (355B parameters) and a compact version, GLM-4.5-Air (106B parameters), to advance research in reasoning and agentic AI systems. Code, models, and more information are available at https://github.com/zai-org/GLM-4.5.

Via

Access Paper or Ask Questions

R2LDM: An Efficient 4D Radar Super-Resolution Framework Leveraging Diffusion Model

Mar 21, 2025

Boyuan Zheng, Shouyi Lu, Renbo Huang, Minqing Huang, Fan Lu, Wei Tian, Guirong Zhuo, Lu Xiong

Abstract:We introduce R2LDM, an innovative approach for generating dense and accurate 4D radar point clouds, guided by corresponding LiDAR point clouds. Instead of utilizing range images or bird's eye view (BEV) images, we represent both LiDAR and 4D radar point clouds using voxel features, which more effectively capture 3D shape information. Subsequently, we propose the Latent Voxel Diffusion Model (LVDM), which performs the diffusion process in the latent space. Additionally, a novel Latent Point Cloud Reconstruction (LPCR) module is utilized to reconstruct point clouds from high-dimensional latent voxel features. As a result, R2LDM effectively generates LiDAR-like point clouds from paired raw radar data. We evaluate our approach on two different datasets, and the experimental results demonstrate that our model achieves 6- to 10-fold densification of radar point clouds, outperforming state-of-the-art baselines in 4D radar point cloud super-resolution. Furthermore, the enhanced radar point clouds generated by our method significantly improve downstream tasks, achieving up to 31.7% improvement in point cloud registration recall rate and 24.9% improvement in object detection accuracy.

Via

Access Paper or Ask Questions

LAiW: A Chinese Legal Large Language Models Benchmark

Oct 09, 2023

Yongfu Dai, Duanyu Feng, Jimin Huang, Haochen Jia, Qianqian Xie, Yifang Zhang, Weiguang Han, Wei Tian, Hao Wang

Abstract:With the emergence of numerous legal LLMs, there is currently a lack of a comprehensive benchmark for evaluating their legal abilities. In this paper, we propose the first Chinese Legal LLMs benchmark based on legal capabilities. Through the collaborative efforts of legal and artificial intelligence experts, we divide the legal capabilities of LLMs into three levels: basic legal NLP capability, basic legal application capability, and complex legal application capability. We have completed the first phase of evaluation, which mainly focuses on the capability of basic legal NLP. The evaluation results show that although some legal LLMs have better performance than their backbones, there is still a gap compared to ChatGPT. Our benchmark can be found at URL.

Via

Access Paper or Ask Questions

Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction

Apr 17, 2023

Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie Liu, Hao Su

Figure 1 for Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction

Figure 2 for Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction

Figure 3 for Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction

Figure 4 for Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and Reconstruction

Abstract:3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images. Despite numerous task-specific methods, developing a comprehensive model remains challenging. In this paper, we present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects. Previous studies have used two-stage approaches that rely on pretrained NeRFs as real data to train diffusion models. In contrast, we propose a new single-stage training paradigm with an end-to-end objective that jointly optimizes a NeRF auto-decoder and a latent diffusion model, enabling simultaneous 3D reconstruction and prior learning, even from sparsely available views. At test time, we can directly sample the diffusion prior for unconditional generation, or combine it with arbitrary observations of unseen objects for NeRF reconstruction. SSDNeRF demonstrates robust results comparable to or better than leading task-specific methods in unconditional generation and single/sparse-view 3D reconstruction.

* Project page: https://lakonik.github.io/ssdnerf. V2 note: fixed typos

Via

Access Paper or Ask Questions

Modeling and Joint Optimization of Security, Latency, and Computational Cost in Blockchain-based Healthcare Systems

Mar 28, 2023

Zukai Li, Wei Tian, Jingjin Wu

Figure 1 for Modeling and Joint Optimization of Security, Latency, and Computational Cost in Blockchain-based Healthcare Systems

Figure 2 for Modeling and Joint Optimization of Security, Latency, and Computational Cost in Blockchain-based Healthcare Systems

Figure 3 for Modeling and Joint Optimization of Security, Latency, and Computational Cost in Blockchain-based Healthcare Systems

Figure 4 for Modeling and Joint Optimization of Security, Latency, and Computational Cost in Blockchain-based Healthcare Systems

Abstract:In the era of the Internet of Things (IoT), blockchain is a promising technology for improving the efficiency of healthcare systems, as it enables secure storage, management, and sharing of real-time health data collected by the IoT devices. As the implementations of blockchain-based healthcare systems usually involve multiple conflicting metrics, it is essential to balance them according to the requirements of specific scenarios. In this paper, we formulate a joint optimization model with three metrics, namely latency, security, and computational cost, that are particularly important for IoT-enabled healthcare. However, it is computationally intractable to identify the exact optimal solution of this problem for practical sized systems. Thus, we propose an algorithm called the Adaptive Discrete Particle Swarm Algorithm (ADPSA) to obtain near-optimal solutions in a low-complexity manner. With its roots in the classical Particle Swarm Optimization (PSO) algorithm, our proposed ADPSA can effectively manage the numerous binary and integer variables in the formulation. We demonstrate by extensive numerical experiments that the ADPSA consistently outperforms existing benchmark approaches, including the original PSO, exhaustive search and Simulated Annealing, in a wide range of scenarios.

Via

Access Paper or Ask Questions

EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Mar 22, 2023

Hansheng Chen, Wei Tian, Pichao Wang, Fan Wang, Lu Xiong, Hao Li

Figure 1 for EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Figure 2 for EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Figure 3 for EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Figure 4 for EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Abstract:Locating 3D objects from a single RGB image via Perspective-n-Point (PnP) is a long-standing problem in computer vision. Driven by end-to-end deep learning, recent studies suggest interpreting PnP as a differentiable layer, allowing for partial learning of 2D-3D point correspondences by backpropagating the gradients of pose loss. Yet, learning the entire correspondences from scratch is highly challenging, particularly for ambiguous pose solutions, where the globally optimal pose is theoretically non-differentiable w.r.t. the points. In this paper, we propose the EPro-PnP, a probabilistic PnP layer for general end-to-end pose estimation, which outputs a distribution of pose with differentiable probability density on the SE(3) manifold. The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution. The underlying principle generalizes previous approaches, and resembles the attention mechanism. EPro-PnP can enhance existing correspondence networks, closing the gap between PnP-based method and the task-specific leaders on the LineMOD 6DoF pose estimation benchmark. Furthermore, EPro-PnP helps to explore new possibilities of network design, as we demonstrate a novel deformable correspondence network with the state-of-the-art pose accuracy on the nuScenes 3D object detection benchmark. Our code is available at https://github.com/tjiiv-cprg/EPro-PnP-v2.

* Code available at https://github.com/tjiiv-cprg/EPro-PnP-v2. arXiv admin note: substantial text overlap with arXiv:2203.13254

Via

Access Paper or Ask Questions

An Attention-based Long Short-Term Memory Framework for Detection of Bitcoin Scams

Oct 26, 2022

Puyang Zhao, Wei Tian, Lefu Xiao, Xinhui Liu, Jingjin Wu

Abstract:Bitcoin is the most common cryptocurrency involved in cyber scams. Cybercriminals often utilize pseudonymity and privacy protection mechanism associated with Bitcoin transactions to make their scams virtually untraceable. The Ponzi scheme has attracted particularly significant attention among Bitcoin fraudulent activities. This paper considers a multi-class classification problem to determine whether a transaction is involved in Ponzi schemes or other cyber scams, or is a non-scam transaction. We design a specifically designed crawler to collect data and propose a novel Attention-based Long Short-Term Memory (A-LSTM) method for the classification problem. The experimental results show that the proposed model has better efficiency and accuracy than existing approaches, including Random Forest, Extra Trees, Gradient Boosting, and classical LSTM. With correctly identified scam features, our proposed A-LSTM achieves an F1-score over 82% for the original data and outperforms the existing approaches.

Via

Access Paper or Ask Questions

Event-driven Two-stage Solution to Non-intrusive Load Monitoring

Jul 27, 2021

Lei Yan, Wei Tian, Jiayu Han, Zuyi Li

Figure 1 for Event-driven Two-stage Solution to Non-intrusive Load Monitoring

Figure 2 for Event-driven Two-stage Solution to Non-intrusive Load Monitoring

Figure 3 for Event-driven Two-stage Solution to Non-intrusive Load Monitoring

Figure 4 for Event-driven Two-stage Solution to Non-intrusive Load Monitoring

Abstract:Existing methods of non-intrusive load monitoring (NILM) in literatures generally suffer from high computational complexity and/or low accuracy in identifying working household appliances. This paper proposes an event-driven Factorial Hidden Markov model (eFHMM) for multiple appliances with multiple states in a household, aiming for low computational complexity and high load disaggregation accuracy. The proposed eFHMM decreases the computational complexity to be linear to the event number, which ensures online load disaggregation. Furthermore, the eFHMM is solved in two stages, where the first stage identifies state-changing appliance using transient signatures and the second stage confirms the inferred states using steady-state signatures. The combination of transient and steady-state signatures, which are extracted from transient and steady periods segmented by detected events, enhances the uniqueness of each state transition and associated appliances, which ensures accurate load disaggregation. The event-driven two-stage NILM solution, termed as eFHMM-TS, is naturally fit into an edge-cloud framework, which makes possible the real-world application of NILM. The proposed eFHMM-TS method is validated on the LIFTED and synD datasets. Results demonstrate that the eFHMM-TS method outperforms other methods and can be applied in practice.

* 14 pages, 17 figures

Via

Access Paper or Ask Questions

Adaptive Event Detection for Representative Load Signature Extraction

Jul 23, 2021

Lei Yan, Wei Tian, Jiayu Han, Zuyi Li

Figure 1 for Adaptive Event Detection for Representative Load Signature Extraction

Figure 2 for Adaptive Event Detection for Representative Load Signature Extraction

Figure 3 for Adaptive Event Detection for Representative Load Signature Extraction

Figure 4 for Adaptive Event Detection for Representative Load Signature Extraction

Abstract:Event detection is the first step in event-based non-intrusive load monitoring (NILM) and it can provide useful transient information to identify appliances. However, existing event detection methods with fixed parameters may fail in case of unpredictable and complicated residential load changes such as high fluctuation, long transition, and near simultaneity. This paper proposes a dynamic time-window approach to deal with these highly complex load variations. Specifically, a window with adaptive margins, multi-timescale window screening, and adaptive threshold (WAMMA) method is proposed to detect events in aggregated home appliance load data with high sampling rate (>1Hz). The proposed method accurately captures the transient process by adaptively tuning parameters including window width, margin width, and change threshold. Furthermore, representative transient and steady-state load signatures are extracted and, for the first time, quantified from transient and steady periods segmented by detected events. Case studies on a 20Hz dataset, the 50Hz LIFTED dataset, and the 60Hz BLUED dataset show that the proposed method can robustly outperform other state-of-art event detection methods. This paper also shows that the extracted load signatures can improve NILM accuracy and help develop other applications such as load reconstruction to generate realistic load data for NILM research.

* 10 pages, 18 figures

Via

Access Paper or Ask Questions