Abstract:With the emergence of numerous legal LLMs, there is currently a lack of a comprehensive benchmark for evaluating their legal abilities. In this paper, we propose the first Chinese Legal LLMs benchmark based on legal capabilities. Through the collaborative efforts of legal and artificial intelligence experts, we divide the legal capabilities of LLMs into three levels: basic legal NLP capability, basic legal application capability, and complex legal application capability. We have completed the first phase of evaluation, which mainly focuses on the capability of basic legal NLP. The evaluation results show that although some legal LLMs have better performance than their backbones, there is still a gap compared to ChatGPT. Our benchmark can be found at URL.
Abstract:3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images. Despite numerous task-specific methods, developing a comprehensive model remains challenging. In this paper, we present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects. Previous studies have used two-stage approaches that rely on pretrained NeRFs as real data to train diffusion models. In contrast, we propose a new single-stage training paradigm with an end-to-end objective that jointly optimizes a NeRF auto-decoder and a latent diffusion model, enabling simultaneous 3D reconstruction and prior learning, even from sparsely available views. At test time, we can directly sample the diffusion prior for unconditional generation, or combine it with arbitrary observations of unseen objects for NeRF reconstruction. SSDNeRF demonstrates robust results comparable to or better than leading task-specific methods in unconditional generation and single/sparse-view 3D reconstruction.
Abstract:In the era of the Internet of Things (IoT), blockchain is a promising technology for improving the efficiency of healthcare systems, as it enables secure storage, management, and sharing of real-time health data collected by the IoT devices. As the implementations of blockchain-based healthcare systems usually involve multiple conflicting metrics, it is essential to balance them according to the requirements of specific scenarios. In this paper, we formulate a joint optimization model with three metrics, namely latency, security, and computational cost, that are particularly important for IoT-enabled healthcare. However, it is computationally intractable to identify the exact optimal solution of this problem for practical sized systems. Thus, we propose an algorithm called the Adaptive Discrete Particle Swarm Algorithm (ADPSA) to obtain near-optimal solutions in a low-complexity manner. With its roots in the classical Particle Swarm Optimization (PSO) algorithm, our proposed ADPSA can effectively manage the numerous binary and integer variables in the formulation. We demonstrate by extensive numerical experiments that the ADPSA consistently outperforms existing benchmark approaches, including the original PSO, exhaustive search and Simulated Annealing, in a wide range of scenarios.
Abstract:Locating 3D objects from a single RGB image via Perspective-n-Point (PnP) is a long-standing problem in computer vision. Driven by end-to-end deep learning, recent studies suggest interpreting PnP as a differentiable layer, allowing for partial learning of 2D-3D point correspondences by backpropagating the gradients of pose loss. Yet, learning the entire correspondences from scratch is highly challenging, particularly for ambiguous pose solutions, where the globally optimal pose is theoretically non-differentiable w.r.t. the points. In this paper, we propose the EPro-PnP, a probabilistic PnP layer for general end-to-end pose estimation, which outputs a distribution of pose with differentiable probability density on the SE(3) manifold. The 2D-3D coordinates and corresponding weights are treated as intermediate variables learned by minimizing the KL divergence between the predicted and target pose distribution. The underlying principle generalizes previous approaches, and resembles the attention mechanism. EPro-PnP can enhance existing correspondence networks, closing the gap between PnP-based method and the task-specific leaders on the LineMOD 6DoF pose estimation benchmark. Furthermore, EPro-PnP helps to explore new possibilities of network design, as we demonstrate a novel deformable correspondence network with the state-of-the-art pose accuracy on the nuScenes 3D object detection benchmark. Our code is available at https://github.com/tjiiv-cprg/EPro-PnP-v2.
Abstract:Bitcoin is the most common cryptocurrency involved in cyber scams. Cybercriminals often utilize pseudonymity and privacy protection mechanism associated with Bitcoin transactions to make their scams virtually untraceable. The Ponzi scheme has attracted particularly significant attention among Bitcoin fraudulent activities. This paper considers a multi-class classification problem to determine whether a transaction is involved in Ponzi schemes or other cyber scams, or is a non-scam transaction. We design a specifically designed crawler to collect data and propose a novel Attention-based Long Short-Term Memory (A-LSTM) method for the classification problem. The experimental results show that the proposed model has better efficiency and accuracy than existing approaches, including Random Forest, Extra Trees, Gradient Boosting, and classical LSTM. With correctly identified scam features, our proposed A-LSTM achieves an F1-score over 82% for the original data and outperforms the existing approaches.
Abstract:Existing methods of non-intrusive load monitoring (NILM) in literatures generally suffer from high computational complexity and/or low accuracy in identifying working household appliances. This paper proposes an event-driven Factorial Hidden Markov model (eFHMM) for multiple appliances with multiple states in a household, aiming for low computational complexity and high load disaggregation accuracy. The proposed eFHMM decreases the computational complexity to be linear to the event number, which ensures online load disaggregation. Furthermore, the eFHMM is solved in two stages, where the first stage identifies state-changing appliance using transient signatures and the second stage confirms the inferred states using steady-state signatures. The combination of transient and steady-state signatures, which are extracted from transient and steady periods segmented by detected events, enhances the uniqueness of each state transition and associated appliances, which ensures accurate load disaggregation. The event-driven two-stage NILM solution, termed as eFHMM-TS, is naturally fit into an edge-cloud framework, which makes possible the real-world application of NILM. The proposed eFHMM-TS method is validated on the LIFTED and synD datasets. Results demonstrate that the eFHMM-TS method outperforms other methods and can be applied in practice.
Abstract:Event detection is the first step in event-based non-intrusive load monitoring (NILM) and it can provide useful transient information to identify appliances. However, existing event detection methods with fixed parameters may fail in case of unpredictable and complicated residential load changes such as high fluctuation, long transition, and near simultaneity. This paper proposes a dynamic time-window approach to deal with these highly complex load variations. Specifically, a window with adaptive margins, multi-timescale window screening, and adaptive threshold (WAMMA) method is proposed to detect events in aggregated home appliance load data with high sampling rate (>1Hz). The proposed method accurately captures the transient process by adaptively tuning parameters including window width, margin width, and change threshold. Furthermore, representative transient and steady-state load signatures are extracted and, for the first time, quantified from transient and steady periods segmented by detected events. Case studies on a 20Hz dataset, the 50Hz LIFTED dataset, and the 60Hz BLUED dataset show that the proposed method can robustly outperform other state-of-art event detection methods. This paper also shows that the extracted load signatures can improve NILM accuracy and help develop other applications such as load reconstruction to generate realistic load data for NILM research.
Abstract:Object localization in 3D space is a challenging aspect in monocular 3D object detection. Recent advances in 6DoF pose estimation have shown that predicting dense 2D-3D correspondence maps between image and object 3D model and then estimating object pose via Perspective-n-Point (PnP) algorithm can achieve remarkable localization accuracy. Yet these methods rely on training with ground truth of object geometry, which is difficult to acquire in real outdoor scenes. To address this issue, we propose MonoRUn, a novel detection framework that learns dense correspondences and geometry in a self-supervised manner, with simple 3D bounding box annotations. To regress the pixel-related 3D object coordinates, we employ a regional reconstruction network with uncertainty awareness. For self-supervised training, the predicted 3D coordinates are projected back to the image plane. A Robust KL loss is proposed to minimize the uncertainty-weighted reprojection error. During testing phase, we exploit the network uncertainty by propagating it through all downstream modules. More specifically, the uncertainty-driven PnP algorithm is leveraged to estimate object pose and its covariance. Extensive experiments demonstrate that our proposed approach outperforms current state-of-the-art methods on KITTI benchmark.
Abstract:Accurate detection of lane and road markings is a task of great importance for intelligent vehicles. In existing approaches, the detection accuracy often degrades with the increasing distance. This is due to the fact that distant lane and road markings occupy a small number of pixels in the image, and scales of lane and road markings are inconsistent at various distances and perspectives. The Inverse Perspective Mapping (IPM) can be used to eliminate the perspective distortion, but the inherent interpolation can lead to artifacts especially around distant lane and road markings and thus has a negative impact on the accuracy of lane marking detection and segmentation. To solve this problem, we adopt the Encoder-Decoder architecture in Fully Convolutional Networks and leverage the idea of Spatial Transformer Networks to introduce a novel semantic segmentation neural network. This approach decomposes the IPM process into multiple consecutive differentiable homographic transform layers, which are called "Perspective Transformer Layers". Furthermore, the interpolated feature map is refined by subsequent convolutional layers thus reducing the artifacts and improving the accuracy. The effectiveness of the proposed method in lane marking detection is validated on two public datasets: TuSimple and ApolloScape
Abstract:Imitation learning for end-to-end autonomous driving has drawn attention from academic communities. Current methods either only use images as the input which is ambiguous when a car approaches an intersection, or use additional command information to navigate the vehicle but not automated enough. Focusing on making the vehicle drive along the given path, we propose a new navigation command that does not require human's participation and a novel model architecture called angle branched network. Both the new navigation command and the angle branched network are easy to understand and effective. Besides, we find that not only segmentation information but also depth information can boost the performance of the driving model. We conduct experiments in a 3D urban simulator and both qualitative and quantitative evaluation results show the effectiveness of our model.