Abstract:Large language models (LLMs) typically employ greedy decoding or low-temperature sampling for reasoning tasks, reflecting a perceived trade-off between diversity and accuracy. We challenge this convention by introducing top-$n\sigma$, a novel sampling method that operates directly on pre-softmax logits by leveraging a statistical threshold. Our key insight is that logits naturally separate into a Gaussian-distributed noisy region and a distinct informative region, enabling efficient token filtering without complex probability manipulations. Unlike existing methods (e.g., top-$p$, min-$p$) that inadvertently include more noise tokens at higher temperatures, top-$n\sigma$ maintains a stable sampling space regardless of temperature scaling. We also provide a theoretical analysis of top-$n\sigma$ to better understand its behavior. The extensive experimental results across four reasoning-focused datasets demonstrate that our method not only outperforms existing sampling approaches but also surpasses greedy decoding, while maintaining consistent performance even at high temperatures.
Abstract:Transparent object depth perception poses a challenge in everyday life and logistics, primarily due to the inability of standard 3D sensors to accurately capture depth on transparent or reflective surfaces. This limitation significantly affects depth map and point cloud-reliant applications, especially in robotic manipulation. We developed a vision transformer-based algorithm for stereo depth recovery of transparent objects. This approach is complemented by an innovative feature post-fusion module, which enhances the accuracy of depth recovery by structural features in images. To address the high costs associated with dataset collection for stereo camera-based perception of transparent objects, our method incorporates a parameter-aligned, domain-adaptive, and physically realistic Sim2Real simulation for efficient data generation, accelerated by AI algorithm. Our experimental results demonstrate the model's exceptional Sim2Real generalizability in real-world scenarios, enabling precise depth mapping of transparent objects to assist in robotic manipulation. Project details are available at https://sites.google.com/view/cleardepth/ .
Abstract:Recently, federated learning (FL) has emerged as a popular technique for edge AI to mine valuable knowledge in edge computing (EC) systems. To mitigate the computing/communication burden on resource-constrained workers and protect model privacy, split federated learning (SFL) has been released by integrating both data and model parallelism. Despite resource limitations, SFL still faces two other critical challenges in EC, i.e., statistical heterogeneity and system heterogeneity. To address these challenges, we propose a novel SFL framework, termed MergeSFL, by incorporating feature merging and batch size regulation in SFL. Concretely, feature merging aims to merge the features from workers into a mixed feature sequence, which is approximately equivalent to the features derived from IID data and is employed to promote model accuracy. While batch size regulation aims to assign diverse and suitable batch sizes for heterogeneous workers to improve training efficiency. Moreover, MergeSFL explores to jointly optimize these two strategies upon their coupled relationship to better enhance the performance of SFL. Extensive experiments are conducted on a physical platform with 80 NVIDIA Jetson edge devices, and the experimental results show that MergeSFL can improve the final model accuracy by 5.82% to 26.22%, with a speedup by about 1.74x to 4.14x, compared to the baselines.
Abstract:Federated Learning (FL) has emerged to allow multiple clients to collaboratively train machine learning models on their private data. However, training and deploying large models for broader applications is challenging in resource-constrained environments. Fortunately, Split Federated Learning (SFL) offers an excellent solution by alleviating the computation and communication burden on the clients SFL often assumes labeled data for local training on clients, however, it is not the case in practice.Prior works have adopted semi-supervised techniques for leveraging unlabeled data in FL, but data non-IIDness poses another challenge to ensure training efficiency. Herein, we propose Pseudo-Clustering Semi-SFL, a novel system for training models in scenarios where labeled data reside on the server. By introducing Clustering Regularization, model performance under data non-IIDness can be improved. Besides, our theoretical and experimental investigations into model convergence reveal that the inconsistent training processes on labeled and unlabeled data impact the effectiveness of clustering regularization. Upon this, we develop a control algorithm for global updating frequency adaptation, which dynamically adjusts the number of supervised training iterations to mitigate the training inconsistency. Extensive experiments on benchmark models and datasets show that our system provides a 3.3x speed-up in training time and reduces the communication cost by about 80.1% while reaching the target accuracy, and achieves up to 6.9% improvement in accuracy under non-IID scenarios compared to the state-of-the-art.
Abstract:Federated learning (FL) allows multiple clients cooperatively train models without disclosing local data. However, the existing works fail to address all these practical concerns in FL: limited communication resources, dynamic network conditions and heterogeneous client properties, which slow down the convergence of FL. To tackle the above challenges, we propose a heterogeneity-aware FL framework, called FedCG, with adaptive client selection and gradient compression. Specifically, the parameter server (PS) selects a representative client subset considering statistical heterogeneity and sends the global model to them. After local training, these selected clients upload compressed model updates matching their capabilities to the PS for aggregation, which significantly alleviates the communication load and mitigates the straggler effect. We theoretically analyze the impact of both client selection and gradient compression on convergence performance. Guided by the derived convergence rate, we develop an iteration-based algorithm to jointly optimize client selection and compression ratio decision using submodular maximization and linear programming. Extensive experiments on both real-world prototypes and simulations show that FedCG can provide up to 5.3$\times$ speedup compared to other methods.
Abstract:Voting plays a central role in bringing crowd wisdom to collective decision making, meanwhile data privacy has been a common ethical/legal issue in eliciting preferences from individuals. This work studies the problem of aggregating individual's voting data under the local differential privacy setting, where usefulness and soundness of the aggregated scores are of major concern. One naive approach to the problem is adding Laplace random noises, however, it makes aggregated scores extremely fragile to new types of strategic behaviors tailored to the local privacy setting: data amplification attack and view disguise attack. The data amplification attack means an attacker's manipulation power is amplified by the privacy-preserving procedure when contributing a fraud vote. The view disguise attack happens when an attacker could disguise malicious data as valid private views to manipulate the voting result. In this work, after theoretically quantifying the estimation error bound and the manipulating risk bound of the Laplace mechanism, we propose two mechanisms improving the usefulness and soundness simultaneously: the weighted sampling mechanism and the additive mechanism. The former one interprets the score vector as probabilistic data. Compared to the Laplace mechanism for Borda voting rule with $d$ candidates, it reduces the mean squared error bound by half and lowers the maximum magnitude risk bound from $+\infty$ to $O(\frac{d^3}{n\epsilon})$. The latter one randomly outputs a subset of candidates according to their total scores. Its mean squared error bound is optimized from $O(\frac{d^5}{n\epsilon^2})$ to $O(\frac{d^4}{n\epsilon^2})$, and its maximum magnitude risk bound is reduced to $O(\frac{d^2}{n\epsilon})$. Experimental results validate that our proposed approaches averagely reduce estimation error by $50\%$ and are more robust to adversarial attacks.
Abstract:Autonomous underwater vehicles (AUVs) have been deployed for underwater exploration. However, its potential is confined by its limited on-board battery energy and data storage capacity. This problem has been addressed using docking systems by underwater recharging and data transfer for AUVs. In this work, we propose a vision based framework for underwater docking following these systems. The proposed framework comprises two modules; (i) a detection module which provides location information on underwater docking stations in 2D images captured by an on-board camera, and (ii) a pose estimation module which recovers the relative 3D position and orientation between docking stations and AUVs from the 2D images. For robust and credible detection of docking stations, we propose a convolutional neural network called Docking Neural Network (DoNN). For accurate pose estimation, a perspective-n-point algorithm is integrated into our framework. In order to examine our framework in underwater docking tasks, we collected a dataset of 2D images, named Underwater Docking Images Dataset (UDID), in an experimental water pool. To the best of our knowledge, UDID is the first publicly available underwater docking dataset. In the experiments, we first evaluate performance of the proposed detection module on UDID and its deformed variations. Next, we assess the accuracy of the pose estimation module by ground experiments, since it is not feasible to obtain true relative position and orientation between docking stations and AUVs under water. Then, we examine the pose estimation module by underwater experiments in our experimental water pool. Experimental results show that the proposed framework can be used to detect docking stations and estimate their relative pose efficiently and successfully, compared to the state-of-the-art baseline systems.
Abstract:Ensemble learning has been widely employed by mobile applications, ranging from environmental sensing to activity recognitions. One of the fundamental issue in ensemble learning is the trade-off between classification accuracy and computational costs, which is the goal of ensemble pruning. During crowdsourcing, the centralized aggregator releases ensemble learning models to a large number of mobile participants for task evaluation or as the crowdsourcing learning results, while different participants may seek for different levels of the accuracy-cost trade-off. However, most of existing ensemble pruning approaches consider only one identical level of such trade-off. In this study, we present an efficient ensemble pruning framework for personalized accuracy-cost trade-offs via multi-objective optimization. Specifically, for the commonly used linear-combination style of the trade-off, we provide an objective-mixture optimization to further reduce the number of ensemble candidates. Experimental results show that our framework is highly efficient for personalized ensemble pruning, and achieves much better pruning performance with objective-mixture optimization when compared to state-of-art approaches.