Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xinming Wang

Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles

May 22, 2025

Jun Xie, Xiongjun Guan, Yingjian Zhu, Zhaoran Zhao, Xinming Wang, Feng Chen, Zhepeng Wang

Abstract:In this paper, we present the runner-up solution for the Ego4D EgoSchema Challenge at CVPR 2025 (Confirmed on May 20, 2025). Inspired by the success of large models, we evaluate and leverage leading accessible multimodal large models and adapt them to video understanding tasks via few-shot learning and model ensemble strategies. Specifically, diversified prompt styles and process paradigms are systematically explored and evaluated to effectively guide the attention of large models, fully unleashing their powerful generalization and adaptability abilities. Experimental results demonstrate that, with our carefully designed approach, directly utilizing an individual multimodal model already outperforms the previous state-of-the-art (SOTA) method which includes several additional processes. Besides, an additional stage is further introduced that facilitates the cooperation and ensemble of periodic results, which achieves impressive performance improvements. We hope this work serves as a valuable reference for the practical application of large models and inspires future research in the field.

Via

Access Paper or Ask Questions

Local transfer learning Gaussian process modeling, with applications to surrogate modeling of expensive computer simulators

Oct 17, 2024

Xinming Wang, Simon Mak, John Miller, Jianguo Wu

Figure 1 for Local transfer learning Gaussian process modeling, with applications to surrogate modeling of expensive computer simulators

Figure 2 for Local transfer learning Gaussian process modeling, with applications to surrogate modeling of expensive computer simulators

Figure 3 for Local transfer learning Gaussian process modeling, with applications to surrogate modeling of expensive computer simulators

Figure 4 for Local transfer learning Gaussian process modeling, with applications to surrogate modeling of expensive computer simulators

Abstract:A critical bottleneck for scientific progress is the costly nature of computer simulations for complex systems. Surrogate models provide an appealing solution: such models are trained on simulator evaluations, then used to emulate and quantify uncertainty on the expensive simulator at unexplored inputs. In many applications, one often has available data on related systems. For example, in designing a new jet turbine, there may be existing studies on turbines with similar configurations. A key question is how information from such "source" systems can be transferred for effective surrogate training on the "target" system of interest. We thus propose a new LOcal transfer Learning Gaussian Process (LOL-GP) model, which leverages a carefully-designed Gaussian process to transfer such information for surrogate modeling. The key novelty of the LOL-GP is a latent regularization model, which identifies regions where transfer should be performed and regions where it should be avoided. This "local transfer" property is desirable in scientific systems: at certain parameters, such systems may behave similarly and thus transfer is beneficial; at other parameters, they may behave differently and thus transfer is detrimental. By accounting for local transfer, the LOL-GP can rectify a critical limitation of "negative transfer" in existing transfer learning models, where the transfer of information worsens predictive performance. We derive a Gibbs sampling algorithm for efficient posterior predictive sampling on the LOL-GP, for both the multi-source and multi-fidelity transfer settings. We then show, via a suite of numerical experiments and an application for jet turbine design, the improved surrogate performance of the LOL-GP over existing methods.

Via

Access Paper or Ask Questions

Guarding Force: Safety-Critical Compliant Control for Robot-Environment Interaction

May 08, 2024

Xinming Wang, Jun Yang, Jianliang Mao, Jinzhuo Liang, Shihua Li, Yunda Yan

Abstract:In this study, we propose a safety-critical compliant control strategy designed to strictly enforce interaction force constraints during the physical interaction of robots with unknown environments. The interaction force constraint is interpreted as a new force-constrained control barrier function (FC-CBF) by exploiting the generalized contact model and the prior information of the environment, i.e., the prior stiffness and rest position, for robot kinematics. The difference between the real environment and the generalized contact model is approximated by constructing a tracking differentiator, and its estimation error is quantified based on Lyapunov theory. By interpreting strict interaction safety specification as a dynamic constraint, restricting the desired joint angular rates in kinematics, the proposed approach modifies nominal compliant controllers using quadratic programming, ensuring adherence to interaction force constraints in unknown environments. The strict force constraint and the stability of the closed-loop system are rigorously analyzed. Experimental tests using a UR3e industrial robot with different environments verify the effectiveness of the proposed method in achieving the force constraints in unknown environments.

Via

Access Paper or Ask Questions

Multi-Attribute Enhancement Network for Person Search

Feb 16, 2021

Lequan Chen, Wei Xie, Zhigang Tu, Yaping Tao, Xinming Wang

Figure 1 for Multi-Attribute Enhancement Network for Person Search

Figure 2 for Multi-Attribute Enhancement Network for Person Search

Figure 3 for Multi-Attribute Enhancement Network for Person Search

Figure 4 for Multi-Attribute Enhancement Network for Person Search

Abstract:Person Search is designed to jointly solve the problems of Person Detection and Person Re-identification (Re-ID), in which the target person will be located in a large number of uncut images. Over the past few years, Person Search based on deep learning has made great progress. Visual character attributes play a key role in retrieving the query person, which has been explored in Re-ID but has been ignored in Person Search. So, we introduce attribute learning into the model, allowing the use of attribute features for retrieval task. Specifically, we propose a simple and effective model called Multi-Attribute Enhancement (MAE) which introduces attribute tags to learn local features. In addition to learning the global representation of pedestrians, it also learns the local representation, and combines the two aspects to learn robust features to promote the search performance. Additionally, we verify the effectiveness of our module on the existing benchmark dataset, CUHK-SYSU and PRW. Ultimately, our model achieves state-of-the-art among end-to-end methods, especially reaching 91.8% of mAP and 93.0% of rank-1 on CUHK-SYSU. Codes and models are available at https://github.com/chenlq123/MAE.

Via

Access Paper or Ask Questions