Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ting Li

DeepSuM: Deep Sufficient Modality Learning Framework

Mar 03, 2025

Zhe Gao, Jian Huang, Ting Li, Xueqin Wang

Abstract:Multimodal learning has become a pivotal approach in developing robust learning models with applications spanning multimedia, robotics, large language models, and healthcare. The efficiency of multimodal systems is a critical concern, given the varying costs and resource demands of different modalities. This underscores the necessity for effective modality selection to balance performance gains against resource expenditures. In this study, we propose a novel framework for modality selection that independently learns the representation of each modality. This approach allows for the assessment of each modality's significance within its unique representation space, enabling the development of tailored encoders and facilitating the joint analysis of modalities with distinct characteristics. Our framework aims to enhance the efficiency and effectiveness of multimodal learning by optimizing modality integration and selection.

Via

Access Paper or Ask Questions

Baichuan-Omni-1.5 Technical Report

Jan 26, 2025

Yadong Li, Jun Liu, Tao Zhang, Song Chen, Tianpeng Li, Zehuan Li, Lijun Liu, Lingfeng Ming, Guosheng Dong, Da Pan(+82 more)

Abstract:We introduce Baichuan-Omni-1.5, an omni-modal model that not only has omni-modal understanding capabilities but also provides end-to-end audio generation capabilities. To achieve fluent and high-quality interaction across modalities without compromising the capabilities of any modality, we prioritized optimizing three key aspects. First, we establish a comprehensive data cleaning and synthesis pipeline for multimodal data, obtaining about 500B high-quality data (text, audio, and vision). Second, an audio-tokenizer (Baichuan-Audio-Tokenizer) has been designed to capture both semantic and acoustic information from audio, enabling seamless integration and enhanced compatibility with MLLM. Lastly, we designed a multi-stage training strategy that progressively integrates multimodal alignment and multitask fine-tuning, ensuring effective synergy across all modalities. Baichuan-Omni-1.5 leads contemporary models (including GPT4o-mini and MiniCPM-o 2.6) in terms of comprehensive omni-modal capabilities. Notably, it achieves results comparable to leading models such as Qwen2-VL-72B across various multimodal medical benchmarks.

Via

Access Paper or Ask Questions

Two-way Node Popularity Model for Directed and Bipartite Networks

Dec 11, 2024

Bing-Yi Jing, Ting Li, Jiangzhou Wang, Ya Wang

Abstract:There has been extensive research on community detection in directed and bipartite networks. However, these studies often fail to consider the popularity of nodes in different communities, which is a common phenomenon in real-world networks. To address this issue, we propose a new probabilistic framework called the Two-Way Node Popularity Model (TNPM). The TNPM also accommodates edges from different distributions within a general sub-Gaussian family. We introduce the Delete-One-Method (DOM) for model fitting and community structure identification, and provide a comprehensive theoretical analysis with novel technical skills dealing with sub-Gaussian generalization. Additionally, we propose the Two-Stage Divided Cosine Algorithm (TSDC) to handle large-scale networks more efficiently. Our proposed methods offer multi-folded advantages in terms of estimation accuracy and computational efficiency, as demonstrated through extensive numerical studies. We apply our methods to two real-world applications, uncovering interesting findings.

Via

Access Paper or Ask Questions

ActiveSplat: High-Fidelity Scene Reconstruction through Active Gaussian Splatting

Oct 29, 2024

Yuetao Li, Zijia Kuang, Ting Li, Guyue Zhou, Shaohui Zhang, Zike Yan

Abstract:We propose ActiveSplat, an autonomous high-fidelity reconstruction system leveraging Gaussian splatting. Taking advantage of efficient and realistic rendering, the system establishes a unified framework for online mapping, viewpoint selection, and path planning. The key to ActiveSplat is a hybrid map representation that integrates both dense information about the environment and a sparse abstraction of the workspace. Therefore, the system leverages sparse topology for efficient viewpoint sampling and path planning, while exploiting view-dependent dense prediction for viewpoint selection, facilitating efficient decision-making with promising accuracy and completeness. A hierarchical planning strategy based on the topological map is adopted to mitigate repetitive trajectories and improve local granularity given limited budgets, ensuring high-fidelity reconstruction with photorealistic view synthesis. Extensive experiments and ablation studies validate the efficacy of the proposed method in terms of reconstruction accuracy, data coverage, and exploration efficiency. Project page: https://li-yuetao.github.io/ActiveSplat/.

Via

Access Paper or Ask Questions

Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models

Jun 06, 2024

Ding Huang, Ting Li, Jian Huang

Figure 1 for Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models

Figure 2 for Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models

Figure 3 for Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models

Figure 4 for Bayesian Power Steering: An Effective Approach for Domain Adaptation of Diffusion Models

Abstract:We propose a Bayesian framework for fine-tuning large diffusion models with a novel network structure called Bayesian Power Steering (BPS). We clarify the meaning behind adaptation from a \textit{large probability space} to a \textit{small probability space} and explore the task of fine-tuning pre-trained models using learnable modules from a Bayesian perspective. BPS extracts task-specific knowledge from a pre-trained model's learned prior distribution. It efficiently leverages large diffusion models, differentially intervening different hidden features with a head-heavy and foot-light configuration. Experiments highlight the superiority of BPS over contemporary methods across a range of tasks even with limited amount of data. Notably, BPS attains an FID score of 10.49 under the sketch condition on the COCO17 dataset.

* 25 pages, 26 figures, and 4 tables

Via

Access Paper or Ask Questions

Combining Experimental and Historical Data for Policy Evaluation

Jun 01, 2024

Ting Li, Chengchun Shi, Qianglin Wen, Yang Sui, Yongli Qin, Chunbo Lai, Hongtu Zhu

Abstract:This paper studies policy evaluation with multiple data sources, especially in scenarios that involve one experimental dataset with two arms, complemented by a historical dataset generated under a single control arm. We propose novel data integration methods that linearly integrate base policy value estimators constructed based on the experimental and historical data, with weights optimized to minimize the mean square error (MSE) of the resulting combined estimator. We further apply the pessimistic principle to obtain more robust estimators, and extend these developments to sequential decision making. Theoretically, we establish non-asymptotic error bounds for the MSEs of our proposed estimators, and derive their oracle, efficiency and robustness properties across a broad spectrum of reward shift scenarios. Numerical experiments and real-data-based analyses from a ridesharing company demonstrate the superior performance of the proposed estimators.

Via

Access Paper or Ask Questions

Style Adaptation for Domain-adaptive Semantic Segmentation

Apr 25, 2024

Ting Li, Jianshu Chao, Deyu An

Abstract:Unsupervised Domain Adaptation (UDA) refers to the method that utilizes annotated source domain data and unlabeled target domain data to train a model capable of generalizing to the target domain data. Domain discrepancy leads to a significant decrease in the performance of general network models trained on the source domain data when applied to the target domain. We introduce a straightforward approach to mitigate the domain discrepancy, which necessitates no additional parameter calculations and seamlessly integrates with self-training-based UDA methods. Through the transfer of the target domain style to the source domain in the latent feature space, the model is trained to prioritize the target domain style during the decision-making process. We tackle the problem at both the image-level and shallow feature map level by transferring the style information from the target domain to the source domain data. As a result, we obtain a model that exhibits superior performance on the target domain. Our method yields remarkable enhancements in the state-of-the-art performance for synthetic-to-real UDA tasks. For example, our proposed method attains a noteworthy UDA performance of 76.93 mIoU on the GTA->Cityscapes dataset, representing a notable improvement of +1.03 percentage points over the previous state-of-the-art results.

Via

Access Paper or Ask Questions

MonoPCC: Photometric-invariant Cycle Constraint for Monocular Depth Estimation of Endoscopic Images

Apr 25, 2024

Zhiwei Wang, Ying Zhou, Shiquan He, Ting Li, Yitong Zhang, Xinxia Feng, Mei Liu, Qiang Li

Abstract:Photometric constraint is indispensable for self-supervised monocular depth estimation. It involves warping a source image onto a target view using estimated depth&pose, and then minimizing the difference between the warped and target images. However, the endoscopic built-in light causes significant brightness fluctuations, and thus makes the photometric constraint unreliable. Previous efforts only mitigate this relying on extra models to calibrate image brightness. In this paper, we propose MonoPCC to address the brightness inconsistency radically by reshaping the photometric constraint into a cycle form. Instead of only warping the source image, MonoPCC constructs a closed loop consisting of two opposite forward-backward warping paths: from target to source and then back to target. Thus, the target image finally receives an image cycle-warped from itself, which naturally makes the constraint invariant to brightness changes. Moreover, MonoPCC transplants the source image's phase-frequency into the intermediate warped image to avoid structure lost, and also stabilizes the training via an exponential moving average (EMA) strategy to avoid frequent changes in the forward warping. The comprehensive and extensive experimental results on three datasets demonstrate that our proposed MonoPCC shows a great robustness to the brightness inconsistency, and exceeds other state-of-the-arts by reducing the absolute relative error by at least 7.27%.

* 9 pages, 7 figures

Via

Access Paper or Ask Questions

MonoBox: Tightness-free Box-supervised Polyp Segmentation using Monotonicity Constraint

Apr 02, 2024

Qiang Hu, Zhenyu Yi, Ying Zhou, Ting Li, Fan Huang, Mei Liu, Qiang Li, Zhiwei Wang

Figure 1 for MonoBox: Tightness-free Box-supervised Polyp Segmentation using Monotonicity Constraint

Figure 2 for MonoBox: Tightness-free Box-supervised Polyp Segmentation using Monotonicity Constraint

Figure 3 for MonoBox: Tightness-free Box-supervised Polyp Segmentation using Monotonicity Constraint

Figure 4 for MonoBox: Tightness-free Box-supervised Polyp Segmentation using Monotonicity Constraint

Abstract:We propose MonoBox, an innovative box-supervised segmentation method constrained by monotonicity to liberate its training from the user-unfriendly box-tightness assumption. In contrast to conventional box-supervised segmentation, where the box edges must precisely touch the target boundaries, MonoBox leverages imprecisely-annotated boxes to achieve robust pixel-wise segmentation. The 'linchpin' is that, within the noisy zones around box edges, MonoBox discards the traditional misguiding multiple-instance learning loss, and instead optimizes a carefully-designed objective, termed monotonicity constraint. Along directions transitioning from the foreground to background, this new constraint steers responses to adhere to a trend of monotonically decreasing values. Consequently, the originally unreliable learning within the noisy zones is transformed into a correct and effective monotonicity optimization. Moreover, an adaptive label correction is introduced, enabling MonoBox to enhance the tightness of box annotations using predicted masks from the previous epoch and dynamically shrink the noisy zones as training progresses. We verify MonoBox in the box-supervised segmentation task of polyps, where satisfying box-tightness is challenging due to the vague boundaries between the polyp and normal tissues. Experiments on both public synthetic and in-house real noisy datasets demonstrate that MonoBox exceeds other anti-noise state-of-the-arts by improving Dice by at least 5.5% and 3.3%, respectively. Codes are at https://github.com/Huster-Hq/MonoBox.

Via

Access Paper or Ask Questions

Node Centrality Approximation For Large Networks Based On Inductive Graph Neural Networks

Mar 08, 2024

Yiwei Zou, Ting Li, Zong-fu Luo

Abstract:Closeness Centrality (CC) and Betweenness Centrality (BC) are crucial metrics in network analysis, providing essential reference for discerning the significance of nodes within complex networks. These measures find wide applications in critical tasks, such as community detection and network dismantling. However, their practical implementation on extensive networks remains computationally demanding due to their high time complexity. To mitigate these computational challenges, numerous approximation algorithms have been developed to expedite the computation of CC and BC. Nevertheless, even these approximations still necessitate substantial processing time when applied to large-scale networks. Furthermore, their output proves sensitive to even minor perturbations within the network structure. In this work, We redefine the CC and BC node ranking problem as a machine learning problem and propose the CNCA-IGE model, which is an encoder-decoder model based on inductive graph neural networks designed to rank nodes based on specified CC or BC metrics. We incorporate the MLP-Mixer model as the decoder in the BC ranking prediction task to enhance the model's robustness and capacity. Our approach is evaluated on diverse synthetic and real-world networks of varying scales, and the experimental results demonstrate that the CNCA-IGE model outperforms state-of-the-art baseline models, significantly reducing execution time while improving performance.

Via

Access Paper or Ask Questions