Paul C. Lauterbur Research Center for Biomedical Imaging, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China, Peng Cheng Laboratory, Shenzhen, China
Abstract:Previous detection studies have shown that LLMs cannot be effectively used as detectors, but these studies have not addressed modern Chinese poetry. Moreover, no relevant research has explored the performance of LLMs in detecting modern Chinese poetry. This paper evaluates and enhances the performance of LLMs as detectors for modern Chinese poetry, and proposes an image-semantic guided poetry detection method. Compared with traditional detection approaches, our method innovatively incorporates images that reflect the content of the poetry. Through example-driven approaches, our method effectively integrates information such as meaning, imagery, and feeling from the image, then forms a complementary judgment with the poem text. Experimental results demonstrate that the LLM detectors based on our method outperform baseline detectors based on plain text, and even surpass the best-performing traditional detector, RoBERTa. The Gemini detector using our method achieves a Macro-F1 score of 85.65%, reaching the state-of-the-art level. The performance improvements of different LLM detectors on multiple LLMs-generated data prove the effectiveness of our method.
Abstract:Surgical scene understanding is a cornerstone of computer-assisted intervention. While recent advances, particularly in surgical image segmentation, have driven progress, real-world clinical applications require a more holistic understanding that jointly captures procedural context, semantic reasoning, and precise visual grounding. However, existing approaches typically address these components in isolation, leading to fragmented representations and limited semantic consistency. To address this limitation, we propose SurgMLLM, a unified surgical scene understanding framework that bridges high-level reasoning and low-level visual grounding within a single model. Given surgical videos, SurgMLLM fine-tunes a multimodal large language model (MLLM) to support structured interpretability reasoning, which is used to jointly model phases, instrument-verb-target (IVT) triplets, and triplet-entity segmentation tokens. These tokens are then temporally aggregated and serve as prompts for a segmentation network, enabling accurate pixel-wise grounding of triplet instruments and targets. The entire framework is trained end-to-end with a unified objective that couples language-based reasoning supervision with visual grounding losses, promoting coherent cross-task learning and clinically consistent scene representations. To facilitate unified evaluation, we introduce CholecT45-Scene, extending CholecT45 dataset with 64,299 frames of pixel-level mask annotations for instruments and targets, aligned with existing triplet labels. Extensive experiments show that SurgMLLM significantly advances surgical scene understanding, improving the primary triplet recognition metric AP_IVT from 40.7% to 46.0% and consistently outperforming prior methods in phase recognition and segmentation. These results highlight the effectiveness of unified reasoning-and-grounding for reliable, context-aware surgical assistance.
Abstract:Accurate electromagnetic field (EMF) exposure mapping is critical for wireless network planning, environmental monitoring, and the deployment of next generation communication systems. The mapping results can be converted into the form of a radio map, a key technology in digital twin communication systems, used to describe the wireless signal propagation characteristics at every location in a specific area. Existing deep learning approaches treat propagation estimation as a pure regression problem and do not enforce physical consistency in the predicted fields. In this paper, we propose Phy2-ExposNet, a novel neural network framework that decouples exposure mapping into a physics-informed estimation stage and a transformer-based residual refinement stage. It first estimates the fields under two physical constraints and then refines the resulting exposure map by capturing long range interactions and complex spatial propagation patterns. Experiments demonstrate that the proposed method achieves lower estimation error while significantly reducing model complexity compared to existing approaches. It achieves around 15% relative error reduction over strong baselines, while using over 80% fewer parameters than conventional physics-informed models. Ablation results further reveal that the physics-informed design is crucial for capturing complex propagation effects, particularly in boundary and shadow regions.
Abstract:The rapid development of large language models (LLMs) has extended text generation tasks into the literary domain. However, AI-generated literary creations has raised increasingly prominent issues of creative authenticity and ethics in literary world, making the detection of LLM-generated literary texts essential and urgent. While previous works have made significant progress in detecting AI-generated text, it has yet to address classical Chinese poetry. Due to the unique linguistic features of classical Chinese poetry, such as strict metrical regularity, a shared system of poetic imagery, and flexible syntax, distinguishing whether a poem is authored by AI presents a substantial challenge. To address these issues, we introduce ChangAn, a benchmark for detecting LLM-generated classical Chinese poetry that containing total 30,664 poems, 10,276 are human-written poems and 20,388 poems are generated by four popular LLMs. Based on ChangAn, we conducted a systematic evaluation of 12 AI detectors, investigating their performance variations across different text granularities and generation strategies. Our findings highlight the limitations of current Chinese text detectors, which fail to serve as reliable tools for detecting LLM-generated classical Chinese poetry. These results validate the effectiveness and necessity of our proposed ChangAn benchmark. Our dataset and code are available at https://github.com/VelikayaScarlet/ChangAn.
Abstract:Foundation models have demonstrated remarkable success across diverse domains and tasks, primarily due to the thrive of large-scale, diverse, and high-quality datasets. However, in the field of medical imaging, the curation and assembling of such medical datasets are highly challenging due to the reliance on clinical expertise and strict ethical and privacy constraints, resulting in a scarcity of large-scale unified medical datasets and hindering the development of powerful medical foundation models. In this work, we present the largest survey to date of medical image datasets, covering over 1,000 open-access datasets with a systematic catalog of their modalities, tasks, anatomies, annotations, limitations, and potential for integration. Our analysis exposes a landscape that is modest in scale, fragmented across narrowly scoped tasks, and unevenly distributed across organs and modalities, which in turn limits the utility of existing medical image datasets for developing versatile and robust medical foundation models. To turn fragmentation into scale, we propose a metadata-driven fusion paradigm (MDFP) that integrates public datasets with shared modalities or tasks, thereby transforming multiple small data silos into larger, more coherent resources. Building on MDFP, we release an interactive discovery portal that enables end-to-end, automated medical image dataset integration, and compile all surveyed datasets into a unified, structured table that clearly summarizes their key characteristics and provides reference links, offering the community an accessible and comprehensive repository. By charting the current terrain and offering a principled path to dataset consolidation, our survey provides a practical roadmap for scaling medical imaging corpora, supporting faster data discovery, more principled dataset creation, and more capable medical foundation models.
Abstract:ChatGPT has demonstrated remarkable capabilities on both poetry generation and translation, yet its ability to truly understand poetry remains unexplored. Previous poetry-related work merely analyzed experimental outcomes without addressing fundamental issues of comprehension. This paper introduces a comprehensive framework for evaluating ChatGPT's understanding of modern poetry. We collaborated with professional poets to evaluate ChatGPT's interpretation of modern Chinese poems by different poets along multiple dimensions. Evaluation results show that ChatGPT's interpretations align with the original poets' intents in over 73% of the cases. However, its understanding in certain dimensions, particularly in capturing poeticity, proved to be less satisfactory. These findings highlight the effectiveness and necessity of our proposed framework. This study not only evaluates ChatGPT's ability to understand modern poetry but also establishes a solid foundation for future research on LLMs and their application to poetry-related tasks.
Abstract:Exemplified by the chemical vapor deposition growth of two-dimensional dendrites, which has potential applications in catalysis and presents a parameter-intensive, data-scarce and reaction process-complex model problem, we devise a machine intelligence-empowered framework for the full chain support of material synthesis, encompassing rapid process optimization, accurate customized synthesis, and comprehensive mechanism deciphering.First, active learning is integrated into the experimental workflow, identifying an optimal recipe for the growth of highly-branched, electrocatalytically-active ReSe2 dendrites through 60 experiments (4 iterations), which account for less than 1.3% of the numerous possible parameter combinations.Then, a prediction accuracy-guided data augmentation strategy is developed combined with a tree-based machine learning (ML) algorithm, unveiling a non-linear correlation between 5 process variables and fractal dimension (DF) of ReSe2 dendrites with only 9 experiment additions, which guides the synthesis of various user-defined DF. Finally, we construct a data-knowledge dual-driven mechanism model by integration of cross-scale characterizations, interpretable ML models, and domain knowledge in thermodynamics and kinetics, unraveling synergistic contributions of multiple process parameters to the product morphology. This work demonstrates the ML potential to transform the research paradigm and is adaptable to broader material synthesis.
Abstract:Background: Pleuroparenchymal fibroelastosis (PPFE) is an upper lobe predominant fibrotic lung abnormality associated with increased mortality in established interstitial lung disease. However, the clinical significance of radiologic PPFE progression in lung cancer screening (LCS) populations remains unclear. Methods: We analysed longitudinal low-dose CT scans and clinical data from two LCS studies: National Lung Screening Trial (NLST; n=7,980); SUMMIT study (n=8,561). An automated algorithm quantified PPFE volume on baseline and follow-up scans. Annualised change in PPFE was derived and dichotomised using a distribution-based threshold to define progressive PPFE. Associations between progressive PPFE and mortality were evaluated using Cox proportional hazards models adjusted for demographic and clinical variables. In SUMMIT cohort, associations between progressive PPFE and clinical outcomes were assessed using incidence rate ratios (IRR) and odds ratios (OR). Findings: Progressive PPFE independently associated with mortality in both LCS cohorts (NLST: Hazard Ratio (HR)=1.25, 95% Confidence Interval (CI): 1.01--1.56, p=0.042; SUMMIT: HR=3.14, 95% CI: 1.66--5.97, p<0.001). Within SUMMIT, progressive PPFE was strongly associated with higher respiratory admissions (IRR=2.79, p<0.001), increased antibiotic and steroid use (IRR=1.55, p=0.010), and showed a trend towards higher modified medical research council scores (OR=1.40, p=0.055). Interpretation: Radiologic PPFE progression independently associates with mortality across two large LCS cohorts, and associates with adverse clinical outcomes. Quantitative assessment of PPFE progression may provide a clinically relevant imaging biomarker to identify individuals at increased risk of respiratory morbidity within LCS programmes.
Abstract:Although debiased LLMs perform well on known bias patterns, they often fail to generalize to unfamiliar bias prompts, producing toxic outputs. We first validate that such high-bias prompts constitute a \emph{distribution shift} via OOD detection, and show static models degrade under this shift. To adapt on-the-fly, we propose \textbf{CAP-TTA}, a test-time adaptation framework that performs context-aware LoRA updates only when the bias-risk \emph{trigger} exceeds a threshold, using a precomputed diagonal \emph{preconditioner} for fast and stable updates. Across toxic-prompt settings and benchmarks, CAP-TTA reduces bias (confirmed by human evaluation) while achieving much lower update latency than AdamW/SGD; it also mitigates catastrophic forgetting by significantly improving narrative fluency over SOTA debiasing baseline while maintaining comparable debiasing effectiveness.
Abstract:Background: Pleuroparenchymal fibroelastosis (PPFE) is an upper lobe predominant fibrotic lung abnormality associated with increased mortality in established interstitial lung disease. However, the clinical significance of radiologic PPFE progression in lung cancer screening populations remains unclear. We investigated whether longitudinal change in PPFE quantified on low dose CT independently associates with mortality and respiratory morbidity. Methods: We analysed longitudinal low-dose CT scans and clinical data from two lung cancer screening studies: the National Lung Screening Trial (NLST; n=7980) and the SUMMIT study (n=8561). An automated algorithm quantified PPFE volume on baseline and follow up scans. Annualised change in PPFE (dPPFE) was derived and dichotomised using a distribution based threshold to define progressive PPFE. Associations between dPPFE and mortality were evaluated using Cox proportional hazards models adjusted for demographic and clinical variables. In the SUMMIT cohort, dPPFE was also examined in relation to clinical outcomes. Findings: dPPFE independently associated with mortality in both cohorts (NLST: HR 1.25, 95% CI 1.01-1.56, p=0.042; SUMMIT: HR 3.14, 95% CI 1.66-5.97, p<0.001). Kaplan-Meier curves showed reduced survival among participants with progressive PPFE in both cohorts. In SUMMIT, dPPFE was associated with higher respiratory admissions (IRR 2.79, p<0.001), increased antibiotic and steroid use (IRR 1.55, p=0.010), and a trend towards higher mMRC scores (OR 1.40, p=0.055). Interpretation: Radiologic PPFE progression independently associates with mortality across two large lung cancer screening cohorts and with adverse clinical outcomes. Quantitative assessment of PPFE progression may provide a clinically relevant imaging biomarker for identifying individuals at increased respiratory risk within screening programmes.