Abstract:Given a query from one modality, few-shot cross-modal retrieval (CMR) retrieves semantically similar instances in another modality with the target domain including classes that are disjoint from the source domain. Compared with classical few-shot CMR methods, vision-language pretraining methods like CLIP have shown great few-shot or zero-shot learning performance. However, they still suffer challenges due to (1) the feature degradation encountered in the target domain and (2) the extreme data imbalance. To tackle these issues, we propose FLEX-CLIP, a novel Feature-level Generation Network Enhanced CLIP. FLEX-CLIP includes two training stages. In multimodal feature generation, we propose a composite multimodal VAE-GAN network to capture real feature distribution patterns and generate pseudo samples based on CLIP features, addressing data imbalance. For common space projection, we develop a gate residual network to fuse CLIP features with projected features, reducing feature degradation in X-shot scenarios. Experimental results on four benchmark datasets show a 7%-15% improvement over state-of-the-art methods, with ablation studies demonstrating enhancement of CLIP features.
Abstract:Most concurrent blockchain systems rely heavily on the Proof-of-Work (PoW) or Proof-of-Stake (PoS) mechanisms for decentralized consensus and security assurance. However, the substantial energy expenditure stemming from computationally intensive yet meaningless tasks has raised considerable concerns surrounding traditional PoW approaches, The PoS mechanism, while free of energy consumption, is subject to security and economic issues. Addressing these issues, the paradigm of Proof-of-Useful-Work (PoUW) seeks to employ challenges of practical significance as PoW, thereby imbuing energy consumption with tangible value. While previous efforts in Proof of Learning (PoL) explored the utilization of deep learning model training SGD tasks as PoUW challenges, recent research has revealed its vulnerabilities to adversarial attacks and the theoretical hardness in crafting a byzantine-secure PoL mechanism. In this paper, we introduce the concept of incentive-security that incentivizes rational provers to behave honestly for their best interest, bypassing the existing hardness to design a PoL mechanism with computational efficiency, a provable incentive-security guarantee and controllable difficulty. Particularly, our work is secure against two attacks to the recent work of Jia et al. [2021], and also improves the computational overhead from $\Theta(1)$ to $O(\frac{\log E}{E})$. Furthermore, while most recent research assumes trusted problem providers and verifiers, our design also guarantees frontend incentive-security even when problem providers are untrusted, and verifier incentive-security that bypasses the Verifier's Dilemma. By incorporating ML training into blockchain consensus mechanisms with provable guarantees, our research not only proposes an eco-friendly solution to blockchain systems, but also provides a proposal for a completely decentralized computing power market in the new AI age.
Abstract:Music is a mysterious language that conveys feeling and thoughts via different tones and timbre. For better understanding of timbre in music, we chose music data of 6 representative instruments, analysed their timbre features and classified them. Instead of the current trend of Neural Network for black-box classification, our project is based on a combination of MFCC and LPC, and augmented with a 6-dimensional feature vector designed by ourselves from observation and attempts. In our white-box model, we observed significant patterns of sound that distinguish different timbres, and discovered some connection between objective data and subjective senses. With a totally 32-dimensional feature vector and a naive all-pairs SVM, we achieved improved classification accuracy compared to a single tool. We also attempted to analyze music pieces downloaded from the Internet, found out different performance on different instruments, explored the reasons and suggested possible ways to improve the performance.
Abstract:We proposed a new criterion \textit{noise-stability}, which revised the classical rigidity theory, for evaluation of MDS algorithms which can truthfully represent the fidelity of global structure reconstruction; then we proved the noise-stability of the cMDS algorithm in generic conditions, which provides a rigorous theoretical guarantee for the precision and theoretical bounds for Euclidean embedding and its application in fields including wireless sensor network localization and satellite positioning. Furthermore, we looked into previous work about minimum-cost globally rigid spanning subgraph, and proposed an algorithm to construct a minimum-cost noise-stable spanning graph in the Euclidean space, which enabled reliable localization on sparse graphs of noisy distance constraints with linear numbers of edges and sublinear costs in total edge lengths. Additionally, this algorithm also suggests a scheme to reconstruct point clouds from pairwise distances at a minimum of $O(n)$ time complexity, down from $O(n^3)$ for cMDS.