Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Srikrishna Iyer

When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?

Nov 25, 2024

Srikrishna Iyer

Figure 1 for When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?

Figure 2 for When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?

Figure 3 for When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?

Figure 4 for When Babies Teach Babies: Can student knowledge sharing outperform Teacher-Guided Distillation on small datasets?

Abstract:We present our submission to the BabyLM challenge, aiming to push the boundaries of data-efficient language model pretraining. Our method builds upon deep mutual learning, introducing a student model search for diverse initialization. We address the limitation of treating students equally by formulating weighted mutual learning as a bi-level optimization problem. The inner loop learns compact students through online distillation, while the outer loop optimizes weights for better knowledge distillation from diverse students. This dynamic weighting strategy eliminates the need for a teacher model, reducing computational requirements. Our evaluations show that teacher-less methods can match or surpass teacher-supervised approaches.

* Accepted to BabyLM challenge, CoNLL Workshop, EMNLP 2024

Via

Access Paper or Ask Questions

GAT-GAN : A Graph-Attention-based Time-Series Generative Adversarial Network

Jun 03, 2023

Srikrishna Iyer, Teng Teck Hou

Figure 1 for GAT-GAN : A Graph-Attention-based Time-Series Generative Adversarial Network

Figure 2 for GAT-GAN : A Graph-Attention-based Time-Series Generative Adversarial Network

Figure 3 for GAT-GAN : A Graph-Attention-based Time-Series Generative Adversarial Network

Figure 4 for GAT-GAN : A Graph-Attention-based Time-Series Generative Adversarial Network

Abstract:Generative Adversarial Networks (GANs) have proven to be a powerful tool for generating realistic synthetic data. However, traditional GANs often struggle to capture complex relationships between features which results in generation of unrealistic multivariate time-series data. In this paper, we propose a Graph-Attention-based Generative Adversarial Network (GAT-GAN) that explicitly includes two graph-attention layers, one that learns temporal dependencies while the other captures spatial relationships. Unlike RNN-based GANs that struggle with modeling long sequences of data points, GAT-GAN generates long time-series data of high fidelity using an adversarially trained autoencoder architecture. Our empirical evaluations, using a variety of real-time-series datasets, show that our framework consistently outperforms state-of-the-art benchmarks based on \emph{Frechet Transformer distance} and \emph{Predictive score}, that characterizes (\emph{Fidelity, Diversity}) and \emph{predictive performance} respectively. Moreover, we introduce a Frechet Inception distance-like (FID) metric for time-series data called Frechet Transformer distance (FTD) score (lower is better), to evaluate the quality and variety of generated data. We also found that low FTD scores correspond to the best-performing downstream predictive experiments. Hence, FTD scores can be used as a standardized metric to evaluate synthetic time-series data.

* 9 pages, 1 figure, 3 tables, preprint under review

Via

Access Paper or Ask Questions