Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Valay Dave

DAG Card is the new Model Card

Nov 20, 2021

Jacopo Tagliabue, Ville Tuulos, Ciro Greco, Valay Dave

Figure 1 for DAG Card is the new Model Card

Figure 2 for DAG Card is the new Model Card

Abstract:With the progressive commoditization of modeling capabilities, data-centric AI recognizes that what happens before and after training becomes crucial for real-world deployments. Following the intuition behind Model Cards, we propose DAG Cards as a form of documentation encompassing the tenets of a data-centric point of view. We argue that Machine Learning pipelines (rather than models) are the most appropriate level of documentation for many practical use cases, and we share with the community an open implementation to generate cards from code.

* Accepted at DCAI @ Neurips 2021, camera-ready version

Via

Access Paper or Ask Questions

Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Oct 15, 2020

Zijie Wang, Lixi Zhou, Amitabh Das, Valay Dave, Zhanpeng Jin, Jia Zou

Figure 1 for Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Figure 2 for Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Figure 3 for Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Figure 4 for Survive the Schema Changes: Integration of Unmanaged Data Using Deep Learning

Abstract:Data is the king in the age of AI. However data integration is often a laborious task that is hard to automate. Schema change is one significant obstacle to the automation of the end-to-end data integration process. Although there exist mechanisms such as query discovery and schema modification language to handle the problem, these approaches can only work with the assumption that the schema is maintained by a database. However, we observe diversified schema changes in heterogeneous data and open data, most of which has no schema defined. In this work, we propose to use deep learning to automatically deal with schema changes through a super cell representation and automatic injection of perturbations to the training data to make the model robust to schema changes. Our experimental results demonstrate that our proposed approach is effective for two real-world data integration scenarios: coronavirus data integration, and machine log integration.

* In submission

Via

Access Paper or Ask Questions