Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fariz Rahman

Deep Lake: a Lakehouse for Deep Learning

Sep 22, 2022

Sasun Hambardzumyan, Abhinav Tuli, Levon Ghukasyan, Fariz Rahman, Hrant Topchyan, David Isayan, Mikayel Harutyunyan, Tatevik Hakobyan, Ivo Stranic, Davit Buniatyan

Figure 1 for Deep Lake: a Lakehouse for Deep Learning

Figure 2 for Deep Lake: a Lakehouse for Deep Learning

Figure 3 for Deep Lake: a Lakehouse for Deep Learning

Figure 4 for Deep Lake: a Lakehouse for Deep Learning

Abstract:Traditional data lakes provide critical data infrastructure for analytical workloads by enabling time travel, running SQL queries, ingesting data with ACID transactions, and visualizing petabyte-scale datasets on cloud storage. They allow organizations to break down data silos, unlock data-driven decision-making, improve operational efficiency, and reduce costs. However, as deep learning takes over common analytical workflows, traditional data lakes become less useful for applications such as natural language processing (NLP), audio processing, computer vision, and applications involving non-tabular datasets. This paper presents Deep Lake, an open-source lakehouse for deep learning applications developed at Activeloop. Deep Lake maintains the benefits of a vanilla data lake with one key difference: it stores complex data, such as images, videos, annotations, as well as tabular data, in the form of tensors and rapidly streams the data over the network to (a) Tensor Query Language, (b) in-browser visualization engine, or (c) deep learning frameworks without sacrificing GPU utilization. Datasets stored in Deep Lake can be accessed from PyTorch, TensorFlow, JAX, and integrate with numerous MLOps tools.

Via

Access Paper or Ask Questions

TableQuery: Querying tabular data with natural language

Jan 27, 2022

Abhijith Neil Abraham, Fariz Rahman, Damanpreet Kaur

Figure 1 for TableQuery: Querying tabular data with natural language

Figure 2 for TableQuery: Querying tabular data with natural language

Figure 3 for TableQuery: Querying tabular data with natural language

Figure 4 for TableQuery: Querying tabular data with natural language

Abstract:This paper presents TableQuery, a novel tool for querying tabular data using deep learning models pre-trained to answer questions on free text. Existing deep learning methods for question answering on tabular data have various limitations, such as having to feed the entire table as input into a neural network model, making them unsuitable for most real-world applications. Since real-world data might contain millions of rows, it may not entirely fit into the memory. Moreover, data could be stored in live databases, which are updated in real-time, and it is impractical to serialize an entire database to a neural network-friendly format each time it is updated. In TableQuery, we use deep learning models pre-trained for question answering on free text to convert natural language queries to structured queries, which can be run against a database or a spreadsheet. This method eliminates the need for fitting the entire data into memory as well as serializing databases. Furthermore, deep learning models pre-trained for question answering on free text are readily available on platforms such as HuggingFace Model Hub (7). TableQuery does not require re-training; when a newly trained model for question answering with better performance is available, it can replace the existing model in TableQuery.

* 11 pages, 1 figures

Via

Access Paper or Ask Questions