Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Guillem Cucurull

Nougat: Neural Optical Understanding for Academic Documents

Aug 25, 2023

Lukas Blecher, Guillem Cucurull, Thomas Scialom, Robert Stojnic

Figure 1 for Nougat: Neural Optical Understanding for Academic Documents

Figure 2 for Nougat: Neural Optical Understanding for Academic Documents

Figure 3 for Nougat: Neural Optical Understanding for Academic Documents

Figure 4 for Nougat: Neural Optical Understanding for Academic Documents

Abstract:Scientific knowledge is predominantly stored in books and scientific journals, often in the form of PDFs. However, the PDF format leads to a loss of semantic information, particularly for mathematical expressions. We propose Nougat (Neural Optical Understanding for Academic Documents), a Visual Transformer model that performs an Optical Character Recognition (OCR) task for processing scientific documents into a markup language, and demonstrate the effectiveness of our model on a new dataset of scientific documents. The proposed approach offers a promising solution to enhance the accessibility of scientific knowledge in the digital age, by bridging the gap between human-readable documents and machine-readable text. We release the models and code to accelerate future work on scientific text recognition.

* 17 pages, 10 figures

Via

Access Paper or Ask Questions

Llama 2: Open Foundation and Fine-Tuned Chat Models

Jul 19, 2023

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale(+58 more)

Figure 1 for Llama 2: Open Foundation and Fine-Tuned Chat Models

Figure 2 for Llama 2: Open Foundation and Fine-Tuned Chat Models

Figure 3 for Llama 2: Open Foundation and Fine-Tuned Chat Models

Figure 4 for Llama 2: Open Foundation and Fine-Tuned Chat Models

Abstract:In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs.

Via

Access Paper or Ask Questions

Galactica: A Large Language Model for Science

Nov 16, 2022

Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, Robert Stojnic

Figure 1 for Galactica: A Large Language Model for Science

Figure 2 for Galactica: A Large Language Model for Science

Figure 3 for Galactica: A Large Language Model for Science

Figure 4 for Galactica: A Large Language Model for Science

Abstract:Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by 68.2% versus 49.0%. Galactica also performs well on reasoning, outperforming Chinchilla on mathematical MMLU by 41.3% to 35.7%, and PaLM 540B on MATH with a score of 20.4% versus 8.8%. It also sets a new state-of-the-art on downstream tasks such as PubMedQA and MedMCQA dev of 77.6% and 52.9%. And despite not being trained on a general corpus, Galactica outperforms BLOOM and OPT-175B on BIG-bench. We believe these results demonstrate the potential for language models as a new interface for science. We open source the model for the benefit of the scientific community.

Via

Access Paper or Ask Questions

Context-Aware Visual Compatibility Prediction

Feb 12, 2019

Guillem Cucurull, Perouz Taslakian, David Vazquez

Figure 1 for Context-Aware Visual Compatibility Prediction

Figure 2 for Context-Aware Visual Compatibility Prediction

Figure 3 for Context-Aware Visual Compatibility Prediction

Figure 4 for Context-Aware Visual Compatibility Prediction

Abstract:How do we determine whether two or more clothing items are compatible or visually appealing? Part of the answer lies in understanding of visual aesthetics, and is biased by personal preferences shaped by social attitudes, time, and place. In this work we propose a method that predicts compatibility between two items based on their visual features, as well as their context. We define context as the products that are known to be compatible with each of these item. Our model is in contrast to other metric learning approaches that rely on pairwise comparisons between item features alone. We address the compatibility prediction problem using a graph neural network that learns to generate product embeddings conditioned on their context. We present results for two prediction tasks (fill in the blank and outfit compatibility) tested on two fashion datasets Polyvore and Fashion-Gen, and on a subset of the Amazon dataset; we achieve state of the art results when using context information and show how test performance improves as more context is used.

Via

Access Paper or Ask Questions

Attend and Rectify: a Gated Attention Mechanism for Fine-Grained Recovery

Jul 24, 2018

Pau Rodríguez, Josep M. Gonfaus, Guillem Cucurull, F. Xavier Roca, Jordi Gonzàlez

Figure 1 for Attend and Rectify: a Gated Attention Mechanism for Fine-Grained Recovery

Figure 2 for Attend and Rectify: a Gated Attention Mechanism for Fine-Grained Recovery

Figure 3 for Attend and Rectify: a Gated Attention Mechanism for Fine-Grained Recovery

Figure 4 for Attend and Rectify: a Gated Attention Mechanism for Fine-Grained Recovery

Abstract:We propose a novel attention mechanism to enhance Convolutional Neural Networks for fine-grained recognition. It learns to attend to lower-level feature activations without requiring part annotations and uses these activations to update and rectify the output likelihood distribution. In contrast to other approaches, the proposed mechanism is modular, architecture-independent and efficient both in terms of parameters and computation required. Experiments show that networks augmented with our approach systematically improve their classification accuracy and become more robust to clutter. As a result, Wide Residual Networks augmented with our proposal surpasses the state of the art classification accuracies in CIFAR-10, the Adience gender recognition task, Stanford dogs, and UEC Food-100.

* Published at ECCV2018

Via

Access Paper or Ask Questions

On the iterative refinement of densely connected representation levels for semantic segmentation

Apr 30, 2018

Arantxa Casanova, Guillem Cucurull, Michal Drozdzal, Adriana Romero, Yoshua Bengio

Figure 1 for On the iterative refinement of densely connected representation levels for semantic segmentation

Figure 2 for On the iterative refinement of densely connected representation levels for semantic segmentation

Figure 3 for On the iterative refinement of densely connected representation levels for semantic segmentation

Figure 4 for On the iterative refinement of densely connected representation levels for semantic segmentation

Abstract:State-of-the-art semantic segmentation approaches increase the receptive field of their models by using either a downsampling path composed of poolings/strided convolutions or successive dilated convolutions. However, it is not clear which operation leads to best results. In this paper, we systematically study the differences introduced by distinct receptive field enlargement methods and their impact on the performance of a novel architecture, called Fully Convolutional DenseResNet (FC-DRN). FC-DRN has a densely connected backbone composed of residual networks. Following standard image segmentation architectures, receptive field enlargement operations that change the representation level are interleaved among residual networks. This allows the model to exploit the benefits of both residual and dense connectivity patterns, namely: gradient flow, iterative refinement of representations, multi-scale feature combination and deep supervision. In order to highlight the potential of our model, we test it on the challenging CamVid urban scene understanding benchmark and make the following observations: 1) downsampling operations outperform dilations when the model is trained from scratch, 2) dilations are useful during the finetuning step of the model, 3) coarser representations require less refinement steps, and 4) ResNets (by model construction) are good regularizers, since they can reduce the model capacity when needed. Finally, we compare our architecture to alternative methods and report state-of-the-art result on the Camvid dataset, with at least twice fewer parameters.

Via

Access Paper or Ask Questions

Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks

Feb 06, 2018

Guillem Cucurull, Pau Rodríguez, V. Oguz Yazici, Josep M. Gonfaus, F. Xavier Roca, Jordi Gonzàlez

Figure 1 for Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks

Figure 2 for Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks

Figure 3 for Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks

Figure 4 for Deep Inference of Personality Traits by Integrating Image and Word Use in Social Networks

Abstract:Social media, as a major platform for communication and information exchange, is a rich repository of the opinions and sentiments of 2.3 billion users about a vast spectrum of topics. To sense the whys of certain social user's demands and cultural-driven interests, however, the knowledge embedded in the 1.8 billion pictures which are uploaded daily in public profiles has just started to be exploited since this process has been typically been text-based. Following this trend on visual-based social analysis, we present a novel methodology based on Deep Learning to build a combined image-and-text based personality trait model, trained with images posted together with words found highly correlated to specific personality traits. So the key contribution here is to explore whether OCEAN personality trait modeling can be addressed based on images, here called \emph{Mind{P}ics}, appearing with certain tags with psychological insights. We found that there is a correlation between those posted images and their accompanying texts, which can be successfully modeled using deep neural networks for personality estimation. The experimental results are consistent with previous cyber-psychology results based on texts or images. In addition, classification results on some traits show that some patterns emerge in the set of images corresponding to a specific text, in essence to those representing an abstract concept. These results open new avenues of research for further refining the proposed personality model under the supervision of psychology experts.

Via

Access Paper or Ask Questions

Graph Attention Networks

Feb 04, 2018

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio

Abstract:We present graph attention networks (GATs), novel neural network architectures that operate on graph-structured data, leveraging masked self-attentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods' features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectral-based graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved or matched state-of-the-art results across four established transductive and inductive graph benchmarks: the Cora, Citeseer and Pubmed citation network datasets, as well as a protein-protein interaction dataset (wherein test graphs remain unseen during training).

* To appear at ICLR 2018. 12 pages, 2 figures

Via

Access Paper or Ask Questions

Regularizing CNNs with Locally Constrained Decorrelations

Mar 15, 2017

Pau Rodríguez, Jordi Gonzàlez, Guillem Cucurull, Josep M. Gonfaus, Xavier Roca

Figure 1 for Regularizing CNNs with Locally Constrained Decorrelations

Figure 2 for Regularizing CNNs with Locally Constrained Decorrelations

Figure 3 for Regularizing CNNs with Locally Constrained Decorrelations

Figure 4 for Regularizing CNNs with Locally Constrained Decorrelations

Abstract:Regularization is key for deep learning since it allows training more complex models while keeping lower levels of overfitting. However, the most prevalent regularizations do not leverage all the capacity of the models since they rely on reducing the effective number of parameters. Feature decorrelation is an alternative for using the full capacity of the models but the overfitting reduction margins are too narrow given the overhead it introduces. In this paper, we show that regularizing negatively correlated features is an obstacle for effective decorrelation and present OrthoReg, a novel regularization technique that locally enforces feature orthogonality. As a result, imposing locality constraints in feature decorrelation removes interferences between negatively correlated feature weights, allowing the regularizer to reach higher decorrelation bounds, and reducing the overfitting more effectively. In particular, we show that the models regularized with OrthoReg have higher accuracy bounds even when batch normalization and dropout are present. Moreover, since our regularization is directly performed on the weights, it is especially suitable for fully convolutional neural networks, where the weight space is constant compared to the feature map space. As a result, we are able to reduce the overfitting of state-of-the-art CNNs on CIFAR-10, CIFAR-100, and SVHN.

* Accepted at ICLR2017

Via

Access Paper or Ask Questions