Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Francesco Pugliese

Toward Automated Website Classification by Deep Learning

Oct 22, 2019

Fabrizio De Fausti, Francesco Pugliese, Diego Zardetto

Figure 1 for Toward Automated Website Classification by Deep Learning

Figure 2 for Toward Automated Website Classification by Deep Learning

Figure 3 for Toward Automated Website Classification by Deep Learning

Figure 4 for Toward Automated Website Classification by Deep Learning

Abstract:In recent years, the interest in Big Data sources has been steadily growing within the Official Statistic community. The Italian National Institute of Statistics (Istat) is currently carrying out several Big Data pilot studies. One of these studies, the ICT Big Data pilot, aims at exploiting massive amounts of textual data automatically scraped from the websites of Italian enterprises in order to predict a set of target variables (e.g. e-commerce) that are routinely observed by the traditional ICT Survey. In this paper, we show that Deep Learning techniques can successfully address this problem. Essentially, we tackle a text classification task: an algorithm must learn to infer whether an Italian enterprise performs e-commerce from the textual content of its website. To reach this goal, we developed a sophisticated processing pipeline and evaluated its performance through extensive experiments. Our pipeline uses Convolutional Neural Networks and relies on Word Embeddings to encode raw texts into grayscale images (i.e. normalized numeric matrices). Web-scraped texts are huge and have very low signal to noise ratio: to overcome these issues, we adopted a framework known as False Positive Reduction, which has seldom (if ever) been applied before to text classification tasks. Several original contributions enable our processing pipeline to reach good classification results. Empirical evidence shows that our proposal outperforms all the alternative Machine Learning solutions already tested in Istat for the same task.

Via

Access Paper or Ask Questions

Satellite-Net: Automatic Extraction of Land Cover Indicators from Satellite Imagery by Deep Learning

Jul 22, 2019

Eleonora Bernasconi, Francesco Pugliese, Diego Zardetto, Monica Scannapieco

Figure 1 for Satellite-Net: Automatic Extraction of Land Cover Indicators from Satellite Imagery by Deep Learning

Figure 2 for Satellite-Net: Automatic Extraction of Land Cover Indicators from Satellite Imagery by Deep Learning

Figure 3 for Satellite-Net: Automatic Extraction of Land Cover Indicators from Satellite Imagery by Deep Learning

Figure 4 for Satellite-Net: Automatic Extraction of Land Cover Indicators from Satellite Imagery by Deep Learning

Abstract:In this paper we address the challenge of land cover classification for satellite images via Deep Learning (DL). Land Cover aims to detect the physical characteristics of the territory and estimate the percentage of land occupied by a certain category of entities: vegetation, residential buildings, industrial areas, forest areas, rivers, lakes, etc. DL is a new paradigm for Big Data analytics and in particular for Computer Vision. The application of DL in images classification for land cover purposes has a great potential owing to the high degree of automation and computing performance. In particular, the invention of Convolution Neural Networks (CNNs) was a fundament for the advancements in this field. In [1], the Satellite Task Team of the UN Global Working Group describes the results achieved so far with respect to the use of earth observation for Official Statistics. However, in that study, CNNs have not yet been explored for automatic classification of imagery. This work investigates the usage of CNNs for the estimation of land cover indicators, providing evidence of the first promising results. In particular, the paper proposes a customized model, called Satellite-Net, able to reach an accuracy level up to 98% on test sets.

* New Techniques and Technologies for Statistics 2019, Brussels

Via

Access Paper or Ask Questions