Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Axinia Radeva

Using Kernel Methods and Model Selection for Prediction of Preterm Birth

Sep 05, 2016

Ilia Vovsha, Ansaf Salleb-Aouissi, Anita Raja, Thomas Koch, Alex Rybchuk, Axinia Radeva, Ashwath Rajan, Yiwen Huang, Hatim Diab, Ashish Tomar(+1 more)

Figure 1 for Using Kernel Methods and Model Selection for Prediction of Preterm Birth

Figure 2 for Using Kernel Methods and Model Selection for Prediction of Preterm Birth

Figure 3 for Using Kernel Methods and Model Selection for Prediction of Preterm Birth

Figure 4 for Using Kernel Methods and Model Selection for Prediction of Preterm Birth

Abstract:We describe an application of machine learning to the problem of predicting preterm birth. We conduct a secondary analysis on a clinical trial dataset collected by the National In- stitute of Child Health and Human Development (NICHD) while focusing our attention on predicting different classes of preterm birth. We compare three approaches for deriving predictive models: a support vector machine (SVM) approach with linear and non-linear kernels, logistic regression with different model selection along with a model based on decision rules prescribed by physician experts for prediction of preterm birth. Our approach highlights the pre-processing methods applied to handle the inherent dynamics, noise and gaps in the data and describe techniques used to handle skewed class distributions. Empirical experiments demonstrate significant improvement in predicting preterm birth compared to past work.

* Presented at 2016 Machine Learning and Healthcare Conference (MLHC 2016), Los Angeles, CA. In this revision, we updated page 4 by adding the reference Vovsha et al. (2013) (incorrectly referenced as XXX in the previous version due to double blind reviewing). The bibtex entry is now added to the references

Via

Access Paper or Ask Questions

Leveraging Subjective Human Annotation for Clustering Historic Newspaper Articles

Aug 17, 2012

Haimonti Dutta, William Chan, Deepak Shankargouda, Manoj Pooleery, Axinia Radeva, Kyle Rego, Boyi Xie, Rebecca Passonneau, Austin Lee, Barbara Taranto

Figure 1 for Leveraging Subjective Human Annotation for Clustering Historic Newspaper Articles

Figure 2 for Leveraging Subjective Human Annotation for Clustering Historic Newspaper Articles

Figure 3 for Leveraging Subjective Human Annotation for Clustering Historic Newspaper Articles

Figure 4 for Leveraging Subjective Human Annotation for Clustering Historic Newspaper Articles

Abstract:The New York Public Library is participating in the Chronicling America initiative to develop an online searchable database of historically significant newspaper articles. Microfilm copies of the newspapers are scanned and high resolution Optical Character Recognition (OCR) software is run on them. The text from the OCR provides a wealth of data and opinion for researchers and historians. However, categorization of articles provided by the OCR engine is rudimentary and a large number of the articles are labeled editorial without further grouping. Manually sorting articles into fine-grained categories is time consuming if not impossible given the size of the corpus. This paper studies techniques for automatic categorization of newspaper articles so as to enhance search and retrieval on the archive. We explore unsupervised (e.g. KMeans) and semi-supervised (e.g. constrained clustering) learning algorithms to develop article categorization schemes geared towards the needs of end-users. A pilot study was designed to understand whether there was unanimous agreement amongst patrons regarding how articles can be categorized. It was found that the task was very subjective and consequently automated algorithms that could deal with subjective labels were used. While the small scale pilot study was extremely helpful in designing machine learning algorithms, a much larger system needs to be developed to collect annotations from users of the archive. The "BODHI" system currently being developed is a step in that direction, allowing users to correct wrongly scanned OCR and providing keywords and tags for newspaper articles used frequently. On successful implementation of the beta version of this system, we hope that it can be integrated with existing software being developed for the Chronicling America project.

Via

Access Paper or Ask Questions