Abstract:In recent years, there have been unprecedented technological advances in sensor technology, and sensors have become more affordable than ever. Thus, sensor-driven data collection is increasingly becoming an attractive and practical option for researchers around the globe. Such data is typically extracted in the form of time series data, which can be investigated with data mining techniques to summarize behaviors of a range of subjects including humans and animals. While enabling cheap and mass collection of data, continuous sensor data recording results in datasets which are big in size and volume, which are challenging to process and analyze with traditional techniques in a timely manner. Such collected sensor data is typically extracted in the form of time series data. There are two main approaches in the literature, namely, shape-based classification and feature-based classification. Shape-based classification determines the best class according to a distance measure. Feature-based classification, on the other hand, measures properties of the time series and finds the best class according to the set of features defined for the time series. In this dissertation, we demonstrate that neither of the two techniques will dominate for some problems, but that some combination of both might be the best. In other words, on a single problem, it might be possible that one of the techniques is better for one subset of the behaviors, and the other technique is better for another subset of behaviors. We introduce a hybrid algorithm to classify behaviors, using both shape and feature measures, in weakly labeled time series data collected from sensors to quantify specific behaviors performed by the subject. We demonstrate that our algorithm can robustly classify real, noisy, and complex datasets, based on a combination of shape and features, and tested our proposed algorithm on real-world datasets.
Abstract:Poultry farms are a major contributor to the human food chain. However, around the world, there have been growing concerns about the quality of life for the livestock in poultry farms; and increasingly vocal demands for improved standards of animal welfare. Recent advances in sensing technologies and machine learning allow the possibility of monitoring birds, and employing the lessons learned to improve the welfare for all birds. This task superficially appears to be easy, yet, studying behavioral patterns involves collecting enormous amounts of data, justifying the term Big Data. Before the big data can be used for analytical purposes to tease out meaningful, well-conserved behavioral patterns, the collected data needs to be pre-processed. The pre-processing refers to processes for cleansing and preparing data so that it is in the format ready to be analyzed by downstream algorithms, such as classification and clustering algorithms. However, as we shall demonstrate, efficient pre-processing of chicken big data is both non-trivial and crucial towards success of further analytics.
Abstract:Time series classification is an important task in its own right, and it is often a precursor to further downstream analytics. To date, virtually all works in the literature have used either shape-based classification using a distance measure or feature-based classification after finding some suitable features for the domain. It seems to be underappreciated that in many datasets it is the case that some classes are best discriminated with features, while others are best discriminated with shape. Thus, making the shape vs. feature choice will condemn us to poor results, at least for some classes. In this work, we propose a new model for classifying time series that allows the use of both shape and feature-based measures, when warranted. Our algorithm automatically decides which approach is best for which class, and at query time chooses which classifier to trust the most. We evaluate our idea on real world datasets and demonstrate that our ideas produce statistically significant improvement in classification accuracy.
Abstract:Poultry farms are an important contributor to the human food chain. Worldwide, humankind keeps an enormous number of domesticated birds (e.g. chickens) for their eggs and their meat, providing rich sources of low-fat protein. However, around the world, there have been growing concerns about the quality of life for the livestock in poultry farms; and increasingly vocal demands for improved standards of animal welfare. Recent advances in sensing technologies and machine learning allow the possibility of automatically assessing the health of some individual birds, and employing the lessons learned to improve the welfare for all birds. This task superficially appears to be easy, given the dramatic progress in recent years in classifying human behaviors, and given that human behaviors are presumably more complex. However, as we shall demonstrate, classifying chicken behaviors poses several unique challenges, chief among which is creating a generalizable dictionary of behaviors from sparse and noisy data. In this work we introduce a novel time series dictionary learning algorithm that can robustly learn from weakly labeled data sources.