16:40 - 17:00
Augmenting Semantic Representation of Depressive Language: from Forums to Microblogs (259)
Nawshad Farruque (University of Alberta), Osmar Zaiane (University of Alberta), Randy Goebel (University of Alberta)
We discuss and analyze the process of creating word embedding feature representations specifically designed for a learning task when annotated data is scarce, like depressive language detection from Tweets. We start from rich word embedding pre-trained from a general dataset, then enhance it with embedding learned from a domain specific but relatively much smaller dataset. Our strengthened representation portrays better the domain of depression we are interested in as it combines the semantics learned from the specific domain and word coverage from the general language. We present a comparative analyses of our word embedding representations with a simple bag-of-words model, a well known sentiment lexicon, a psycholinguistic lexicon, and a general pre-trained word embedding, based on their efficacy in accurately identifying depressive Tweets. We show that our representations achieves a significantly better F1 score than the others when applied to a high quality dataset.
Reproducible Research
|
16:20 - 16:40
Augmenting Physiological Time Series Data: A Case Study for Sleep Apnea Detection (293)
Konstantinos Nikolaidis (University of Oslo), Stein Kristiansen (University of Oslo), Vera Goebel (University of Oslo), Thomas Plagemann (University of Oslo), Knut Liestøl (University of Oslo), Mohan Kankanhalli (National University of Singapore)
Supervised machine learning applications in the health domain often face the problem of insufficient training datasets. The quantity of labelled data is small due to privacy concerns and the cost of data acquisition and labelling by a medical expert. Furthermore, it is quite common that collected data are unbalanced and getting enough data to personalize models for individuals is very expensive or even infeasible. This paper addresses these problems by (1) designing a recurrent Generative Adversarial Network to generate realistic synthetic data and to augment the original dataset, (2) enabling the generation of balanced datasets based on a heavily unbalanced dataset, and (3) to control the data generation in such a way that the generated data resembles data from specific individuals. We apply these solutions for sleep apnea detection and study in the evaluation the performance of four well-known techniques, i.e., K-Nearest Neighbour, Random Forest, Multi-Layer Perceptron, and Support Vector Machine. All classifiers exhibit in the experiments a consistent increase in sensitivity and a kappa statisticincrease by between 0.72· 10^-2 and 18.2· 10^-2.
|
17:40 - 18:00
Wearable-based Parkinson's Disease Severity Monitoring using Deep Learning (575)
Jann Goschenhofer (Dept. of Statistics, Ludwig-Maximilians University, Munich; ConnectedLife GmbH, Munich), Franz M. J. Pfister (Dept. of Statistics, Ludwig-Maximilians University, Munich; ConnectedLife GmbH, Munich), Kamer Ali Yuksel (ConnectedLife GmbH, Munich), Bernd Bischl (Dept. of Statistics, Ludwig-Maximilians University, Munich), Urban Fietzek (Dept. of Neurology, Ludwig-Maximilians University, Munich; Dept. of Neurology), Janek Thomas (Clinical Neurophysiology, Schoen Clinic Schwabing)
One major challenge in the medication of Parkinson's disease is that the severity of the disease, reflected in the patients' motor state, cannot be measured using accessible biomarkers. Therefore, we develop and examine a variety of statistical models to detect the motor state of such patients based on sensor data from a wearable device. We find that deep learning models consistently outperform a classical machine learning model applied on hand-crafted features in this time series classification task. Furthermore, our results suggest that treating this problem as a regression instead of an ordinal regression or a classification task is most appropriate. For consistent model evaluation and training, we adopt the leave-one-subject-out validation scheme to the training of deep learning models. We also employ a class-weighting scheme to successfully mitigate the problem of high multi-class imbalances in this domain. In addition, we propose a customized performance measure that reflects the requirements of the involved medical staff on the model. To solve the problem of limited availability of high quality training data, we propose a transfer learning technique which helps to improve model performance substantially. Our results suggest that deep learning techniques offer a high potential to autonomously detect motor states of patients with Parkinson's disease.
|
17:00 - 17:20
CASTNet: Community-Attentive Spatio-Temporal Networks for Opioid Overdose Forecasting (900)
Ali Mert Ertugrul (University of Pittsburgh; Middle East Technical University, Ankara), Yu-Ru Lin (University of Pittsburgh), Tugba Taskaya-Temizel (Middle East Technical University, Ankara)
Opioid overdose is a growing public health crisis in the United States. This crisis, recognized as "opioid epidemic," has widespread societal consequences including the degradation of health, and the increase in crime rates and family problems. To improve the overdose surveillance and to identify the areas in need of prevention effort, in this work, we focus on forecasting opioid overdose using real-time crime dynamics. Previous work identified various types of links between opioid use and criminal activities, such as financial motives and common causes. Motivated by these observations, we propose a novel spatio-temporal predictive model for opioid overdose forecasting by leveraging the spatio-temporal patterns of crime incidents. Our proposed model incorporates multi-head attentional networks to learn different representation subspaces of features. Such deep learning architecture, called "community-attentive" networks, allows the prediction for a given location to be optimized by a mixture of groups (i.e., communities) of regions. In addition, our proposed model allows for interpreting what features, from what communities, have more contributions to predicting local incidents as well as how these communities are captured through forecasting. Our results on two real-world overdose datasets indicate that our model achieves superior forecasting performance and provides meaningful interpretations in terms of spatio-temporal relationships between the dynamics of crime and that of opioid overdose.
Reproducible Research
|
17:20 - 17:40
Investigating Time Series Classification Techniques for Rapid Pathogen Identification with Single-Cell MALDI-TOF Mass Spectrum Data (620)
Christina Papagiannopoulou (Ghent University), René Parchen (BiosparQ B.V., Leiden), Willem Waegeman (Ghent University)
Matrix-assisted laser desorption/ionization-time-of-flight mass spectrometry (MALDI-TOF-MS) is a well-known technology, widely used in species identification. Specifically, MALDI-TOF-MS is applied on samples that usually include bacterial cells, generating representative signals for the various bacterial species. However, for a reliable identification result, a significant amount of biomass is required. For most samples used for diagnostics of infectious diseases, the sample volume is extremely low to obtain the required amount of biomass. Therefore, amplification of the bacterial load is performed by a culturing phase. If the MALDI process could be applied to individual bacteria, it would be possible to circumvent the need for culturing and isolation, accelerating the whole process. In this paper, we briefly describe an implementation of a MALDI-TOF MS procedure in a setting of individual cells and we demonstrate the use of the produced data for the application of pathogen identification. The identification of pathogens (bacterial species) is performed by using machine learning algorithms on the generated single-cell signals. The high predictive performance of the machine learning models indicates that the produced bacterial signatures constitute an informative representation, helpful in distinguishing the different bacterial species. In addition, we reformulate the bacterial species identification problem as a time series classification task by considering the intensity sequences of a given spectrum as time series values. Experimental results show that algorithms originally introduced for time series analysis are beneficial in modelling observations of single-cell MALDI-TOF MS.
|