MENU

Clustering of Data Streams With Dynamic Gaussian Mixture Models: An IoT Application in Industrial Processes

Diaz-Rozo, Javier; Bielza, Concha; Larranaga, Pedro

IEEE INTERNET OF THINGS JOURNAL
2018
VL / 5 - BP / 3533 - EP / 3547
abstract
In industrial Internet of Things applications with sensors sending dynamic process data at high speed, producing actionable insights at the right time is challenging. A key problem concerns processing a large amount of data, while the underlying dynamic phenomena related to the machine is possibly evolving over time due to factors, such as degradation. This makes any actionable model become obsolete and necessary to be updated. To cope with this problem, in this paper we propose a new unsupervised learning algorithm based on Gaussian mixture models called Gaussian-based dynamic probabilistic clustering (GDPC) mainly based on integrating and adapting three well known algorithms for use in dynamic scenarios: the expectationmaximization (EM) algorithm to estimate the model parameters and the Page-Hinkley test and Chernoff bound to detect concept drifts. Unlike other unsupervised methods, the model induced by the GDPC provides the membership probabilities of each instance to each cluster. This allows to determine, through a Brier score analysis, the robustness of the instance assignment and its evolution each time a concept drift is detected. Also, the algorithm works with very little data and significantly less computing power being able to decide whether (and when) to change the model. The algorithm is tested using synthetic data and data streams from an industrial testbed, where different operational states are automatically identified, giving good results in terms of classification accuracy, sensitivity, and specificity.

AccesS level

Green accepted

MENTIONS DATA