Discrete Time-Series Clustering and Linear Temporal Logic Delineation

Abstract

The collection of information in this data-driven world has become paramount to the way businesses and individuals interact with society. From personal wearable technology to weather prediction, sales forecasting, and everything in between, a common characteristic among a significant proportion of this data is its relationship with respect to time. Acting as a key to unlock the power contained within time-series data, analytical techniques and logical representations provide a basis to translate data into learning outcomes. While time series data represents a significant opportunity for institutions and individuals to learn from the past to improve the future, the prevalence of unstructured data within real-world settings is an active challenge for existing analytical techniques. This impediment is especially relevant within research areas such as goal recognition, policy summarization, and system dynamic modelling, where the shared objective is to derive meaning from observed behavior. To establish meaningful insights from unstructured time-series data, partitions and patterns must be identified to effectively differentiate observations based on temporal attributes.

To address this, we propose two novel approaches, which both leverage linear temporal logic to provide structure to unstructured discrete time-series data by identifying and contrastively explaining the differences between an unspecified quantity of discrete time-series observations.

Our first proposed approach discovers a feature set of relevant temporal specifications to represent observations in vector-space, clusters data points via traditional clustering algorithms, and delineates clusters via conjunction of linear temporal logic features. Within reasonable search limits, we discover a near-perfect success rate for accurate and complete cluster definitions found by our algorithm for six simulated evaluation domains of three unique vocabulary sizes.

Our second proposed approach embraces a tree-based perspective to organize observations into clusters. By employing a Monte Carlo node-splitting approach, our algorithm seeks balance to contrastively divide any given set of discrete time-series observations into two sets with an accompanying temporal logic specification satisfying one of the sets. Recursively applying this procedure, we demonstrate the effectiveness of our approach to cluster and delineate discrete time-series observations, allowing temporal logic specifications to evoke insight at each level of the resulting tree.

Publication
MSc Thesis
Date
Links
PDF