Share

Follow

SMAQ AI Fabric: Correlating Across Data Streams.


Artificial intelligence (AI) and Machine learning (ML) get a great deal of attention, and indeed the importance of these technologies to every industry can’t be overstated. However, our experience suggests that many leaders not only misunderstand how ML works but also how they will adopt ML within their organisation.


This misunderstanding is compounded by the fact that the change management process around ML adoption is significantly different from that of traditional software development. Laying this foundation for effective incorporation of AI/ML is central to SMAQstacks value proposition.


Experience has shown us that instantiating a common fabric uniting both data at rest and data in motion is crucial to unlocking the value of distinct and potentially out-of-phase event-oriented data, for operational personnel, without the need for a team of data wranglers, scientists, engineers, managers, etc.


This post will demonstrate how Bolster solves this challenge through the SMAQ dynamic foundation concept and how it can generate lift for your organisation as the prevalence of AI in business grows.


Suppose I am a desk officer for the Royal Thai Police tasked with allocating police assets across the Bangkok metropolitan area. Traditionally I might rely on voice reports over radio and open-source news reports to localise trouble areas. In recent years the proliferation of mobile data entry capabilities has lent itself to map-based common operating pictures or “COPs”, based on a mix of historical spatial data and user-entered data (mobile data entry forms), which have improved situational awareness of the area of interest. However, as the ubiquity of streaming data and historical overlays of various formats has increased, it has become increasingly challenging to maintain context across these data sets, let alone infer interactions between them.


Despite being a significant improvement over traditional methods, at some point, the 2-dimensional mix of data of various ages, sources, etc., evolves into an overly complex ball of wool, with diminishing utility. Since these systems often only portray the most recent data, the definition of which is highly variable across data sources, data quickly gets “out of phase,” thereby reducing its integrated value. This is where a common data model allows us to significantly reduce the cognitive load on the user by using ML-based forecasting and anomaly detection techniques to only present derived, high signal data rather than each raw data point. 


To illustrate how SMAQ allows us to reduce and refine this data, let’s now assume that I have a SMAQstack containing data from the Armed Conflict Location Event Data Project (ACLED) updated monthly, traffic congestion data updated every 5 minutes, and weather data updated every 10 minutes. Typical ML applications in this use case might include AI working within individual data streams to do things like; predict traffic backups based on historical patterns and/or indicators of impending traffic jams through forecasting. A more sophisticated application of ML might account for the effects of inclement weather on traffic flow. Still, even this relatively simple 2-dimensional ML model (forecasting traffic backups based on current traffic conditions and current/forecasted weather conditions) is a significant undertaking when architected from scratch. 


Integrating these high-frequency streaming datasets would probably require a modern cloud infrastructure to capture data, train models, and carry out high-volume inference in near real-time. Indeed this topic continues to be the subject of many academic papers but has largely failed to transition into mainstream business use. This is largely due to the bespoke nature of this type of integration and the complexity of instrumenting it within a productised environment. 


Now consider the impact of adding batch-oriented (as opposed to streaming) data like ACLED or crime data, which is generally > 1 month old, as well as other batch/streaming datasets like AISscanner-based logistics data, and you have a real challenge on your hands.  This is where SMAQstacks opinionated data model and big data infrastructure becomes an essential enabler for n-dimensional ML applications for forecasting and anomaly detection.


Since SMAQ data is already normalised, we can nominate which datasets we are interested in correlating and create new composite inference streams. For example, we might create a stream that infers the probability of an ongoing protest in Bangkok CBD based on current weather conditions and traffic patterns, based on how these factors have correlated with historical protest activity. In ML terms, things like logistics flow, AIS, crime data, weather, and traffic would serve as the model’s ‘features’, and the presence of a protest event in that area would serve as the model’s ‘label’ for training purposes.


Through this relatively straightforward process of integrating datasets into a predictive model, we would stand a good chance of detecting likely protest events in Bangkok before they are even reported, and take action to allocate increased law enforcement assets to the area, and/or reroute delivery drivers for a logistics use case (e.g. clear weather, traffic backups, slow delivery times, high crime area, shipping disruptions ⇒ increased probability of protest activity). 


This example intends to demonstrate the importance of a common data fabric for allowing business users (other than data experts) to leverage the power of ML at scale and in real-time. This is where SMAQstack’s opinionated data model shines, and the compounding value of data for forecasting and risk management becomes truly apparent.


Share:

Follow Learn More
Share by: