A Primer for BI and IoT/IIoT and Edge Analytics - Part 2: Process Monitoring for Steady States: QC Charting - Spotfire

This is part 2 of a 2-part article providing a short and concise overview of practical considerations and methods commonly applied to extract actionable information from continuous or streaming process data. This is an area of analytics that is rapidly becoming increasingly important as streaming IoT (Internet of Things) technologies are transforming, disrupting, or creating new businesses across virtually all industries.

A Primer for BI and IoT/IIoT and Edge Analytics - Part 2: Process Monitoring for Steady States: QC Charting

Thomas Hill, PhD - Senior Director Analytics, TIBCO Software

Overview

It is easy to forget that the problem of how to derive actionable insight from streaming continuous data is actually an "old problem:" Walter A. Shewhart is generally considered the "inventor" of modern statistical process control charting (e.g., see Shewhart, 1945; Montgomery, 2012; see also Lewicki, Hill, Qazaz, 2007, for multivariate and model-based SPC). Interestingly, the business requirements behind the invention of statistical process control originated with early IoT technology of sorts; Shewhart worked at Bell Labs in the 1920's and was concerned about the reliability the emerging telephone transmission system.

This article is the second in a series. For Part 1, see A Primer for BI and IoT/IIoT and Edge Analytics - Part 1: Connecting to Data

Monitoring Steady States

A very common and useful statistical model for monitoring individual data streams is to assume a constant average or mean (median) value over time, with constant variability due to uncontrollable or noise factors. If a process is stable and working as desired (is of acceptable quality), then a control chart can be constructed based on the estimate of the acceptable noise variability. The control chart will consist of a Center Line that indicates the expected constant average (mean, media, etc.) value for a particular parameter, and control limits around that. When the observed values fall outside the control limits the original steady-state model no longer fits the observed data: The process is "out-of-control".

Figure 1: Simple Control Chart in TIBCO Spotfire^®

Model-Based Monitoring. In general terms, control charting is 'model-based' process monitoring, where the 'model' usually describes a stationary steady-state process with constant variability over time. Variations of this model might include expected step-changes, either when the mean changes or the variability changes. In Figure 1 this is shown for the second chart with stepped control limits. Such step changes might be due to differences in recipes in a continuous process, or they might be due to naturally occurring changes over a day, or week (e.g., demand for electricity, number of visits to a web store, and so on).

Deviations-from-models can occur for two reasons: Either the mean of a process shifts, drifts, or simply steps out of the control limits, or the variability of the process shifts, drifts, or steps out of control limits. Either way, the conclusion is that the model no longer fits, and the process is no longer a steady state. As will be shown later, this reasoning applies equally to simple univariate control charts as well as multivariate control charts for steady states, or based on other (e.g., batch-process) models.

Note that general model-based process monitoring will be discussed in greater detail in subsequent tutorials. In that context, univariate and multivariate control charting as discussed here is equally important: If a standard prediction model exists of what the in-control process looks like, then monitoring the prediction residuals for that model provides a powerful method for (prediction-) monitoring model adequacy, and hence the consistency of the process with the in-control-process.

Modern SPC Charting

Today, Statistical Process Control methods are the foundation of virtually all process control in manufacturing and elsewhere. They are mandated for all pharmaceutical and medical device manufacturing by the FDA and equivalent international regulatory agencies (e.g., the International Conference on Harmonization, ICH), required by automotive manufacturers (see AIAG, 2005), and an integral part of all automated high-tech manufacturing. A summary of methods can be found for example in Montgomery (2012; see also Lewicki, Hill, Qazaz, 2007, for multivariate control methods). In addition to the simple X-bar (mean) chart described earlier, charts can and usually are constructed for monitoring the process noise variability as well. Specialized charts following the same statistical control strategy can also be derived for discrete events, proportions, or rare event counts (e.g., failures). Other charts have been proposed that are particularly sensitive to slow process shift and drift (Exponentially Weighted Moving Average Chart - EWMA, Cumulative Sum Chart - CUSUM), or have been adapted and tuned for data streams with multiple short-run-states (e.g., production runs) characterized by different average values or variability. Multiple data streams that are expected to be identical can be monitored via so-called Multiple Stream Charts. In addition, specific statistical test procedures are commonly used to detect small process shift, simple repeated point patterns, and similar anomalies (the Western Electric Runs Rules/Tests; see Montgomery, 2012).

More information on these charts can also be found at http://documentation.statsoft.com/STATISTICAHelp.aspx?path=Statistics/Indices/QualityControlChartsSTAT_HIndex (univariate control charts) and at http://documentation.statsoft.com/STATISTICAHelp.aspx?path=MQC/MQC/Overviews/CommonTypesofMultivariateControlCharts (multivariate control charts).

Apply Statistical Process Control First to Predict Reliability and Expected Maintenance!

Even though, statistical control charting may sometimes appear "too simple", these techniques provide a very powerful approach to predictive maintenance or quality control applications. Modern complex automated manufacturing systems today may generate thousands and in some cases millions of data streams. Most of the individual data streams may never show any variability or interesting patterns and are therefore mostly ignored - until they do change or show increasing instability and variability. When that happens and is detected early through uni/multivariate SPC charting and analyses, process owners know that "something" in the process has changed, or is drifting. Engineering analyses can then be initiated to determine if that change is expected or unexpected, and how it may eventually affect important process KPIs.

Multivariate and Simple Model-Based SPC

Univariate SPC methods effectively monitor processes for a single-stream, steady-state process that can be characterized by a constant value (average, median) and some acceptable variability. Multivariate methods extend this approach to multiple-parameter steady-states or processes that follow specific patterns, with some acceptable variability and covariances. Figure 2 shows a simple example.

Figure 2: Outliers in Multivariate Space

These data are similar to real use cases where multiple test stations were used for final product testing in an automated continuous production process. In this Spotfire^® dashboard, the two parameters of interest - Measure_Y1 and Measure_Y2 - are plotted against the X and Y axes, respectively. The lower and upper control limits for each dimension (Y1, Y2) are indicated by straight lines in the plot.

Multivariate SPC

Reviewing the graph in Figure 2, it is quite apparent that one of the test stations has "drifted-away" from the other test stations: While most points in the scatterplot show a homogeneous correlation between measurements, the points highlighted as Outliers in multivariate space are entirely separated from the other points. When looking at each dimension separately (i.e., with respect to the Y1/Y2 control limits), those multivariate outliers are in-control and would not be classified as outliers; however, they are clearly "unusual" when compared to the normal points in the bivariate space. The example Workspace discussed later demonstrates how to create univariate and multivariate charts for these data.

The general power of multivariate and model-based SPC is that it allows for the identification of shift, drift, and emerging quality problems before they can be identified by reviewing individual data streams one by one.

There are many multivariate process control charting methods that have been developed which all can be thought of as multivariate generalizations of the univariate single-stream chart. The Hotelling T2 chart - a generalization of the standard X-bar chart for sample averages - is one of the most popular charts. Specialized charts for detection of multivariate drift or trends have also be developed (Multivariate Cumulative Sum [CUSUM] Charts; MCUSUM, see Healy, 1987; Multivariate Exponentially Weighted Moving Average Chart; MEWMA, see Lowry, Woodall, Champ, Rigdon, 1992). Many of these also provide various auxiliary statistics to aid root-cause analyses if outliers or trends are detected.

Multivariate CUSUM Chart. As mentioned earlier, the group of multivariate outliers shown in Figure 2 would not be detected when using default univariate SPC charting. The Hotelling T2 chart will identify the group of outliers, but the Multivariate CUSUM chart for these data will generate a very strong signal, illustrating the power and sensitivity of such methods for identifying shift and drift in multivariate continuous process streams.

Figure 3: Multivariate CUSUM Chart, Detecting a Strong Signal

The Workspace attached to the example described below will compute univariate and multivariate control charts for these data.

Spotfire Resources

Spotfire Statistica Enterprise (https://www.spotfire.com/products/data-science). A mature platform for batch-real-time (pull/poll) applications and use cases; supports enterprise wide-deployment with model management of data-prep and analytic workflows; natively supports large number of statistical and machine learning algorithms, as well as open-source scripting languages and environments including Python, R, Scala (Spark), C#

More information about:

univariate control charting can also be found at http://documentation.statsoft.com/STATISTICAHelp.aspx?path=Statistics/Indices/QualityControlChartsSTAT_HIndex;
multivariate control charts are further discussed here http://documentation.statsoft.com/STATISTICAHelp.aspx?path=MQC/MQC/Overviews/CommonTypesofMultivariateControlCharts.

Spotfire^® (https://www.spotfire.com/products/visual-analytics). Spotfire^® is a general visual analytics platform for enterprise-wide analytic BI; Spotfire includes a proprietary version of high-performance R, can interface with various open-source libraries and languages including R and Python, and can connect to virtually any data source, including real-time data sources

Spotfire Streaming (https://www.spotfire.com/products/streaming-analytics). A mature platform for real-time processing of streaming data and events; implements push-architecture; supports large number of connectors to virtually all common streaming data sources and data historians, capable of very high-volume data loads; can implement complex rules logic, prediction models, R-based prediction models and computations

Use Case: Identifying Multivariate Outliers

To illustrate the points discussed here, a Workspace SimpleExampleHotellingT2Chart.sdm is available for download below. The dataset describes two measurements taken from multiple test stations; the specific parameter names are Measure_Y1 and Measure_Y2.

The workspace contains analytic nodes that will create two univariate Individuals-and-Moving-Range charts. While those charts show a few outliers, those would be expected given that the input file has 1,000 observations (note that the default 3-Sigma limits are used, so a few false-positive alarms can be expected).

The Workspace will also generate a default Hotelling T2 chart, which clearly shows a group of observations that are out-of-control. A categorized scatterplot is also generated where those outliers are identified as belonging to Test Station: FXZ_15.

The Multivariate CUSUM Chart also computed in this Workspace illustrates the very strong signal associated with the errant test station, and demonstrates the sensitivity of this method for identifying shift and drift in multivariate space.

References

AIAG (2005). Statistical Process Control. Automotive Industry Action Group AIAG.
Healy, J. D. (1987). A note on multivariate CUSUM procedures. Technometrics, 29, 4, 409-412.
Lewicki, P., Hill, T., Qazaz, C. (2007). Multivariate quality control. Quality Magazine, April, 2007.
Lowry, C. A., Woodall, W. H. , Champ, C. W., and Rigdon, S. E. (1992). A multivariate exponentially weighted moving average control chart. Technometrics, 34, 46-53.
Montgomery, D. C. (2012). Statistical Quality Control. Wiley
Shewhart, W. A. (1945). Statistical Method from the Viewpoint of Quality Control. University of Washington.

Figures

Figure 1: Simple Chart in Spotfire

This a typical control chart created with Spotfire for a continuous process. A template for creating such control charts with Spotfire can be downloaded from the Spotfire Community Exchange: Quality Control Charts template for Spotfire

Figure 2: Outliers in Multivariate Space

When a process is continuously monitored via multiple sensors, then typically the measurements for different sensors are correlated. It is often much easier to identify outliers in multivariate space, compared to univariate one-variable-at-a-time control charting. In this data set multiple test stations measure two outcomes Y1 and Y2. One of the test stations clearly produces a distinct cluster of Outliers in multivariate space, even though none of those points would be flagged as outliers when only looking at each measurement (Y1, Y2) separately.

Figure 3: Multivariate CUSUM Chart, Detecting a Strong Signal

This chart illustrates a default Multivariate Cumulative Sum (CUSUM) Chart, for the data shown in Figure 2. Note that the multivariate outliers depicted in Figure 2 cannot be identified via standard univariate (Individuals-and-Moving-Average) control charting. However, the Multivariate CUSUM chart is very sensitive to small shifts and drifts of consecutive observations in multivariate space, and provides a powerful method for monitoring steady states (or prediction residuals), for example, in order to detect emerging problems and required maintenance early enough before expensive failures occur.

Attachments

primer_for_bi_iot_part2_examples.zip

Sign In

A Primer for BI and IoT/IIoT and Edge Analytics - Part 2: Process Monitoring for Steady States: QC Charting

A Primer for BI and IoT/IIoT and Edge Analytics - Part 2: Process Monitoring for Steady States: QC Charting

Overview

Monitoring Steady States

Modern SPC Charting

Apply Statistical Process Control First to Predict Reliability and Expected Maintenance!

Multivariate and Simple Model-Based SPC

Multivariate SPC

Spotfire Resources

Use Case: Identifying Multivariate Outliers

References

Figures

Figure 1: Simple Chart in Spotfire

Figure 2: Outliers in Multivariate Space

Figure 3: Multivariate CUSUM Chart, Detecting a Strong Signal

Attachments

Table of contents

User Feedback

Recommended Comments

Industries