Phase I and Phase II

library(shewhartr)
library(dplyr)

A control chart serves two purposes that are easy to confuse, and confusing them is the most common cause of misleading charts in practice. Woodall (2000) crystallised the distinction:

Most R packages collapse the two into a single function call, which has two consequences. First, every “monitoring” run silently re-estimates the limits, so a real shift that grows over time can update the limits along with itself and never alarm. Second, the diagnostic table for Phase II shouldn’t contain “this baseline itself violates the rules” entries — but it always does in tools that don’t separate the phases.

shewhartr keeps the two distinct in code through calibrate() and monitor().

A worked example

bottle_fill has 100 observations. Imagine the first 60 are our historical baseline, gathered while the process was thought to be in control, and the next 40 are new data we want to monitor.

baseline <- bottle_fill[1:60, ]
new_obs  <- bottle_fill[61:100, ]

Phase I: calibration

calib <- calibrate(
  baseline,
  value         = ml,
  index         = observation,
  chart         = "i_mr",
  trim_outliers = TRUE
)
calib
#> 
#> ── Shewhart chart I-MR (individuals & moving range) ────────────────────────────
#> • Observations / subgroups: 60
#> • Phase: "phase_1"
#> • Sigma estimate ("mr"): 1.288
#> 
#> 
#> ── Control limits ──
#> # A tibble: 6 × 3
#>   chart line   value
#>   <chr> <chr>  <dbl>
#> 1 I     CL    500.  
#> 2 I     UCL   504.  
#> 3 I     LCL   496.  
#> 4 MR    CL      1.45
#> 5 MR    UCL     4.75
#> 6 MR    LCL     0
#> ── Rule violations ──
#> 
#> ✔ No violations across 2 rules: "nelson_1_beyond_3s" and "nelson_2_nine_same".

trim_outliers = TRUE enables iterative trimming: any observation that violates the rules is removed and the limits are recomputed. The procedure is described in Montgomery (2019), Section 6.2.3.

calib$phase
#> [1] "phase_1"
calib$n          # potentially less than nrow(baseline) if trimming dropped points
#> [1] 60
broom::tidy(calib)
#> # A tibble: 6 × 3
#>   chart line   value
#>   <chr> <chr>  <dbl>
#> 1 I     CL    500.  
#> 2 I     UCL   504.  
#> 3 I     LCL   496.  
#> 4 MR    CL      1.45
#> 5 MR    UCL     4.75
#> 6 MR    LCL     0

Phase II: monitoring

alarms <- monitor(new_obs, calib)
alarms$phase
#> [1] "phase_2"
alarms$violations
#> # A tibble: 0 × 5
#> # ℹ 5 variables: position <int>, rule <chr>, description <chr>, value <dbl>,
#> #   severity <chr>

monitor() does not re-estimate the limits. It propagates the calibrated limits to the new data, applies the same rule set, and returns a fresh chart object whose phase slot is "phase_2".

You can plot the monitored series exactly the same way:

autoplot(alarms)

Why the trim step matters

Suppose your baseline contains a single contaminated observation — say, the time the operator forgot to recalibrate the scale. If you include it, the moving range will be inflated, sigma will be overestimated, and future alarms will be too lenient. The trim step iteratively removes such contamination from the calibration data.

You can disable trimming if you don’t trust automatic outlier removal:

calib_no_trim <- calibrate(baseline, value = ml,
                           chart = "i_mr",
                           trim_outliers = FALSE)
calib_no_trim$sigma_hat
#> [1] 1.288322
calib$sigma_hat       # potentially smaller after trimming
#> [1] 1.288322

The general principle (Tukey 1977): an analyst should look at the data before trusting any automated calibration. Use shewhart_diagnostics() and the runs-violation table to interrogate your baseline before declaring it “in control”.

When this matters most

The Phase I / Phase II distinction is most consequential when:

References