---
title: "Variables charts: I-MR, Xbar-R, Xbar-S"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Variables charts: I-MR, Xbar-R, Xbar-S}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse  = TRUE,
  comment   = "#>",
  fig.width = 7,
  fig.height = 4.2
)
```

```{r setup, message = FALSE}
library(shewhartr)
library(dplyr)
library(ggplot2)
```

This vignette covers the three classical Shewhart charts for
continuous measurements (variables): I-MR for individual observations,
Xbar-R for small rational subgroups, and Xbar-S for larger or
unequal-sized subgroups. The choice between them depends entirely on
the structure of your data.

## When to use which

| Situation                                              | Chart                |
|--------------------------------------------------------|----------------------|
| Single measurements, no rational subgroup              | `shewhart_i_mr()`   |
| Subgroups of size 2-10, equal sizes                    | `shewhart_xbar_r()` |
| Subgroups of size > 10, or unequal sizes               | `shewhart_xbar_s()` |

The reason for the cutoff at 10 is that for small subgroup sizes the
range estimator of sigma is more efficient (in the Fisher-information
sense) than the standard deviation estimator; from about $n = 10$
onwards the standard deviation overtakes (Montgomery 2019, Section
6.4.1).

## I-MR

`bottle_fill` is a 100-point series of fill volumes. The default
estimator of sigma is the moving range:

$$
\hat{\sigma} = \frac{\overline{\mathrm{MR}}}{d_2(2)} = \frac{\overline{\mathrm{MR}}}{1.128}.
$$

```{r}
fit <- shewhart_i_mr(bottle_fill, value = ml, index = observation)
broom::glance(fit)
```

```{r, eval = FALSE}
autoplot(fit)
```

### Robust alternatives

The mean of the moving range is sensitive to outliers. Three robust
alternatives are built in:

```{r}
fit_med <- shewhart_i_mr(bottle_fill, value = ml,
                         sigma_method = "median_mr")
fit_bw  <- shewhart_i_mr(bottle_fill, value = ml,
                         sigma_method = "biweight")
fit_sd  <- shewhart_i_mr(bottle_fill, value = ml,
                         sigma_method = "sd")

dplyr::bind_rows(
  broom::glance(fit)     |> mutate(method = "mr (default)"),
  broom::glance(fit_med) |> mutate(method = "median_mr"),
  broom::glance(fit_bw)  |> mutate(method = "biweight"),
  broom::glance(fit_sd)  |> mutate(method = "sd")
) |>
  select(method, sigma_hat, n_violations)
```

The biweight is Tukey's M-estimator with a redescending influence
function (Tukey 1977; Mosteller & Tukey 1977). Use it when you
suspect a small fraction of contaminated observations and don't want
them to inflate sigma.

## Xbar-R

`tablet_weight` records 25 batches of 5 tablets each.

```{r}
fit_xbar <- shewhart_xbar_r(tablet_weight,
                            value    = weight,
                            subgroup = subgroup)
broom::tidy(fit_xbar)
```

The Xbar limits use the constant $A_2(5) = 0.577$; the R limits use
$D_3(5) = 0$ and $D_4(5) = 2.114$. These constants come from
Montgomery (2019), Appendix VI, and are exposed via
`shewhart_constants()`:

```{r}
shewhart_constants(c(2, 5, 10, 25))
```

`shewhart_xbar_r()` insists on equal subgroup sizes (otherwise the
constants don't apply uniformly). For unequal sizes use `shewhart_xbar_s()`
with `sigma_method = "pooled_sd"`.

## Xbar-S

The S chart uses the bias-corrected sample standard deviation. Sigma
is estimated as

$$
\hat{\sigma} = \frac{\bar{S}}{c_4(n)}, \qquad c_4(n) = \sqrt{\frac{2}{n-1}} \cdot \frac{\Gamma(n/2)}{\Gamma((n-1)/2)}.
$$

```{r}
# Same data, but using S instead of R (preferred for n > 10)
n_per <- 5
df <- tablet_weight |>
  group_by(subgroup) |>
  filter(n() == n_per) |>     # in case of any incomplete batches
  ungroup()

fit_xbs <- shewhart_xbar_s(df, value = weight, subgroup = subgroup)
broom::glance(fit_xbs)
```

For variable subgroup sizes:

```{r, eval = FALSE}
fit_xbs_pooled <- shewhart_xbar_s(df, value = weight, subgroup = subgroup,
                                  sigma_method = "pooled_sd")
```

## Reading the results

Every variables chart returns a `shewhart_chart` object with the
following per-observation columns (`broom::augment()`):

* `.value` — the plotted statistic (individual, Xbar, etc.)
* `.center`, `.sigma`, `.upper`, `.lower` — control-line series
* For two-panel charts, additional `.mr*`, `.r*`, `.s*` columns for
  the lower panel
* `.flag_<rule>` and `.flag_any` — TRUE/FALSE for each runs rule
  configured

The `rules` argument controls which rules are flagged. By default,
Nelson 1 (point beyond 3 sigma) and Nelson 2 (9 points same side) are
checked, and we recommend keeping at least these two. Add more with:

```{r}
fit_full <- shewhart_i_mr(bottle_fill, value = ml,
                          rules = c("nelson_1_beyond_3s",
                                    "nelson_2_nine_same",
                                    "nelson_3_six_trend",
                                    "nelson_5_two_of_three",
                                    "nelson_6_four_of_five"))
fit_full$violations
```

Each additional rule increases the chart's sensitivity at the cost of
shorter in-control ARL. See `vignette("arl-simulation")` for how to
quantify this trade-off.

## References

* Montgomery, D. C. (2019). *Introduction to Statistical Quality
  Control* (8th ed.). Wiley. Chapter 6.
* Tukey, J. W. (1977). *Exploratory Data Analysis*. Addison-Wesley.
* Mosteller, F., & Tukey, J. W. (1977). *Data Analysis and
  Regression*. Addison-Wesley.
* Wheeler, D. J., & Chambers, D. S. (1992). *Understanding Statistical
  Process Control* (2nd ed.). SPC Press.
* Cryer, J. D., & Ryan, T. P. (1990). The Estimation of Sigma for an
  X Chart: $\bar{MR}/d_2$ or $S/c_4$? *Journal of Quality Technology*,
  22(3), 187-192.
