Individual Olink® NPX datasets are normalized using either plate control normalization or intensity normalization methods. Plate control normalization is generally used for single plate projects, while intensity normalization is used for multi-plate projects. Additionally, intensity normalization method assumes that all samples within a project are fully randomized.
In the case where all samples within a project are not fully randomized, or when a study is separated into separate batches, an additional normalization step is needed to allow the data to be comparable, since NPX is a relative measurement. The joint analysis of two or more Olink® NPX datasets often requires an additional batch correction step to remove technical variations, which is referred to as bridging.
Bridging is also needed if Olink® NPX datasets are:
plate control normalized only and run conditions (e.g lab and reagent lots) have changes.
intensity normalized but from two different sample populations.
To bridge two or more Olink® NPX datasets, bridging samples are needed to calculate the assay-specific adjustment factors between datasets. Bridging samples are shared samples among datasets - that is that samples that are analyzed in both datasets. The recommended number of bridging samples are shown in the table below. Olink® NPX datasets without shared samples should not be combined using the bridging approach described below.
Platform | BridgingSamples |
---|---|
Target 96 | 8-16 |
Explore 384 Cardiometabolic, Inflammation, Neurology, and Oncology | 8-16 |
Explore 384 Cardiometabolic II, Inflammation II, Neurology II, and Oncology II | 16-24 |
The following tutorial is designed to give you an overview of the kinds of data combining methods that are possible using the Olink® bridging procedure. Before starting bridging, it is important to check if the same sample IDs were assigned to the bridging samples.
Prior to running the second study, bridging samples must be selected
from the reference study and be added to the second study. These samples
can be selected using the olink_bridgeselector()
function
in Olink Analyze. The bridge selection function will select a number of
bridge samples based on the reference data. This function selects
samples which passes QC and have high detectability. To select samples
across the range of the data, the samples are ordered by mean NPX value
and selected across this range. When running the selector, Olink
recommends starting at sampleMissingFreq = 0.10 which represents a
maximum of 10% data below LOD per sample. If there are not enough
samples output, increase to 20%. For alternative matrices and specific
disease types, it may be needed to increase the sampleMissingFreq to
higher levels.
In this example we will demonstrate how to select 16 bridging samples
using npx_data1
which will act as the reference data.
SampleID | PercAssaysBelowLOD | MeanNPX |
---|---|---|
A20 | 0.05 | 6.54 |
A53 | 0.04 | 6.28 |
B36 | 0.07 | 6.20 |
B35 | 0.07 | 6.24 |
B20 | 0.06 | 6.46 |
B79 | 0.06 | 6.11 |
B55 | 0.08 | 6.26 |
B62 | 0.05 | 6.08 |
A59 | 0.03 | 6.07 |
B9 | 0.06 | 5.99 |
B13 | 0.05 | 6.32 |
A77 | 0.07 | 6.22 |
B65 | 0.05 | 6.39 |
B34 | 0.05 | 5.87 |
A47 | 0.07 | 6.15 |
A67 | 0.04 | 6.35 |
Bridging datasets are standard Olink® NPX tables. They can be loaded
using read_NPX()
function with default Olink Software NPX
file as input.
<- read_NPX("~/NPX_file1_location.xlsx")
data1 <- read_NPX("~/NPX_file2_location.xlsx") data2
To demonstrate how bridging works, we will use the example datasets
(npx_data1
and npx_data2
) from Olink
Analyze package. This workflow also uses functions from the
dplyr, stringr, and ggplot2 packages.
First, confirm that there are overlapping sample IDs within the study. It is important that the sample IDs are the same in both NPX files.
SampleID |
---|
A13 |
A29 |
A30 |
A36 |
A45 |
A46 |
A52 |
A63 |
A71 |
A73 |
B3 |
B4 |
B37 |
B45 |
B63 |
B75 |
Then, gain an overview of the datasets that are going to be bridged. For example, plot and compare NPX distribution between datasets. By having a sense of how the studies compared to each other before bridging, we can then determine the success of the bridging process post bridging.
# Load datasets
<- npx_data1 %>%
npx_1 mutate(dataset = "data1")
<- npx_data2 %>%
npx_2 mutate(dataset = "data2")
<- bind_rows(npx_1, npx_2)
npx_df
# Plot NPX density before bridging normalization
%>%
npx_df mutate(Panel = gsub("Olink ", "", Panel)) %>%
ggplot(aes(x = NPX, fill = dataset)) +
geom_density(alpha = 0.4) +
facet_grid(~Panel) +
olink_fill_discrete(coloroption = c("red", "darkblue")) +
set_plot_theme() +
ggtitle("Before bridging normalization: NPX distribution") +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.text = element_text(size = 16),
legend.title = element_blank(),
legend.position = "top")
Use a PCA plot to visualize sample-to-sample distance before bridging. Typically the project dataset accounts for most of the observed variation within the combined datasets at this point.
## before bridging
#### Extract bridging samples
<- data.frame(SampleID = intersect(npx_1$SampleID, npx_2$SampleID)) %>%
overlapping_samples filter(!str_detect(SampleID, "CONTROL_SAMPLE")) %>% #Remove control samples
pull(SampleID)
<- npx_data1 %>%
npx_before_br ::filter(!str_detect(SampleID, "CONTROL_SAMPLE")) %>% #Remove control samples
dplyr::mutate(Type = if_else(SampleID %in% overlapping_samples,
dplyrpaste0("20200001 Bridge"),
paste0("20200001 Sample"))) %>%
rbind({
%>%
npx_data2 filter(!str_detect(SampleID, "CONTROL_SAMPLE")) %>% #Remove control samples %>%
mutate(Type = if_else(SampleID %in% overlapping_samples,
paste0("20200002 Bridge"),
paste0("20200002 Sample"))) %>%
mutate(SampleID = if_else(SampleID %in% overlapping_samples,
paste0(SampleID, "_new"),
SampleID))
})
### PCA plot
::olink_pca_plot(df = npx_before_br,
OlinkAnalyzecolor_g = "Type",
byPanel = TRUE)
PCA plot of combined datasets before bridging
We can use olink_normalization_bridge()
function to
bridge two datasets. The bridging procedure is to first calculate
the median of the paired NPX differences per assay
between the bridging samples as adjustment factor then use these
adjustment factors to adjust NPX values between two datasets. In this
process, one dataset is considered the reference dataset
(df1
) and its NPX values remain unaltered. The other
dataset is considered the new dataset (df2
) and is adjusted
to the reference dataset based on the adjustment factors.
The output from olink_normalization_bridge()
function is
a NPX table with adjusted NPX value in the column NPX
.
olink_normalization_bridge()
is a wrapper for and
supersedes olink_normalization()
.
olink_normalization_bridge()
creates a new column
Project
to distinguish between reference dataset from the
other dataset. It is up to the user to define which dataset is the
reference dataset and specify the names of the bridge samples. The
resulting dataset will contain the reference dataset, which will be
identical to the input reference data, with adjustment factors of 0, and
the newly bridged dataset.
# Find shared samples
<- npx_data1 %>%
npx_1 mutate(dataset = "data1")
<- npx_data2 %>%
npx_2 mutate(dataset = "data2")
<-data.frame(SampleID = intersect(npx_1$SampleID, npx_2$SampleID)) %>%
overlap_samples filter(!str_detect(SampleID, "CONTROL_SAMPLE")) %>% #Remove control samples
pull(SampleID)
<- list("DF1" = overlap_samples,
overlap_samples_list "DF2" = overlap_samples)
# Perform Bridging normalization
<- olink_normalization_bridge(project_1_df = npx_1,
npx_br_data project_2_df = npx_2,
bridge_samples = overlap_samples_list,
project_1_name = "20200001",
project_2_name = "20200002",
project_ref_name = "20200001")
::glimpse(npx_br_data)
dplyr#> Rows: 61,824
#> Columns: 19
#> $ Index <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 1…
#> $ OlinkID <chr> "OID01216", "OID01216", "OID01216", "OID01216", "OID0121…
#> $ UniProt <chr> "O00533", "O00533", "O00533", "O00533", "O00533", "O0053…
#> $ Assay <chr> "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", "CHL1", …
#> $ MissingFreq <dbl> 0.01875, 0.01875, 0.01875, 0.01875, 0.01875, 0.01875, 0.…
#> $ Panel_Version <chr> "v.1201", "v.1201", "v.1201", "v.1201", "v.1201", "v.120…
#> $ PlateID <chr> "Example_Data_1_CAM.csv", "Example_Data_1_CAM.csv", "Exa…
#> $ QC_Warning <chr> "Pass", "Pass", "Pass", "Pass", "Pass", "Pass", "Pass", …
#> $ LOD <dbl> 2.368467, 2.368467, 2.368467, 2.368467, 2.368467, 2.3684…
#> $ NPX <dbl> 12.956143, 11.269477, 25.451070, 14.453038, 7.628712, 6.…
#> $ Subject <chr> "ID1", "ID1", "ID1", "ID2", "ID2", "ID2", "ID3", "ID3", …
#> $ Treatment <chr> "Untreated", "Untreated", "Untreated", "Untreated", "Unt…
#> $ Site <chr> "Site_D", "Site_D", "Site_D", "Site_C", "Site_C", "Site_…
#> $ Time <chr> "Baseline", "Week.6", "Week.12", "Baseline", "Week.6", "…
#> $ Project <chr> "20200001", "20200001", "20200001", "20200001", "2020000…
#> $ Panel <chr> "Olink Cardiometabolic", "Olink Cardiometabolic", "Olink…
#> $ dataset <chr> "data1", "data1", "data1", "data1", "data1", "data1", "d…
#> $ Adj_factor <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
#> $ SampleID <chr> "A1", "A2", "A3", "A4", "A5", "A6", "A7", "A8", "CONTROL…
olink_normalization_bridge()
also supports data where
the shared samples (bridge samples) are not named the same in both
projects. In this case the overlap_sample_list
will contain
2 arrays of equal length where the index of each entry corresponds to
the same sample. For example, if a sample had the SampleID of
Sample_1_Aliquot_1
in the first batch and
Sample_1_Aliquot_2
in the second batch, then the overlap
sample list should look as the following.
overlap_sample_list <-list("DF1" = c("A1", "A2", "A3", "Sample_1_Aliquot_1"),
"DF2" = c("A1", "A2", "A3", "Sample_1_Aliquot_2"))
First, check NPX distribution in datasets after bridging normalization.
# Plot NPX density after bridging normalization
%>%
npx_br_data mutate(Panel = gsub("Olink ", "", Panel)) %>%
::ggplot(ggplot2::aes(x = NPX, fill = dataset)) +
ggplot2::geom_density(alpha = 0.4) +
ggplot2::facet_grid(~Panel) +
ggplot2olink_fill_discrete(coloroption = c("red", "darkblue")) +
set_plot_theme() +
::ggtitle("After bridging normalization: NPX distribution") +
ggplot2theme(axis.title.x = element_blank(),
axis.title.y = element_blank(),
strip.text = element_text(size = 16),
legend.title = element_blank(),
legend.position = "top")
Then, summarize number of assays that have adjustment factors in certain ranges. High adjustment factors can result from variations between projects, such as panel versions or technical modifications. The cutoff of a deviating adjustment factor is subjective and depends on a variety of factors including the distribution of adjustment factors.
Such assays can be visualized individually with violin plots and may warrant further investigation to confirm they are still comparable between projects. For example, if a violin plot exhibits a different range or truncated distribution, this may suggest that the assay is below LOD or at hook in one of the data sets. However, as long as the bridge samples are not at hook or below LOD, this should not impact the bridging quality. For projects with differing clinical phenotypes, it is more informative to look at the similarities between the bridging samples than the similarity between the datasets.
The distribution of CHL1 between projects is visualized for demonstration purposes.
# Bridge sample data
<- npx_1 %>%
bridge_samples rbind(npx_2) %>%
filter(SampleID %in% overlapping_samples) %>%
filter(Assay == "CHL1") %>%
mutate(Assay_OID = paste(Assay, OlinkID, sep = "\n"))
# Generate violin plot for CHL1
%>%
npx_data1 mutate(Project = "20200001") %>%
bind_rows({
%>%
npx_data2 mutate(Project = "20200002")
%>%
}) filter(Assay == "CHL1") %>%
filter(!str_detect(SampleID, "CONTROL*.")) %>%
mutate(Assay_OID = paste(Assay, OlinkID, sep = "\n")) %>%
::ggplot(aes(Project, NPX)) +
ggplot2::geom_violin(aes(fill = Project)) +
ggplot2geom_point(data = bridge_samples, position = position_jitter(0.1)) +
theme(legend.position = "none") +
set_plot_theme() +
facet_wrap(. ~ Assay_OID, scales='free_y')
Another way to determine if bridging decreased variability between projects is to calculate the CV before and after bridging the projects. Note that the CV calculation formula differs for Target 96 and Explore projects.
<- function(npx, na.rm = F) {
explore_cv sqrt(exp((log(2) * sd(npx, na.rm = na.rm))^2) - 1)*100
}
<- function(NPX, na.rm = T) {
t96_cv 100*sd(2^NPX)/mean(2^NPX)
}
<- "Explore"
tech
<- npx_1 %>%
cv_before rbind(npx_2) %>%
filter(str_detect(SampleID,"CONTROL*.")) %>%
filter(NPX > LOD) %>%
group_by(OlinkID) %>%
mutate(CV = ifelse(tech=='Explore',explore_cv(NPX), t96_cv(NPX))) %>%
ungroup() %>%
distinct(OlinkID,CV)
<- npx_br_data %>%
cv_after filter(str_detect(SampleID, "CONTROL")) %>%
filter(NPX > LOD) %>%
group_by(OlinkID) %>%
mutate(CV = ifelse(tech=='Explore',explore_cv(NPX), t96_cv(NPX))) %>%
ungroup() %>%
distinct(OlinkID,CV)
%>%
cv_before mutate(Analysis = "Before") %>%
rbind((cv_after %>%
mutate(Analysis = "After"))) %>%
::ggplot(ggplot2::aes(x = CV, fill = Analysis)) +
ggplot2::geom_density(alpha = 0.7) +
ggplot2set_plot_theme() +
olink_fill_discrete()+
::theme(text = ggplot2::element_text(size = 20)) + ggplot2::xlim(-50,400) ggplot2
Finally, use PCA plot to check whether bridging normalization has effect in correcting batch effects. In the example below, it is clear that before bridging samples from data 1 and 2 are divided into separate clusters due to the batch effects, but after bridging they are shown as one cluster in the PCA plot. Bridging normalization has sufficiently removed the batch effects between two data sets.
## After bridging
### Generate unique SampleIDs
<- npx_br_data %>%
npx_after_br ::mutate(Type = ifelse(SampleID %in% overlapping_samples,
dplyrpaste(Project, "Bridge"),
paste(Project, "Sample"))) %>%
:::mutate(SampleID = paste0(Project, PlateID, SampleID))
dplyr
### PCA plot
::olink_pca_plot(df = npx_after_br,
OlinkAnalyzecolor_g = "Type",
byPanel = TRUE)
PCA plot of combined datasets after bridging
Normalized data can be exported using write.table to export long
format data. Note that 2 columns are added during the bridging process,
so to have the input format match the export format the Project and
Adj_factor columns will need to be removed. To export the new project,
dplyr::filter
can be used to filter by Project.
<- npx_br_data %>%
new_normalized_data ::filter(Project == "20200002") %>%
dplyr::select(-Project, -Adj_factor) %>%
dplyrwrite.table(, file = "New_Normalized_NPX_data.csv", sep = ";")
We are always happy to help. Email us with any questions:
biostat@olink.com for statistical services and general stats questions
biostattools@olink.com for Olink Analyze and Shiny app support
support@olink.com for Olink lab product and technical support
info@olink.com for more information