Authors: Benjamin Luke & Stephanie Gamble
Savannah River National Laboratory
Contact stephanie.gamble@srnl.doe.gov
The goal of MethodOpt is to employ a sophisticated multi-objective, multivariate method optimization technique in the convenient context of a graphical user interface (GUI). It offers experimental design, plotting, and analysis tools to lead the user through the entire method optimization procedure.
The most convenient way to install the development version of MethodOpt from CRAN is by calling
{install.packages("MethodOpt")}
in RStudio. (Note we assume the user has RStudio installed on their hardware.)
In this vignette we will cover a comprehensive example demonstrating the basic procedure when using MethodOpt. We will use data performed in gas-chromatography mass-spectrometry (GCMS) method optimization as our test case.
After installing MethodOpt, the user can initiate the GUI by calling
{MethodOpt::MethodOpt()}
The Shiny session will open the GUI screen to the fractional factorial experimental design (FFD) tab.
The first step in the method optimization process is to generate a fractional factorial experimental design (FFD). This allows a screening process to be performed in order to identify parameters that significantly impact the desired objectives (more on this below). The fractional factorial design also allows for a substantially reduced number of screening methods to be performed compared to a full factorial design.
From the FFD tab screen, the user should input all parameters that are experimentally relevant for the optimization. In the case of our GCMS test case, we input split ratio, column flow, inlet temperature, injection volume, auxiliary line temperature, oven ramp rate, and film thickness. We also input each parameter’s corresponding high and low values. That is, a range of values where we predict the true optimal value will lie. This range should be an informed estimate, which an experienced scientist can usually make. Pressing “Generate FFD” will render the FFD.
Note a mathematical and experimental version is available under separate subtabs. The mathematical table is a more traditional FFD with “+1” and “-1”, indicating highs and lows. The experimental version contains actual high and low data as indicated by the user. We expect this table to be much more useful for the practicing scientist.
As can also be seen, input values are also tabulated in the sidebar. To facilitate a more convenient experience, any values in the table can be deleted or modified within the program (so as to avoid having to quit MethodOpt and starting all over). Simply double click any value in the table to modify it. The change will be internally recorded, but a new table will have to be rendered for the table to reflect the change. A row of data can also be entirely deleted by highlighting that row and clicking “Delete Row.”
Once the table is finalized by the user, the table can be downloaded in a Comma Separated Value (CSV) file by selecting “Download FFD Experimental.” This should make it convenient for the experimentalist to print the design and use it in a convenient way with one’s equipment. In any case, the table should be downloaded, as it will be uploaded to the program at a later time during the analysis of the screening data.
An analysis of variance (ANOVA) test should be carried out following
the completion of the screening experiments. To do this, the user should
open MethodOpt again by MethodOpt::MethodOpt()
and navigate
to the “Data Analysis” tab.
At the top of the analysis tab there is a field to upload data. Selecting “Browse…” will open an upload handler. All of the screening data should be uploaded by simultaneously selecting multiple files. For traditional spectra data, this data will be intensity versus time (that is, time on the x-axis and intensity on the y-axis). The format of the raw data is important. Only CSV files are accepted; a non-CSV file will be rejected. The program is trained to read the first column of the file as the independent variable (i.e., x values) and the second column as the dependent variable, and this must also be considered in data preparation. There is, however freedom to delay the start of the data in the CSV file by an arbitrary number of rows. See the image of the example data below.
Uploaded data will be tabulated in the order it was input. This order should reflect the order in which the experiments were done according to the FFD. If it does not, the files should be either reorganized before input, or they can be reorganized in the table by clicking and dragging a file’s index to its appropriate location. It is recommended, however, to upload the data in the proper order from the beginning.
There is a series of subtabs within the Data Analysis tab. These are “Plot,” “Identify Peaks,” “Objectives,” “ANOVA,” and “Optimization.” We will consider each in turn.
The Plot tab is primarily a visual tool. By selecting (highlighting) a file from the table, its plot is automatically rendered. If the importance of data organization was not obvious before, it will be now. The plot will display the first column of the selected file’s data as the x-axis and the second column as the y-axis. There is also a sidebar panel that contains some data manipulation tools. These include options to skip rows (skip the first n rows of the files being read; this must be used if there are non-data rows at the start of data files, as described above), change x- and y-axis labels, and manipulate the data by applying a logarithmic y-axis scale, subtracting the (calculated) baseline from the original data, and overlaying the baseline in red.
Any of the plots may be downloaded as a PNG image. Each plot can be downloaded individually as needed by selecting “Download Plot,” or all of the plots can be downloaded in a compressed file by selecting “Download All.” If the user selects “Download All,” every plot will be downloaded according to the settings applied in the sidebar.
The only input in the Plot tab that the user must consider is the row skip. This will be considered in the rest of the analyses in the program. Otherwise interaction with the Plot tab is optional (though probably useful).
The peaks of the input data must be properly identified in order to calculate the objectives in the next section. There are two ways to do this. One way is by means of a peak finding algorithm built into MethodOpt. It will identify n peaks, where n is an integer input by the user. Selecting any file will render the file’s plot with the peaks identified by red dots, and a table listing the peaks with corresponding times. The peak finding algorithm is not perfect, however, and will sometimes (perhaps frequently) misidentify correct peaks. This will happen more often if the analytes are not easily distinguishable from the rest of the signal. Because of this limitation, there is the option to select from the table any misidentified peaks, delete them, and advance to the next viable peak found by the algorithm. However, there is a limited number of alternates, and it is possible that the proper peaks may still not be identified by the time the list of “backups” is exhausted, though this is hopefully unlikely if the spectra are clean.
The (strongly) preferred method of peak identification is by uploading a retention time file. This will streamline the process significantly compared to the first method. As with other uploaded files to MethodOpt, there are formatting requirements to be able to properly read the retention times. The retention time file must also be CSV; the first row must list the analyte names, and the first column must list the method name (that is, some identifier to specify which set of parameters was used (formally, the “method”) in the screening process to generate its data). See the image of the example data.
Retention time data will be uploaded by selecting “Browse…” in the sidebar and selecting the appropriate file. Once the upload is complete (the program will flag the file should the format be incorrect in an obvious way), one can select “SEARCH” and the peaks will be identified on each data set. Each plot can be rendered with peaks marked in red by selecting its file from the table.
After peaks are identified, the proper objectives should be selected. There are several objective options built into MethodOpt. Any number may be selected simultaneously, with one caveat. Conflicting objectives—namely, minimize peak width and maximize peak area—may not be selected concurrently. Selecting objectives in the sidebar will automatically render their values.
Maximizing the retention time separation requires an additional input—specifying which retention times to separate. The user should be sure to select two adjacent analytes from the corresponding drop down box. Up to three pairs of retention times may be separated.
The analysis of variance (ANOVA) test is the next step in the procedure following the selection of the objectives, accomplished under the ANOVA subtab. The user is required to upload the FFD used in the screening experiments. This should ideally be identical to the FFD that the user downloaded from MethodOpt in the first stage of the process. (Here again, the format of the uploaded file is very important.)
The user should select an alpha value, which indicates how confidently the significant parameters should be identified. Pressing “Run ANOVA” will calculate the significance status of each parameter for each objective.
The third phase of the process is to create a three level Box-Behnken experimental design (BBD). This is done in a way very similar to creating the FFD. The significant parameters identified from the ANOVA test will be input along with their low, middle, and high values. The low, middle, and high values represent an informed range where the true optimum will be, much like the low and high values from the FFD. When all of the parameters are input with their corresponding low, middle, and high values, the BBD can be generated by selecting “Generate BBD.”
Parameters and their numeric values can also be modified by double clicking their entries in the table where they are listed. They can also be deleted entirely by highlighting the parameter’s row and selecting “Delete Row.” The BBD must be regenerated to reflect these changes, however.
This table should be downloaded like the FFD was, as it will be uploaded in the next stage. Experiments should be run according to each method’s instructions and data should be saved in the same format as was required for the above ANOVA section.
When the data has been collected from the BBD experiments, the user is ready to proceed to the last part of the optimization procedure—the optimization itself. This is done by navigating back to the “Data Analysis” tab. A substantial amount of the procedure for the optimization overlaps with what has already been said in the Analysis of Variance sections above, so to avoid repetition, this section will be abbreviated with references to what has already been written.
The user begins by uploading the raw data from the BBD experiments. This will mimic the procedure described in the Plot section above. Then the user will proceed to identify the proper peaks. This can be done by using the peak search algorithm in MethodOpt or by the preferred method of uploading a properly formatted retention time CSV file. This also copies the description in the Identify Peaks section. The user will then select objectives just as described earlier.
Now the procedure shifts gears from that described in the Analysis of Variance section. Rather than using the prepared information from the Plot, Identify Peaks, and Objectives tabs to perform an ANOVA test, the user will skip the ANOVA subtab and use the Optimization tab.
The Optimization tab has a side bar with a couple fields. The first step is to upload the BBD file that was generated in the earlier section. Of course, one doesn’t have to submit exactly the BBD that was generated by MethodOpt, but the format must be identical, so using the one generated by MethodOpt makes things easy. Error messages will be thrown if the format is erroneous.
Also in the sidebar is a physical limit input. What this means is that any of the parameters may have physical ranges of validity, either because of instrument/machine limits or real-world physical limitations. These ranges must be specified in the optimization process because the unbounded solution may fall outside of these ranges if not otherwise considered. Any number of parameters can be selected from the input box. (The parameters are programatically obtained by reading the BBD.) The default upper and lower boundaries (i.e., the ranges of validity) are determined by the lows and highs from the BBD. These can be edited by double clicking their values and adjusting.
Once the BBD is uploaded and all necessary physical limits are input, the parameters can be optimized by selecting “Optimize.” If no physical limits are input, then only an unbounded solution will render. If limits are input, then both an unbounded and a bounded solution will render. Additionally, the objective prediction at the optimum value will be displayed (this is for the bounded solution, if available—otherwise it is for the unbounded solution).