Help for package DataQualityDashboard

Type:

Package

Title:

Execute and View Data Quality Checks on OMOP CDM Database

Version:

2.8.6

Date:

2026-01-22

Author:

Katy Sadowski [aut, cre], Clair Blacketer [aut], Maxim Moinat [aut], Ajit Londhe [aut], Anthony Sena [aut], Anthony Molinaro [aut], Frank DeFalco [aut], Pavel Grafkin [aut]

Maintainer:

Katy Sadowski <sadowski@ohdsi.org>

Description:

Assesses data quality in Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) databases. Executes data quality checks and provides an R 'shiny' application to view the results.

License:

Apache License 2.0

Config/build/clean-inst-doc:

FALSE

VignetteBuilder:

knitr

Config/testthat/edition:

URL:

https://github.com/OHDSI/DataQualityDashboard

BugReports:

https://github.com/OHDSI/DataQualityDashboard/issues

Depends:

R (≥ 3.2.2), DatabaseConnector (≥ 2.0.2)

Imports:

magrittr, ParallelLogger, dplyr, jsonlite, rJava, SqlRender (≥ 1.10.1), plyr, stringr, rlang, tidyselect, readr

Suggests:

testthat, knitr, rmarkdown, markdown, shiny, ggplot2, Eunomia (≥ 2.0.0), duckdb, R.utils, devtools

RoxygenNote:

7.3.2

Encoding:

UTF-8

NeedsCompilation:

Packaged:

2026-01-24 18:39:14 UTC; katysadowski

Repository:

CRAN

Date/Publication:

2026-01-28 18:50:23 UTC

Applies the 'Not Applicable' status to a single check

Description

Applies the 'Not Applicable' status to a single check

Usage

.applyNotApplicable(x)

Arguments

x

Results from a single check

Value

A numeric value (0 or 1) indicating whether the check is not applicable

Determines if check should be notApplicable and the notApplicableReason

Description

Determines if check should be notApplicable and the notApplicableReason

Usage

.calculateNotApplicableStatus(checkResults)

Arguments

checkResults

A dataframe containing the results of the data quality checks

Value

A dataframe with updated check results including notApplicable status and reasons

Determines if all checks required for 'Not Applicable' status are in the checkNames

Description

Determines if all checks required for 'Not Applicable' status are in the checkNames

Usage

.containsNAchecks(checkNames)

Arguments

checkNames

A character vector of check names

Value

A logical value indicating whether all required checks are present

Internal function to evaluate the data quality checks against given thresholds.

Description

Internal function to evaluate the data quality checks against given thresholds.

Usage

.evaluateThresholds(checkResults, tableChecks, fieldChecks, conceptChecks)

Arguments

checkResults

A dataframe containing the results of the data quality checks

tableChecks

A dataframe containing the table checks

fieldChecks

A dataframe containing the field checks

conceptChecks

A dataframe containing the concept checks

Value

A dataframe with updated check results including pass/fail status and threshold values

Internal function to define the id of each check.

Description

Internal function to define the id of each check.

Usage

.getCheckId(
  checkLevel,
  checkName,
  cdmTableName,
  cdmFieldName = NA,
  conceptId = NA,
  unitConceptId = NA
)

Arguments

checkLevel

The level of the check. Options are table, field, or concept

checkName

The name of the data quality check

cdmTableName

The name of the CDM data table the quality check is applied to

cdmFieldName

The name of the field in the CDM data table the quality check is applied to

conceptId

The concept id the quality check is applied to

unitConceptId

The unit concept id the quality check is applied to

Value

A character string representing the unique check ID

Determines if all checks are present expected to calculate the 'Not Applicable' status

Description

Determines if all checks are present expected to calculate the 'Not Applicable' status

Usage

.hasNAchecks(checkResults)

Arguments

checkResults

A dataframe containing the results of the data quality checks

Value

A logical value indicating whether all required checks are present

Internal function to determine if the connection needs auto commit

Description

Internal function to determine if the connection needs auto commit

Usage

.needsAutoCommit(connectionDetails, connection)

Arguments

connectionDetails

A connectionDetails object for connecting to the CDM database

connection

A connection for connecting to the CDM database using the DatabaseConnector::connect(connectionDetails) function.

Value

A logical value indicating if the connection needs auto commit

Internal function to send the fully qualified sql to the database and return the numerical result.

Description

Internal function to send the fully qualified sql to the database and return the numerical result.

Usage

.processCheck(
  connection,
  connectionDetails,
  check,
  checkDescription,
  sql,
  outputFolder
)

Arguments

connection

A connection for connecting to the CDM database using the DatabaseConnector::connect(connectionDetails) function.

connectionDetails

A connectionDetails object for connecting to the CDM database.

check

The data quality check

checkDescription

The description of the data quality check

sql

The fully qualified sql for the data quality check

outputFolder

The folder to output logs and SQL files to.

Value

A dataframe containing the check results

Internal function to read threshold files

Description

Internal function to read threshold files

Usage

.readThresholdFile(checkThresholdLoc, defaultLoc)

Arguments

checkThresholdLoc

The location of the threshold file

defaultLoc

The default location of the threshold file

Value

A dataframe containing the threshold data

Internal function to put the results of each quality check into a dataframe.

Description

Internal function to put the results of each quality check into a dataframe.

Usage

.recordResult(
  result = NULL,
  check,
  checkDescription,
  sql,
  executionTime = NA,
  warning = NA,
  error = NA
)

Arguments

result

The result of the data quality check

check

The data quality check

checkDescription

The description of the data quality check

sql

The fully qualified sql for the data quality check

executionTime

The total time it took to execute the data quality check

warning

Any warnings returned from the server

error

Any errors returned from the server

Value

A dataframe containing the check results

Internal function to run and process each data quality check.

Description

Internal function to run and process each data quality check.

Usage

.runCheck(
  checkDescription,
  tableChecks,
  fieldChecks,
  conceptChecks,
  connectionDetails,
  connection,
  cdmDatabaseSchema,
  vocabDatabaseSchema,
  resultsDatabaseSchema,
  writeTableName,
  cohortDatabaseSchema,
  cohortTableName,
  cohortDefinitionId,
  outputFolder,
  sqlOnlyUnionCount,
  sqlOnlyIncrementalInsert,
  sqlOnly
)

Arguments

checkDescription

The description of the data quality check

tableChecks

A dataframe containing the table checks

fieldChecks

A dataframe containing the field checks

conceptChecks

A dataframe containing the concept checks

connectionDetails

A connectionDetails object for connecting to the CDM database

connection

A connection for connecting to the CDM database using the DatabaseConnector::connect(connectionDetails) function.

cdmDatabaseSchema

The fully qualified database name of the CDM schema

vocabDatabaseSchema

The fully qualified database name of the vocabulary schema (default is to set it as the cdmDatabaseSchema)

resultsDatabaseSchema

The fully qualified database name of the results schema

writeTableName

The table tor write DQD results to. Used when sqlOnly or writeToTable is True.

cohortDatabaseSchema

The schema where the cohort table is located.

cohortTableName

The name of the cohort table.

cohortDefinitionId

The cohort definition id for the cohort you wish to run the DQD on. The package assumes a standard OHDSI cohort table called 'Cohort'

outputFolder

The folder to output logs and SQL files to

sqlOnlyUnionCount

(OPTIONAL) How many SQL commands to union before inserting them into output table (speeds processing when queries done in parallel). Default is 1.

sqlOnlyIncrementalInsert

(OPTIONAL) Boolean to determine whether insert check results and associated metadata into output table. Default is FALSE (for backwards compatability to <= v2.2.0)

sqlOnly

Should the SQLs be executed (FALSE) or just returned (TRUE)?

Value

A dataframe containing the check results or SQL queries (NULL if sqlOnlyIncrementalInsert is TRUE)

Internal function to summarize the results of the DQD run.

Description

Internal function to summarize the results of the DQD run.

Usage

.summarizeResults(checkResults)

Arguments

checkResults

A dataframe containing the results of the checks after running against the database

Value

A list containing summary statistics of the check results

Internal function to write the check results to a csv file.

Description

Internal function to write the check results to a csv file.

Usage

.writeResultsToCsv(
  checkResults,
  csvPath,
  columns = c("checkId", "failed", "passed", "isError", "notApplicable", "checkName",
    "checkDescription", "thresholdValue", "notesValue", "checkLevel", "category",
    "subcategory", "context", "checkLevel", "cdmTableName", "cdmFieldName", "conceptId",
    "unitConceptId", "numViolatedRows", "pctViolatedRows", "numDenominatorRows",
    "executionTime", "notApplicableReason", "error", "queryText"),
  delimiter = ","
)

Arguments

checkResults

A dataframe containing the fully summarized data quality check results

csvPath

The path where the csv file should be written

columns

The columns to be included in the csv file. Default is all columns in the checkResults dataframe.

delimiter

The delimiter for the file. Default is comma.

Value

NULL (writes results to CSV file)

Write DQD results to json

Description

Write DQD results to json

Usage

.writeResultsToJson(result, outputFolder, outputFile)

Arguments

result

A DQD results object (list)

outputFolder

The output folder

outputFile

The output filename

Value

NULL (writes results to JSON file)

Internal function to write the check results to a table in the database. Requires write access to the database

Description

Internal function to write the check results to a table in the database. Requires write access to the database

Usage

.writeResultsToTable(
  connectionDetails,
  resultsDatabaseSchema,
  checkResults,
  writeTableName,
  cohortDefinitionId
)

Arguments

connectionDetails

A connectionDetails object for connecting to the CDM database

resultsDatabaseSchema

The fully qualified database name of the results schema

checkResults

A dataframe containing the fully summarized data quality check results

writeTableName

The name of the table to be written to the database. Default is "dqdashboard_results".

cohortDefinitionId

(OPTIONAL) The cohort definition id for the cohort you wish to run the DQD on. The package assumes a standard OHDSI cohort table called 'Cohort' with the fields cohort_definition_id and subject_id.

Value

NULL (writes results to database table)

Convert JSON results file case

Description

Convert a DQD JSON results file between camelcase and (all-caps) snakecase. Enables viewing of pre-v.2.1.0 results files in later DQD versions, and vice versa

Usage

convertJsonResultsFileCase(
  jsonFilePath,
  writeToFile,
  outputFolder = NA,
  outputFile = "",
  targetCase
)

Arguments

jsonFilePath

Path to the JSON results file to be converted

writeToFile

Whether or not to write the converted results back to a file (must be either TRUE or FALSE)

outputFolder

The folder to output the converted JSON results file to

outputFile

(OPTIONAL) File to write converted results JSON object to. Default is name of input file with a "_camel" or "_snake" postfix

targetCase

Case into which the results file parameters should be converted (must be either "camel" or "snake")

Value

DQD results object (a named list)

Execute DQ checks

Description

This function will connect to the database, generate the sql scripts, and run the data quality checks against the database. By default, results will be written to a json file as well as a database table.

Usage

executeDqChecks(
  connectionDetails,
  cdmDatabaseSchema,
  resultsDatabaseSchema,
  vocabDatabaseSchema = cdmDatabaseSchema,
  cdmSourceName,
  numThreads = 1,
  sqlOnly = FALSE,
  sqlOnlyUnionCount = 1,
  sqlOnlyIncrementalInsert = FALSE,
  outputFolder,
  outputFile = "",
  verboseMode = FALSE,
  writeToTable = TRUE,
  writeTableName = "dqdashboard_results",
  writeToCsv = FALSE,
  csvFile = "",
  checkLevels = c("TABLE", "FIELD", "CONCEPT"),
  checkNames = c(),
  checkSeverity = c("fatal", "convention", "characterization"),
  cohortDefinitionId = c(),
  cohortDatabaseSchema = resultsDatabaseSchema,
  cohortTableName = "cohort",
  tablesToExclude = c("CONCEPT", "VOCABULARY", "CONCEPT_ANCESTOR",
    "CONCEPT_RELATIONSHIP", "CONCEPT_CLASS", "CONCEPT_SYNONYM", "RELATIONSHIP", "DOMAIN"),
  cdmVersion = "5.3",
  tableCheckThresholdLoc = "default",
  fieldCheckThresholdLoc = "default",
  conceptCheckThresholdLoc = "default"
)

Arguments

connectionDetails

A connectionDetails object for connecting to the CDM database

cdmDatabaseSchema

The fully qualified database name of the CDM schema

resultsDatabaseSchema

The fully qualified database name of the results schema

vocabDatabaseSchema

The fully qualified database name of the vocabulary schema (default is to set it as the cdmDatabaseSchema)

cdmSourceName

The name of the CDM data source

numThreads

The number of concurrent threads to use to execute the queries

sqlOnly

Should the SQLs be executed (FALSE) or just returned (TRUE)?

sqlOnlyUnionCount

(OPTIONAL) In sqlOnlyIncrementalInsert mode, how many SQL commands to union in each query to insert check results into results table (can speed processing when queries done in parallel). Default is 1.

sqlOnlyIncrementalInsert

(OPTIONAL) In sqlOnly mode, boolean to determine whether to generate SQL queries that insert check results and associated metadata into results table. Default is FALSE (for backwards compatibility to <= v2.2.0)

outputFolder

The folder to output logs, SQL files, and JSON results file to

outputFile

(OPTIONAL) File to write results JSON object

verboseMode

Boolean to determine if the console will show all execution steps. Default is FALSE

writeToTable

Boolean to indicate if the check results will be written to the dqdashboard_results table in the resultsDatabaseSchema. Default is TRUE

writeTableName

The name of the results table. Defaults to 'dqdashboard_results'. Used when sqlOnly or writeToTable is True.

writeToCsv

Boolean to indicate if the check results will be written to a csv file. Default is FALSE

csvFile

(OPTIONAL) CSV file to write results

checkLevels

Choose which DQ check levels to execute. Default is all 3 (TABLE, FIELD, CONCEPT)

checkNames

(OPTIONAL) Choose which check names to execute. Names can be found in inst/csv/OMOP_CDM_v[cdmVersion]_Check_Descriptions.csv. Note that "cdmTable", "cdmField" and "measureValueCompleteness" are always executed.

checkSeverity

Choose which DQ check severity levels to execute. Default is all 3 (fatal, convention, characterization)

cohortDefinitionId

The cohort definition id for the cohort you wish to run the DQD on. The package assumes a standard OHDSI cohort table with the fields cohort_definition_id and subject_id.

cohortDatabaseSchema

The schema where the cohort table is located.

cohortTableName

The name of the cohort table. Defaults to 'cohort'.

tablesToExclude

(OPTIONAL) Choose which CDM tables to exclude from the execution.

cdmVersion

The CDM version to target for the data source. Options are "5.2", "5.3", or "5.4". By default, "5.3" is used.

tableCheckThresholdLoc

The location of the threshold file for evaluating the table checks. If not specified the default thresholds will be applied.

fieldCheckThresholdLoc

The location of the threshold file for evaluating the field checks. If not specified the default thresholds will be applied.

conceptCheckThresholdLoc

The location of the threshold file for evaluating the concept checks. If not specified the default thresholds will be applied.

Value

A list object of results

List DQ checks

Description

Details on all checks defined by the DataQualityDashboard Package.

Usage

listDqChecks(
  cdmVersion = "5.3",
  tableCheckThresholdLoc = "default",
  fieldCheckThresholdLoc = "default",
  conceptCheckThresholdLoc = "default"
)

Arguments

cdmVersion

The CDM version to target for the data source. By default, 5.3 is used.

tableCheckThresholdLoc

The location of the threshold file for evaluating the table checks. If not specified the default thresholds will be applied.

fieldCheckThresholdLoc

The location of the threshold file for evaluating the field checks. If not specified the default thresholds will be applied.

conceptCheckThresholdLoc

The location of the threshold file for evaluating the concept checks. If not specified the default thresholds will be applied.

Value

A list containing check descriptions, table checks, field checks, and concept checks

Re-evaluate Thresholds

Description

Re-evaluate an existing DQD result against an updated thresholds file.

Usage

reEvaluateThresholds(
  jsonFilePath,
  outputFolder,
  outputFile,
  tableCheckThresholdLoc = "default",
  fieldCheckThresholdLoc = "default",
  conceptCheckThresholdLoc = "default",
  cdmVersion = "5.3"
)

Arguments

jsonFilePath

Path to the JSON results file generated using the execute function

outputFolder

The folder to output new JSON result file to

outputFile

File to write results JSON object to

tableCheckThresholdLoc

The location of the threshold file for evaluating the table checks. If not specified the default thresholds will be applied.

fieldCheckThresholdLoc

The location of the threshold file for evaluating the field checks. If not specified the default thresholds will be applied.

conceptCheckThresholdLoc

The location of the threshold file for evaluating the concept checks. If not specified the default thresholds will be applied.

cdmVersion

The CDM version to target for the data source. By default, 5.3 is used.

Value

A list containing the re-evaluated DQD results

View DQ Dashboard

Description

View DQ Dashboard

Usage

viewDqDashboard(jsonPath, launch.browser = NULL, display.mode = NULL, ...)

Arguments

jsonPath

The fully-qualified path to the JSON file produced by executeDqChecks

launch.browser

Passed on to shiny::runApp

display.mode

Passed on to shiny::runApp

...

Extra parameters for shiny::runApp() like "port" or "host"

Value

NULL (launches Shiny application)

Write DQD results database table to json

Description

Write DQD results database table to json

Usage

writeDBResultsToJson(
  connection,
  resultsDatabaseSchema,
  cdmDatabaseSchema,
  writeTableName,
  outputFolder,
  outputFile
)

Arguments

connection

A connection object

resultsDatabaseSchema

The fully qualified database name of the results schema

cdmDatabaseSchema

The fully qualified database name of the CDM schema

writeTableName

Name of DQD results table in the database to read from

outputFolder

The folder to output the json results file to

outputFile

The output filename of the json results file

Value

NULL (writes results to JSON file)

Write JSON Results to CSV file

Description

Write JSON Results to CSV file

Usage

writeJsonResultsToCsv(
  jsonPath,
  csvPath,
  columns = c("checkId", "failed", "passed", "isError", "notApplicable", "checkName",
    "checkDescription", "thresholdValue", "notesValue", "checkLevel", "category",
    "subcategory", "context", "checkLevel", "cdmTableName", "cdmFieldName", "conceptId",
    "unitConceptId", "numViolatedRows", "pctViolatedRows", "numDenominatorRows",
    "executionTime", "notApplicableReason", "error", "queryText"),
  delimiter = ","
)

Arguments

jsonPath

Path to the JSON results file generated using the execute function

csvPath

Path to the CSV output file

columns

(OPTIONAL) List of desired columns

delimiter

(OPTIONAL) CSV delimiter

Value

NULL (writes results to CSV file)

Write JSON Results to SQL Table

Description

Write JSON Results to SQL Table

Usage

writeJsonResultsToTable(
  connectionDetails,
  resultsDatabaseSchema,
  jsonFilePath,
  writeTableName = "dqdashboard_results",
  cohortDefinitionId = c(),
  singleTable = FALSE
)

Arguments

connectionDetails

A connectionDetails object for connecting to the CDM database

resultsDatabaseSchema

The fully qualified database name of the results schema

jsonFilePath

Path to the JSON results file generated using the execute function

writeTableName

Name of table in the database to write results to

cohortDefinitionId

If writing results for a single cohort this is the ID that will be appended to the table name

singleTable

If TRUE, writes all results to a single table. If FALSE (default), writes to 3 separate tables by check level (table, field, concept) (NOTE this default behavior will be deprecated in the future)

Value

NULL (writes results to database table)