Validate Drug Purchase Data
check_purchases.Rd
This function checks the structure and content of drug purchase data (data.frame or data.table) for use in pre2dup
workflows.
It helps users detect errors in advance, such as missing or invalid records, incorrect formats, or dates outside a specified range.
If all checks pass, the function can return a validated data.table with the required columns and proper types.
Usage
check_purchases(
dt,
pre_person_id = NULL,
pre_atc = NULL,
pre_package_id = NULL,
pre_date = NULL,
pre_ratio = NULL,
pre_ddd = NULL,
date_range = NULL,
print_all = FALSE,
return_data = FALSE
)
Arguments
- dt
data.frame or data.table containing drug purchase records.
- pre_person_id
Character. Column name for the person identifier.
- pre_atc
Character. Column name for the ATC code.
- pre_package_id
Character. Column name for the package identifier (e.g., Vnr in Nordic data).
- pre_date
Character. Column name for the drug purchase date.
- pre_ratio
Character. Column name for the amount of drug purchased: for whole packages, number of packages; for partial supplies, the proportion of a package (e.g., 0.5 for 14 tablets from a 28-tablet package).
- pre_ddd
Character. Column name for defined daily dose (DDD) of the purchase.
- date_range
Character vector of length 2. Date range for purchase dates (e.g., c("1995-01-01", "2018-12-31")). Default is NULL (no date range check).
- print_all
Logical. If TRUE, all row numbers that caused warnings are printed; if FALSE, only the first 5 problematic rows are printed.
- return_data
Logical. If TRUE and no errors are detected, returns a data.table with the validated columns and proper types. If FALSE, only a message is printed.
Value
If return_data = TRUE
, returns a data.table containing only the validated columns, with converted types.
If errors are detected, the function stops and prints error messages.
Details
The following checks are performed:
Existence and naming of required columns
No missing or duplicated records
Each package has a unique ATC code
Validity of person identifiers (numeric or non-numeric, no missing values)
Validity of ATC codes (no missing or invalid values)
Validity of package IDs and purchase ratio (numeric, no missing values)
DDD values: missing allowed, but not zero or negative
All purchase dates must be present, convertible, and within the specified range (if given)
Sufficient DDD coverage per ATC (with user confirmation if below threshold)
If any errors are found, the function stops execution and prints all error messages.
Examples
ID <- c(rep(100001, 3), rep(100002, 3))
ATC <- c(rep("N06AX11", 3), rep("N05AH03", 3))
vnr <- c(rep(48580, 3), rep(145698, 3))
dates <- as.Date(c("1998-07-04","1998-07-27","1998-08-28", "2000-01-12", "2000-02-05","2000-02-24"))
ratios <- c(0.5, 2, 2, 1, 0.5, 2)
ddds <- c(7.5, 30, 30, 28, 14, 56)
purchases <- data.frame(ID, ATC, vnr, dates, ratios, ddds)
check_purchases(
dt = purchases,
pre_person_id = "ID",
pre_atc = "ATC",
pre_package_id = "vnr",
pre_date = "dates",
pre_ratio = "ratios",
pre_ddd = "ddds",
date_range = c("1995-01-01", "2018-12-31"),
print_all = TRUE,
return_data = TRUE
)
#> Checks passed for 'purchases'
#> ID ATC vnr dates ratios ddds date_pre
#> <fctr> <char> <int> <Date> <num> <num> <int>
#> 1: 100001 N06AX11 48580 1998-07-04 0.5 7.5 10411
#> 2: 100001 N06AX11 48580 1998-07-27 2.0 30.0 10434
#> 3: 100001 N06AX11 48580 1998-08-28 2.0 30.0 10466
#> 4: 100002 N05AH03 145698 2000-01-12 1.0 28.0 10968
#> 5: 100002 N05AH03 145698 2000-02-05 0.5 14.0 10992
#> 6: 100002 N05AH03 145698 2000-02-24 2.0 56.0 11011