Introduction to PRE2DUP-R
introduction.Rmd
This vignette provides a brief introduction to the
PRE2DUPR
package, which is designed to construct treatment
periods from drug purhchases data with PRE2DUP algorithm. The package
includes functions for validating data and running the PRE2DUP.
Installation
To install the PRE2DUPR
package, you can use the
following command in R:
install.packages("devtools")
devtools::install_github("piavat/PRE2DUP-R")
To use the PRE2DUPR
package, you can start by loading it
into your R session:
library(PRE2DUPR)
Data
The PRE2DUPR
package comes with example datasets that
you can use to test the functionality of the PRE2DUP algorithm. The
datasets include:
-
purchases_example
: A dataset containing drug purchase records. -
hospitalizations_example
: A dataset containing hospital admission records. -
package_parameters_example
: A dataset containing package characteristics. -
ATC_parameters_example
: A dataset containing ATC code characteristics.
All data types have associated functions to validate the input before
running pre2dup
. These functions are called internally by
the program, so you don’t need to run them manually unless you want to
check your data beforehand.
It is recommended to run these checks in advance to detect and
correct errors more easily and efficiently. Note that the internal
checks in pre2dup
will display only the first five rows
with detected errors. When run separately, all rows with issues can be
listed by adjusting the function parameter
print_all = TRUE
.
Drug purchases data
Drug purchases are records with information about the purchase of drugs, including the person who made the purchase, the drug’s ATC code, the package ID, the date of purchase, the number of packages purchased, and the amount in DDDs (Defined Daily Doses).
Data validation
Function check_purchases
checks the data before running
the PRE2DUP algorithm. It ensures that the dataset meets the necessary
requirements for the algorithm to function correctly.
check_purchases(dt = purchases_example,
pre_person_id = "id",
pre_atc = "ATC",
pre_package_id = "vnr",
pre_date = "purchase_date",
pre_ratio = "n_packages",
pre_ddd = "amount",
print_all = TRUE)
# Checks passed for ‘purchases_example’
Hospitalizations data
Hospitalizations are records of hospital admissions, including the person ID, admission date, and discharge date. This data is used to assess the impact of hospitalizations on drug exposure periods.
Data validation
Function check_hospitalizations
checks the data before
running the PRE2DUP algorithm.
check_hospitalizations(dt = hospitalizations_example,
hosp_person_id = "id",
hosp_admission = "hospital_start",
hosp_discharge = "hospital_end",
print_all = TRUE)
# Checks passed for ‘hospitalizations_example’
Package parameters
Package parameters are used to define the characteristics of drug packages. The parameter file specifies the identifying number, ATC code, and the minimum, usual, and maximum duration of a package, as well as the usual and minimum dose in defined daily doses (DDDs).
Intruction show to create package parameters Package Parameters tutorial.
Data validation
Function check_package_parameters
checks the data before
running the PRE2DUP algorithm.
check_package_parameters(dt = package_parameters_example,
pack_atc = "ATC",
pack_id = "vnr",
pack_ddd_low = "lower_ddd",
pack_ddd_usual = "usual_ddd",
pack_dur_min = "minimum_dur",
pack_dur_usual = "usual_dur",
pack_dur_max = "maximum_dur",
print_all = FALSE)
# Checks passed for ‘package_parameters_example’
ATC parameters
ATC parameters are used to define the characteristics of ATC codes when package-specific information is not available. The ATC parameters file specifies the partial or full ATC code, the lower limit of daily dose, the usual daily dose, and the minimum and maximum allowed treatment durations. Package example data ATC_parameters can be used as such or as an example of how to create your own ATC code characteristics dataset.
Data validation
Function check_atc_parameters
checks the ATC parameters
data before running the PRE2DUP algorithm.
check_atc_parameters(dt = ATC_parameters,
atc_class = "partial_atc",
atc_ddd_low = "lower_ddd_atc",
atc_ddd_usual = "usual_ddd_atc",
atc_dur_min = "minimum_dur_atc",
atc_dur_max = "maximum_dur_atc",
print_all = TRUE)
# Checks passed for ‘ATC_parameters’.
Running the PRE2DUP
The PRE2DUP algorithm for creation of drug use periods is run using
the pre2dup
function. This function will process your drug
purchase data, hospitalizations, package parameters, and ATC parameters
to estimate drug exposure.
outdata <- pre2dup(
pre_data = purchases_example,
pre_person_id = "id",
pre_atc = "ATC",
pre_package_id = "vnr",
pre_date = "purchase_date",
pre_ratio = "n_packages",
pre_ddd = "amount",
package_parameters = package_parameters_example,
pack_atc = "ATC",
pack_id = "vnr",
pack_ddd_low = "lower_ddd",
pack_ddd_usual ="usual_ddd",
pack_dur_min = "minimum_dur",
pack_dur_usual = "usual_dur",
pack_dur_max = "maximum_dur",
atc_parameters = ATC_parameters,
atc_class = "partial_atc",
atc_ddd_low = "lower_ddd_atc",
atc_ddd_usual = "usual_ddd_atc",
atc_dur_min = "minimum_dur_atc",
atc_dur_max = "maximum_dur_atc",
hosp_data = hospitalizations_example,
hosp_person_id = "id",
hosp_admission = "hospital_start",
hosp_discharge = "hospital_end",
date_range = c("2025-01-01", "2025-12-31"),
global_gap_max = 300,
global_min = 5,
global_max = 300,
global_max_single = 150,
global_ddd_high = 10,
global_hosp_max = 30,
weight_past = 1,
weight_current = 4,
weight_next = 1,
weight_first_last = 5,
calculate_pack_dur_usual = T,
days_covered = 5,
post_process_perc = 1)
# Step 1/6: Checking parameters and datasets...
# Checks passed for ‘pre_data’
# Checks passed for ‘package_parameters’
# Checks passed for ‘atc_parameters’.
# Checks passed for ‘hosp_data’
# Preparing hospitalization data and merging overlapping hospitalizations.
# Step 2/6: Calculating purchase durations...
# Step 3/6: Stockpiling assessment...
# Step 4/6: Calculating common package durations in data...
# Refill lengths couldn't be re-estimated, probably due to too small data size.
# Step 5/6: Preparing drug use periods...
# Step 6/6: Post-processing drug use periods...
# Current post processing percentage: 1
# Drug use periods calculated. 7 periods created for 5 persons.
# Drug use periods are stored in the `periods` element of the output list.
dup <- outdata$periods
dup
# period id ATC dup_start dup_end dup_days dup_hospital_days dup_n_purchases dup_last_purchase dup_total_DDD dup_temporal_average_DDDs
# <int> <fctr> <char> <Date> <Date> <num> <num> <int> <Date> <num> <num>
# 1: 1 1 N05AH02 2025-01-01 2025-04-14 104 0 3 2025-03-08 99.99 0.961
# 2: 2 2 N05AH02 2025-01-15 2025-04-28 104 5 3 2025-03-22 99.99 0.961
# 3: 3 3 N05AH02 2025-02-01 2025-05-15 104 0 3 2025-04-08 99.99 0.961
# 4: 4 3 N05AH04 2025-01-05 2025-08-26 233 0 2 2025-04-15 200.00 0.858
# 5: 5 4 N05AH02 2025-01-10 2025-04-23 104 0 3 2025-03-17 99.99 0.961
# 6: 6 4 N05AH04 2025-01-20 2025-09-10 233 0 2 2025-04-30 200.00 0.858
# 7: 7 5 N05AH04 2025-01-01 2025-08-22 233 38 2 2025-04-11 200.00 0.858
# Package parameters with updated common durations is not returned, because the common durations were not calculated in this example.
outdata$pack_info
# NULL
Workflow when using estimated usual package durations from data
The pre2dup
function has an option to estimate the usual
package durations from the data. This is useful when you want to derive
package durations based on the actual purchase patterns in your dataset.
Set calculate_pack_dur_usual = T
, run the program and
update package parameters based on common duration in data.
# Make data so big, that it can calculate common durations
id <- sort(rep(1:5, each = 20))
vnr <- rep(c(rep(30627, 10), rep(41738, 10)), 5)
ATC <- rep(c(rep("N05AH02", 10), rep("N05AH04", 10)), 5)
d40 <- as.Date("2020-01-01") + 40*1:10
d120 <- as.Date("2022-01-01") + 120*1:10
purchase_date <- rep(c(d40, d120), 5)
n_packages <- rep(1, 100)
amount <- rep(c(rep(33, 10), rep(80, 10)), 5)
purchases_data <- data.frame(id, vnr, ATC, purchase_date, n_packages, amount)
# This example uses function default values, they can be changed to fit your needs.
outdata <- pre2dup(
pre_data = purchases_data,
pre_person_id = "id",
pre_atc = "ATC",
pre_package_id = "vnr",
pre_date = "purchase_date",
pre_ratio = "n_packages",
pre_ddd = "amount",
package_parameters = package_parameters_example,
pack_atc = "ATC",
pack_id = "vnr",
pack_ddd_low = "lower_ddd",
pack_ddd_usual ="usual_ddd",
pack_dur_min = "minimum_dur",
pack_dur_usual = "usual_dur",
pack_dur_max = "maximum_dur",
atc_parameters = ATC_parameters,
atc_class = "partial_atc",
atc_ddd_low = "lower_ddd_atc",
atc_ddd_usual = "usual_ddd_atc",
atc_dur_min = "minimum_dur_atc",
atc_dur_max = "maximum_dur_atc",
hosp_data = hospitalizations_example,
hosp_person_id = "id",
hosp_admission = "hospital_start",
hosp_discharge = "hospital_end",
date_range = c("2020-01-01", "2025-12-31"),
global_gap_max = 300,
global_min = 5,
global_max = 300,
global_max_single = 150,
global_ddd_high = 10,
global_hosp_max = 30,
weight_past = 1,
weight_current = 4,
weight_next = 1,
weight_first_last = 5,
calculate_pack_dur_usual = T,
days_covered = 5,
post_process_perc = 1)
# Step 1/6: Checking parameters and datasets...
# Checks passed for ‘pre_data’
# Checks passed for ‘package_parameters’
# Checks passed for ‘atc_parameters’.
# Checks passed for ‘hosp_data’
# Preparing hospitalization data and merging overlapping hospitalizations.
# Step 2/6: Calculating purchase durations...
# Step 3/6: Stockpiling assessment...
# Step 4/6: Calculating common package durations in data...
# Step 5/6: Preparing drug use periods...
# Step 6/6: Post-processing drug use periods...
# Current post processing percentage: 1
# Drug use periods calculated. 10 periods created for 5 persons.
# Program returns periods, but not used for now
dup <- outdata$periods
# Program returns updated package parameters
updated_params <- outdata$pack_info
# Check new common durations
updated_params[!is.na(updated_params$common_duration), ]
# vnr ATC product_name strength strength_num packagesize packsize_num drug_form_harmonized ddd_per_pack minimum_dur usual_dur maximum_dur lower_ddd usual_ddd common_duration
# 6 30627 N05AH02 LEPONEX 100MG 100 100 100 TABLET 33.33333 25 33.33 100 0.3333 1.00 40
# 8 41738 N05AH04 KETIPINOR 300MG 300 100FOL 100 TABLET 75.00000 50 100.00 200 0.3750 0.75 120
# Make a new common duration column selecting the common duration from the updated package parameters by your choice
updated_params$usual_duration_new <- ifelse(
!is.na(updated_params$common_duration),
updated_params$common_duration,
updated_params$usual_dur
)
# Run PRE2DUP with the updated package parameters
outdata <- pre2dup(
pre_data = purchases_data,
pre_person_id = "id",
pre_atc = "ATC",
pre_package_id = "vnr",
pre_date = "purchase_date",
pre_ratio = "n_packages",
pre_ddd = "amount",
package_parameters = updated_params,
pack_atc = "ATC",
pack_id = "vnr",
pack_ddd_low = "lower_ddd",
pack_ddd_usual ="usual_ddd",
pack_dur_min = "minimum_dur",
pack_dur_usual = "usual_duration_new", # This is the new column
pack_dur_max = "maximum_dur",
atc_parameters = ATC_parameters,
atc_class = "partial_atc",
atc_ddd_low = "lower_ddd_atc",
atc_ddd_usual = "usual_ddd_atc",
atc_dur_min = "minimum_dur_atc",
atc_dur_max = "maximum_dur_atc",
hosp_data = hospitalizations_example,
hosp_person_id = "id",
hosp_admission = "hospital_start",
hosp_discharge = "hospital_end",
date_range = c("2020-01-01", "2025-12-31"),
global_gap_max = 300,
global_min = 5,
global_max = 300,
global_max_single = 150,
global_ddd_high = 10,
global_hosp_max = 30,
weight_past = 1,
weight_current = 4,
weight_next = 1,
weight_first_last = 5,
calculate_pack_dur_usual = F,# Done already
days_covered = 5,
post_process_perc = 1)
# Step 1/6: Checking parameters and datasets...
# Checks passed for ‘pre_data’
# Checks passed for ‘package_parameters’
# Checks passed for ‘atc_parameters’.
# Checks passed for ‘hosp_data’
# Preparing hospitalization data and merging overlapping hospitalizations.
# Step 2/6: Calculating purchase durations...
# Step 3/6: Stockpiling assessment...
# Step 4/6: Common package duration calculation was not selected in the parameters; skipping this step.
# Step 5/6: Preparing drug use periods...
# Step 6/6: Post-processing drug use periods...
# Current post processing percentage: 1
# Drug use periods calculated. 10 periods created for 5 persons.
# The final output
final_periods <- outdata$periods
final_periods
# period id ATC dup_start dup_end dup_days dup_hospital_days dup_n_purchases dup_last_purchase dup_total_DDD dup_temporal_average_DDDs
# <int> <fctr> <char> <Date> <Date> <num> <num> <int> <Date> <num> <num>
# 1: 1 1 N05AH02 2020-02-10 2021-03-23 408 0 10 2021-02-04 330 0.809
# 2: 2 1 N05AH04 2022-05-01 2025-09-05 1224 0 10 2025-04-15 800 0.654
# 3: 3 2 N05AH02 2020-02-10 2021-03-23 408 0 10 2021-02-04 330 0.809
# 4: 4 2 N05AH04 2022-05-01 2025-09-05 1224 5 10 2025-04-15 800 0.654
# 5: 5 3 N05AH02 2020-02-10 2021-03-23 408 0 10 2021-02-04 330 0.809
# 6: 6 3 N05AH04 2022-05-01 2025-09-05 1224 0 10 2025-04-15 800 0.654
# 7: 7 4 N05AH02 2020-02-10 2021-03-23 408 0 10 2021-02-04 330 0.809
# 8: 8 4 N05AH04 2022-05-01 2025-09-05 1224 0 10 2025-04-15 800 0.654
# 9: 9 5 N05AH02 2020-02-10 2021-03-23 408 0 10 2021-02-04 330 0.809
# 10: 10 5 N05AH04 2022-05-01 2025-09-05 1224 38 10 2025-04-15 800 0.654
For any questions or support, feel free to reach out to the package maintainers
…