This vignette provides a brief introduction to the
PRE2DUPR package, which is designed to construct treatment
periods from drug purhchases data with PRE2DUP algorithm. The package
includes functions for validating data and running the PRE2DUP.
Installation
To install the PRE2DUPR package, you can use the
following command in R:
install.packages("devtools")
devtools::install_github("piavat/PRE2DUP-R")To use the PRE2DUPR package, you can start by loading it
into your R session:
Data
The PRE2DUPR package comes with example datasets that
you can use to test the functionality of the PRE2DUP algorithm. The
datasets include:
-
purchases_example: A dataset containing drug purchase records. -
hospitalizations_example: A dataset containing hospital admission records. -
package_parameters_example: A dataset containing package characteristics. -
ATC_parameters_example: A dataset containing ATC code characteristics.
All data types have associated functions to validate the input before
running pre2dup. These functions are run internally by the
program, so you don’t need to run them manually unless you want to check
your data beforehand.
It is recommended to run these checks in advance to detect and
correct errors more easily and efficiently. Note that the internal
checks in pre2dup will display only the first five rows
with detected errors. When run separately, all rows with issues can be
listed by adjusting the function parameter
print_all = TRUE.
Drug purchases data
Drug purchases are records with information about the purchase of drugs, including the person who made the purchase, the drug’s ATC code, the package ID, the date of purchase, the number of packages purchased, and the amount in DDDs (Defined Daily Doses).
Data validation
Function check_purchases checks the data before running
the PRE2DUP algorithm. It ensures that the dataset meets the necessary
requirements for the algorithm to function correctly.
check_purchases(dt = purchases_example,
pre_person_id = "id",
pre_atc = "ATC",
pre_package_id = "vnr",
pre_date = "purchase_date",
pre_ratio = "n_packages",
pre_ddd = "amount",
drop_atcs = TRUE,
print_all = TRUE)
# Checks passed for ‘purchases_example’Hospitalizations data
Hospitalizations are records of hospital admissions, including the person ID, admission date, and discharge date. This data is used to assess the impact of hospitalizations on drug exposure periods.
Data validation
Function check_hospitalizations checks the data before
running the PRE2DUP algorithm.
check_hospitalizations(dt = hospitalizations_example,
hosp_person_id = "id",
hosp_admission = "hospital_start",
hosp_discharge = "hospital_end",
print_all = TRUE)
# Checks passed for ‘hospitalizations_example’Package parameters
Package parameters are used to define the characteristics of drug packages. The parameter file specifies the identifying number, ATC code, and the minimum, usual, and maximum duration of a package, as well as the usual and minimum dose in defined daily doses (DDDs).
Intruction show to create package parameters Package Parameters tutorial.
Data validation
Function check_package_parameters checks the data before
running the PRE2DUP algorithm.
check_package_parameters(dt = package_parameters_example,
pack_atc = "ATC",
pack_id = "vnr",
pack_ddd_low = "lower_ddd",
pack_ddd_usual = "usual_ddd",
pack_dur_min = "minimum_dur",
pack_dur_usual = "usual_dur",
pack_dur_max = "maximum_dur",
print_all = FALSE)
# Checks passed for ‘package_parameters_example’ATC parameters
ATC parameters are used to define the characteristics of ATC codes when package-specific information is not available. The ATC parameters file specifies the partial or full ATC code, the lower limit of daily dose, the usual daily dose, and the minimum and maximum allowed treatment durations. Package example data ATC_parameters can be used as such or as an example of how to create your own ATC code characteristics dataset.
Data validation
Function check_atc_parameters checks the ATC parameters
data before running the PRE2DUP algorithm.
check_atc_parameters(dt = ATC_parameters,
atc_class = "partial_atc",
atc_ddd_low = "lower_ddd_atc",
atc_ddd_usual = "usual_ddd_atc",
atc_dur_min = "minimum_dur_atc",
atc_dur_max = "maximum_dur_atc",
print_all = TRUE)
# Checks passed for ‘ATC_parameters’.Running the PRE2DUP
The PRE2DUP algorithm for creation of drug use periods is run using
the pre2dup function. This function will process your drug
purchase data, hospitalizations, package parameters, and ATC parameters
to estimate drug exposure.
outdata <- pre2dup(
pre_data = purchases_example,
pre_person_id = "id",
pre_atc = "ATC",
pre_package_id = "vnr",
pre_date = "purchase_date",
pre_ratio = "n_packages",
pre_ddd = "amount",
package_parameters = package_parameters_example,
pack_atc = "ATC",
pack_id = "vnr",
pack_ddd_low = "lower_ddd",
pack_ddd_usual ="usual_ddd",
pack_dur_min = "minimum_dur",
pack_dur_usual = "usual_dur",
pack_dur_max = "maximum_dur",
atc_parameters = ATC_parameters,
atc_class = "partial_atc",
atc_ddd_low = "lower_ddd_atc",
atc_ddd_usual = "usual_ddd_atc",
atc_dur_min = "minimum_dur_atc",
atc_dur_max = "maximum_dur_atc",
hosp_data = hospitalizations_example,
hosp_person_id = "id",
hosp_admission = "hospital_start",
hosp_discharge = "hospital_end",
date_range = c("2025-01-01", "2025-12-31"),
global_gap_max = 300,
global_min = 5,
global_max = 300,
global_max_single = 150,
global_ddd_high = 10,
global_hosp_max = 30,
days_covered = 5,
weight_past = 1,
weight_current = 4,
weight_next = 1,
weight_first_last = 5,
drop_atcs = TRUE,
data_to_return = "periods",
post_process_perc = 1)
# Step 1/6: Checking parameters and datasets...
# Checks passed for ‘pre_data’
# Checks passed for ‘package_parameters’
# Checks passed for ‘atc_parameters’.
# Checks passed for ‘hosp_data’
# Preparing hospitalization data and merging overlapping hospitalizations.
# Step 2/6: Calculating purchase durations...
# Step 3/6: Stockpiling assessment...
# Step 4/6: Common package duration calculation was not selected in function call; skipping this step.
# Step 5/6: Preparing drug use periods...
# Step 6/6: Post-processing drug use periods...
# Current post processing percentage: 1
# Drug use periods calculated. 7 periods created for 5 persons.
# Returning drug use periods.
# Drug use periods
outdata
# period id ATC dup_start dup_end dup_days dup_hospital_days dup_n_purchases dup_last_purchase dup_total_DDD dup_temporal_average_DDDs
# <int> <fctr> <char> <Date> <Date> <num> <num> <int> <Date> <num> <num>
# 1: 1 1 N05AH02 2025-01-01 2025-04-14 104 0 3 2025-03-08 99.99 0.961
# 2: 2 2 N05AH02 2025-01-15 2025-04-28 104 5 3 2025-03-22 99.99 0.961
# 3: 3 3 N05AH02 2025-02-01 2025-05-15 104 0 3 2025-04-08 99.99 0.961
# 4: 4 3 N05AH04 2025-01-05 2025-08-26 233 0 2 2025-04-15 200.00 0.858
# 5: 5 4 N05AH02 2025-01-10 2025-04-23 104 0 3 2025-03-17 99.99 0.961
# 6: 6 4 N05AH04 2025-01-20 2025-09-10 233 0 2 2025-04-30 200.00 0.858
# 7: 7 5 N05AH04 2025-01-01 2025-08-22 233 38 2 2025-04-11 200.00 0.858Workflow when using estimated usual package durations from data
The pre2dup function has an option to calculate typical
package durations from the drug purchases data. This is useful when user
wants to derive package parameter’s usual package durations based on the
actual purchase patterns. User runs first the pre2dup with
argument data_to_return = "parameters". The function then
returns a package parameter file that includes an additional column for
the typical package durations. The user can use this information to
update the parameters usual package duration and usual daily DDD and run
pre2dup with data_to_return = "periods" to
create the drug exposure periods.
# Step 1: Calculate common package durations in drug purchases data
id <- sort(rep(1:5, each = 20))
vnr <- rep(c(rep(30627, 10), rep(41738, 10)), 5)
ATC <- rep(c(rep("N05AH02", 10), rep("N05AH04", 10)), 5)
d40 <- as.Date("2020-01-01") + 40*1:10
d120 <- as.Date("2022-01-01") + 120*1:10
purchase_date <- rep(c(d40, d120), 5)
n_packages <- rep(1, 100)
amount <- rep(c(rep(33, 10), rep(80, 10)), 5)
purchases_data <- data.frame(id, vnr, ATC, purchase_date, n_packages, amount)
# Run pre2dup with data_to_return = "parameters".
updated_params <- pre2dup(
pre_data = purchases_data,
pre_person_id = "id",
pre_atc = "ATC",
pre_package_id = "vnr",
pre_date = "purchase_date",
pre_ratio = "n_packages",
pre_ddd = "amount",
package_parameters = package_parameters_example,
pack_atc = "ATC",
pack_id = "vnr",
pack_ddd_low = "lower_ddd",
pack_ddd_usual ="usual_ddd",
pack_dur_min = "minimum_dur",
pack_dur_usual = "usual_dur",
pack_dur_max = "maximum_dur",
atc_parameters = ATC_parameters,
atc_class = "partial_atc",
atc_ddd_low = "lower_ddd_atc",
atc_ddd_usual = "usual_ddd_atc",
atc_dur_min = "minimum_dur_atc",
atc_dur_max = "maximum_dur_atc",
hosp_data = hospitalizations_example,
hosp_person_id = "id",
hosp_admission = "hospital_start",
hosp_discharge = "hospital_end",
date_range = c("2020-01-01", "2025-12-31"),
global_gap_max = 300,
global_min = 5,
global_max = 300,
global_max_single = 150,
global_ddd_high = 10,
global_hosp_max = 30,
days_covered = 5,
weight_past = 1,
weight_current = 4,
weight_next = 1,
weight_first_last = 5,
drop_atcs = TRUE,
data_to_return = "parameters")
# Step 1/6: Checking parameters and datasets...
# Checks passed for ‘pre_data’
# Checks passed for ‘package_parameters’
# Checks passed for ‘atc_parameters’.
# Checks passed for ‘hosp_data’
# Preparing hospitalization data and merging overlapping hospitalizations.
# Step 2/6: Calculating purchase durations...
# Step 3/6: Stockpiling assessment...
# Step 4/6: Calculating common package durations in data...
# Common package durations calculated, returning updated package parameters.
# Step 2: Check new common durations and use as new usual duration
updated_params[!is.na(updated_params$common_duration), ]
# vnr ATC product_name strength strength_num packagesize packsize_num drug_form_harmonized ddd_per_pack minimum_dur usual_dur maximum_dur lower_ddd usual_ddd common_duration
# 6 30627 N05AH02 LEPONEX 100MG 100 100 100 TABLET 33.33333 25 33.33 100 0.3333 1.00 40
# 8 41738 N05AH04 KETIPINOR 300MG 300 100FOL 100 TABLET 75.00000 50 100.00 200 0.3750 0.75 120
# Make a new common duration column selecting the common duration from the updated package parameters by your choice
updated_params$usual_duration_new <- ifelse(
!is.na(updated_params$common_duration),
updated_params$common_duration,
updated_params$usual_dur
)
# Update also usual daily DDD
updated_params$usual_ddd_new <- updated_params$ddd_per_pack/updated_params$usual_duration_new
# Step 3: Run pre2dup with the updated package parameters and with data_to_return = "periods"
final_periods <- pre2dup(
pre_data = purchases_data,
pre_person_id = "id",
pre_atc = "ATC",
pre_package_id = "vnr",
pre_date = "purchase_date",
pre_ratio = "n_packages",
pre_ddd = "amount",
package_parameters = updated_params,
pack_atc = "ATC",
pack_id = "vnr",
pack_ddd_low = "lower_ddd",
pack_ddd_usual ="usual_ddd_new", # New column
pack_dur_min = "minimum_dur",
pack_dur_usual = "usual_duration_new", # New column
pack_dur_max = "maximum_dur",
atc_parameters = ATC_parameters,
atc_class = "partial_atc",
atc_ddd_low = "lower_ddd_atc",
atc_ddd_usual = "usual_ddd_atc",
atc_dur_min = "minimum_dur_atc",
atc_dur_max = "maximum_dur_atc",
hosp_data = hospitalizations_example,
hosp_person_id = "id",
hosp_admission = "hospital_start",
hosp_discharge = "hospital_end",
date_range = c("2020-01-01", "2025-12-31"),
global_gap_max = 300,
global_min = 5,
global_max = 300,
global_max_single = 150,
global_ddd_high = 10,
global_hosp_max = 30,
days_covered = 5,
weight_past = 1,
weight_current = 4,
weight_next = 1,
weight_first_last = 5,
drop_atcs = TRUE,
data_to_return = "periods",
post_process_perc = 1)
# Step 1/6: Checking parameters and datasets...
# Checks passed for ‘pre_data’
# Checks passed for ‘package_parameters’
# Checks passed for ‘atc_parameters’.
# Checks passed for ‘hosp_data’
# Preparing hospitalization data and merging overlapping hospitalizations.
# Step 2/6: Calculating purchase durations...
# Step 3/6: Stockpiling assessment...
# Step 4/6: Common package duration calculation was not selected in function call; skipping this step.
# Step 5/6: Preparing drug use periods...
# Step 6/6: Post-processing drug use periods...
# Current post processing percentage: 1
# Drug use periods calculated. 10 periods created for 5 persons.
# Returning drug use periods.
# The final output
final_periods
# period id ATC dup_start dup_end dup_days dup_hospital_days dup_n_purchases dup_last_purchase dup_total_DDD dup_temporal_average_DDDs
# 1: 1 1 N05AH02 2020-02-10 2021-03-23 408 0 10 2021-02-04 330 0.809
# 2: 2 1 N05AH04 2022-05-01 2025-09-05 1224 0 10 2025-04-15 800 0.654
# 3: 3 2 N05AH02 2020-02-10 2021-03-23 408 0 10 2021-02-04 330 0.809
# 4: 4 2 N05AH04 2022-05-01 2025-09-05 1224 5 10 2025-04-15 800 0.654
# 5: 5 3 N05AH02 2020-02-10 2021-03-23 408 0 10 2021-02-04 330 0.809
# 6: 6 3 N05AH04 2022-05-01 2025-09-05 1224 0 10 2025-04-15 800 0.654
# 7: 7 4 N05AH02 2020-02-10 2021-03-23 408 0 10 2021-02-04 330 0.809
# 8: 8 4 N05AH04 2022-05-01 2025-09-05 1224 0 10 2025-04-15 800 0.654
# 9: 9 5 N05AH02 2020-02-10 2021-03-23 408 0 10 2021-02-04 330 0.809
# 10: 10 5 N05AH04 2022-05-01 2025-09-05 1224 38 10 2025-04-15 800 0.654For any questions or support, feel free to reach out to the package maintainers
…