Introduction to PRE2DUP-R • PRE2DUPR

This vignette provides a brief introduction to the PRE2DUPR package, which is designed to construct treatment periods from drug purhchases data with PRE2DUP algorithm. The package includes functions for validating data and running the PRE2DUP.

Installation

To install the PRE2DUPR package, you can use the following command in R:

install.packages("devtools")
devtools::install_github("piavat/PRE2DUP-R")

To use the PRE2DUPR package, you can start by loading it into your R session:

library(PRE2DUPR)

Data

The PRE2DUPR package comes with example datasets that you can use to test the functionality of the PRE2DUP algorithm. The datasets include:

purchases_example: A dataset containing drug purchase records.
hospitalizations_example: A dataset containing hospital admission records.
package_parameters_example: A dataset containing package characteristics.
ATC_parameters_example: A dataset containing ATC code characteristics.

All data types have associated functions to validate the input before running pre2dup. These functions are run internally by the program, so you don’t need to run them manually unless you want to check your data beforehand.

It is recommended to run these checks in advance to detect and correct errors more easily and efficiently. Note that the internal checks in pre2dup will display only the first five rows with detected errors. When run separately, all rows with issues can be listed by adjusting the function parameter print_all = TRUE.

Drug purchases data

Drug purchases are records with information about the purchase of drugs, including the person who made the purchase, the drug’s ATC code, the package ID, the date of purchase, the number of packages purchased, and the amount in DDDs (Defined Daily Doses).

Data validation

Function check_purchases checks the data before running the PRE2DUP algorithm. It ensures that the dataset meets the necessary requirements for the algorithm to function correctly.

check_purchases(dt = purchases_example, 
                pre_person_id = "id",
                pre_atc = "ATC",
                pre_package_id = "vnr",
                pre_date = "purchase_date",
                pre_ratio = "n_packages",
                pre_ddd = "amount",
                drop_atcs = TRUE,
                print_all = TRUE)
                
# Checks passed for ‘purchases_example’

Hospitalizations data

Hospitalizations are records of hospital admissions, including the person ID, admission date, and discharge date. This data is used to assess the impact of hospitalizations on drug exposure periods.

Data validation

Function check_hospitalizations checks the data before running the PRE2DUP algorithm.

check_hospitalizations(dt = hospitalizations_example,
                       hosp_person_id = "id",
                       hosp_admission = "hospital_start",
                       hosp_discharge = "hospital_end",
                       print_all = TRUE)
                
# Checks passed for ‘hospitalizations_example’

Package parameters

Package parameters are used to define the characteristics of drug packages. The parameter file specifies the identifying number, ATC code, and the minimum, usual, and maximum duration of a package, as well as the usual and minimum dose in defined daily doses (DDDs).

Intruction show to create package parameters Package Parameters tutorial.

Data validation

Function check_package_parameters checks the data before running the PRE2DUP algorithm.

check_package_parameters(dt = package_parameters_example, 
                         pack_atc = "ATC",
                         pack_id = "vnr",
                         pack_ddd_low = "lower_ddd", 
                         pack_ddd_usual = "usual_ddd",
                         pack_dur_min = "minimum_dur",
                         pack_dur_usual = "usual_dur", 
                         pack_dur_max = "maximum_dur",
                         print_all = FALSE)
                
# Checks passed for ‘package_parameters_example’

ATC parameters

ATC parameters are used to define the characteristics of ATC codes when package-specific information is not available. The ATC parameters file specifies the partial or full ATC code, the lower limit of daily dose, the usual daily dose, and the minimum and maximum allowed treatment durations. Package example data ATC_parameters can be used as such or as an example of how to create your own ATC code characteristics dataset.

Data validation

Function check_atc_parameters checks the ATC parameters data before running the PRE2DUP algorithm.

check_atc_parameters(dt = ATC_parameters,
                     atc_class = "partial_atc",
                     atc_ddd_low = "lower_ddd_atc",
                     atc_ddd_usual = "usual_ddd_atc", 
                     atc_dur_min = "minimum_dur_atc",
                     atc_dur_max = "maximum_dur_atc",
                     print_all = TRUE)
                
# Checks passed for ‘ATC_parameters’.

Running the PRE2DUP

The PRE2DUP algorithm for creation of drug use periods is run using the pre2dup function. This function will process your drug purchase data, hospitalizations, package parameters, and ATC parameters to estimate drug exposure.

outdata <- pre2dup(
  pre_data = purchases_example,
  pre_person_id = "id",
  pre_atc = "ATC",
  pre_package_id = "vnr",
  pre_date = "purchase_date",
  pre_ratio = "n_packages",
  pre_ddd = "amount",
  package_parameters = package_parameters_example,
  pack_atc = "ATC",
  pack_id = "vnr",
  pack_ddd_low = "lower_ddd",
  pack_ddd_usual ="usual_ddd",
  pack_dur_min = "minimum_dur",
  pack_dur_usual = "usual_dur",
  pack_dur_max = "maximum_dur",
  atc_parameters = ATC_parameters,
  atc_class = "partial_atc",
  atc_ddd_low = "lower_ddd_atc",
  atc_ddd_usual = "usual_ddd_atc",
  atc_dur_min = "minimum_dur_atc",
  atc_dur_max = "maximum_dur_atc",
  hosp_data = hospitalizations_example,
  hosp_person_id = "id",
  hosp_admission = "hospital_start",
  hosp_discharge = "hospital_end",
  date_range = c("2025-01-01", "2025-12-31"),
  global_gap_max = 300,
  global_min = 5,
  global_max = 300,
  global_max_single = 150,
  global_ddd_high = 10,
  global_hosp_max = 30,
  days_covered = 5,
  weight_past = 1,
  weight_current = 4,
  weight_next = 1,
  weight_first_last = 5,
  drop_atcs = TRUE,
  data_to_return = "periods",
  post_process_perc = 1)

# Step 1/6: Checking parameters and datasets...
# Checks passed for ‘pre_data’
# Checks passed for ‘package_parameters’
# Checks passed for ‘atc_parameters’.
# Checks passed for ‘hosp_data’
# Preparing hospitalization data and merging overlapping hospitalizations.
# Step 2/6: Calculating purchase durations...
# Step 3/6: Stockpiling assessment...
# Step 4/6: Common package duration calculation was not selected in function call; skipping this step.
# Step 5/6: Preparing drug use periods...
# Step 6/6: Post-processing drug use periods...
# Current post processing percentage: 1
# Drug use periods calculated. 7 periods created for 5 persons.
# Returning drug use periods.

# Drug use periods
outdata
#   period     id     ATC  dup_start    dup_end dup_days dup_hospital_days dup_n_purchases dup_last_purchase dup_total_DDD dup_temporal_average_DDDs
# <int> <fctr>  <char>     <Date>     <Date>    <num>             <num>           <int>            <Date>         <num>                     <num>
# 1:      1      1 N05AH02 2025-01-01 2025-04-14      104                 0               3        2025-03-08         99.99                     0.961
# 2:      2      2 N05AH02 2025-01-15 2025-04-28      104                 5               3        2025-03-22         99.99                     0.961
# 3:      3      3 N05AH02 2025-02-01 2025-05-15      104                 0               3        2025-04-08         99.99                     0.961
# 4:      4      3 N05AH04 2025-01-05 2025-08-26      233                 0               2        2025-04-15        200.00                     0.858
# 5:      5      4 N05AH02 2025-01-10 2025-04-23      104                 0               3        2025-03-17         99.99                     0.961
# 6:      6      4 N05AH04 2025-01-20 2025-09-10      233                 0               2        2025-04-30        200.00                     0.858
# 7:      7      5 N05AH04 2025-01-01 2025-08-22      233                38               2        2025-04-11        200.00                     0.858

Workflow when using estimated usual package durations from data

The pre2dup function has an option to calculate typical package durations from the drug purchases data. This is useful when user wants to derive package parameter’s usual package durations based on the actual purchase patterns. User runs first the pre2dup with argument data_to_return = "parameters". The function then returns a package parameter file that includes an additional column for the typical package durations. The user can use this information to update the parameters usual package duration and usual daily DDD and run pre2dup with data_to_return = "periods" to create the drug exposure periods.

# Step 1: Calculate common package durations in drug purchases data
id <- sort(rep(1:5, each = 20))
vnr <- rep(c(rep(30627, 10), rep(41738, 10)), 5)
ATC <- rep(c(rep("N05AH02", 10), rep("N05AH04", 10)), 5)
d40 <- as.Date("2020-01-01")  + 40*1:10
d120 <- as.Date("2022-01-01")  + 120*1:10
purchase_date <- rep(c(d40, d120), 5)
n_packages <- rep(1, 100)
amount <- rep(c(rep(33, 10), rep(80, 10)), 5)

purchases_data <- data.frame(id, vnr, ATC, purchase_date, n_packages, amount)

# Run pre2dup with data_to_return = "parameters". 
updated_params <- pre2dup(
  pre_data = purchases_data,
  pre_person_id = "id",
  pre_atc = "ATC",
  pre_package_id = "vnr",
  pre_date = "purchase_date",
  pre_ratio = "n_packages",
  pre_ddd = "amount",
  package_parameters = package_parameters_example,
  pack_atc = "ATC",
  pack_id = "vnr",
  pack_ddd_low = "lower_ddd",
  pack_ddd_usual ="usual_ddd",
  pack_dur_min = "minimum_dur",
  pack_dur_usual = "usual_dur",
  pack_dur_max = "maximum_dur",
  atc_parameters = ATC_parameters,
  atc_class = "partial_atc",
  atc_ddd_low = "lower_ddd_atc",
  atc_ddd_usual = "usual_ddd_atc",
  atc_dur_min = "minimum_dur_atc",
  atc_dur_max = "maximum_dur_atc",
  hosp_data = hospitalizations_example,
  hosp_person_id = "id",
  hosp_admission = "hospital_start",
  hosp_discharge = "hospital_end",
  date_range = c("2020-01-01", "2025-12-31"),
  global_gap_max = 300,
  global_min = 5,
  global_max = 300,
  global_max_single = 150,
  global_ddd_high = 10,
  global_hosp_max = 30,
  days_covered = 5,
  weight_past = 1,
  weight_current = 4,
  weight_next = 1,
  weight_first_last = 5,
  drop_atcs = TRUE,
  data_to_return = "parameters")


# Step 1/6: Checking parameters and datasets...
# Checks passed for ‘pre_data’
# Checks passed for ‘package_parameters’
# Checks passed for ‘atc_parameters’.
# Checks passed for ‘hosp_data’
# Preparing hospitalization data and merging overlapping hospitalizations.
# Step 2/6: Calculating purchase durations...
# Step 3/6: Stockpiling assessment...
# Step 4/6: Calculating common package durations in data...
# Common package durations calculated, returning updated package parameters.

# Step 2: Check new common durations and use as new usual duration
updated_params[!is.na(updated_params$common_duration), ]
#     vnr     ATC product_name strength strength_num packagesize packsize_num drug_form_harmonized ddd_per_pack minimum_dur usual_dur maximum_dur lower_ddd usual_ddd common_duration
# 6 30627 N05AH02      LEPONEX    100MG          100         100          100               TABLET     33.33333          25     33.33         100    0.3333      1.00              40
# 8 41738 N05AH04    KETIPINOR    300MG          300      100FOL          100               TABLET     75.00000          50    100.00         200    0.3750      0.75             120

# Make a new common duration column selecting the common duration from the updated package parameters by your choice
updated_params$usual_duration_new <- ifelse(
  !is.na(updated_params$common_duration),
  updated_params$common_duration,
  updated_params$usual_dur
)

# Update also usual daily DDD
updated_params$usual_ddd_new <- updated_params$ddd_per_pack/updated_params$usual_duration_new

# Step 3: Run pre2dup with the updated package parameters and with data_to_return = "periods"
final_periods <- pre2dup(
  pre_data = purchases_data,
  pre_person_id = "id",
  pre_atc = "ATC",
  pre_package_id = "vnr",
  pre_date = "purchase_date",
  pre_ratio = "n_packages",
  pre_ddd = "amount",
  package_parameters = updated_params,
  pack_atc = "ATC",
  pack_id = "vnr",
  pack_ddd_low = "lower_ddd",
  pack_ddd_usual ="usual_ddd_new", # New column
  pack_dur_min = "minimum_dur",
  pack_dur_usual =  "usual_duration_new", # New column
  pack_dur_max = "maximum_dur",
  atc_parameters = ATC_parameters,
  atc_class = "partial_atc",
  atc_ddd_low = "lower_ddd_atc",
  atc_ddd_usual = "usual_ddd_atc",
  atc_dur_min = "minimum_dur_atc",
  atc_dur_max = "maximum_dur_atc",
  hosp_data = hospitalizations_example,
  hosp_person_id = "id",
  hosp_admission = "hospital_start",
  hosp_discharge = "hospital_end",
  date_range = c("2020-01-01", "2025-12-31"),
  global_gap_max = 300,
  global_min = 5,
  global_max = 300,
  global_max_single = 150,
  global_ddd_high = 10,
  global_hosp_max = 30,
  days_covered = 5,
  weight_past = 1,
  weight_current = 4,
  weight_next = 1,
  weight_first_last = 5,
  drop_atcs = TRUE,
  data_to_return = "periods",
  post_process_perc = 1)

# Step 1/6: Checking parameters and datasets...
# Checks passed for ‘pre_data’
# Checks passed for ‘package_parameters’
# Checks passed for ‘atc_parameters’.
# Checks passed for ‘hosp_data’
# Preparing hospitalization data and merging overlapping hospitalizations.
# Step 2/6: Calculating purchase durations...
# Step 3/6: Stockpiling assessment...
# Step 4/6: Common package duration calculation was not selected in function call; skipping this step.
# Step 5/6: Preparing drug use periods...
# Step 6/6: Post-processing drug use periods...
# Current post processing percentage: 1
# Drug use periods calculated. 10 periods created for 5 persons.
# Returning drug use periods.

# The final output
final_periods
#    period     id     ATC  dup_start    dup_end dup_days dup_hospital_days dup_n_purchases dup_last_purchase dup_total_DDD dup_temporal_average_DDDs
# 1:      1      1 N05AH02 2020-02-10 2021-03-23      408                 0              10        2021-02-04           330                     0.809
# 2:      2      1 N05AH04 2022-05-01 2025-09-05     1224                 0              10        2025-04-15           800                     0.654
# 3:      3      2 N05AH02 2020-02-10 2021-03-23      408                 0              10        2021-02-04           330                     0.809
# 4:      4      2 N05AH04 2022-05-01 2025-09-05     1224                 5              10        2025-04-15           800                     0.654
# 5:      5      3 N05AH02 2020-02-10 2021-03-23      408                 0              10        2021-02-04           330                     0.809
# 6:      6      3 N05AH04 2022-05-01 2025-09-05     1224                 0              10        2025-04-15           800                     0.654
# 7:      7      4 N05AH02 2020-02-10 2021-03-23      408                 0              10        2021-02-04           330                     0.809
# 8:      8      4 N05AH04 2022-05-01 2025-09-05     1224                 0              10        2025-04-15           800                     0.654
# 9:      9      5 N05AH02 2020-02-10 2021-03-23      408                 0              10        2021-02-04           330                     0.809
# 10:     10     5 N05AH04 2022-05-01 2025-09-05     1224                38              10        2025-04-15           800                     0.654

For any questions or support, feel free to reach out to the package maintainers

…