Estimate Drug Use Periods from Drug Purchase Data
pre2dup.Rd
Estimates drug use periods based on individual drug purchase data. Optionally, hospitalization data can be incorporated. The estimation uses package-specific and Anatomical Therapeutic Chemical (ATC) Classification code -level parameters. This function supports estimation for individuals with varied purchase patterns, including stockpiling behavior.
Usage
pre2dup(
pre_data,
pre_person_id,
pre_atc,
pre_package_id,
pre_date,
pre_ratio,
pre_ddd,
package_parameters,
pack_atc,
pack_id,
pack_ddd_low,
pack_ddd_usual,
pack_dur_min,
pack_dur_usual,
pack_dur_max,
atc_parameters,
atc_class,
atc_ddd_low,
atc_ddd_usual,
atc_dur_min,
atc_dur_max,
hosp_data = NULL,
hosp_person_id = NULL,
hosp_admission = NULL,
hosp_discharge = NULL,
date_range = NULL,
global_gap_max = 300,
global_min = 5,
global_max = 300,
global_max_single = 150,
global_ddd_high = 10,
global_hosp_max = 30,
days_covered = 5,
weight_past = 1,
weight_current = 4,
weight_next = 1,
weight_first_last = 5,
calculate_pack_dur_usual = FALSE,
post_process_perc = 1
)
Arguments
- pre_data
data.frame or data.table containing drug purchases.
- pre_person_id
character, name of the column containing person id.
- pre_atc
character, name of the column containing ATC code.
- pre_package_id
character, name of the column containing package id.
- pre_date
character, name of the column containing purchase date.
- pre_ratio
character, name of the column containing ratio of packages purchased (e.g., number of packages).
- pre_ddd
character, name of the column containing defined daily doses (DDD) of the purchase.
- package_parameters
data.frame or data.table containing package parameters.
- pack_atc
character, name of the column containing ATC code.
- pack_id
character, name of the column containing package id.
- pack_ddd_low
character, name of the column containing lower limit of daily DDD.
- pack_ddd_usual
character, name of the column containing usual daily DDD.
- pack_dur_min
character, name of the column containing minimum duration of the package.
- pack_dur_usual
character, name of the column containing usual duration of the package.
- pack_dur_max
character, name of the column containing maximum duration of the package.
- atc_parameters
data.frame or data.table containing ATC parameters.
- atc_class
character, name of the column containing ATC class.
- atc_ddd_low
character, name of the column containing lower limit of daily DDD for the ATC class.
- atc_ddd_usual
character, name of the column containing usual daily DDD for the ATC class.
- atc_dur_min
character, name of the column containing minimum duration for the ATC class.
- atc_dur_max
character, name of the column containing maximum duration for the ATC class.
- hosp_data
data.frame or data.table containing hospitalizations.
- hosp_person_id
character, name of the column containing person id.
- hosp_admission
character, name of the column containing admission date.
- hosp_discharge
character, name of the column containing discharge date.
- date_range
character, vector of two dates, start and end of the purchase data.
- global_gap_max
numeric, maximum gap between purchases, default 300..
- global_min
numeric, minimum duration of a purchase, default 5.
- global_max
numeric, maximum duration of a purchase, default 300.
- global_max_single
numeric, maximum duration of a single purchase, default 150.
- global_ddd_high
numeric, maximum daily DDD for a purchase per day for any ATC, default 10.
- global_hosp_max
numeric, maximum number of hospital days to be considered when estimating the exposure duration, default 30.
- days_covered
numeric, maximum number of days to be added to the exposure duration to cover the gap between purchases, default 5.
- weight_past
numeric, weight for the past purchase in sliding average calculation, default 1.
- weight_current
numeric, weight for the current purchase in sliding average calculation, default 4.
- weight_next
numeric, weight for the next purchase in sliding average calculation, default 1.
- weight_first_last
numeric, weight for the first and last purchase in sliding average calculation, default 5.
- calculate_pack_dur_usual
TRUE or FALSE, re-calculate usual duration of the package based on the purchase frequency in data, default FALSE.
- post_process_perc
numeric, percentage of the data to be used in post-processing, default 1.
Value
a list of two elements. Main element is periods
: a data.table with one row per drug use period, including person, ATC, period start/end dates, duration, number of purchases, and total DDD. If calculate_pack_dur_usual = TRUE
, an additional element pack_info
contains updated package parameter information.
Details
Before starting to estimate the drug use periods, the function validates the input data and arguments by checking for missing values and unacceptable duplicates. It will stop execution if such issues are detected, with the following exceptions:
Up to 10% of missing DDD values per ATC class in the drug purchase data is allowed.
Up to 10% of missing package parameter records per ATC class is allowed.
If either threshold is exceeded, the function prompts the user to decide whether to continue. If the user agrees, ATC classes with insufficient data are excluded, and the function proceeds with the remaining data.
There are five available methods for estimating the duration of each purchase, presented in the order of preference:
Main method: Based on purchased daily doses (DDDs), temporal average of daily DDDs, and individual purchase patterns.
Package DDD method: Based on purchased DDDs and the usual daily DDD for the specific package.
Package duration method: Based on the usual duration of the package, considering the proportion of the package purchased.
ATC-level DDD method: Based on purchased DDDs and usual daily DDDs at the ATC level.
Minimum ATC duration method: Based on the minimum duration defined for the ATC group.
Periods that are close in time can be joined in a post-processing step controlled by post_process_perc
. Post processing percentage reduces by 0.1 at each estimation round to prevent very long calculation times for large datasets.
In addition to estimating drug use periods, the function can also calculate common package durations from the purchase data. These calculated durations can be used to verify and adjust the usual duration parameters of packages. After making corrections, re-run the function to recalculate drug use periods using the updated package parameters.
See also
Each data type has their own check functions. pre2dup
runs the checks internally, but checking the validity before running the program is recommended for faster and easier error detection and handling.
check_purchases
, check_hospitalizations
, check_package_parameters
, check_atc_parameters
Examples
period_data <-pre2dup(pre_data = purchases_example, pre_person_id = "id",
pre_atc = "ATC", pre_package_id = "vnr", pre_date = "purchase_date",
pre_ratio = "n_packages", pre_ddd = "amount",
package_parameters = package_parameters_example,
pack_atc = "ATC", pack_id = "vnr", pack_ddd_low = "lower_ddd",
pack_ddd_usual ="usual_ddd", pack_dur_min = "minimum_dur",
pack_dur_usual = "usual_dur", pack_dur_max = "maximum_dur",
atc_parameters = ATC_parameters, atc_class = "partial_atc",
atc_ddd_low = "lower_ddd_atc", atc_ddd_usual = "usual_ddd_atc",
atc_dur_min = "minimum_dur_atc", atc_dur_max = "maximum_dur_atc",
hosp_data = hospitalizations_example, hosp_person_id = "id",
hosp_admission = "hospital_start", hosp_discharge = "hospital_end",
date_range = c("2025-01-01", "2025-12-31"),
global_gap_max = 300, global_min = 5, global_max = 300,
global_max_single = 150, global_ddd_high = 10,
global_hosp_max = 30, weight_past = 1, weight_current = 4,
weight_next = 1, weight_first_last = 5,
calculate_pack_dur_usual = TRUE,
days_covered = 5,
post_process_perc = 1)
#> Step 1/6: Checking parameters and datasets...
#> Checks passed for 'pre_data'
#> Checks passed for 'package_parameters'
#> Checks passed for 'atc_parameters'.
#> Checks passed for 'hosp_data'
#> Preparing hospitalization data and merging overlapping hospitalizations.
#> Step 2/6: Calculating purchase durations...
#> Step 3/6: Stockpiling assessment...
#> Step 4/6: Calculating common package durations in data...
#> Refill lengths couldn't be re-estimated, probably due to too small data size.
#> Step 5/6: Preparing drug use periods...
#> Step 6/6: Post-processing drug use periods...
#> Current post processing percentage: 1
#> Drug use periods calculated. 7 periods created for 5 persons.
period_data$periods
#> Key: <period>
#> period id ATC dup_start dup_end dup_days dup_hospital_days
#> <int> <fctr> <char> <Date> <Date> <num> <num>
#> 1: 1 1 N05AH02 2025-01-01 2025-04-14 104 0
#> 2: 2 2 N05AH02 2025-01-15 2025-04-28 104 5
#> 3: 3 3 N05AH02 2025-02-01 2025-05-15 104 0
#> 4: 4 3 N05AH04 2025-01-05 2025-08-26 233 0
#> 5: 5 4 N05AH02 2025-01-10 2025-04-23 104 0
#> 6: 6 4 N05AH04 2025-01-20 2025-09-10 233 0
#> 7: 7 5 N05AH04 2025-01-01 2025-08-22 233 38
#> dup_n_purchases dup_last_purchase dup_total_DDD dup_temporal_average_DDDs
#> <int> <Date> <num> <num>
#> 1: 3 2025-03-08 99.99 0.961
#> 2: 3 2025-03-22 99.99 0.961
#> 3: 3 2025-04-08 99.99 0.961
#> 4: 2 2025-04-15 200.00 0.858
#> 5: 3 2025-03-17 99.99 0.961
#> 6: 2 2025-04-30 200.00 0.858
#> 7: 2 2025-04-11 200.00 0.858