Validate Hospitalization Data
check_hospitalizations.Rd
This function checks the structure and content of hospitalization data (data.frame or data.table) for use in pre2dup
workflows.
It validates required columns, data types, date consistency, and chronological logic (admission before discharge).
If all checks pass, it can return a cleaned data.table with the required columns and types.
Usage
check_hospitalizations(
dt,
hosp_person_id = NULL,
hosp_admission = NULL,
hosp_discharge = NULL,
date_range = NULL,
print_all = FALSE,
return_data = FALSE
)
Arguments
- dt
data.frame or data.table containing hospitalization records.
- hosp_person_id
Character. Column name for the person identifier.
- hosp_admission
Character. Column name for hospital admission date.
- hosp_discharge
Character. Column name for hospital discharge date.
- date_range
Character vector of length 2. Date range for hospitalizations (e.g., c("1995-01-01", "2025-12-31")). Default is NULL (no date range check).
- print_all
Logical. If TRUE, all row numbers that caused warnings are printed; if FALSE, only the first 5 problematic rows are printed.
- return_data
Logical. If TRUE and no errors are detected, returns a data.table with the validated columns and proper types. If FALSE, only a message is printed.
Value
If return_data = TRUE
, returns a data.table containing only the validated columns, with dates converted to integer and overlapping hospitalizations combined.
If errors are detected, the function stops and prints error messages.
Details
The following checks are performed:
Existence and naming of required columns
Validity of person identifiers (numeric or non-numeric, no missing values)
Admission and discharge dates are present and convertible to date
Admission date is strictly before discharge date
All dates are within the specified range (if given)
Overlapping hospitalizations are combined
If any errors are found, the function stops execution and prints all error messages.
Examples
PID <- c(1, 1, 2, 2)
Entry <- c("2023-01-01", "2023-02-01", "2023-01-01", "2023-02-01")
Leave <- c("2023-01-15", "2023-02-15", "2023-01-10", "2023-02-10")
hospital_data <- data.frame(PID, Entry, Leave)
hospitalizations <- check_hospitalizations(
hospital_data,
hosp_person_id = "PID",
hosp_admission = "Entry",
hosp_discharge = "Leave",
return_data = TRUE
)
#> Checks passed for 'hospital_data'
#> Preparing hospitalization data and merging overlapping hospitalizations.
hospitalizations
#> pid_hosp admission_date discharge_date
#> <fctr> <int> <int>
#> 1: 1 19358 19372
#> 2: 1 19389 19403
#> 3: 2 19358 19367
#> 4: 2 19389 19398