Skip to contents

This function checks the structure and content of hospitalization data (data.frame or data.table) for use in pre2dup workflows. It validates required columns, data types, date consistency, and chronological logic (admission before discharge). If all checks pass, it can return a cleaned data.table with the required columns and types.

Usage

check_hospitalizations(
  dt,
  hosp_person_id = NULL,
  hosp_admission = NULL,
  hosp_discharge = NULL,
  date_range = NULL,
  print_all = FALSE,
  return_data = FALSE
)

Arguments

dt

data.frame or data.table containing hospitalization records.

hosp_person_id

Character. Column name for the person identifier.

hosp_admission

Character. Column name for hospital admission date.

hosp_discharge

Character. Column name for hospital discharge date.

date_range

Character vector of length 2. Date range for hospitalizations (e.g., c("1995-01-01", "2025-12-31")). Default is NULL (no date range check).

print_all

Logical. If TRUE, all row numbers that caused warnings are printed; if FALSE, only the first 5 problematic rows are printed.

return_data

Logical. If TRUE and no errors are detected, returns a data.table with the validated columns and proper types. If FALSE, only a message is printed.

Value

If return_data = TRUE, returns a data.table containing only the validated columns, with dates converted to integer and overlapping hospitalizations combined. If errors are detected, the function stops and prints error messages.

Details

The following checks are performed:

  • Existence and naming of required columns

  • Validity of person identifiers (numeric or non-numeric, no missing values)

  • Admission and discharge dates are present and convertible to date

  • Admission date is strictly before discharge date

  • All dates are within the specified range (if given)

  • Overlapping hospitalizations are combined

If any errors are found, the function stops execution and prints all error messages.

Examples

PID <- c(1, 1, 2, 2)
Entry <- c("2023-01-01", "2023-02-01", "2023-01-01", "2023-02-01")
Leave <- c("2023-01-15", "2023-02-15", "2023-01-10", "2023-02-10")
hospital_data <- data.frame(PID, Entry, Leave)

hospitalizations <- check_hospitalizations(
  hospital_data,
  hosp_person_id = "PID",
  hosp_admission = "Entry",
  hosp_discharge = "Leave",
  return_data = TRUE
)
#> Checks passed for 'hospital_data'
#> Preparing hospitalization data and merging overlapping hospitalizations.
hospitalizations
#>    pid_hosp admission_date discharge_date
#>      <fctr>          <int>          <int>
#> 1:        1          19358          19372
#> 2:        1          19389          19403
#> 3:        2          19358          19367
#> 4:        2          19389          19398