Retrieve the peptide metadata table into DuckDB, forcing atomic types

This function uses the phiperio logging utilities for consistent, ASCII-only progress messages and timing. Long-running steps are bracketed with .ph_with_timing(), and informational/warning/error messages are emitted via .ph_log_info(), .ph_log_ok(), .ph_warn(), and .ph_abort().

Downloads the RDS once, sanitizes types (logical, character, numeric), and writes into a DuckDB cache on disk.
Subsequent calls return a lazy tbl_dbi without loading into R memory.

Usage

get_peptide_library(force_refresh = FALSE)

Arguments

force_refresh: Logical. If TRUE, re-downloads and rebuilds the cache.

Value

A dplyr::tbl_dbi pointing to the peptide_meta table. The returned object carries an attribute "duckdb_con" with the open DBI connection.

Details

Caching: A persistent DuckDB database is created under the user cache directory (via tools::R_user_dir("phiperio", "cache")). You can override this location with options(phiperio.cache_dir = \"...\"). The force_refresh argument bypasses the fast path and rebuilds the cache.

Sanitization: Columns are stripped of attributes, list-columns are flattened, textual "NaN" and numeric NaN are coerced to NA. Binary 0/1 fields are converted to logical, "TRUE"/"FALSE" (case-insensitive) are converted to logical, and numeric-looking character columns (beyond trivial 0/1) are converted to numeric. All other atomic types are preserved.

Integrity check: If a SHA-256 checksum is provided, a warning is logged when the downloaded file’s checksum does not match the expected value.

Examples

lib <- get_peptide_library()
#> [13:45:33] INFO  Retrieving peptide metadata into DuckDB cache
#>                  -> get_peptide_library(force_refresh = FALSE)
#> [13:45:33] INFO  Opened DuckDB connection
#>                    - cache dir:
#>                      /home/runner/.cache/R/phiperio/peptide_meta/phip_cache.duckdb
#>                    - table: peptide_meta
#> [13:45:33] OK    Using cached peptide_meta (fast path)
#> [13:45:33] OK    Retrieving peptide metadata into DuckDB cache - done
#>                  -> elapsed: 0.045s