Retrieve the peptide metadata table into DuckDB, forcing atomic types
Source:R/peptide-library.R
get_peptide_library.RdThis function uses the phiperio logging utilities for
consistent, ASCII-only progress messages and timing. Long-running steps are
bracketed with .ph_with_timing(), and informational/warning/error
messages are emitted via .ph_log_info(), .ph_log_ok(), .ph_warn(),
and .ph_abort().
Downloads the RDS once, sanitizes types (logical, character, numeric), and writes into a DuckDB cache on disk.
Subsequent calls return a lazy
tbl_dbiwithout loading into R memory.
Value
A dplyr::tbl_dbi pointing to the peptide_meta table. The returned
object carries an attribute "duckdb_con" with the open DBI connection.
Details
Caching: A persistent DuckDB database is created under the user cache
directory (via tools::R_user_dir("phiperio", "cache")). You can override
this location with options(phiperio.cache_dir = \"...\"). The
force_refresh argument bypasses the fast path and rebuilds the cache.
Sanitization: Columns are stripped of attributes, list-columns are
flattened, textual "NaN" and numeric NaN are coerced to NA. Binary 0/1
fields are converted to logical, "TRUE"/"FALSE" (case-insensitive) are
converted to logical, and numeric-looking character columns (beyond trivial
0/1) are converted to numeric. All other atomic types are preserved.
Integrity check: If a SHA-256 checksum is provided, a warning is logged when the downloaded file’s checksum does not match the expected value.
Examples
lib <- get_peptide_library()
#> [13:45:33] INFO Retrieving peptide metadata into DuckDB cache
#> -> get_peptide_library(force_refresh = FALSE)
#> [13:45:33] INFO Opened DuckDB connection
#> - cache dir:
#> /home/runner/.cache/R/phiperio/peptide_meta/phip_cache.duckdb
#> - table: peptide_meta
#> [13:45:33] OK Using cached peptide_meta (fast path)
#> [13:45:33] OK Retrieving peptide metadata into DuckDB cache - done
#> -> elapsed: 0.045s