Constrained Ordination (db-rda / cap) on Distance Matrix
Source:R/beta_compute.R
compute_capscale.RdPerforms distance-based redundancy analysis (constrained pcoa,
a.k.a. cap) on a distance matrix using vegan::capscale, with
optional negative eigenvalue correction. Returns constrained sample scores,
eigenvalues, variance partitioning, and feature associations.
Arguments
- dist_obj
A
distobject returned bycompute_distance(). The normalized abundance matrix used to compute the distances is expected to be attached as an attribute"abundances"(samples in rows, features in columns).- ps
A
phip_dataobject or a table providing sample-level metadata. This table must containsample_idand all variables referenced on the right-hand side offormula. Variable detection usesall.names(terms(formula), functions = FALSE), so transformed terms likelog(age)are supported as long asageexists.- formula
An R formula specifying the constraints (independent variables) for the ordination, e.g.
~ sex + age. Do not include a response on the left-hand side; the distance matrix is provided viadist_obj.- neg_correction
One of
"none","lingoes","cailliez". Method for negative eigenvalue correction. Default is"none". Passed to theaddargument ofvegan::capscale().- top_features
Integer scalar. Number of top features to return in associations (selected per constrained axis by absolute association, then unioned). Default is 30.
- permutations
Integer scalar. Number of permutations for per-term permutation tests via
vegan::anova.cca(by = "term"). Default is 999.- feature_assoc
character scalar. Type of feature-axis association to return.
"weighted_average"returns weighted-average feature scores (centroid of sample scores weighted by feature abundance)."correlation"returns feature-axis correlations."regression"returns regression slopes for axis scores on feature abundance."none"skips feature associations.
Value
A list of class "beta_capscale" with elements:
- sample_coords
Tibble of sample scores on constrained axes (
CAP1,CAP2, ...). Containssample_idand coordinates.- eigenvalues
Numeric vector of eigenvalues of the constrained axes.
- variance_partition
Tibble with total inertia and inertia partitioned into constrained and unconstrained components, with their proportion of total.
- feature_associations
Tibble of top feature-axis associations for constrained axes (possibly empty if the
"abundances"attribute is missing or cannot be aligned). To limit runtime/memory, associations are computed for at most 10 constrained axes.- r2
Numeric scalar. Unadjusted R-squared from
vegan::RsquareAdj().- r2_adj
Numeric scalar. Adjusted R-squared from
vegan::RsquareAdj().- perm_terms
Tibble of per-term permutation tests from
vegan::anova.cca(by = "term").- cap_model
The full
vegan::capscalemodel object.
Examples
# \donttest{
ps <- load_example_data("small_mixture")
#> [18:40:33] INFO Constructing <phip_data> object
#> -> create_data()
#> [18:40:33] INFO Fetching peptide metadata library via get_peptide_library()
#> [18:40:33] INFO Retrieving peptide metadata into DuckDB cache
#> -> get_peptide_library(force_refresh = FALSE)
#> [18:40:33] INFO Opened DuckDB connection
#> - cache dir:
#> /home/runner/.cache/R/phiperio/peptide_meta/phip_cache.duckdb
#> - table: peptide_meta
#> [18:40:33] OK Using cached peptide_meta (fast path)
#> [18:40:33] OK Retrieving peptide metadata into DuckDB cache - done
#> -> elapsed: 0.039s
#> [18:40:33] OK Peptide metadata acquired
#> [18:40:33] INFO Validating <phip_data>
#> -> validate_phip_data()
#> [18:40:33] INFO Checking structural requirements (shape & mandatory columns)
#> [18:40:33] INFO Checking outcome family availability (exist / fold_change /
#> raw_counts)
#> [18:40:33] INFO Checking collisions with reserved names
#> - subject_id, sample_id, timepoint, peptide_id, exist,
#> fold_change, counts_input, counts_hit
#> [18:40:33] INFO Ensuring all columns are atomic (no list-cols)
#> [18:40:33] INFO Checking key uniqueness
#> [18:40:33] INFO Validating value ranges & types for outcomes
#> [18:40:33] INFO Assessing sparsity (NA/zero prevalence vs threshold)
#> - warn threshold: 50%
#> [18:40:33] INFO Checking peptide_id coverage against peptide_library
#> Warning: [18:40:33] WARN peptide_id not found in peptide_library (e.g. 10003)
#> -> peptide library coverage.
#> [18:40:33] INFO Checking full grid completeness (peptide * sample)
#> Warning: [18:40:33] WARN Counts table is not a full peptide * sample grid.
#> -> grid completeness
#> - observed rows: 78200
#> - expected rows: 156000.
#> Warning: [18:40:33] WARN Grid remains incomplete (auto_expand = FALSE).
#> -> grid completeness
#> - observed rows: 78200
#> - expected rows: 156000.
#> [18:40:33] OK Validating <phip_data> - done
#> -> elapsed: 0.534s
#> [18:40:33] OK Constructing <phip_data> object - done
#> -> elapsed: 0.574s
# compute distance matrix
val_col <- "fold_change"
dist_bc <- compute_distance(
ps,
value_col = val_col,
method_normalization = "hellinger",
distance = "bray",
n_threads = 2L
)
#> [18:40:33] INFO building abundance matrix from `ps` using `fold_change`.
#> [18:40:33] INFO building pivot spec (sample_id x peptide_id).
#> [18:40:33] INFO Collecting long table (sample_id, peptide_id, value).
#> -> compute_distance
#> [18:40:33] INFO Pivoting to wide abundance matrix in R.
#> -> compute_distance
#> [18:40:33] INFO abundance matrix has 43 samples and 5 features after
#> preprocessing.
#> [18:40:33] INFO computing distance: bray
#> [18:40:33] INFO distance matrix computation complete.
# pick a simple constraint that exists in the example data (fallback order)
dat <- ps
cand <- c("group", "big_group", "type_person", "sex", "age")
rhs_var <- cand[cand %in% dplyr::tbl_vars(dat)][1]
cap_res <- compute_capscale(
dist_bc,
ps = ps,
formula = stats::as.formula(paste0("~ ", rhs_var)),
neg_correction = "none",
top_features = 30L
)
#> [18:40:33] INFO building metadata from `ps$data_long`.
#> [18:40:33] INFO fitting constrained ordination (cap/db-rda)
#> - formula: ~group
#> [18:40:34] INFO extracting constrained sample scores.
#> [18:40:34] INFO computing variance partitioning and permutation tests.
#> [18:40:34] INFO computing feature associations: weighted_average.
#> [18:40:34] INFO cap analysis complete.
cap_res$variance_partition
#> # A tibble: 3 × 3
#> component inertia proportion
#> <chr> <dbl> <dbl>
#> 1 Total 11.3 1
#> 2 Constrained 10.1 0.899
#> 3 Unconstrained 1.27 0.113
cap_res$sample_coords
#> # A tibble: 43 × 2
#> sample_id CAP1
#> <chr> <dbl>
#> 1 A_T1_1 -0.766
#> 2 B_T1_1 0.669
#> 3 A_T1_10 -0.779
#> 4 B_T1_10 0.678
#> 5 A_T1_11 -0.749
#> 6 B_T1_11 0.666
#> 7 A_T1_12 -0.786
#> 8 B_T1_12 0.664
#> 9 A_T1_13 -0.781
#> 10 B_T1_13 0.634
#> # ℹ 33 more rows
cap_res$feature_associations
#> # A tibble: 5 × 2
#> feature CAP1
#> <chr> <dbl>
#> 1 16196 -0.764
#> 2 16627 0.666
#> 3 18003 -0.763
#> 4 24799 -0.765
#> 5 5243 0.662
# }