Performs t-distributed stochastic neighbor embedding (t-SNE) on a sample distance matrix to create low-dimensional embeddings for visualization. Returns t-SNE coordinates with optional sample metadata attached.
Usage
compute_tsne(
ps,
dist_obj,
dims = 3L,
perplexity = 30,
theta = 0.5,
max_iter = 1000L,
meta_cols = NULL,
seed = NULL,
check_duplicates = FALSE,
...
)Arguments
- ps
A
phip_dataobject or a table providing sample-level metadata. This table must containsample_idand any columns specified inmeta_cols.- dist_obj
A sample distance object. Either:
A
distobject (e.g., fromcompute_distance()).A numeric, symmetric matrix with row/column names corresponding to
sample_ids.
- dims
Integer scalar. Number of t-SNE dimensions to compute (2 or 3). Default is 3 to enable both 2D and 3D visualizations.
- perplexity
Numeric scalar. Perplexity parameter for t-SNE. Must be smaller than the number of samples. If too large, it is automatically reduced with a warning.
- theta
Numeric scalar. Speed/accuracy tradeoff for Barnes-Hut approximation. Default is 0.5.
- max_iter
Integer scalar. Maximum number of t-SNE iterations. Default is 1000.
- meta_cols
Character vector. Column names from
psto attach as metadata. IfNULL, usesps$meta$extra_colswhen available.- seed
Integer scalar. Random seed for reproducibility. If provided, the seed is temporarily set and restored after computation.
- check_duplicates
Logical scalar. Whether to check for duplicate points. Default is
FALSEfor distance input.- ...
Additional arguments passed to
Rtsne::Rtsne().
Value
A tibble of class c("phip_tsne", "tbl_df", "tbl",
"data.frame")
with columns:
sample_id: Sample identifier from distance labels.tSNE1,tSNE2: First two t-SNE dimensions.tSNE3: Third dimension ifdims >= 3, otherwiseNA.Additional metadata columns as specified in
meta_cols.
Attributes:
"distance": The originaldist_obj."tsne_params": List of t-SNE parameters and function call."meta_cols": Character vector of metadata columns used.
Details
This function runs t-SNE in distance mode (is_distance = TRUE),
using the supplied distance matrix directly. The distance computation
should be performed separately using compute_distance().
t-SNE is a visualization method: it preserves local neighborhoods rather than global distances, axes are not directly interpretable, and embeddings can change with different seeds or perplexity settings.
Samples with missing values in metadata columns are retained in the t-SNE
result but will have NA values for those metadata columns.
Examples
# \donttest{
# Build example phip_data object
ps <- load_example_data("small_mixture")
# Compute distance matrix
val_col <- "fold_change"
d <- compute_distance(
ps,
value_col = val_col,
method_normalization = "hellinger",
distance = "bray",
n_threads = 2L
)
#> [18:40:37] INFO building abundance matrix from `ps` using `fold_change`.
#> [18:40:37] INFO building pivot spec (sample_id x peptide_id).
#> [18:40:37] INFO Collecting long table (sample_id, peptide_id, value).
#> -> compute_distance
#> [18:40:37] INFO Pivoting to wide abundance matrix in R.
#> -> compute_distance
#> [18:40:37] INFO abundance matrix has 43 samples and 5 features after
#> preprocessing.
#> [18:40:37] INFO computing distance: bray
#> [18:40:37] INFO distance matrix computation complete.
# Compute t-SNE embeddings
tsne_res <- compute_tsne(
ps = ps,
dist_obj = d,
dims = 3L,
perplexity = 15,
meta_cols = c("subject_id", "timepoint"),
seed = 42
)
#> Warning: [18:40:37] WARN Perplexity (15) is high for n = 43; reducing to 14.
#> [18:40:37] INFO Running t-SNE with dims = 3, perplexity = 14 on 43 samples
#> (distance input).
#> [18:40:37] INFO Attaching metadata columns to t-SNE result: subject_id,
#> timepoint
#> [18:40:37] INFO t-SNE embedding computation finished.
# View results
head(tsne_res)
#> # A tibble: 6 × 6
#> sample_id tSNE1 tSNE2 tSNE3 subject_id timepoint
#> <chr> <dbl> <dbl> <dbl> <chr> <chr>
#> 1 A_T1_1 -6.33 3.08 27.8 1 T1
#> 2 B_T1_1 5.25 -2.96 -27.0 1 T1
#> 3 A_T1_10 -6.63 3.05 28.9 10 T1
#> 4 B_T1_10 4.96 -3.00 -24.6 10 T1
#> 5 A_T1_11 -5.01 3.78 30.2 11 T1
#> 6 B_T1_11 4.93 -2.61 -27.1 11 T1
# }