Type: Package
Title: Block-Wise Rank in Similarity Graph Edge-Count Two-Sample Test (BRISE)
Version: 0.1.0
Maintainer: Kejian Zhang <kejianzhang@u.nus.edu>
Description: Implements the Block-wise Rank in Similarity Graph Edge-count test (BRISE), a rank-based two-sample test designed for block-wise missing data. The method constructs (pattern) pair-wise similarity graphs and derives quadratic test statistics with asymptotic chi-square distribution or permutation-based p-values. It provides both vectorized and congregated versions for flexible inference. The methodology is described in Zhang, Liang, Maile, and Zhou (2025) <doi:10.48550/arXiv.2508.17411>.
License: GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]
Encoding: UTF-8
Depends: R (≥ 3.5.0)
Imports: stats
Suggests: testthat (≥ 3.0.0), knitr, rmarkdown
RoxygenNote: 7.3.3
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2025-09-24 10:54:58 UTC; keenov
Author: Kejian Zhang [aut, cre], Doudou Zhou ORCID iD [aut]
Repository: CRAN
Date/Publication: 2025-10-01 07:00:26 UTC

Block-wise Rank In Similarity graph Edge-count (BRISE) Test

Description

BRISE implements the Two-Sample Test that handles block-wise missingness. It identifies missing-data patterns, constructs a (blockwise) dissimilarity matrix, induces ranks via a k-nearest neighbor style graph, and computes a quadratic statistic under two versions: the congregated form (‘con’) and vectorized form (‘vec’). Permutation p-values are optionally available.

Usage

BRISE(
  X = NULL,
  Y = NULL,
  D = NULL,
  ptn_list = NULL,
  k = 10,
  perm = 0,
  skip = 1,
  ver = "con"
)

Arguments

X

Numeric matrix (m × p) of observations for X (Sample 1). Optional if D and ptn_list are provided.

Y

Numeric matrix (n × p) of observations for Y (Sample 2). Optional if D and ptn_list are provided.

D

Numeric square dissimilarity matrix (N × N), where N = m + n. Required when X and Y are not given.

ptn_list

List of integer vectors. Each element contains indices (1…N) of observations that share the same missing-data pattern.

k

Positive integer. Neighborhood size offset for rank truncation in nearest-neighbor ranking. Default is 10.

perm

Integer. Number of permutations for computing permutation p-value. Default is 0 (no permutation).

skip

Integer (0 or 1). When set to 1 (default), skip rank-based dissimilarity for modality pairs with no shared observed variables; setting to 0 computes them (slower).

ver

Character. Version of the test statistic: "con" (congregated form, default) or "vec" (vectorized form).

Details

If both X and Y are supplied, Identify_mods is used to detect missing patterns and reorganize variables by modality. The dissimilarity matrix D is then constructed via Blockdist. Patterns with too few observations in either sample (e.g. fewer than 2) or patterns that are very small relative to the largest pattern are filtered out for robustness. A symmetric rank matrix is built based on truncated nearest-neighbor ranks. Under ver="con" the contrast statistic (two degrees of freedom) is used; under ver="vec" a higher-dimensional vector statistic is used. Asymptotic p-values use chi-square approximations; if perm > 0, empirical permutation p-values are also computed.

Value

A list with elements:

test.statistic

Numeric. The computed test statistic.

pval.approx

Numeric. Asymptotic p-value (chi-square based).

Cov

Covariance matrix used in computing the test statistic.

pval.perm

(Optional) Permutation p-value if perm > 0.

References

Zhang, K., Liang, M., Maile, R. & Zhou, D. (2025). Two-Sample Testing with Block-wise Missingness in Multi-source Data. arXiv preprint arXiv:2508.17411.

See Also

BRISE_Rank, Cov_mu.c, Cov_mu.v

Examples

set.seed(1)
X <- matrix(rnorm(50*200, mean = 0), nrow=50)
Y <- matrix(rnorm(50*200, mean = 0.3), nrow=50)
X[1:20, 1:100] <- 0
X[30:50, 101:200] <- 0
Y[1:10, 1:100] <- 0
Y[30:40, 101:200] <- 0
out <- BRISE(X = X, Y = Y, k = 5, perm = 1000, ver = "con")
print(out$test.statistic)
print(out$pval.approx)



Rank Induction within- and cross-pattern similarity blocks

Description

Compute row-wise ranks of a similarity matrix for two cases:

Ranks are computed row-wise with rank() and then shifted by 1 (i.e., the function returns rank - 1).

Usage

BRISE_Rank(S, method = "row")

Arguments

S

Numeric similarity matrix: (Sii) (square) when method = "row"; (Sij) (rectangular) when method = "rowij". Larger values indicate greater similarity.

method

Character, either "row" (within-pattern (Sii), diagonal suppressed) or "rowij" (cross-pattern (Sij), no diagonal to suppress).

Value

A numeric matrix with the same dimensions as S containing row-wise ranks minus one.


Block-wise Statistic (Congregated Form)

Description

For the contrast version of BRISE (“con”), computes within-sample sums of the rank matrix R (i.e. Ux, Uy) over all observations in X and Y, for congregated BRISE test.

Usage

BRISE_c.stat(R, sample1ID, sample2ID)

Arguments

R

Numeric symmetric rank matrix with zero diagonal.

sample1ID

Integer vector of indices for X.

sample2ID

Integer vector of indices for Y.

Value

Numeric vector c(Ux, Uy), the within-sample rank sums for the two samples.


Block-wise Statistic (Vectorized Form)

Description

For the vectorized version of BRISE, computes the within-pattern rank sums for both samples across all pattern pairs. Returns a concatenated vector of (Ux_ab, Uy_ab) for all blocks (a, b) with a>b.

Usage

BRISE_v.stat(R, sample1ID, sample2ID, ptn_list)

Arguments

R

Numeric symmetric rank matrix (N × N).

sample1ID

Integer vector. Indices of observations in X.

sample2ID

Integer vector. Indices of observations in Y.

ptn_list

List of integer vectors that indexes observations sharing the same missing pattern.

Value

Numeric vector containing the sums of R entries within X and Y, for each pattern pair.


Block-wise Distance Matrix Construction

Description

Constructs a symmetric dissimilarity matrix that accounts for missing-data patterns. Within blocks where both observations share a modality, standard Euclidean distances are used. Optionally, for observations without shared observed features (based on modality), a rank-based dissimilarity is computed (if skip = 0).

Usage

Blockdist(data, m, n, d, ptn_list, mod_id, modality, mod_bound, skip = 1)

Arguments

data

List with X and Y matrices.

m

Integer. Number of rows (observations) in X.

n

Integer. Number of rows in Y.

d

Integer. Number of features (columns).

ptn_list

List of integer vectors: each element indexes observations sharing the same missing pattern.

mod_id

Binary matrix (N × modality) indicating modality membership per observation.

modality

Integer. Number of modalities.

mod_bound

Integer vector. Feature indices boundaries per modality block.

skip

Integer (0 or 1). If set to 1, dissimilarity for modality-disjoint pairs is skipped. If 0, computed rank-based distances are used.

Value

Numeric symmetric matrix (N × N) of pairwise dissimilarities.


Covariance and Expectation (Congregated Form)

Description

Computes the 2×2 covariance matrix and expectation vector (mu) for the congregated BRISE statistic (Ux, Uy), under the pattern-wise permutation null distribution.

Usage

Cov_mu.c(R, m_, n_, ptn_list)

Arguments

R

Numeric symmetric rank matrix (N × N).

m_

Integer vector. X's sample sizes in each pattern.

n_

Integer vector. Y's sample sizes in each pattern.

ptn_list

List of integer vectors that indexes observations sharing the same missing pattern.

Value

List with two elements:

Cov

2×2 covariance matrix for (Ux, Uy).

mu

Numeric vector length-2 giving expected values of (Ux, Uy) under null.


Covariance and Expectation (Vectorized Form)

Description

Computes the asymptotic covariance matrix and expectation (mu) vector for the vectorized BRISE statistic under the pattern-wise permutation null distribution, based on rank matrix R and the list of pattern indicator. Used to form the quadratic statistic and its chi-square approximation.

Usage

Cov_mu.v(R, m_, n_, ptn_list)

Arguments

R

Numeric symmetric rank matrix (N × N).

m_

Integer vector. X's sample sizes in each pattern.

n_

Integer vector. Y's sample sizes in each pattern.

ptn_list

List of integer vectors that indexes observations sharing the same missing pattern.

Value

List with two elements:

Cov

Covariance matrix corresponding to the vector of pair-wise statistics.

mu

Expectation vector for those pair-wise statistics under the null.


Identify Data Modalities

Description

Detects modalities across the combined data (samples X and Y), rearranges variables/columns by modality, and produces identification structures used downstream for blockwise operations.

Usage

Identify_mods(data, m, n, d)

Arguments

data

List with components X and Y (numeric matrices).

m

Integer. Number of rows in X.

n

Integer. Number of rows in Y.

d

Integer. Number of features (columns) in X (and Y).

Value

List with components:

rearr_data

List with rearranged X, Y after grouping features by modality.

modality

Integer. Number of distinct missing-data modalities.

mod_bound

Integer vector. Cumulative boundaries of modalities among the features.

mod_id

Binary matrix (N × modality) indicating, for each observation, whether each modality is observed (1) or missing (0).