| Title: | Program for Inferring Immunoglobulin Allele Similarity Clusters and Genotypes |
| Version: | 1.2.0 |
| Author: | Ayelet Peres [aut, cre], William Lees [aut], Gur Yaari [aut, cph] |
| Maintainer: | Ayelet Peres <ayelet.peres@yale.edu> |
| Description: | Improves genotype inference and downstream Adaptive Immune Receptor Repertoire Sequence data analysis. Inference of allele similarity clusters, an alternative naming scheme and genotype inference for immunoglobulin heavy chain repertoires. The main tools are allele similarity clusters, and allele based genotype. The first tool is designed to reduce the ambiguity within the immunoglobulin heavy chain V alleles. The ambiguity is caused by duplicated or similar alleles which are shared among different genes. The second tool is an allele based genotype, that determined the presence of an allele based on a threshold derived from a naive population. See Peres et al. (2023) <doi:10.1093/nar/gkad603>. |
| License: | CC BY-SA 4.0 |
| Encoding: | UTF-8 |
| Depends: | R (≥ 3.5.0) |
| LinkingTo: | Rcpp |
| SystemRequirements: | GNU make |
| Imports: | Biostrings (≥ 2.62.0), DECIPHER (≥ 2.22.0), alakazam (≥ 1.2.0), dendextend (≥ 1.9.0), data.table (≥ 1.12.2), tigger (≥ 1.0.0), methods (≥ 3.4.4), rlang (≥ 0.4.0), zen4R (≥ 0.7), RColorBrewer (≥ 1.1.2), ggplot2 (≥ 3.3.6), circlize (≥ 0.4.15), R6 (≥ 2.5.1), jsonlite (≥ 1.8.3), Rcpp (≥ 0.11.0), magrittr, igraph (≥ 1.3.0), stringdist (≥ 0.9.0), cluster (≥ 2.1.0), ape (≥ 5.0) |
| Suggests: | knitr, rmarkdown, tidyr, htmltools, stringi, bookdown, ComplexHeatmap, dplyr, ggtree (≥ 3.0.0), testthat (≥ 3.0.0), parallel |
| RoxygenNote: | 7.3.3 |
| Collate: | 'Data.R' 'GermlineCluster-class.R' 'RcppExports.R' 'piglet.R' 'allele_cluster.R' 'utils.R' 'allele_genotype.R' 'community_detection.R' 'piglet-package.R' 'utils-pipe.R' 'visualization.R' |
| LazyData: | true |
| BuildVignettes: | true |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| NeedsCompilation: | yes |
| Packaged: | 2026-02-13 16:44:03 UTC; ayelet |
| Repository: | CRAN |
| Date/Publication: | 2026-02-17 22:30:02 UTC |
piglet: Program for Inferring Immunoglobulin Allele Similarity Clusters and Genotypes
Description
Improves genotype inference and downstream Adaptive Immune Receptor Repertoire Sequence data analysis. Inference of allele similarity clusters, an alternative naming scheme and genotype inference for immunoglobulin heavy chain repertoires. The main tools are allele similarity clusters, and allele based genotype. The first tool is designed to reduce the ambiguity within the immunoglobulin heavy chain V alleles. The ambiguity is caused by duplicated or similar alleles which are shared among different genes. The second tool is an allele based genotype, that determined the presence of an allele based on a threshold derived from a naive population. See Peres et al. (2023) doi:10.1093/nar/gkad603.
Author(s)
Maintainer: Ayelet Peres ayelet.peres@yale.edu
Authors:
William Lees william@lees.org.uk
Gur Yaari gur.yaari@yale.edu [copyright holder]
Pipe operator
Description
See magrittr::%>% for details.
Usage
lhs %>% rhs
Arguments
lhs |
A value or the magrittr placeholder. |
rhs |
A function call using the magrittr semantics. |
Value
The result of calling rhs(lhs).
Create IUIS labels with markers for split groups
Description
Internal function to create IUIS labels with superscript markers when multiple ASC groups split a single IUIS subgroup.
Usage
.create_iuis_labels_with_markers(iuis_subgroups, asc_subgroups)
Arguments
iuis_subgroups |
Vector of IUIS subgroup names |
asc_subgroups |
Vector of corresponding ASC subgroup names |
Value
Character vector of labels with markers
Find resolution for target cluster count
Description
Uses binary search to find a resolution parameter that produces approximately the target number of clusters.
Usage
.getNClusters(
g,
n_cluster,
range_min = 0,
range_max = 6,
max_steps = 20,
method = "leiden"
)
Arguments
g |
An igraph graph object with weighted edges |
n_cluster |
Target number of clusters |
range_min |
Minimum resolution to search. Default is 0. |
range_max |
Maximum resolution to search. Default is 6. |
max_steps |
Maximum number of search iterations. Default is 20. |
method |
Community detection method: "leiden" or "louvain". Default is "leiden". |
Value
A list containing:
-
partition: The community detection result -
clusters: Number of clusters found -
best_resolution: The resolution parameter used
GermlineCluster class
Description
An S3 class returned by inferAlleleClusters that stores allele
similarity clusters and related objects.
Human IGHV germlines
Description
A character vector of all 498 human IGHV germline gene segment alleles
in IMGT Gene-db release July 2022, with an additional 25 undocumented alleles from VDJbase.
Usage
HVGERM
Format
Values correspond to IMGT-gaped nuceltoide sequences (with nucleotides capitalized and gaps represented by '.').
References
Xochelli et al. (2014) Immunoglobulin heavy variable (IGHV) genes and alleles: new entities, new names and implications for research and prognostication in chronic lymphocytic leukaemia. Immunogenetics. 67(1):61-6.
Allele similarity cluster naming scheme
Description
For a given cluster the function collapse similar sequences and renames the sequences based on the ASC name scheme
Usage
alleleClusterNames(cluster, allele.cluster.table, germ.dist, chain, segment)
Arguments
cluster |
A vector with the cluster identifier - the family and allele cluster number. |
allele.cluster.table |
A data.frame with the list of all germline sequences and their clusters. |
germ.dist |
A matrix with the germline distance between the germline set sequences. |
chain |
A character with the chain identifier: IGH/IGL/IGK/TRB/TRA... (Currently only IGH is supported) |
segment |
A character with the segment identifier: IGHV/IGHD/IGHJ.... (Currently only IGHV is supported) |
Value
A data.frame with the clusters renamed alleles based on the ASC scheme.
Allele similarity cluster table
Description
A data.table of the allele similarity cluster table based on the
HVGERM and hv_functionality germlie reference set. This is not the latest
version of the allele similarity cluster table. For the latest version please refer either to the
zenodo doi or you can use the recentAlleleClusters
Usage
allele_cluster_table
Format
An object of class data.table (inherits from data.frame) with 286 rows and 5 columns.
References
Peres, et al (2022) doi:10.1101/2022.12.26.521922
Alleles nucleotide position difference
Description
Compare the sequences of two alleles (reference and sample alleles) and returns the differential nucleotide positions of the sample allele.
Usage
allele_diff(
reference_allele,
sample_allele,
position_threshold = 0,
snps = TRUE
)
Arguments
reference_allele |
The nucleotide sequence of the reference allele, character object. |
sample_allele |
The nucleotide sequence of the sample allele, character object. |
position_threshold |
A position from which to check for differential positions. If zero checks all position. Default to zero. |
snps |
If to return the SNP with the position (e.g., A2G where A is for the reference and G is for the sample.). If false returns just the positions. Default to True |
Details
The function utilizes c++ script to optimize the run time for large comparisons.
Value
A character vector of the differential nucleotide positions of the sample allele.
Examples
{
reference_allele = "AAGG"
sample_allele = "ATGA"
# setting position_threshold = 0 will return all differences
diff <- allele_diff(reference_allele, sample_allele)
# "A2T", "G4A"
print(diff)
# setting position_threshold = 3 will return the differences from position three onward
diff <- allele_diff(reference_allele, sample_allele, position_threshold = 3)
# "G4A"
print(diff)
# setting snps = FALSE will return the differences as indices
diff <- allele_diff(reference_allele, sample_allele, snps = FALSE)
# 2, 4
print(diff)
}
Calculate differences between characters in columns of germs and return their indices as an int vector.
Description
Calculate differences between characters in columns of germs and return their indices as an int vector.
Usage
allele_diff_indices(germs, X = 0L, non_mismatch_chars_nullable = NULL)
Arguments
germs |
A vector of strings representing germ sequences. |
X |
The threshold index from which to return differences as indices. |
non_mismatch_chars_nullable |
A set of characters that are ignored when comparing sequences (default: 'N', '.', '-'). |
Value
A vector of integers containing indices of differing columns.
Examples
germs = c("ATCG", "ATCC")
X = 3
result = allele_diff_indices(germs, X)
# 1, 2, 3
Calculate SNPs or their count for each germline-input sequence pair with optional parallel execution.
Description
Calculate SNPs or their count for each germline-input sequence pair with optional parallel execution.
Usage
allele_diff_indices_parallel(
germs,
inputs,
X = 0L,
parallel = FALSE,
return_count = FALSE
)
Arguments
germs |
A vector of strings representing germline sequences. |
inputs |
A vector of strings representing input sequences. |
X |
The threshold index from which to return SNP indices or counts (default: 0). |
parallel |
A boolean flag to enable parallel processing (default: FALSE). |
return_count |
A boolean flag to return the count of mutations instead of their indices (default: FALSE). |
Value
A list of integer vectors (if return_count = FALSE) or a vector of integers (if return_count = TRUE).
Calculate SNPs or their count for each germline-input sequence pair with optional parallel execution.
Description
This function compares germline sequences (germs) and input sequences (inputs)
and identifies single nucleotide polymorphisms (SNPs) or their counts, with optional parallel execution.
The comparison ignores specified non-mismatch characters (e.g., gaps or ambiguous bases).
Usage
allele_diff_indices_parallel2(
germs,
inputs,
X = 0L,
parallel = FALSE,
return_count = FALSE,
non_mismatch_chars_nullable = NULL
)
Arguments
germs |
A vector of strings representing germline sequences. |
inputs |
A vector of strings representing input sequences. |
X |
The threshold index from which to return SNP indices or counts (default: 0). |
parallel |
A boolean flag to enable parallel processing (default: FALSE). |
return_count |
A boolean flag to return the count of mutations instead of their indices (default: FALSE). |
non_mismatch_chars_nullable |
A set of characters that are ignored when comparing sequences (default: 'N', '.', '-'). |
Value
A list of integer vectors (if return_count = FALSE) or a vector of integers (if return_count = TRUE).
Examples
# Example usage
germs <- c("ATCG", "ATCC")
inputs <- c("ATTG", "ATTA")
X <- 0
# Return indices of SNPs
result_indices <- allele_diff_indices_parallel2(germs, inputs, X,
parallel = TRUE, return_count = FALSE)
print(result_indices) # list(c(4), c(3, 4))
# Return counts of SNPs
result_counts <- allele_diff_indices_parallel2(germs, inputs, X,
parallel = FALSE, return_count = TRUE)
print(result_counts) # c(1, 2)
Calculate differences between characters in columns of germs and return them as a string vector.
Description
Calculate differences between characters in columns of germs and return them as a string vector.
Usage
allele_diff_strings(germs, X = 0L, non_mismatch_chars_nullable = NULL)
Arguments
germs |
A vector of strings representing germ sequences. |
X |
The threshold index from which to return differences as strings. |
non_mismatch_chars_nullable |
A set of characters that are ignored when comparing sequences (default: 'N', '.', '-'). |
Value
A vector of strings containing differences between characters in columns.
Examples
germs = c("ATCG", "ATCC")
X = 3
result = allele_diff_strings(germs, X)
# "A2T", "T3C", "C2G"
Allele thresholds table
Description
A data.table of the allele thresholds table. The V alleles are based on the
HVGERM and hv_functionality germline reference set. The D, and the J are based on
the AIRR-C reference set (https://zenodo.org/records/10489725). The table contains these columns: allele - the IUIS allele name,
asc_allele - the allele name based on allele similarity clusters (only for V), threshold = the genotype threshold for the alleles.
Usage
allele_threshold_table
Format
An object of class data.table (inherits from data.frame) with 262 rows and 4 columns.
References
Peres, et al (2022) doi:10.1101/2022.12.26.521922
FWR1 artificial dataset generator
Description
A function to artificially create an IGHV reference set with framework1 (FWR1) primers (see Details).
Usage
artificialFRW1Germline(
germline_set,
mask_primer = TRUE,
trimm_primer = FALSE,
quite = FALSE
)
Arguments
germline_set |
A germline set distance matrix created by |
mask_primer |
Logical (TRUE by default). If to mask with Ns the region of the primer from the germline sequence |
trimm_primer |
Logical (FALSE by default). If to trim the region of the primer from the germline sequence. If TRUE then, mask_primer is ignored. |
quite |
Logical (FALSE by default). Do you want to suppress informative messages |
Details
The FRW1 primers used in this function were taken from the BIOMED-2 protocol. For more information on the protocol and primer design go to: van Dongen, J., Langerak, A., Brüggemann, M. et al. Design and standardization of PCR primers and protocols for detection of clonal immunoglobulin and T-cell receptor gene recombinations in suspect lymphoproliferations: Report of the BIOMED-2 Concerted Action BMH4-CT98-3936. Leukemia 17, 2257–2317 (2003). https://doi.org/10.1038/sj.leu.2403202Van Dongen, J. J. M., et al. "Design and standardization of PCR primers and protocols for detection of clonal immunoglobulin and T-cell receptor gene recombinations in suspect lymphoproliferations: report of the BIOMED-2 Concerted Action BMH4-CT98-3936." Leukemia 17.12 (2003): 2257-2317.
Value
A list with the input germline set allele and the trimmed/masked sequences.
Assign allele similarity clusters
Description
assignAlleleClusters uses the allele clusters annotation to change the preliminary allele
assignments to the new annotations before inferring a genotype.
Usage
assignAlleleClusters(
data,
alleleClusterTable,
v_call = "v_call",
from_col = "imgt_allele",
to_col = "new_allele"
)
Arguments
data |
data.frame in AIRR format, containing V allele calls from a single subject and the sample IMGT-gapped V(D)J sequences under seq. |
alleleClusterTable |
A data.frame of the allele clusters new annotations relative to the original reference set. See details. |
v_call |
name of the V allele call column. Default is |
from_col |
name of the column in alleleClusterTable to use as the source for the dictionary. Default is |
to_col |
name of the column in alleleClusterTable to use as the target for the dictionary. Default is |
Value
A modified input data.frame with the new assigned
Examples
# preferably obtain the latest ASC cluster table
# asc_archive <- recentAlleleClusters(doi="10.5281/zenodo.7429773", get_file = TRUE)
# allele_cluster_table <- extractASCTable(archive_file = asc_archive)
# example allele similarity cluster table
data(allele_cluster_table)
# loading TIgGER AIRR-seq b cell data
data <- tigger::AIRRDb
asc_data <- assignAlleleClusters(data, allele_cluster_table)
Compute distance matrix
Description
Compute a pairwise distance matrix between sequences using stringdist.
Usage
compute_distance(
sequences,
method = c("hamming", "lv"),
trim_3prime = NULL,
quiet = TRUE,
return_type = c("dist", "matrix")
)
Arguments
sequences |
A named character vector of sequences |
method |
Distance method: "hamming" or "lv" (Levenshtein). Default is "hamming". |
trim_3prime |
Optional position to trim sequences from 3' end |
quiet |
Logical. Suppress messages. Default is TRUE. |
return_type |
One of "dist" (default) or "matrix" |
Value
A dist object or matrix of pairwise distances
See Also
igDistance for more distance options
Leiden community detection
Description
Performs community detection on a weighted graph using the Leiden algorithm with CPM (Constant Potts Model) objective function.
Usage
detect_communities_leiden(g, resolution = 1)
Arguments
g |
An igraph graph object with weighted edges |
resolution |
Resolution parameter for Leiden algorithm. Higher values produce more communities. Default is 1.0. |
Details
The Leiden algorithm is a community detection method that optimizes a quality function (here CPM). It guarantees connected communities and is generally faster than Louvain while producing better quality partitions.
Value
An igraph communities object
See Also
distance_to_graph, optimize_resolution
Examples
data(HVGERM)
d <- igDistance(HVGERM[1:10], method = "hamming")
g <- distance_to_graph(d)
comm <- detect_communities_leiden(g, resolution = 0.5)
Convert distance matrix to weighted graph
Description
Converts a distance matrix to a weighted igraph object using a log transform that spreads small distances and produces weights in [0,1].
Usage
distance_to_graph(distance_matrix)
Arguments
distance_matrix |
A distance matrix or dist object |
Details
The transformation uses a log-based similarity measure:
Normalize distances by max distance
Apply -log transform to convert to similarity
Normalize similarities to [0,1] range
Create weighted undirected graph
Value
An igraph object with weighted edges
See Also
detect_communities_leiden, igClust
Examples
data(HVGERM)
d <- igDistance(HVGERM[1:10], method = "hamming")
g <- distance_to_graph(d)
Extracts the allele cluster table from the archive file.
Description
Extracts the allele cluster table from the archive file.
Usage
extractASCTable(archive_file = NULL)
Arguments
archive_file |
A path to the asc archive file. Default is null. (see details) |
Details
For downloading the latest archive file with the updated allele cluster table, use the function recentAlleleClusters.
Value
Returns the allele cluster table.
The table columns:
new_allele - the ASC given allele name
func_group - the ASC cluster number
imgt_allele - the original IUIS/IMGT allele name
thresh - the allele threshold for ASC-based genotype inference
amplicon_length - is the original length of the reference set.
Examples
asc_archive <- recentAlleleClusters(doi="10.5281/zenodo.7429773", get_file = TRUE)
allele_cluster_table <- extractASCTable(archive_file = asc_archive)
Generate allele similarity reference set
Description
Generates the allele clusters reference set based on the clustering from ighvClust. The function collapse similar alleles and assign them into their respective allele clusters and family clusters. See details for naming scheme
Usage
generateReferenceSet(
germline_distance,
germline_set,
alleleClusterTable,
trim_3prime_side = NULL
)
Arguments
germline_distance |
A germline set distance matrix created by ighvDistance. |
germline_set |
A character list of the IMGT aligned IGHV allele sequences. See details for curating options. |
alleleClusterTable |
A data.frame of the alleles and their clusters created by ighvClust. |
trim_3prime_side |
If a 3' position trim is supplied, duplicated sequences will be checked for differential positions past the trim position. Default NULL; NULL will not activate the check. see @details |
Details
Each allele is named by this scheme: IGHVF1-G1*01 - IGH = chain, V = region, F1 = family cluster numbering, G1 - allele cluster numbering, and 01 = allele numbering (given by clustering order, no connection to the expression)
In case there are alleles that are differentiated in a nucleotide position past the trimming position used for the clustering, then the alleles are separated and are annotated with the differentiating position as so: Say A101 and A102 are similar up to position 318, and thus collapsed in the clusters to G101. Upon checking the sequences past the trim position (318), a differentiating nucleotide was seen in position 319, A101 has a G, and A102 has a T. Then the alleles will be separated, and the new annotation will be as so: A101 = G101, and A102 = G1*01_G319T. Where the first nucleotide indicate the base, the following number the position, and the last nucleotide the one the base changed into.
Value
A list with the re-named germline set, and a table of the allele clusters and thresholds.
Converts IGHV germline set to ASC germline set.
Description
Converts IGHV germline set to ASC germline set.
Usage
germlineASC(allele_cluster_table, germline)
Arguments
allele_cluster_table |
The allele cluster table. |
germline |
An IGHV germline set with matching names to the "imgt_allele" column in the allele_cluster_table. |
Value
Returns the IGHV germline set with the ASC allele names.
Examples
# preferably obtain the latest ASC cluster table
# asc_archive <- recentAlleleClusters(doi="10.5281/zenodo.7429773", get_file = TRUE)
# allele_cluster_table <- extractASCTable(archive_file = asc_archive)
data(HVGERM)
# example allele similarity cluster table
data(allele_cluster_table)
asc_germline <- germlineASC(allele_cluster_table, germline = HVGERM)
Human IGHV germlines functionality description
Description
A data.table of all 498 human IGHV germline gene segment alleles
in IMGT Gene-db release July 2022, with an additional 25 undocumented alleles from VDJbase.
The first column is the allele name, the second column is the functionality annotation, the
third column is the nt sequence and the last column is the aa sequence.
Usage
hv_functionality
Format
An object of class data.table (inherits from data.frame) with 521 rows and 4 columns.
References
Xochelli et al. (2014) Immunoglobulin heavy variable (IGHV) genes and alleles: new entities, new names and implications for research and prognostication in chronic lymphocytic leukaemia. Immunogenetics. 67(1):61-6.
Allele similarity clustering
Description
Cluster the distance matrix to create allele clusters. Supports both hierarchical clustering (default) and Leiden community detection.
Usage
igClust(
germline_distance,
method = c("hierarchical", "leiden"),
family_threshold = 75,
allele_cluster_threshold = 95,
cluster_method = "complete",
resolution = NULL,
target_clusters = NULL,
optimize_silhouette = TRUE,
ncores = 1,
quiet = FALSE
)
Arguments
germline_distance |
A germline set distance matrix created by |
method |
Clustering method. One of "hierarchical" (default) or "leiden". |
family_threshold |
The similarity threshold for family level (hierarchical only). Default is 75. |
allele_cluster_threshold |
The similarity threshold for allele cluster level (hierarchical only). Default is 95. |
cluster_method |
The hierarchical clustering linkage method. Default is "complete". |
resolution |
Resolution parameter for Leiden clustering. If NULL, will be optimized. |
target_clusters |
Target number of clusters for Leiden optimization. Default is NULL. |
optimize_silhouette |
Logical. Optimize resolution using silhouette score (Leiden only). Default is TRUE. |
ncores |
Number of cores for parallel processing (Leiden only). Default is 1. |
quiet |
Logical. Suppress messages. Default is FALSE. |
Value
A named list that includes:
-
alleleClusterTable: data.frame of allele clusters -
threshold: list of threshold parameters -
hclustAlleleCluster: hierarchical clustering object (hierarchical method) -
communityObject: community detection result (Leiden method) -
graphObject: igraph object (Leiden method) -
silhouetteScore: silhouette score (Leiden method) -
resolutionParameter: resolution used (Leiden method)
See Also
igDistance, inferAlleleClusters
Germline set alleles distance
Description
Calculates the distance between pairs of alleles based on their aligned germline sequences. Supports multiple distance methods for different segment types.
Usage
igDistance(
germline_set,
AA = FALSE,
method = c("decipher", "hamming", "lv"),
trim_3prime = NULL,
return_type = c("matrix", "dist"),
quiet = TRUE
)
Arguments
germline_set |
A character vector of aligned allele sequences. See details for curating options. |
AA |
Logical (FALSE by default). If TRUE, calculate the distance based on amino acid sequences. |
method |
Distance calculation method. One of:
|
trim_3prime |
Optional position to trim sequences from 3' end before distance calculation |
return_type |
One of "matrix" (default) or "dist" to return a dist object |
quiet |
Logical (TRUE by default). Suppress informative messages |
Details
The aligned IMGT IGHV allele germline set can be downloaded from the IMGT site https://www.imgt.org/ under the section genedb.
For V segments, the "decipher" method is recommended as it handles alignment gaps properly. For D and J segments which may have variable lengths, the "lv" (Levenshtein) method is appropriate.
Value
A matrix or dist object of the computed distances between allele pairs.
See Also
ighvDistance for backward compatibility wrapper
Examples
data(HVGERM)
# Using DECIPHER method (default, for V segments)
d1 <- igDistance(HVGERM[1:10], method = "decipher")
# Using Hamming distance
d2 <- igDistance(HVGERM[1:10], method = "hamming")
# Using Levenshtein distance (good for D/J segments)
d3 <- igDistance(HVGERM[1:10], method = "lv")
Allele similarity clustering (deprecated)
Description
This function is deprecated. Use igClust instead.
Usage
ighvClust(
germline_distance,
family_threshold = 75,
allele_cluster_threshold = 95,
cluster_method = "complete"
)
Arguments
germline_distance |
A germline set distance matrix created by |
family_threshold |
The similarity threshold for family level (hierarchical only). Default is 75. |
allele_cluster_threshold |
The similarity threshold for allele cluster level (hierarchical only). Default is 95. |
cluster_method |
The hierarchical clustering linkage method. Default is "complete". |
Value
A named list with clustering results.
See Also
igClust for the current implementation
Germline set alleles distance (deprecated)
Description
This function is deprecated. Use igDistance instead.
Usage
ighvDistance(germline_set, AA = FALSE)
Arguments
germline_set |
A character list of aligned IGHV allele sequences. |
AA |
Logical (FALSE by default). If to calculate the distance based on amino acid sequences. |
Value
A matrix of computed distances between allele pairs.
See Also
igDistance for the current implementation
Allele similarity cluster
Description
A wrapper function to infer the allele clusters. Supports both hierarchical clustering (default) and Leiden community detection.
Usage
inferAlleleClusters(
germline_set,
locus = NULL,
clustering_method = c("hierarchical", "leiden"),
distance_method = c("decipher", "hamming", "lv"),
trim_3prime_side = 318,
mask_5prime_side = 0,
family_threshold = 75,
allele_cluster_threshold = 95,
cluster_method = "complete",
resolution = NULL,
target_clusters = NULL,
optimize_silhouette = TRUE,
ncores = 1,
aa_set = FALSE,
quiet = FALSE
)
Arguments
germline_set |
A character vector of Ig sequence alleles (must be gapped by IMGT scheme for optimal results). |
locus |
The locus type. One of "IGHV", "IGKV", "IGLV", "IGHD", "IGHJ", "IGKJ", "IGLJ". Default is NULL (auto-detected from sequence names). |
clustering_method |
Clustering method. One of "hierarchical" (default) or "leiden". |
distance_method |
Distance calculation method. One of "decipher" (default), "hamming", or "lv". |
trim_3prime_side |
Position to trim sequences from 3' end. Default is 318; NULL uses full length. |
mask_5prime_side |
Length to mask from 5' side. Default is 0. |
family_threshold |
Similarity threshold for family level (hierarchical only). Default is 75. |
allele_cluster_threshold |
Similarity threshold for allele cluster level (hierarchical only). Default is 95. |
cluster_method |
Hierarchical clustering linkage method. Default is "complete". |
resolution |
Resolution parameter for Leiden clustering. Default is NULL (auto-optimized). |
target_clusters |
Target number of clusters for Leiden optimization. Default is NULL. |
optimize_silhouette |
Optimize resolution using silhouette score (Leiden only). Default is TRUE. |
ncores |
Number of cores for parallel processing (Leiden only). Default is 1. |
aa_set |
Logical. Is the sequence set amino acids? Default is FALSE. |
quiet |
Logical. Suppress messages. Default is FALSE. |
Details
The distance between pairs of allele sequences is calculated, then the alleles are clustered. For hierarchical clustering, two similarity thresholds define family and allele clusters. For Leiden clustering, community detection identifies clusters at a specified resolution.
The allele cluster names follow this scheme: IGHVF1-G1*01 - IGH = chain, V = region, F1 = family cluster numbering, G1 = allele cluster numbering, 01 = allele numbering (by clustering order)
For V segments, the "decipher" distance method is recommended. For D and J segments with variable lengths, "lv" (Levenshtein) is more appropriate.
Value
An object of class GermlineCluster containing:
germlineSet: Modified germline set (3' trimming and 5' masking)
alleleClusterSet: Renamed germline set with ASC names
alleleClusterTable: data.frame of allele similarity clusters
threshold: List of threshold parameters
hclustAlleleCluster: hclust object (hierarchical method)
clusteringMethod: Method used ("hierarchical" or "leiden")
communityObject: Community object (Leiden method)
graphObject: igraph object (Leiden method)
silhouetteScore: Silhouette score (Leiden method)
resolutionParameter: Resolution used (Leiden method)
locus: Locus identifier
See Also
igDistance, igClust, plot.GermlineCluster
Examples
# load the initial germline set
data(HVGERM)
germline <- HVGERM[!grepl("^[.]", HVGERM)]
# Hierarchical clustering (default)
asc <- inferAlleleClusters(germline)
# Leiden community detection
asc_leiden <- inferAlleleClusters(germline[1:50],
clustering_method = "leiden",
target_clusters = 10)
## plotting the clusters
plot(asc)
Allele based genotype inference
Description
inferGenotypeAllele infer an individual's genotype based on the allele-base method.
The method utilize the allele specific threshold to determine the presence of an allele in the genotype.
More specifically, based on the allele frequency, repertoire depth, and the specific allele threshold, a confidence level (Z score) is calculated
for the presence of the allele in the genotype. The user can select the confidence level for the genotype inference.
Usage
inferGenotypeAllele(
data,
allele_threshold_table = NULL,
call = "v_call",
asc_annotation = FALSE,
single_assignment = FALSE,
translate_to_asc = FALSE,
germline_db = NA,
find_unmutated = FALSE,
seq = "sequence_alignment",
default_allele_threshold = 1e-04,
quiet = TRUE
)
Arguments
data |
data.frame in AIRR format, containing allele calls from a single subject and the sample IMGT-gapped V(D)J sequences under seq. |
allele_threshold_table |
A data.frame of the alleles and their thresholds. |
call |
name of the V,D, or J allele call column, i.e v_call, d_call, j_call. Default is |
asc_annotation |
Logical (FALSE by default). Are the allele calls annotated with the allele similarity clusters. |
single_assignment |
if TRUE, the method only considers sequence with single assignment for the genotype inference. |
translate_to_asc |
For V allele calls, collapse identical allele for the genotype inference. Default is FALSE. |
germline_db |
named vector of sequences containing the germline sequences named in V allele calls and the alleleClusterTable. Only required if find_unmutated is TRUE. |
find_unmutated |
if TRUE, use germline_db to find which samples are unmutated. Not needed if V allele calls only represent unmutated samples. |
seq |
name of the column in data with the aligned, IMGT-numbered, V(D)J nucleotide sequence. Default is sequence_alignment. |
default_allele_threshold |
The default allele threshold for the genotype inference, in case the allele threshold is not in the |
quiet |
Logical (TRUE by default). Do you want to suppress informative messages |
Details
In naive repertoires, allele calls where more than one assignment is assigned is rare. Hence, in case the data represents the naive repertoire of a subject
it is recommended to use the find_unmutated=TRUE option, to remove mutated sequences. For non-naive population, the allele calls in cases of multiple assignment
are treated as belonging to all groups.
Value
A a data.frame with the inferred V genotype. The table contains the following columns:
allele: The alleles in the
allele_threshold_table.counts: The number of reads for each alleles.
depth: The total number of reads in the genotype (Sum of counts).
threshold: The population driven allele thresholds for genotype presence.
z_score: The confidence level for the presence of the allele in the genotype.
asc_allele: If
translate_to_ascis true, the asc allele value from allele_threshold_table.
See Also
inferAlleleClusters will infer the allele clusters based on a supplied V reference set and set the default allele threshold of 1e-04. See recentAlleleClusters to obtain the latest version of the IGHV allele clusters and the naive population based allele threshold.
Examples
# loading TIgGER AIRR-seq b cell data
data <- tigger::AIRRDb
# allele threshold table
data(allele_threshold_table)
data(HVGERM)
# inferring the genotype
genotype <- inferGenotypeAllele(
data = data,
allele_threshold_table = allele_threshold_table,
germline_db = HVGERM, find_unmutated=TRUE)
# filter alleles with z_score >= 0
head(genotype[genotype$z_score >= 0,])
Allele similarity cluster based genotype inference Testing function
Description
inferGenotypeAllele_asc infer an individual's genotype based on the allele-base method.
The method utilize the allele specific threshold to determine the presence of an allele in the genotype.
More specifically, the absolute frequency of each allele is calculated and checked against the threshold.
Usage
inferGenotypeAllele_asc(
data,
alleleClusterTable,
v_call = "v_call",
single_assignment = FALSE,
germline_db = NA,
find_unmutated = FALSE,
seq = "sequence_alignment",
confidence_level = NULL,
default_allele_threshold = 1e-04
)
Arguments
data |
data.frame in AIRR format, containing V allele calls from a single subject and the sample IMGT-gapped V(D)J sequences under seq. |
alleleClusterTable |
A data.frame of the allele similarity clusters thresholds. |
v_call |
name of the V allele call column. Default is |
single_assignment |
if TRUE, the method only considers sequence with single assignment for the genotype inference. |
germline_db |
named vector of sequences containing the germline sequences named in V allele calls and the alleleClusterTable. Only required if find_unmutated is TRUE. |
find_unmutated |
if TRUE, use germline_db to find which samples are unmutated. Not needed if V allele calls only represent unmutated samples. |
seq |
name of the column in data with the aligned, IMGT-numbered, V(D)J nucleotide sequence. Default is sequence_alignment. |
confidence_level |
The confidence level on which to filter the inferred genotype alleles. Default is NULL, meaning filtering only based on allele threshold. |
default_allele_threshold |
The default allele threshold for the genotype inference, in case the allele threshold is not in the |
Details
In naive repertoires, allele calls where more than one assignment is assigned is rare. Hence, in case the data represents the naive repertoire of a subject
it is recommended to use the find_unmutated=TRUE option, to remove mutated sequences. For non-naive population, the allele calls in cases of multiple assignment
are treated as belonging to all groups.
Value
A a data.frame with the inferred V genotype. The table contains the following columns:
| gene | alleles | imgt_alleles | counts | absolute_fraction | absolute_threshold | genotyped_alleles | genotype_imgt_alleles |
| allele cluster | the present alleles | the imgt nomenclature | the number of reads | the absolute fraction | the population driven allele | the alleles which | the imgt nomenclature |
| in the repertoire | of the alleles | for each alleles | of the alleles | thresholds for genotype presence | entered the genotype | of the alleles |
See Also
inferAlleleClusters will infer the allele clusters based on a supplied V reference set and set the default allele threshold of 1e-04. See recentAlleleClusters to obtain the latest version of the IGHV allele clusters and the naive population based allele threshold.
Examples
# loading TIgGER AIRR-seq b cell data
data <- tigger::AIRRDb
# preferably obtain the latest ASC cluster table
# asc_archive <- recentAlleleClusters(doi="10.5281/zenodo.7429773", get_file = TRUE)
# allele_cluster_table <- extractASCTable(archive_file = asc_archive)
# example allele similarity cluster table
data(allele_cluster_table)
data(HVGERM)
# reforming the germline set
asc_germline <- germlineASC(allele_cluster_table, germline = HVGERM)
# assigning the ASC alleles
asc_data <- assignAlleleClusters(data, allele_cluster_table)
# inferring the genotype
asc_genotype <- inferGenotypeAllele_asc(
data = asc_data,
alleleClusterTable = allele_cluster_table,
germline_db = asc_germline, find_unmutated=TRUE)
Insert gaps into an ungapped sequence based on a gapped reference sequence.
Description
This function inserts gaps (e.g., . or -) into an ungapped sequence (ungapped)
to match the positions of gaps in a reference sequence (gapped). It ensures that
the aligned sequence has the same gap structure as the reference.
Usage
insert_gaps2_vec(gapped, ungapped, parallel = FALSE)
Arguments
gapped |
A vector of strings representing the reference sequences with gaps. |
ungapped |
A vector of strings representing the sequences without gaps. |
parallel |
A boolean flag to enable parallel processing (default: FALSE). |
Value
A vector of strings with gaps inserted to match the gapped reference.
Examples
# Example usage
gapped <- c("caggtc..aact", "caggtc---aact")
ungapped <- c("caggtcaact", "caggtcaact")
# Sequential execution
result <- insert_gaps2_vec(gapped, ungapped, parallel = FALSE)
print(result) # "caggtc..aact", "caggtc---aact"
# Parallel execution
result_parallel <- insert_gaps2_vec(gapped, ungapped, parallel = TRUE)
print(result_parallel)
Create a GermlineCluster object
Description
GermlineCluster is an S3 class that stores the output of
inferAlleleClusters. It contains the allele cluster table,
clustering objects, and threshold parameters used for inference.
Usage
new_germline_cluster(
germlineSet,
alleleClusterSet,
alleleClusterTable,
threshold,
hclustAlleleCluster = NULL,
clusteringMethod = "hierarchical",
communityObject = NULL,
graphObject = NULL,
distanceMatrix = NULL,
silhouetteScore = NA_real_,
resolutionParameter = NA_real_,
locus = "IGHV"
)
Arguments
germlineSet |
The original germline set provided. |
alleleClusterSet |
The renamed germline set with allele clusters. |
alleleClusterTable |
The allele cluster table. |
threshold |
The threshold used for family and allele clusters. |
hclustAlleleCluster |
A hierarchical clustering object for the germline set,
or |
clusteringMethod |
The clustering method used, either |
communityObject |
A community detection object for Leiden clustering, or |
graphObject |
An igraph graph object for Leiden clustering, or |
distanceMatrix |
The distance matrix used for clustering, or |
silhouetteScore |
The silhouette score for community detection. |
resolutionParameter |
The resolution parameter used for Leiden clustering. |
locus |
The locus identifier, for example |
Value
An object of class "GermlineCluster".
See Also
Optimize resolution parameter using silhouette score
Description
Performs a grid search over resolution parameters and selects the one that maximizes the silhouette score.
Usage
optimize_resolution(
g,
distance_matrix,
target_clusters = 80,
resolution_range_low = 0.1,
resolution_range_high = 0.5,
max_steps = 20,
ncores = 1
)
Arguments
g |
An igraph graph object with weighted edges |
distance_matrix |
The distance matrix (as dist object) used for silhouette calculation |
target_clusters |
Target number of clusters for initial tuning. Default is 80. |
resolution_range_low |
Fractional range below tuned resolution. Default is 0.1. |
resolution_range_high |
Fractional range above tuned resolution. Default is 0.5. |
max_steps |
Maximum steps for initial tuning. Default is 20. |
ncores |
Number of cores for parallel processing. Default is 1. |
Value
A list containing:
-
results: data.frame with Resolution, ClusterCount, Silhouette -
partitions: list of membership vectors for each resolution -
best_resolution: optimal resolution parameter -
best_partition: membership vector at optimal resolution -
best_clusters: number of clusters at optimal resolution
See Also
detect_communities_leiden, igClust
The Program for Ig clusters (PIgLET) package
Description
PIgLET is a suite of computational tools that improves genotype inference and downstream AIRR-seq data analysis. The package as two main tools. The first is Allele Clusters, this tool is designed to reduce the ambiguity within the IGHV alleles. The ambiguity is caused by duplicated or similar alleles which are shared among different genes. The second tool is an allele based genotype, that determined the presence of an allele based on a threshold derived from a naive population.
Allele Similarity Cluster
This section provides the functions that support the main tool of creating the allele similarity cluster form an IGHV germline set.
-
inferAlleleClusters: The main function of the section to create the allele clusters based on a germline set.
-
ighvDistance: Calculate the distance between IGHV aligned germline sequences.
-
ighvClust: Hierarchical clustering of the distance matrix from
ighvDistance. -
generateReferenceSet: Generate the allele clusters reference set.
-
plotAlleleCluster: Plots the Hierarchical clustering.
-
artificialFRW1Germline: Artificially create an IGHV reference set with framework1 (FWR1) primers.
Allele based genotype
This section provides the functions to infer the IGHV genotype using the allele based method and the allele clusters thresholds
-
inferGenotypeAllele: Infer the IGHV genotype using the allele based method.
-
assignAlleleClusters: Renames the v allele calls based on the new allele clusters.
-
germlineASC: Converts IGHV germline set to ASC germline set.
-
recentAlleleClusters: Download the most recent version of the allele clusters table archive from zenodo.
-
extractASCTable: Extracts the allele cluster table from the zenodo archive file.
-
zenodoArchive: An R6 object to query the zenodo api.
References
##
Plot method for GermlineCluster
Description
Plot method for GermlineCluster
Usage
## S3 method for class 'GermlineCluster'
plot(x, y = NULL, cex = 1, seed = 9999, ...)
Arguments
x |
GermlineCluster object |
y |
Not used |
cex |
Controls the size of the allele label. Default is 1. |
seed |
Set a seed number for drawing the dendrogram. Default 9999. |
... |
Additional arguments passed to plotting functions |
Value
A plot of the allele clusters dendrogram
Plotting the dendrogram of the clusters
Description
Plotting the dendrogram of the clusters
Usage
plotAlleleCluster(x, y = NULL, cex = 1, seed = 9999)
Arguments
x |
The GermlineCluster object. See inferAlleleClusters |
y |
NULL. not in use. |
cex |
Controls the size of the allele label. Default is 1. |
seed |
Set a seed number for drawing the dendrogram. Default 9999. |
Value
A plot of the allele clusters dendrogram
Compare hierarchical and Leiden clustering
Description
Creates a comparison visualization showing cluster assignments from both methods.
Usage
plotClusterComparison(hierarchical_result, leiden_result, ...)
Arguments
hierarchical_result |
GermlineCluster object from hierarchical clustering |
leiden_result |
GermlineCluster object from Leiden clustering |
... |
Additional arguments |
Value
A ggplot object showing cluster agreement
See Also
Plot community network
Description
Creates a network visualization of allele clusters from community detection.
Usage
plotCommunityNetwork(
x,
layout = c("fr", "kk", "circle"),
node_color = "cluster",
node_size = "degree",
edge_alpha = 0.3,
show_labels = TRUE,
label_size = 3,
...
)
Arguments
x |
A GermlineCluster object with Leiden clustering |
layout |
Network layout: "fr" (Fruchterman-Reingold, default), "kk" (Kamada-Kawai), or "circle" |
node_color |
Variable for node color: "cluster" (default), "family", or a color value |
node_size |
Variable for node size: "degree" (default), "fixed", or a numeric value |
edge_alpha |
Alpha transparency for edges. Default is 0.3. |
show_labels |
Logical. Show node labels. Default is TRUE. |
label_size |
Size of node labels. Default is 3. |
... |
Additional arguments |
Details
This function creates a network visualization showing:
Nodes representing alleles, colored by cluster
Edges weighted by sequence similarity
Layout optimized by specified algorithm
Value
A ggplot object
See Also
inferAlleleClusters, detect_communities_leiden
Examples
data(HVGERM)
asc <- inferAlleleClusters(HVGERM[1:30],
clustering_method = "leiden",
target_clusters = 5)
plotCommunityNetwork(asc)
Plot silhouette optimization results
Description
Creates a plot showing silhouette score and cluster count across resolution values.
Usage
plotSilhouetteOptimization(optimization_result, highlight_best = TRUE, ...)
Arguments
optimization_result |
Result from |
highlight_best |
Logical. Highlight optimal resolution. Default is TRUE. |
... |
Additional arguments |
Value
A ggplot object
See Also
Examples
data(HVGERM)
d <- igDistance(HVGERM[1:30], method = "hamming")
g <- distance_to_graph(d)
opt <- optimize_resolution(g, d, target_clusters = 5)
plotSilhouetteOptimization(opt)
Plot truncated tree visualization
Description
Creates a circular or dendrogram tree visualization collapsed to ASC subgroup level, with optional heatmap annotations showing family assignments.
Usage
plotTruncatedTree(
x,
layout = c("circular", "dendrogram"),
collapse_to = c("asc_subgroup", "iuis_subgroup", "family"),
label_style = c("asc", "iuis", "both"),
show_threshold_line = TRUE,
threshold = 0.25,
tip_size_by = "n_alleles",
tip_color_by = "present",
show_heatmap = TRUE,
label_size = 7,
...
)
Arguments
x |
A GermlineCluster object from |
layout |
Tree layout: "circular" (default) or "dendrogram" |
collapse_to |
Level to collapse tree: "asc_subgroup" (default, based on ASC names), "iuis_subgroup" (based on original IUIS gene names), or "family" |
label_style |
Label style for tips: "asc" (default, show ASC names like IGHVF1-G1), "iuis" (show IUIS names with superscript markers if ASC splits IUIS group), or "both" (show both names) |
show_threshold_line |
Logical. Show threshold line on tree. Default is TRUE. |
threshold |
Threshold height for threshold line (0-1 scale). Default is 0.25. |
tip_size_by |
Variable for tip point size: "n_alleles" (default), "fixed", or NULL |
tip_color_by |
Variable for tip point color: "present" (default), "fraction_novel", or NULL |
show_heatmap |
Logical. Show heatmap annotation for IUIS vs ASC families. Default is TRUE. |
label_size |
Size of tip labels. Default is 7. |
... |
Additional arguments passed to ggtree |
Details
This function creates a publication-quality tree visualization that:
Renames tree tips from original allele names to ASC names (new_allele)
Collapses alleles to ASC subgroup level (single representative per ASC group)
Shows tip point size by number of alleles in cluster
Adds optional heatmap track showing IUIS vs ASC family assignments
Draws threshold line at specified height
When using label_style = "iuis", if multiple ASC groups split a single IUIS
subgroup, the labels are marked with superscript letters (e.g., IGHV1-2^A, IGHV1-2^B)
to distinguish them.
Requires the ggtree package to be installed.
Value
A ggplot/ggtree object
See Also
inferAlleleClusters, plot.GermlineCluster
Examples
data(HVGERM)
asc <- inferAlleleClusters(HVGERM[1:50])
# Basic truncated tree with ASC labels
if (requireNamespace("ggtree", quietly = TRUE)) {
plotTruncatedTree(asc, show_heatmap = FALSE)
# With IUIS labels (marked if ASC splits IUIS group)
plotTruncatedTree(asc, label_style = "iuis", show_heatmap = FALSE)
}
Print method for GermlineCluster
Description
Print method for GermlineCluster
Usage
## S3 method for class 'GermlineCluster'
print(x, ...)
Arguments
x |
A GermlineCluster object |
... |
Additional arguments (ignored) |
Value
Invisibly returns x
Retrieving allele similarity clusters Zenodo archive
Description
A wrapper function for zenodoArchive, download the most recent allele similarity clusters and thresholds from the zenodo archive.
The clusters and thresholds are based on https://yaarilab.github.io/IGHV_reference_book/
At the moment only available for human IGHV reference set.
Usage
recentAlleleClusters(
doi = "10.5281/zenodo.7401189",
path,
get_file = FALSE,
quite = FALSE
)
Arguments
doi |
The doi for the archive to download. Default is the IGHV set. |
path |
The output folder for saving the archive files. Default is to a temporary directory. |
get_file |
Logical (FALSE by default). Do you want to return the path for the file downloaded. |
quite |
Logical (FALSE by default). Do you want to suppress informative messages |
Value
If get_file is TRUE, the function returns the path to the archive file
Examples
recentAlleleClusters(doi="10.5281/zenodo.7401189")
Summary method for GermlineCluster
Description
Summary method for GermlineCluster
Usage
## S3 method for class 'GermlineCluster'
summary(object, ...)
Arguments
object |
A GermlineCluster object |
... |
Additional arguments (ignored) |
Value
A list with summary statistics
zenodoArchive
Description
zenodoArchive
zenodoArchive
Format
R6Class object.
Value
Object of R6Class for modelling an zenodoArchive for ASC cluster files
Public fields
doizenodoArchive doi, NULL is not supplied
all_versionszenodoArchive if to return all versions,
truewhen not specifiedsortzenodoArchive how to sort the records,
mostrecentwhen not specifiedpagezenodoArchive which page to pull in query,
1when not specifiedsizezenodoArchive how many records per page,
20when not specifiedzenodoVersionszenodoArchive doi available version, a storing variable.
zenodoQueryzenodoArchive doi version query, a storing variable.
download_filezenodoArchive doi downloads files, a storing variable.
download_urlzenodoArchive doi downloads urls, a storing variable.
Methods
Public methods
Method new()
initializes the zenodoArchive
Usage
zenodoArchive$new( doi, page = 1, size = 20, all_versions = "true", sort = "mostrecent" )
Arguments
doiA zenodo doi. To retrieve all records supply a concept doi (a generic doi common to all versions).
pageWhich page to query. Default is 1
sizeHow many records per page. Default is 20
all_versionsIf to return all concept doi versions. If
truereturns all, iffalsereturns the latest. Default isturesortWhich sorting to apply on the records. Default is
mostrecent. Possible sortings "bestmatch", "mostrecent", "-mostrecent" (ascending), "version", "-version" (ascending).
Method clean_doi()
cleans the doi record for query
Usage
zenodoArchive$clean_doi(doi = self$doi)
Arguments
doiThe zenodo archive doi
Returns
the clean doi
Method zenodo_query()
Query the zenodo archive according to the initial parameters.
Usage
zenodoArchive$zenodo_query(...)
Arguments
...Excepts the self created by
initialize
Returns
a list with the query values.
Method get_versions()
Extract all concept doi available versions.
Usage
zenodoArchive$get_versions(...)
Arguments
...Excepts the self created by
initialize
Returns
a data.frame of the available versions.
Method get_version_files()
get the chosen doi archive version available files
Usage
zenodoArchive$get_version_files(version = "latest")
Arguments
versionwhich archive version files to get. Default to latest. To see all available version use
get_versions
Returns
a list of the available files in the archive version.
Method download_zenodo_files()
get the chosen doi archive version available files
Usage
zenodoArchive$download_zenodo_files( file = NULL, path = tempdir(), version = "latest", all_files = F, get_file_path = F, quite = F )
Arguments
fileIf supplied, downloads the specific file from the archive.
pathThe output folder for saving the archive files. Default is to a temporary directory.
versionwhich archive version files to get. Default to latest. To see all available version use
get_versionsall_filesLogical (FALSE by default). Do you want to download all files in the archive.
get_file_pathLogical (FALSE by default). Do you want to return the path for the file downloaded.
quiteLogical (FALSE by default). Do you want to suppress informative messages
Returns
If get_file_path is TRUE, the function returns the path to the archive file
Method clone()
The objects of this class are cloneable with this method.
Usage
zenodoArchive$clone(deep = FALSE)
Arguments
deepWhether to make a deep clone.
Examples
zenodo_archive <- zenodoArchive$new(
doi = "10.5281/zenodo.7401189"
)
# view available version ins the archive
archive_versions <- zenodo_archive$get_versions()
# Getting the available files in the latest zenodo archive version
files <- zenodo_archive$get_version_files()
# downloading the first file from the latest archive version
zenodo_archive$download_zenodo_files()