zero_col() and single_col():
get_colnames():
data.frame, tibble, and
data.table.GaussSuppression() updates:
auto_subSumAny to optionally
disable the automatic switch of the singleton method.GaussSuppression() for use in a
new algorithm for linked tables.DummyDuplicated() improvements:
any_duplicated_rows():
base::anyDuplicated(), implemented
similarly to RowGroups().GaussSuppression() function is updated with a new
parameter: cell_grouping.
x input structured as a block-diagonal
matrix.table_id and
auto_anySumNOTprimary.formula_term_labels().
?formula_utils.output_term_labels().term_labels in
tables_by_formulas(), used to include term labels in the
output.Matrix are moved from Depends to Imports
stringr
SSBtools has no dependencies beyond
standard R packages.stringr::str_split() with base R alternatives
in WildcardGlobbingVector() and
HierarchicalWildcardGlobbing().invert parameter in
WildcardGlobbingVector() and
HierarchicalWildcardGlobbing().
GaussSuppression(), along with a minor fix.
?NumSingleton for details on the
improvement to elimination (4th character).RbindAll().
RbindAll() now correctly handles data frames with 0
rows instead of producing an error.RbindAll() now also accepts NULL as
input.tables_by_formulas()
ModelMatrix()
function and its formula parameter.data.frame that contains the results for
all tables.model_aggregate()
avoid_hierarchical, input_in_output,
and total are direct parameters to
model_aggregate().
ModelMatrix() parameters
(avoidHierarchical, inputInOutput, and
total) had to be set via the mm_args
parameter. Old code remains functional.tibble and data.table
input (parameter data).
as.data.frame() to ensure consistent behavior.model_aggregate()
can now be speeded up.
aggregate_pkg = "data.table" to
utilize this possibility. Also note the related new parameter
aggregate_base_order.aggregate_na, to control
handling of missing values in grouping variables.
NAomit parameter to
Formula2ModelMatrix(), which makes it meaningful to include
NAs in the grouping variables.aggregate_na = TRUE, NAs in grouping variables are
retained during pre-aggregation.GaussSuppression() – now removes
duplicate rows
removeDuplicated
parameter.ModelMatrix() that uses the
hierarchies parameter together with
inputInOutput = FALSE.printXdim, which
can be used to print information about dimensional changes to the
console.map_hierarchies_to_data()
when_overwritten.add_comment.hierarchies_as_vars()
drop_codes and
include_codes.combine_formulas() is
fixed
"+" operator,filter_by_variable() and
names_by_variable() are functions toExtend0fromModelMatrixInput(), marked as internal, is a
specialized version of Extend0()ModelMatrix().AutoHierarchies() has been updated to recognize common
from-to names, and the sign variable is now optional.
See the new parameter autoNames for details on
common from-to names.
Also note the new parameter autoLevel, with a
default value (TRUE) that ensures the function behaves as
it always has.
NAs in the ‘to’ variable are now allowed to support common hierarchies, and rows where ‘to’ == ‘from’ are also allowed. Such rows are removed before processing the hierarchy, with a warning when relevant (Codes removed due to ‘to’ == ‘from’ or ‘to’ == NA).
Output from functions like get_klass() in the klassR package or
hier_create() in the sdcHierarchies
package can now be used directly as input.
Example of usage:
a <- get_klass(classification = "24")
b <- hier_create(root = "Total", nodes = LETTERS[1:5])
AutoHierarchies(list(tree = a, letter = b))hierarchies_as_vars():
vars_to_hierarchies():
hierarchies_as_vars().map_hierarchies_to_data():
hierarchies_as_vars() to transform hierarchies,
followed by mapping to the dataset.max_contribution() with wrapper
n_contributors().
MaxContribution() and
Ncontributors() developed in the GaussSuppression
package.table_all_integers().
total_collapse().
substitute_formula_vars().
?formula_utils.formula_include_hierarchies(),
which has been renamed for clarity and corrected to produce the intended
output.FormulaSums() when
viaSparseMatrix = TRUE.
NAomit.viaSparseMatrix = FALSE) already
handled this correctly.Extent0().
hierarchical = FALSE.FormulaSelection() and its
identical wrapper formula_selection().
FormulaSelection() and thereby the
identical wrapper formula_selection() have been
generalized.
logical: When TRUE,
the logical selection vector is returned.FormulaSelection() is now a generic function, allowing
methods for other input objects to be added.GaussSuppression() function and related
functionality have now been documented in a “Privacy in Statistical
Databases 2024” paper.
data.table package is listed under
Suggests and can be utilized in two functions. See below.aggregate_by_pkg()
data.table.include_na: A logical value
indicating whether NA values in the grouping variables
should be included in the aggregation. Default is
FALSE.NAomit is new parameter to RowGroups() and
Formula2ModelMatrix()/FormulaSums().
ModelMatrix().pkg is new parameter to RowGroups()
"base" (default) or
"data.table" (for improved speed).Formula2ModelMatrix()/FormulaSums().
ModelMatrix().Matrix::sparseMatrix() instead of building the transposed
matrix with rbind() based on numerous
Matrix::fac2sparse() calls.rowGroupsPackage, to
data.table.ModelMatrix() is fixed.
viaOrdinary = TRUE, model.matrix() or
sparse.model.matrix() was called twice.combine_formulas() is improved
ModelMatrix() function and related functionality
for hierarchical computations have now been documented in a paper in The
R Journal.
remove_empty is an explicit parameter to
model_aggregate().
mm_args
parameter. Old code works as before.?formula_utilsExtend0() to allow even more advanced
possibilities by varGroups-attribute.GaussSuppression(),
"anySum" in
GaussSuppression() to align with best theory.
singletonMethod to either "anySumOld" or
"anySumNOTprimaryOld".quantile_weighted().
quantile_weighted(x=c(0,2,0), weights = c(1,1,0))
correctly outputs the 50% value as 1.CheckInput() or check_input().