Help for package BMisc

Title:

Miscellaneous Functions for Panel Data, Quantiles, and Printing Results

Version:

1.4.8

Description:

These are miscellaneous functions for working with panel data, quantiles, and printing results. For panel data, the package includes functions for making a panel data balanced (that is, dropping missing individuals that have missing observations in any time period), converting id numbers to row numbers, and to treat repeated cross sections as panel data under the assumption of rank invariance. For quantiles, there are functions to make distribution functions from a set of data points (this is particularly useful when a distribution function is created in several steps), to combine distribution functions based on some external weights, and to invert distribution functions. Finally, there are several other miscellaneous functions for obtaining weighted means, weighted distribution functions, and weighted quantiles; to generate summary statistics and their differences for two groups; and to add or drop covariates from formulas.

Depends:

R (≥ 3.1.0)

Imports:

data.table, dplyr, Rcpp, caret, tidyr

License:

GPL-2

Suggests:

testthat (≥ 3.0.0), plm, tibble

Encoding:

UTF-8

RoxygenNote:

7.3.2

Config/testthat/edition:

LinkingTo:

Rcpp, RcppArmadillo

URL:

https://bcallaway11.github.io/BMisc/

BugReports:

https://github.com/bcallaway11/BMisc/issues

NeedsCompilation:

yes

Packaged:

2025-02-04 14:52:24 UTC; bmc43193

Author:

Brantly Callaway [aut, cre]

Maintainer:

Brantly Callaway <brantly.callaway@uga.edu>

Repository:

CRAN

Date/Publication:

2025-02-04 15:20:01 UTC

BMisc

Description

A set of miscellaneous helper functions

Author(s)

Maintainer: Brantly Callaway brantly.callaway@uga.edu

TorF

Description

A function to replace NA's with FALSE in vector of logicals

Usage

TorF(cond, use_isTRUE = FALSE)

Arguments

cond

a vector of conditions to check

use_isTRUE

whether or not to use a vectorized version of isTRUE. This is generally slower but covers more cases.

Value

logical vector

addCovToFormla

Description

Legacy version of 'add_cov_to_formula', please use that function instead. This function will eventually be deleted.

Usage

addCovToFormla(covs, formla)

Arguments

covs

should be a list of variable names

Add a Covariate to a Formula

Description

add_cov_to_formula adds some covariates to a formula; covs should be a list of variable names

Usage

add_cov_to_formula(covs, formula)

Arguments

covs

should be a list of variable names

formula

which formula to add covariates to

Value

formula

Examples

ff <- y ~ x
add_cov_to_formula(list("w", "z"), ff)

ff <- ~x
add_cov_to_formula("z", ff)

blockBootSample

Description

Legacy name for the function 'block_boot_sample', please use that function going forward. This function will eventually be deleted

Usage

blockBootSample(data, idname)

Arguments

data

data.frame from which you want to bootstrap

idname

column in data which contains an individual identifier

Block Bootstrap

Description

make draws of all observations with the same id in a panel data context. This is useful for bootstrapping with panel data.

Usage

block_boot_sample(data, idname)

Arguments

data

data.frame from which you want to bootstrap

idname

column in data which contains an individual identifier

Value

data.frame bootstrapped from the original dataset; this data.frame will contain new ids

Examples


data("LaborSupply", package = "plm")
bbs <- block_boot_sample(LaborSupply, "id")
nrow(bbs)
head(bbs$id)

check_staggered

Description

A function to check if treatment is staggered in a panel data set.

Usage

check_staggered(df, idname, treatname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

treatname

name of column with the treatment indicator

Value

a logical indicating whether treatment is staggered

check_staggered_inner

Description

A helper function to check if treatment is staggered in a panel data set.

Usage

check_staggered_inner(this_df, treatname)

Arguments

this_df

a data.frame, for this function it should be specific to a particular unit

treatname

name of column with the treatment indicator

Check Function

Description

The check function used for optimizing to get quantiles

Usage

checkfun(a, tau)

Arguments

a

vector to compute quantiles for

tau

between 0 and 1, ex. .5 implies get the median

Value

numeric value

Examples

x <- rnorm(100)
x[which.min(checkfun(x, 0.5))] ## should be around 0

combineDfs

Description

Legacy version of 'combine_ecdfs', please use that function instead. This function will eventually be deleted.

Usage

combineDfs(y.seq, dflist, pstrat = NULL, ...)

Arguments

y.seq

sequence of possible y values

dflist

list of distribution functions to combine

...

additional arguments that can be past to BMisc::make_dist

Combine Two Distribution Functions

Description

Combines two distribution functions with given weights by 'weights'

Usage

combine_ecdfs(y.seq, dflist, weights = NULL, ...)

Arguments

y.seq

sequence of possible y values

dflist

list of distribution functions to combine

weights

a vector of weights to put on each distribution function; if weights are not provided then equal weight is given to each distribution function

...

additional arguments that can be past to BMisc::make_dist

Value

ecdf

Examples

x <- rnorm(100)
y <- rnorm(100, 1, 1)
Fx <- ecdf(x)
Fy <- ecdf(y)
both <- combineDfs(seq(-2, 3, 0.1), list(Fx, Fy))
plot(Fx, col = "green")
plot(Fy, col = "blue", add = TRUE)
plot(both, add = TRUE)

compareBinary

Description

Legacy version of 'compare_binary', please use that function instead. This function will eventually be deleted.

Usage

compareBinary(
  x,
  on,
  dta,
  w = rep(1, nrow(dta)),
  report = c("diff", "levels", "both")
)

Arguments

x

variables to run regression on

on

binary variable

dta

the data to use

w

weights

report

which type of report to make; diff is the difference between the two variables by group

Compare Variables across Groups

Description

compare_binary takes in a variable e.g. union and runs bivariate regression of x on treatment (for summary statistics)

Usage

compare_binary(
  x,
  on,
  dta,
  w = rep(1, nrow(dta)),
  report = c("diff", "levels", "both")
)

Arguments

x

variables to run regression on

on

binary variable

dta

the data to use

w

weights

report

which type of report to make; diff is the difference between the two variables by group

Value

matrix of results

Compare a single variable across two groups

Description

compare_binary_inner takes in a variable e.g. union and runs bivariate regression of x on treatment (for summary statistics)

Usage

compare_binary_inner(
  x,
  on,
  dta,
  w = rep(1, nrow(dta)),
  report = c("diff", "levels", "both")
)

Arguments

x

variables to run regression on

on

binary variable

dta

the data to use

w

weights

report

which type of report to make; diff is the difference between the two variables by group

Value

matrix of results

Cross Section to Panel

Description

Turn repeated cross sections data into panel data by imposing rank invariance; does not require that the inputs have the same length

Usage

cs2panel(cs1, cs2, yname)

Arguments

cs1

data frame, the first cross section

cs2

data frame, the second cross section

yname

the name of the variable to calculate difference for (should be the same in each dataset)

Value

the change in outcomes over time

dropCovFromFormla

Description

Legacy version of 'drop_cov_from_formula', please use that function instead. This function will eventually be deleted.

Usage

dropCovFromFormla(covs, formla)

Arguments

covs

should be a list of variable names

drop_collinear

Description

A function to check for multicollinearity and drop collinear terms from a matrix

Usage

drop_collinear(matrix)

Arguments

matrix

a matrix for which the function will remove collinear columns

Value

a matrix with collinear columns removed

Drop a Covariate from a Formula

Description

drop_cov_from_formula adds drops some covariates from a formula; covs should be a list of variable names

Usage

drop_cov_from_formula(covs, formula)

Arguments

covs

should be a list of variable names

formula

the formula to drop covariates from

Value

formula

Examples

ff <- y ~ x + w + z
drop_cov_from_formula(list("w", "z"), ff)

drop_cov_from_formula("z", ff)

element_wise_mult

Description

This is a function that takes in two matrices of dimension nxB and nxk and returns a Bxk matrix that comes from element-wise multiplication of every column in the first matrix times the entire second matrix and the averaging over the n-dimension. It is equivalent (but faster than) the following R code: 'sapply(1:biters, function(b) sqrt(n)*colMeans(Umat[,b]*inf.func))' . This function is particularly useful for fast computations using the multiplier bootstrap.

Usage

element_wise_mult(U, inf_func)

Arguments

U

nxB matrix (e.g., these could be a matrix of Rademachar weights for B bootstrap iterations using the multiplier bootstrap

inf_func

nxk matrix of (e.g., these could be a matrix containing the influence function for different parameter estimates)

Value

a Bxk matrix

getListElement

Description

Legacy version of 'get_list_element', please use that function instead. This function will eventually be deleted.

Usage

getListElement(listolists, whichone = 1)

Arguments

listolists

a list

whichone

which item to get out of each list (can be numeric or name)

getWeightedDf

Description

Legacy version of 'weighted_ecdf', please use that function instead. This function will eventually be deleted.

Usage

getWeightedDf(y, y.seq = NULL, weights = NULL, norm = TRUE)

Arguments

y

a vector to compute the mean for

y.seq

an optional vector of values to compute the distribution function for; the default is to use all unique values of y

weights

the vector of weights, can be NULL, then will just return mean

norm

normalize the weights so that they have mean of 1, default is to normalize

getWeightedMean

Description

Legacy version of 'weighted_mean', please use that function instead. This function will eventually be deleted.

Usage

getWeightedMean(y, weights = NULL, norm = TRUE)

Arguments

y

a vector to compute the mean for

weights

the vector of weights, can be NULL, then will just return mean

norm

normalize the weights so that they have mean of 1, default is to normalize

getWeightedQuantiles

Description

Legacy version of 'weighted_quantile', please use that function instead. This function will eventually be deleted.

Usage

getWeightedQuantiles(tau, cvec, weights = NULL, norm = TRUE)

Arguments

tau

a vector of values between 0 and 1

cvec

a vector to compute quantiles for

weights

the weights, weighted.checkfun normalizes the weights to sum to 1.

norm

normalize the weights so that they have mean of 1, default is to normalize

get_Yi1

Description

A function to calculate outcomes for units in the first time period that is available in a panel data setting (this function can also be used to recover covariates, etc. in the first period).

Usage

get_Yi1(df, idname, yname, tname, gname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

gname

name of column containing the unit's group

get_Yi1_inner

Description

Calculates a units outcome in the first time period. This function operates on a data.frame that is already local to a particular unit.

Usage

get_Yi1_inner(this_df, yname, tname, gname)

Arguments

this_df

a data.frame, for this function it should be specific to a particular unit

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

gname

name of column containing the unit's group

get_YiGmin1

Description

A function to calculate outcomes for units in the period right before they become treated (this function can also be used to recover covariates, etc. in the period right before a unit becomes treated). For units that do not participate in the treatment (and therefore have group==0), they are assigned their outcome in the last period.

Usage

get_YiGmin1(df, idname, yname, tname, gname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

gname

name of column containing the unit's group

get_YiGmin1_inner

Description

Calculates a units outcome (or also can be used for a covariate) in the period right before it becomes treated. The unit's group must be specified at this point. This function operates on a data.frame that is already local to a particular unit.

Usage

get_YiGmin1_inner(this_df, yname, tname, gname)

Arguments

this_df

a data.frame, for this function it should be specific to a particular unit

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

gname

name of column containing the unit's group

get_Yibar

Description

A function to calculate the average outcome across all time periods separately for each unit in a panel data setting (this function can also be used to recover covariates, etc.).

Usage

get_Yibar(df, idname, yname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

get_Yibar_inner

Description

Calculates a units average outcome across all periods. This function operates on a data.frame that is already local to a particular unit.

Usage

get_Yibar_inner(this_df, yname)

Arguments

this_df

a data.frame, for this function it should be specific to a particular unit

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

get_Yibar_pre

Description

A function to calculate average outcomes for units in their pre-treatment periods (this function can also be used to recover pre-treatment averages of covariates, etc.). For units that do not participate in the treatment (and therefore have group==0), the function calculates their overall average outcome.

Usage

get_Yibar_pre(df, idname, yname, tname, gname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

gname

name of column containing the unit's group

get_Yibar_pre_inner

Description

Calculates a unit's average outcome in pre-treatment periods (or also can be used for a covariate). The unit's group must be specified at this point. This function operates on a data.frame that is already local to a particular unit.

Usage

get_Yibar_pre_inner(this_df, yname, tname, gname)

Arguments

this_df

a data.frame, for this function it should be specific to a particular unit

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

gname

name of column containing the unit's group

get_Yit

Description

A function to calculate outcomes for units in a particular time period 'tp' in a panel data setting (this function can also be used to recover covariates, etc. in the first period).

Usage

get_Yit(df, tp, idname, yname, tname)

Arguments

df

the data.frame used in the function

tp

The time period for which to get the outcome

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

Value

a vector of outcomes in period t, the vector will have the length nT (i.e., this is returned for each element in the panel, not for a particular period)

get_Yit_inner

Description

Calculates a units outcome in some particular period 'tp'. This function operates on a data.frame that is already local to a particular unit.

Usage

get_Yit_inner(this_df, tp, yname, tname)

Arguments

this_df

a data.frame, for this function it should be specific to a particular unit

tp

The time period for which to get the outcome

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

get_first_difference

Description

A function that calculates the first difference in a panel data setting. If the data.frame that is passed in has nxT rows, the resulting vector will also have nxT elements with one element for each unit set to be NA.

Usage

get_first_difference(df, idname, yname, tname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

get_group

Description

A function to calculate a unit's group in a panel data setting with a binary treatment and staggered treatment adoption and where there is a column in the data indicating whether or not a unit is treated

Usage

get_group(df, idname, tname, treatname)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

tname

name of column that holds the time period

treatname

name of column with the treatment indicator

get_group_inner

Description

Calculates the group for a particular unit

Usage

get_group_inner(this_df, tname, treatname)

Arguments

this_df

a data.frame, for this function it should be specific to a particular unit

tname

name of column that holds the time period

treatname

name of column with the treatment indicator

get_lagYi

Description

A function that calculates lagged outcomes in a panel data setting. If the data.frame that is passed in has nxT rows, the resulting vector will also have nxT elements with one element for each unit set to be NA

Usage

get_lagYi(df, idname, yname, tname, nlags = 1)

Arguments

df

the data.frame used in the function

idname

name of column that holds the unit id

yname

name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period

tname

name of column that holds the time period

nlags

The number of periods to lag. The default is 1, which computes the lag from the previous period.

Return Particular Element from Each Element in a List

Description

a function to take a list and get a particular part out of each element in the list

Usage

get_list_element(listolists, whichone = 1)

Arguments

listolists

a list

whichone

which item to get out of each list (can be numeric or name)

Value

list of all the elements 'whichone' from each list

Examples

len <- 100 # number elements in list
lis <- lapply(1:len, function(l) list(x = (-l), y = l^2)) # create list
getListElement(lis, "x")[1] # should be equal to -1
getListElement(lis, 1)[1] # should be equal to -1

get_principal_components

Description

A function to calculate unit-specific principal components, given panel data

Usage

get_principal_components(
  xformula,
  data,
  idname,
  tname,
  n_components = NULL,
  ret_wide = FALSE,
  ret_id = FALSE
)

Arguments

xformula

a formula specifying the variables to use in the principal component analysis

data

a data.frame containing the panel data

idname

the name of the column containing the unit id

tname

the name of the column containing the time period

n_components

the number of principal components to retain, the default is NULL which will result in all principal components being retained

ret_wide

whether to return the data in wide format (where the number of rows is equal to n = length(unique(data[[idname]])) or long format (where the number of rows is equal to nT = nrow(data)). The default is FALSE, so that long data is returned by default.

ret_id

whether to return the id column in the output data.frame. The default is FALSE.

Value

a data.frame containing the original data with the principal components appended

Take particular id and convert to row number

Description

id2rownum takes an id and converts it to the right row number in the dataset; ids should be unique in the dataset that is, don't pass the function panel data with multiple same ids

Usage

id2rownum(id, data, idname)

Arguments

id

a particular id

data

data frame

idname

unique id

Convert Vector of ids into Vector of Row Numbers

Description

ids2rownum takes a vector of ids and converts it to the right row number in the dataset; ids should be unique in the dataset that is, don't pass the function panel data with multiple same ids

Usage

ids2rownum(ids, data, idname)

Arguments

ids

vector of ids

data

data frame

idname

unique id

Value

vector of row numbers

Examples

ids <- seq(1, 1000, length.out = 100)
ids <- ids[order(runif(100))]
df <- data.frame(id = ids)
ids2rownum(df$id, df, "id")

invertEcdf

Description

Legacy function for 'invert_ecdf', please use that function instead. This function will eventually be deleted.

Usage

invertEcdf(df)

Arguments

df

an ecdf object

Invert Ecdf

Description

take an ecdf object and invert it to get a step-quantile function

Usage

invert_ecdf(df)

Arguments

df

an ecdf object

Value

stepfun object that contains the quantiles of the df

lhs.vars

Description

Legacy version of 'lhs_vars', please use that function instead. This function will eventually be deleted.

Usage

lhs.vars(formla)

Left-hand Side Variables

Description

Take a formula and return a vector of the variables on the left hand side, it will return NULL for a one sided formula

Usage

lhs_vars(formula)

Arguments

formula

a formula

Value

vector of variable names

Examples

ff <- yvar ~ x1 + x2
lhs.vars(ff)

makeBalancedPanel

Description

Legacy version of 'make_balanced_panel', please use that function name going forward, though this will still work for now.

Usage

makeBalancedPanel(data, idname, tname, return_data.table = FALSE)

Arguments

data

data.frame used in function

idname

unique id

tname

time period name

return_data.table

if TRUE, make_balanced_panel will return a data.table rather than a data.frame. Default is FALSE.

makeDist

Description

Legacy name of 'make_dist' function, please use that function instead. This function will eventually be deleted.

Usage

makeDist(
  x,
  Fx,
  sorted = FALSE,
  rearrange = FALSE,
  force01 = FALSE,
  method = "constant"
)

Arguments

x

vector of values

Fx

vector of the distribution function values

sorted

boolean indicating whether or not x is already sorted; computation is somewhat faster if already sorted

rearrange

boolean indicating whether or not should monotize distribution function

force01

boolean indicating whether or not to force the values of the distribution function (i.e. Fx) to be between 0 and 1

method

which method to pass to approxfun to approximate the distribution function. Default is "constant"; other possible choice is "linear". "constant" returns a step function, just like an empirical cdf; "linear" linearly interpolates between neighboring points.

Balance a Panel Data Set

Description

This function drops observations from data.frame that are not part of balanced panel data set.

Usage

make_balanced_panel(data, idname, tname, return_data.table = FALSE)

Arguments

data

data.frame used in function

idname

unique id

tname

time period name

return_data.table

if TRUE, make_balanced_panel will return a data.table rather than a data.frame. Default is FALSE.

Value

data.frame that is a balanced panel

Examples

id <- rep(seq(1, 100), each = 2) # individual ids for setting up a two period panel
t <- rep(seq(1, 2), 100) # time periods
y <- rnorm(200) # outcomes
dta <- data.frame(id = id, t = t, y = y) # make into data frame
dta <- dta[-7, ] # drop the 7th row from the dataset (which creates an unbalanced panel)
dta <- make_balanced_panel(dta, idname = "id", tname = "t")

Make a Distribution Function

Description

turn vectors of a values and their distribution function values into an ecdf. Vectors should be the same length and both increasing.

Usage

make_dist(
  x,
  Fx,
  sorted = FALSE,
  rearrange = FALSE,
  force01 = FALSE,
  method = "constant"
)

Arguments

x

vector of values

Fx

vector of the distribution function values

sorted

boolean indicating whether or not x is already sorted; computation is somewhat faster if already sorted

rearrange

boolean indicating whether or not should monotize distribution function

force01

boolean indicating whether or not to force the values of the distribution function (i.e. Fx) to be between 0 and 1

method

Value

ecdf

Examples

y <- rnorm(100)
y <- y[order(y)]
u <- runif(100)
u <- u[order(u)]
F <- make_dist(y, u)

multiplier_bootstrap

Description

A function that takes in an influence function (an nxk matrix) and the number of bootstrap iterations and returns a Bxk matrix of bootstrap results. This function uses Rademechar weights.

Usage

multiplier_bootstrap(inf_func, biters)

Arguments

inf_func

nxk matrix of (e.g., these could be a matrix containing the influence function for different parameter estimates)

biters

the number of bootstrap iterations

Value

a Bxk matrix

Matrix-Vector Multiplication

Description

This function multiplies a matrix by a vector and returns a numeric vector.

Usage

mv_mult(A, v)

Arguments

A

an nxk matrix.

v

a vector (can be stored as numeric or as a kx1 matrix)

Value

A numeric vector resulting from the multiplication of the matrix by the vector.

Examples

A <- matrix(1:9, nrow = 3, ncol = 3)
v <- c(2, 4, 6)
mv_mult(A, v)

orig2t

Description

A helper function to switch from original time periods to "new" time periods (which are just time periods going from 1 to total number of available periods). This allows for periods not being exactly spaced apart by 1.

Usage

orig2t(orig, original_time.periods)

Arguments

orig

a vector of original time periods to convert to new time periods.

original_time.periods

vector containing all original time periods.

Value

new time period converted from original time period

orig2t_inner

Description

A helper function to switch from original t values to "new" t values (which are just time periods going from 1 to total number of available periods).

Usage

orig2t_inner(orig, original_time.periods)

Arguments

orig

a single original time period to convert to new time period

original_time.periods

vector containing all original time periods.

Panel Data to Repeated Cross Sections

Description

panel2cs takes a 2 period dataset and turns it into a cross sectional dataset. The data includes the change in time varying variables between the time periods. The default functionality is to keep all the variables from period 1 and add all the variables listed by name in timevars from period 2 to those.

Usage

panel2cs(data, timevars, idname, tname)

Arguments

data

data.frame used in function

timevars

vector of names of variables to keep

idname

unique id

tname

time period name

Value

data.frame

Panel Data to Repeated Cross Sections

Description

panel2cs2 takes a 2 period dataset and turns it into a cross sectional dataset; i.e., long to wide. This function considers a particular case where there is some outcome whose value can change over time. It returns the dataset from the first period with the outcome in the second period and the change in outcomes over time appended to it

Usage

panel2cs2(data, yname, idname, tname, balance_panel = TRUE)

Arguments

data

data.frame used in function

yname

name of outcome variable that can change over time

idname

unique id

tname

time period name

balance_panel

whether to ensure that panel is balanced. Default is TRUE, but code runs somewhat faster if this is set to be FALSE.

Value

data from first period with .y0 (outcome in first period), .y1 (outcome in second period), and .dy (change in outcomes over time) appended to it

Right-hand Side of Formula

Description

Take a formula and return the right hand side of the formula

Usage

rhs(formula)

Arguments

formula

a formula

Value

a one sided formula

Examples

ff <- yvar ~ x1 + x2
rhs(ff)

rhs.vars

Description

Legacy version of 'rhs_vars', please use that function instead. This function will eventually be deleted.

Usage

rhs.vars(formla)

Arguments

formla

a formula

Right-hand Side Variables

Description

Take a formula and return a vector of the variables on the right hand side

Usage

rhs_vars(formula)

Arguments

formula

a formula

Value

vector of variable names

Examples

ff <- yvar ~ x1 + x2
rhs_vars(ff)

ff <- y ~ x1 + I(x1^2)
rhs_vars(ff)

source_all

Description

Source all the files in a folder

Usage

source_all(fldr)

Arguments

fldr

path to a folder

Subsample of Observations from Panel Data

Description

returns a subsample of a panel data set; in particular drops all observations that are not in keepids. If it is not set, randomly keeps nkeep observations.

Usage

subsample(dta, idname, tname, keepids = NULL, nkeep = NULL)

Arguments

dta

a data.frame which is a balanced panel

idname

the name of the id variable

tname

the name of the time variable

keepids

which ids to keep

nkeep

how many ids to keep (only used if keepids is not set); the default is the number of unique ids

Value

a data.frame that contains a subsample of dta

Examples


data("LaborSupply", package = "plm")
nrow(LaborSupply)
unique(LaborSupply$year)
ss <- subsample(LaborSupply, "id", "year", nkeep = 100)
nrow(ss)

t2orig

Description

A helper function to switch from "new" t values to original t values. This allows for periods not being exactly spaced apart by 1.

Usage

t2orig(t, original_time.periods)

Arguments

t

a vector of time periods to convert back to original time periods.

original_time.periods

vector containing all original time periods.

Value

original time period converted from new time period

t2orig_inner

Description

A helper function to switch from "new" t values to original t values for a single t.

Usage

t2orig_inner(t, original_time.periods)

Arguments

t

a single time period to convert back to original time

original_time.periods

vector containing all original time periods.

time_invariant_to_panel

Description

This function takes a time-invariant variable and repeats it for each period in a panel data set.

Usage

time_invariant_to_panel(x, df, idname, balanced_panel = TRUE)

Arguments

x

a vector of length equal to the number of unique ids in df.

df

the data.frame used in the function

idname

name of column that holds the unit id

balanced_panel

a logical indicating whether the panel is balanced. If TRUE, the function will optimize the repetition process. Default is TRUE.

Value

a vector of length equal to the number of rows in df.

Variable Names to Formula

Description

take a name for a y variable and a vector of names for x variables and turn them into a formula

Usage

toformula(yname, xnames)

Arguments

yname

the name of the y variable

xnames

vector of names for x variables

Value

a formula

Examples

toformula("yvar", c("x1", "x2"))

## should return yvar ~ 1
toformula("yvar", rhs.vars(~1))

weighted.checkfun

Description

Legacy version of 'weighted_checkfun', please use that function instead. This function will eventually be deleted.

Usage

weighted.checkfun(q, cvec, tau, weights)

Arguments

q

the value to check

cvec

vector of data to compute quantiles for

tau

between 0 and 1, ex. .5 implies get the median

weights

the weights, weighted.checkfun normalizes the weights to sum to 1.

Weighted Check Function

Description

Weights the check function

Usage

weighted_checkfun(q, cvec, tau, weights)

Arguments

q

the value to check

cvec

vector of data to compute quantiles for

tau

between 0 and 1, ex. .5 implies get the median

weights

the weights, weighted.checkfun normalizes the weights to sum to 1.

Value

numeric

weighted_combine_list

Description

A function that takes in either a list of vectors or matrices and computes a weighted average of them, where the weights are applied to every element in the list.

Usage

weighted_combine_list(l, w, normalize_weights = TRUE)

Arguments

l

a list that contains either vectors or matrices of the same dimension that are to be combined

w

a vector of weights, the weights should have the same number of elements as 'length(l)'

normalize_weights

whether or not to force the weights to sum to 1, default is true

Value

matrix or vector corresponding to the weighted average of all of the elements in 'l'

Weighted Distribution Function

Description

Get a distribution function from a vector of values after applying some weights

Usage

weighted_ecdf(y, y.seq = NULL, weights = NULL, norm = TRUE)

Arguments

y

a vector to compute the mean for

y.seq

an optional vector of values to compute the distribution function for; the default is to use all unique values of y

weights

the vector of weights, can be NULL, then will just return mean

norm

normalize the weights so that they have mean of 1, default is to normalize

Value

ecdf

Weighted Mean

Description

Get the mean applying some weights

Usage

weighted_mean(y, weights = NULL, norm = TRUE)

Arguments

y

a vector to compute the mean for

weights

the vector of weights, can be NULL, then will just return mean

norm

normalize the weights so that they have mean of 1, default is to normalize

Value

the weighted mean

weighted_quantile

Description

function to recover quantiles of a vector with weights

Usage

weighted_quantile(tau, cvec, weights = NULL, norm = TRUE)

Arguments

tau

a vector of values between 0 and 1

cvec

a vector to compute quantiles for

weights

the weights, weighted.checkfun normalizes the weights to sum to 1.

norm

normalize the weights so that they have mean of 1, default is to normalize

Value

vector of quantiles

Quantile of a Weighted Check Function

Description

Finds the quantile by optimizing the weighted check function

Usage

weighted_quantile_inner(tau, cvec, weights = NULL, norm = TRUE)

Arguments

tau

between 0 and 1, ex. .5 implies get the median

cvec

a vector to compute quantiles for

weights

the weights, weighted.checkfun normalizes the weights to sum to 1.

norm

normalize the weights so that they have mean of 1, default is to normalize

BMisc

Description

Author(s)

See Also

TorF

Description

Usage

Arguments

Value

addCovToFormla

Description

Usage

Arguments

Add a Covariate to a Formula

Description

Usage

Arguments

Value

Examples

blockBootSample

Description

Usage

Arguments

Block Bootstrap

Description

Usage

Arguments

Value

Examples

check_staggered

Description

Usage

Arguments

Value

check_staggered_inner

Description

Usage

Arguments

Check Function

Description

Usage

Arguments

Value

Examples

combineDfs

Description

Usage

Arguments

Combine Two Distribution Functions

Description

Usage

Arguments

Value

Examples

compareBinary

Description

Usage

Arguments

Compare Variables across Groups

Description

Usage

Arguments

Value

Compare a single variable across two groups

Description

Usage

Arguments

Value

Cross Section to Panel

Description

Usage

Arguments

Value

dropCovFromFormla

Description

Usage

Arguments

drop_collinear

Description

Usage