Title: | Miscellaneous Functions for Panel Data, Quantiles, and Printing Results |
Version: | 1.4.8 |
Description: | These are miscellaneous functions for working with panel data, quantiles, and printing results. For panel data, the package includes functions for making a panel data balanced (that is, dropping missing individuals that have missing observations in any time period), converting id numbers to row numbers, and to treat repeated cross sections as panel data under the assumption of rank invariance. For quantiles, there are functions to make distribution functions from a set of data points (this is particularly useful when a distribution function is created in several steps), to combine distribution functions based on some external weights, and to invert distribution functions. Finally, there are several other miscellaneous functions for obtaining weighted means, weighted distribution functions, and weighted quantiles; to generate summary statistics and their differences for two groups; and to add or drop covariates from formulas. |
Depends: | R (≥ 3.1.0) |
Imports: | data.table, dplyr, Rcpp, caret, tidyr |
License: | GPL-2 |
Suggests: | testthat (≥ 3.0.0), plm, tibble |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Config/testthat/edition: | 3 |
LinkingTo: | Rcpp, RcppArmadillo |
URL: | https://bcallaway11.github.io/BMisc/ |
BugReports: | https://github.com/bcallaway11/BMisc/issues |
NeedsCompilation: | yes |
Packaged: | 2025-02-04 14:52:24 UTC; bmc43193 |
Author: | Brantly Callaway [aut, cre] |
Maintainer: | Brantly Callaway <brantly.callaway@uga.edu> |
Repository: | CRAN |
Date/Publication: | 2025-02-04 15:20:01 UTC |
BMisc
Description
A set of miscellaneous helper functions
Author(s)
Maintainer: Brantly Callaway brantly.callaway@uga.edu
See Also
Useful links:
TorF
Description
A function to replace NA's with FALSE in vector of logicals
Usage
TorF(cond, use_isTRUE = FALSE)
Arguments
cond |
a vector of conditions to check |
use_isTRUE |
whether or not to use a vectorized version of isTRUE. This is generally slower but covers more cases. |
Value
logical vector
addCovToFormla
Description
Legacy version of 'add_cov_to_formula', please use that function instead. This function will eventually be deleted.
Usage
addCovToFormla(covs, formla)
Arguments
covs |
should be a list of variable names |
Add a Covariate to a Formula
Description
add_cov_to_formula
adds some covariates to a formula;
covs should be a list of variable names
Usage
add_cov_to_formula(covs, formula)
Arguments
covs |
should be a list of variable names |
formula |
which formula to add covariates to |
Value
formula
Examples
ff <- y ~ x
add_cov_to_formula(list("w", "z"), ff)
ff <- ~x
add_cov_to_formula("z", ff)
blockBootSample
Description
Legacy name for the function 'block_boot_sample', please use that function going forward. This function will eventually be deleted
Usage
blockBootSample(data, idname)
Arguments
data |
data.frame from which you want to bootstrap |
idname |
column in data which contains an individual identifier |
Block Bootstrap
Description
make draws of all observations with the same id in a panel data context. This is useful for bootstrapping with panel data.
Usage
block_boot_sample(data, idname)
Arguments
data |
data.frame from which you want to bootstrap |
idname |
column in data which contains an individual identifier |
Value
data.frame bootstrapped from the original dataset; this data.frame will contain new ids
Examples
data("LaborSupply", package = "plm")
bbs <- block_boot_sample(LaborSupply, "id")
nrow(bbs)
head(bbs$id)
check_staggered
Description
A function to check if treatment is staggered in a panel data set.
Usage
check_staggered(df, idname, treatname)
Arguments
df |
the data.frame used in the function |
idname |
name of column that holds the unit id |
treatname |
name of column with the treatment indicator |
Value
a logical indicating whether treatment is staggered
check_staggered_inner
Description
A helper function to check if treatment is staggered in a panel data set.
Usage
check_staggered_inner(this_df, treatname)
Arguments
this_df |
a data.frame, for this function it should be specific to a particular unit |
treatname |
name of column with the treatment indicator |
Check Function
Description
The check function used for optimizing to get quantiles
Usage
checkfun(a, tau)
Arguments
a |
vector to compute quantiles for |
tau |
between 0 and 1, ex. .5 implies get the median |
Value
numeric value
Examples
x <- rnorm(100)
x[which.min(checkfun(x, 0.5))] ## should be around 0
combineDfs
Description
Legacy version of 'combine_ecdfs', please use that function instead. This function will eventually be deleted.
Usage
combineDfs(y.seq, dflist, pstrat = NULL, ...)
Arguments
y.seq |
sequence of possible y values |
dflist |
list of distribution functions to combine |
... |
additional arguments that can be past to BMisc::make_dist |
Combine Two Distribution Functions
Description
Combines two distribution functions with given weights by 'weights'
Usage
combine_ecdfs(y.seq, dflist, weights = NULL, ...)
Arguments
y.seq |
sequence of possible y values |
dflist |
list of distribution functions to combine |
weights |
a vector of weights to put on each distribution function; if weights are not provided then equal weight is given to each distribution function |
... |
additional arguments that can be past to BMisc::make_dist |
Value
ecdf
Examples
x <- rnorm(100)
y <- rnorm(100, 1, 1)
Fx <- ecdf(x)
Fy <- ecdf(y)
both <- combineDfs(seq(-2, 3, 0.1), list(Fx, Fy))
plot(Fx, col = "green")
plot(Fy, col = "blue", add = TRUE)
plot(both, add = TRUE)
compareBinary
Description
Legacy version of 'compare_binary', please use that function instead. This function will eventually be deleted.
Usage
compareBinary(
x,
on,
dta,
w = rep(1, nrow(dta)),
report = c("diff", "levels", "both")
)
Arguments
x |
variables to run regression on |
on |
binary variable |
dta |
the data to use |
w |
weights |
report |
which type of report to make; diff is the difference between the two variables by group |
Compare Variables across Groups
Description
compare_binary
takes in a variable e.g. union
and runs bivariate regression of x on treatment (for summary statistics)
Usage
compare_binary(
x,
on,
dta,
w = rep(1, nrow(dta)),
report = c("diff", "levels", "both")
)
Arguments
x |
variables to run regression on |
on |
binary variable |
dta |
the data to use |
w |
weights |
report |
which type of report to make; diff is the difference between the two variables by group |
Value
matrix of results
Compare a single variable across two groups
Description
compare_binary_inner
takes in a variable e.g. union
and runs bivariate regression of x on treatment (for summary statistics)
Usage
compare_binary_inner(
x,
on,
dta,
w = rep(1, nrow(dta)),
report = c("diff", "levels", "both")
)
Arguments
x |
variables to run regression on |
on |
binary variable |
dta |
the data to use |
w |
weights |
report |
which type of report to make; diff is the difference between the two variables by group |
Value
matrix of results
Cross Section to Panel
Description
Turn repeated cross sections data into panel data by imposing rank invariance; does not require that the inputs have the same length
Usage
cs2panel(cs1, cs2, yname)
Arguments
cs1 |
data frame, the first cross section |
cs2 |
data frame, the second cross section |
yname |
the name of the variable to calculate difference for (should be the same in each dataset) |
Value
the change in outcomes over time
dropCovFromFormla
Description
Legacy version of 'drop_cov_from_formula', please use that function instead. This function will eventually be deleted.
Usage
dropCovFromFormla(covs, formla)
Arguments
covs |
should be a list of variable names |
drop_collinear
Description
A function to check for multicollinearity and drop collinear terms from a matrix
Usage
drop_collinear(matrix)
Arguments
matrix |
a matrix for which the function will remove collinear columns |
Value
a matrix with collinear columns removed
Drop a Covariate from a Formula
Description
drop_cov_from_formula
adds drops some covariates from a
formula; covs should be a list of variable names
Usage
drop_cov_from_formula(covs, formula)
Arguments
covs |
should be a list of variable names |
formula |
the formula to drop covariates from |
Value
formula
Examples
ff <- y ~ x + w + z
drop_cov_from_formula(list("w", "z"), ff)
drop_cov_from_formula("z", ff)
element_wise_mult
Description
This is a function that takes in two matrices of dimension nxB and nxk and returns a Bxk matrix that comes from element-wise multiplication of every column in the first matrix times the entire second matrix and the averaging over the n-dimension. It is equivalent (but faster than) the following R code: 'sapply(1:biters, function(b) sqrt(n)*colMeans(Umat[,b]*inf.func))' . This function is particularly useful for fast computations using the multiplier bootstrap.
Usage
element_wise_mult(U, inf_func)
Arguments
U |
nxB matrix (e.g., these could be a matrix of Rademachar weights for B bootstrap iterations using the multiplier bootstrap |
inf_func |
nxk matrix of (e.g., these could be a matrix containing the influence function for different parameter estimates) |
Value
a Bxk matrix
getListElement
Description
Legacy version of 'get_list_element', please use that function instead. This function will eventually be deleted.
Usage
getListElement(listolists, whichone = 1)
Arguments
listolists |
a list |
whichone |
which item to get out of each list (can be numeric or name) |
getWeightedDf
Description
Legacy version of 'weighted_ecdf', please use that function instead. This function will eventually be deleted.
Usage
getWeightedDf(y, y.seq = NULL, weights = NULL, norm = TRUE)
Arguments
y |
a vector to compute the mean for |
y.seq |
an optional vector of values to compute the distribution function for; the default is to use all unique values of y |
weights |
the vector of weights, can be NULL, then will just return mean |
norm |
normalize the weights so that they have mean of 1, default is to normalize |
getWeightedMean
Description
Legacy version of 'weighted_mean', please use that function instead. This function will eventually be deleted.
Usage
getWeightedMean(y, weights = NULL, norm = TRUE)
Arguments
y |
a vector to compute the mean for |
weights |
the vector of weights, can be NULL, then will just return mean |
norm |
normalize the weights so that they have mean of 1, default is to normalize |
getWeightedQuantiles
Description
Legacy version of 'weighted_quantile', please use that function instead. This function will eventually be deleted.
Usage
getWeightedQuantiles(tau, cvec, weights = NULL, norm = TRUE)
Arguments
tau |
a vector of values between 0 and 1 |
cvec |
a vector to compute quantiles for |
weights |
the weights, weighted.checkfun normalizes the weights to sum to 1. |
norm |
normalize the weights so that they have mean of 1, default is to normalize |
get_Yi1
Description
A function to calculate outcomes for units in the first time period that is available in a panel data setting (this function can also be used to recover covariates, etc. in the first period).
Usage
get_Yi1(df, idname, yname, tname, gname)
Arguments
df |
the data.frame used in the function |
idname |
name of column that holds the unit id |
yname |
name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period |
tname |
name of column that holds the time period |
gname |
name of column containing the unit's group |
get_Yi1_inner
Description
Calculates a units outcome in the first time period. This function operates on a data.frame that is already local to a particular unit.
Usage
get_Yi1_inner(this_df, yname, tname, gname)
Arguments
this_df |
a data.frame, for this function it should be specific to a particular unit |
yname |
name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period |
tname |
name of column that holds the time period |
gname |
name of column containing the unit's group |
get_YiGmin1
Description
A function to calculate outcomes for units in the period right before they become treated (this function can also be used to recover covariates, etc. in the period right before a unit becomes treated). For units that do not participate in the treatment (and therefore have group==0), they are assigned their outcome in the last period.
Usage
get_YiGmin1(df, idname, yname, tname, gname)
Arguments
df |
the data.frame used in the function |
idname |
name of column that holds the unit id |
yname |
name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period |
tname |
name of column that holds the time period |
gname |
name of column containing the unit's group |
get_YiGmin1_inner
Description
Calculates a units outcome (or also can be used for a covariate) in the period right before it becomes treated. The unit's group must be specified at this point. This function operates on a data.frame that is already local to a particular unit.
Usage
get_YiGmin1_inner(this_df, yname, tname, gname)
Arguments
this_df |
a data.frame, for this function it should be specific to a particular unit |
yname |
name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period |
tname |
name of column that holds the time period |
gname |
name of column containing the unit's group |
get_Yibar
Description
A function to calculate the average outcome across all time periods separately for each unit in a panel data setting (this function can also be used to recover covariates, etc.).
Usage
get_Yibar(df, idname, yname)
Arguments
df |
the data.frame used in the function |
idname |
name of column that holds the unit id |
yname |
name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period |
get_Yibar_inner
Description
Calculates a units average outcome across all periods. This function operates on a data.frame that is already local to a particular unit.
Usage
get_Yibar_inner(this_df, yname)
Arguments
this_df |
a data.frame, for this function it should be specific to a particular unit |
yname |
name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period |
get_Yibar_pre
Description
A function to calculate average outcomes for units in their pre-treatment periods (this function can also be used to recover pre-treatment averages of covariates, etc.). For units that do not participate in the treatment (and therefore have group==0), the function calculates their overall average outcome.
Usage
get_Yibar_pre(df, idname, yname, tname, gname)
Arguments
df |
the data.frame used in the function |
idname |
name of column that holds the unit id |
yname |
name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period |
tname |
name of column that holds the time period |
gname |
name of column containing the unit's group |
get_Yibar_pre_inner
Description
Calculates a unit's average outcome in pre-treatment periods (or also can be used for a covariate). The unit's group must be specified at this point. This function operates on a data.frame that is already local to a particular unit.
Usage
get_Yibar_pre_inner(this_df, yname, tname, gname)
Arguments
this_df |
a data.frame, for this function it should be specific to a particular unit |
yname |
name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period |
tname |
name of column that holds the time period |
gname |
name of column containing the unit's group |
get_Yit
Description
A function to calculate outcomes for units in a particular time period 'tp' in a panel data setting (this function can also be used to recover covariates, etc. in the first period).
Usage
get_Yit(df, tp, idname, yname, tname)
Arguments
df |
the data.frame used in the function |
tp |
The time period for which to get the outcome |
idname |
name of column that holds the unit id |
yname |
name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period |
tname |
name of column that holds the time period |
Value
a vector of outcomes in period t, the vector will have the length nT (i.e., this is returned for each element in the panel, not for a particular period)
get_Yit_inner
Description
Calculates a units outcome in some particular period 'tp'. This function operates on a data.frame that is already local to a particular unit.
Usage
get_Yit_inner(this_df, tp, yname, tname)
Arguments
this_df |
a data.frame, for this function it should be specific to a particular unit |
tp |
The time period for which to get the outcome |
yname |
name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period |
tname |
name of column that holds the time period |
get_first_difference
Description
A function that calculates the first difference in a panel data setting. If the data.frame that is passed in has nxT rows, the resulting vector will also have nxT elements with one element for each unit set to be NA.
Usage
get_first_difference(df, idname, yname, tname)
Arguments
df |
the data.frame used in the function |
idname |
name of column that holds the unit id |
yname |
name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period |
tname |
name of column that holds the time period |
get_group
Description
A function to calculate a unit's group in a panel data setting with a binary treatment and staggered treatment adoption and where there is a column in the data indicating whether or not a unit is treated
Usage
get_group(df, idname, tname, treatname)
Arguments
df |
the data.frame used in the function |
idname |
name of column that holds the unit id |
tname |
name of column that holds the time period |
treatname |
name of column with the treatment indicator |
get_group_inner
Description
Calculates the group for a particular unit
Usage
get_group_inner(this_df, tname, treatname)
Arguments
this_df |
a data.frame, for this function it should be specific to a particular unit |
tname |
name of column that holds the time period |
treatname |
name of column with the treatment indicator |
get_lagYi
Description
A function that calculates lagged outcomes in a panel data setting. If the data.frame that is passed in has nxT rows, the resulting vector will also have nxT elements with one element for each unit set to be NA
Usage
get_lagYi(df, idname, yname, tname, nlags = 1)
Arguments
df |
the data.frame used in the function |
idname |
name of column that holds the unit id |
yname |
name of column containing the outcome (or other variable) for which to calculate its outcome in the immediate pre-treatment period |
tname |
name of column that holds the time period |
nlags |
The number of periods to lag. The default is 1, which computes the lag from the previous period. |
Return Particular Element from Each Element in a List
Description
a function to take a list and get a particular part out of each element in the list
Usage
get_list_element(listolists, whichone = 1)
Arguments
listolists |
a list |
whichone |
which item to get out of each list (can be numeric or name) |
Value
list of all the elements 'whichone' from each list
Examples
len <- 100 # number elements in list
lis <- lapply(1:len, function(l) list(x = (-l), y = l^2)) # create list
getListElement(lis, "x")[1] # should be equal to -1
getListElement(lis, 1)[1] # should be equal to -1
get_principal_components
Description
A function to calculate unit-specific principal components, given panel data
Usage
get_principal_components(
xformula,
data,
idname,
tname,
n_components = NULL,
ret_wide = FALSE,
ret_id = FALSE
)
Arguments
xformula |
a formula specifying the variables to use in the principal component analysis |
data |
a data.frame containing the panel data |
idname |
the name of the column containing the unit id |
tname |
the name of the column containing the time period |
n_components |
the number of principal components to retain, the default is NULL which will result in all principal components being retained |
ret_wide |
whether to return the data in wide format (where the number of rows is equal to n = length(unique(data[[idname]])) or long format (where the number of rows is equal to nT = nrow(data)). The default is FALSE, so that long data is returned by default. |
ret_id |
whether to return the id column in the output data.frame. The default is FALSE. |
Value
a data.frame containing the original data with the principal components appended
Take particular id and convert to row number
Description
id2rownum takes an id and converts it to the right row number in the dataset; ids should be unique in the dataset that is, don't pass the function panel data with multiple same ids
Usage
id2rownum(id, data, idname)
Arguments
id |
a particular id |
data |
data frame |
idname |
unique id |
Convert Vector of ids into Vector of Row Numbers
Description
ids2rownum takes a vector of ids and converts it to the right row number in the dataset; ids should be unique in the dataset that is, don't pass the function panel data with multiple same ids
Usage
ids2rownum(ids, data, idname)
Arguments
ids |
vector of ids |
data |
data frame |
idname |
unique id |
Value
vector of row numbers
Examples
ids <- seq(1, 1000, length.out = 100)
ids <- ids[order(runif(100))]
df <- data.frame(id = ids)
ids2rownum(df$id, df, "id")
invertEcdf
Description
Legacy function for 'invert_ecdf', please use that function instead. This function will eventually be deleted.
Usage
invertEcdf(df)
Arguments
df |
an ecdf object |
Invert Ecdf
Description
take an ecdf object and invert it to get a step-quantile function
Usage
invert_ecdf(df)
Arguments
df |
an ecdf object |
Value
stepfun object that contains the quantiles of the df
lhs.vars
Description
Legacy version of 'lhs_vars', please use that function instead. This function will eventually be deleted.
Usage
lhs.vars(formla)
Left-hand Side Variables
Description
Take a formula and return a vector of the variables on the left hand side, it will return NULL for a one sided formula
Usage
lhs_vars(formula)
Arguments
formula |
a formula |
Value
vector of variable names
Examples
ff <- yvar ~ x1 + x2
lhs.vars(ff)
makeBalancedPanel
Description
Legacy version of 'make_balanced_panel', please use that function name going forward, though this will still work for now.
Usage
makeBalancedPanel(data, idname, tname, return_data.table = FALSE)
Arguments
data |
data.frame used in function |
idname |
unique id |
tname |
time period name |
return_data.table |
if TRUE, make_balanced_panel will return a data.table rather than a data.frame. Default is FALSE. |
makeDist
Description
Legacy name of 'make_dist' function, please use that function instead. This function will eventually be deleted.
Usage
makeDist(
x,
Fx,
sorted = FALSE,
rearrange = FALSE,
force01 = FALSE,
method = "constant"
)
Arguments
x |
vector of values |
Fx |
vector of the distribution function values |
sorted |
boolean indicating whether or not x is already sorted; computation is somewhat faster if already sorted |
rearrange |
boolean indicating whether or not should monotize distribution function |
force01 |
boolean indicating whether or not to force the values of the distribution function (i.e. Fx) to be between 0 and 1 |
method |
which method to pass to |
Balance a Panel Data Set
Description
This function drops observations from data.frame that are not part of balanced panel data set.
Usage
make_balanced_panel(data, idname, tname, return_data.table = FALSE)
Arguments
data |
data.frame used in function |
idname |
unique id |
tname |
time period name |
return_data.table |
if TRUE, make_balanced_panel will return a data.table rather than a data.frame. Default is FALSE. |
Value
data.frame that is a balanced panel
Examples
id <- rep(seq(1, 100), each = 2) # individual ids for setting up a two period panel
t <- rep(seq(1, 2), 100) # time periods
y <- rnorm(200) # outcomes
dta <- data.frame(id = id, t = t, y = y) # make into data frame
dta <- dta[-7, ] # drop the 7th row from the dataset (which creates an unbalanced panel)
dta <- make_balanced_panel(dta, idname = "id", tname = "t")
Make a Distribution Function
Description
turn vectors of a values and their distribution function values into an ecdf. Vectors should be the same length and both increasing.
Usage
make_dist(
x,
Fx,
sorted = FALSE,
rearrange = FALSE,
force01 = FALSE,
method = "constant"
)
Arguments
x |
vector of values |
Fx |
vector of the distribution function values |
sorted |
boolean indicating whether or not x is already sorted; computation is somewhat faster if already sorted |
rearrange |
boolean indicating whether or not should monotize distribution function |
force01 |
boolean indicating whether or not to force the values of the distribution function (i.e. Fx) to be between 0 and 1 |
method |
which method to pass to |
Value
ecdf
Examples
y <- rnorm(100)
y <- y[order(y)]
u <- runif(100)
u <- u[order(u)]
F <- make_dist(y, u)
multiplier_bootstrap
Description
A function that takes in an influence function (an nxk matrix) and the number of bootstrap iterations and returns a Bxk matrix of bootstrap results. This function uses Rademechar weights.
Usage
multiplier_bootstrap(inf_func, biters)
Arguments
inf_func |
nxk matrix of (e.g., these could be a matrix containing the influence function for different parameter estimates) |
biters |
the number of bootstrap iterations |
Value
a Bxk matrix
Matrix-Vector Multiplication
Description
This function multiplies a matrix by a vector and returns a numeric vector.
Usage
mv_mult(A, v)
Arguments
A |
an nxk matrix. |
v |
a vector (can be stored as numeric or as a kx1 matrix) |
Value
A numeric vector resulting from the multiplication of the matrix by the vector.
Examples
A <- matrix(1:9, nrow = 3, ncol = 3)
v <- c(2, 4, 6)
mv_mult(A, v)
orig2t
Description
A helper function to switch from original time periods to "new" time periods (which are just time periods going from 1 to total number of available periods). This allows for periods not being exactly spaced apart by 1.
Usage
orig2t(orig, original_time.periods)
Arguments
orig |
a vector of original time periods to convert to new time periods. |
original_time.periods |
vector containing all original time periods. |
Value
new time period converted from original time period
orig2t_inner
Description
A helper function to switch from original t values to "new" t values (which are just time periods going from 1 to total number of available periods).
Usage
orig2t_inner(orig, original_time.periods)
Arguments
orig |
a single original time period to convert to new time period |
original_time.periods |
vector containing all original time periods. |
Panel Data to Repeated Cross Sections
Description
panel2cs takes a 2 period dataset and turns it into a cross sectional dataset. The data includes the change in time varying variables between the time periods. The default functionality is to keep all the variables from period 1 and add all the variables listed by name in timevars from period 2 to those.
Usage
panel2cs(data, timevars, idname, tname)
Arguments
data |
data.frame used in function |
timevars |
vector of names of variables to keep |
idname |
unique id |
tname |
time period name |
Value
data.frame
Panel Data to Repeated Cross Sections
Description
panel2cs2 takes a 2 period dataset and turns it into a cross sectional dataset; i.e., long to wide. This function considers a particular case where there is some outcome whose value can change over time. It returns the dataset from the first period with the outcome in the second period and the change in outcomes over time appended to it
Usage
panel2cs2(data, yname, idname, tname, balance_panel = TRUE)
Arguments
data |
data.frame used in function |
yname |
name of outcome variable that can change over time |
idname |
unique id |
tname |
time period name |
balance_panel |
whether to ensure that panel is balanced. Default is TRUE, but code runs somewhat faster if this is set to be FALSE. |
Value
data from first period with .y0 (outcome in first period), .y1 (outcome in second period), and .dy (change in outcomes over time) appended to it
Right-hand Side of Formula
Description
Take a formula and return the right hand side of the formula
Usage
rhs(formula)
Arguments
formula |
a formula |
Value
a one sided formula
Examples
ff <- yvar ~ x1 + x2
rhs(ff)
rhs.vars
Description
Legacy version of 'rhs_vars', please use that function instead. This function will eventually be deleted.
Usage
rhs.vars(formla)
Arguments
formla |
a formula |
Right-hand Side Variables
Description
Take a formula and return a vector of the variables on the right hand side
Usage
rhs_vars(formula)
Arguments
formula |
a formula |
Value
vector of variable names
Examples
ff <- yvar ~ x1 + x2
rhs_vars(ff)
ff <- y ~ x1 + I(x1^2)
rhs_vars(ff)
source_all
Description
Source all the files in a folder
Usage
source_all(fldr)
Arguments
fldr |
path to a folder |
Subsample of Observations from Panel Data
Description
returns a subsample of a panel data set; in particular drops
all observations that are not in keepids
. If it is not set,
randomly keeps nkeep
observations.
Usage
subsample(dta, idname, tname, keepids = NULL, nkeep = NULL)
Arguments
dta |
a data.frame which is a balanced panel |
idname |
the name of the id variable |
tname |
the name of the time variable |
keepids |
which ids to keep |
nkeep |
how many ids to keep (only used if |
Value
a data.frame that contains a subsample of dta
Examples
data("LaborSupply", package = "plm")
nrow(LaborSupply)
unique(LaborSupply$year)
ss <- subsample(LaborSupply, "id", "year", nkeep = 100)
nrow(ss)
t2orig
Description
A helper function to switch from "new" t values to original t values. This allows for periods not being exactly spaced apart by 1.
Usage
t2orig(t, original_time.periods)
Arguments
t |
a vector of time periods to convert back to original time periods. |
original_time.periods |
vector containing all original time periods. |
Value
original time period converted from new time period
t2orig_inner
Description
A helper function to switch from "new" t values to original t values for a single t.
Usage
t2orig_inner(t, original_time.periods)
Arguments
t |
a single time period to convert back to original time |
original_time.periods |
vector containing all original time periods. |
time_invariant_to_panel
Description
This function takes a time-invariant variable and repeats it for each period in a panel data set.
Usage
time_invariant_to_panel(x, df, idname, balanced_panel = TRUE)
Arguments
x |
a vector of length equal to the number of unique ids in df. |
df |
the data.frame used in the function |
idname |
name of column that holds the unit id |
balanced_panel |
a logical indicating whether the panel is balanced. If TRUE, the function will optimize the repetition process. Default is TRUE. |
Value
a vector of length equal to the number of rows in df.
Variable Names to Formula
Description
take a name for a y variable and a vector of names for x variables and turn them into a formula
Usage
toformula(yname, xnames)
Arguments
yname |
the name of the y variable |
xnames |
vector of names for x variables |
Value
a formula
Examples
toformula("yvar", c("x1", "x2"))
## should return yvar ~ 1
toformula("yvar", rhs.vars(~1))
weighted.checkfun
Description
Legacy version of 'weighted_checkfun', please use that function instead. This function will eventually be deleted.
Usage
weighted.checkfun(q, cvec, tau, weights)
Arguments
q |
the value to check |
cvec |
vector of data to compute quantiles for |
tau |
between 0 and 1, ex. .5 implies get the median |
weights |
the weights, weighted.checkfun normalizes the weights to sum to 1. |
Weighted Check Function
Description
Weights the check function
Usage
weighted_checkfun(q, cvec, tau, weights)
Arguments
q |
the value to check |
cvec |
vector of data to compute quantiles for |
tau |
between 0 and 1, ex. .5 implies get the median |
weights |
the weights, weighted.checkfun normalizes the weights to sum to 1. |
Value
numeric
weighted_combine_list
Description
A function that takes in either a list of vectors or matrices and computes a weighted average of them, where the weights are applied to every element in the list.
Usage
weighted_combine_list(l, w, normalize_weights = TRUE)
Arguments
l |
a list that contains either vectors or matrices of the same dimension that are to be combined |
w |
a vector of weights, the weights should have the same number of elements as 'length(l)' |
normalize_weights |
whether or not to force the weights to sum to 1, default is true |
Value
matrix or vector corresponding to the weighted average of all of the elements in 'l'
Weighted Distribution Function
Description
Get a distribution function from a vector of values after applying some weights
Usage
weighted_ecdf(y, y.seq = NULL, weights = NULL, norm = TRUE)
Arguments
y |
a vector to compute the mean for |
y.seq |
an optional vector of values to compute the distribution function for; the default is to use all unique values of y |
weights |
the vector of weights, can be NULL, then will just return mean |
norm |
normalize the weights so that they have mean of 1, default is to normalize |
Value
ecdf
Weighted Mean
Description
Get the mean applying some weights
Usage
weighted_mean(y, weights = NULL, norm = TRUE)
Arguments
y |
a vector to compute the mean for |
weights |
the vector of weights, can be NULL, then will just return mean |
norm |
normalize the weights so that they have mean of 1, default is to normalize |
Value
the weighted mean
weighted_quantile
Description
function to recover quantiles of a vector with weights
Usage
weighted_quantile(tau, cvec, weights = NULL, norm = TRUE)
Arguments
tau |
a vector of values between 0 and 1 |
cvec |
a vector to compute quantiles for |
weights |
the weights, weighted.checkfun normalizes the weights to sum to 1. |
norm |
normalize the weights so that they have mean of 1, default is to normalize |
Value
vector of quantiles
Quantile of a Weighted Check Function
Description
Finds the quantile by optimizing the weighted check function
Usage
weighted_quantile_inner(tau, cvec, weights = NULL, norm = TRUE)
Arguments
tau |
between 0 and 1, ex. .5 implies get the median |
cvec |
a vector to compute quantiles for |
weights |
the weights, weighted.checkfun normalizes the weights to sum to 1. |
norm |
normalize the weights so that they have mean of 1, default is to normalize |