Title: | Statistical Inference in High Dimensional Regression |
---|---|
Description: | The goal of SIHR is to provide inference procedures in the high-dimensional generalized linear regression setting for: (1) linear functionals <doi:10.48550/arXiv.1904.12891> <doi:10.48550/arXiv.2012.07133>, (2) conditional average treatment effects, (3) quadratic functionals <doi:10.48550/arXiv.1909.01503>, (4) inner product, (5) distance. |
Authors: | Zhenyu Wang [aut], Prabrisha Rakshit [aut], Tony Cai [aut], Zijian Guo [aut, cre] |
Maintainer: | Zijian Guo <[email protected]> |
License: | GPL-3 |
Version: | 2.1.0 |
Built: | 2024-11-26 04:53:05 UTC |
Source: | https://github.com/zywang0701/sihr |
Computes the bias-corrected estimator of the difference of linear combinations of the regression vectors for the high dimensional generalized linear regressions and the corresponding standard error.
CATE( X1, y1, X2, y2, loading.mat, model = c("linear", "logistic", "logistic_alter"), intercept = TRUE, intercept.loading = FALSE, beta.init1 = NULL, beta.init2 = NULL, lambda = NULL, mu = NULL, prob.filter = 0.05, rescale = 1.1, verbose = FALSE )
CATE( X1, y1, X2, y2, loading.mat, model = c("linear", "logistic", "logistic_alter"), intercept = TRUE, intercept.loading = FALSE, beta.init1 = NULL, beta.init2 = NULL, lambda = NULL, mu = NULL, prob.filter = 0.05, rescale = 1.1, verbose = FALSE )
X1 |
Design matrix for the first sample, of dimension |
y1 |
Outcome vector for the first sample, of length |
X2 |
Design matrix for the second sample, of dimension |
y2 |
Outcome vector for the second sample, of length |
loading.mat |
Loading matrix, nrow= |
model |
The high dimensional regression model, either |
intercept |
Should intercept(s) be fitted for the initial estimators
(default = |
intercept.loading |
Should intercept term be included for the
|
beta.init1 |
The initial estimator of the regression vector for the 1st
data (default = |
beta.init2 |
The initial estimator of the regression vector for the 2nd
data (default = |
lambda |
The tuning parameter in fitting initial model. If |
mu |
The dual tuning parameter used in the construction of the
projection direction. If |
prob.filter |
The threshold of estimated probabilities for filtering observations in logistic regression. (default = 0.05) |
rescale |
The factor to enlarge the standard error to account for the finite sample bias. (default = 1.1) |
verbose |
Should intermediate message(s) be printed (default =
|
A list consists of plugin estimators, debiased estimators, and confidence intervals. For logistic regression, it also returns those items after probability transformation.
est.plugin.vec |
The vector of plugin(biased) estimators for the
linear combination of regression coefficients, length of |
est.debias.vec |
The vector of bias-corrected estimators for the linear
combination of regression coefficients, length of |
se.vec |
The vector of standard errors of the bias-corrected estimators,
length of |
prob.debias.vec |
The vector of bias-corrected estimators after probability
transformation, length of |
prob.se.vec |
The vector of standard errors of the bias-corrected
estimators after probability transformation, length of |
X1 <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5) y1 <- -0.5 + X1[, 1] * 0.5 + X1[, 2] * 1 + rnorm(100) X2 <- matrix(rnorm(90 * 5), nrow = 90, ncol = 5) y2 <- -0.4 + X2[, 1] * 0.48 + X2[, 2] * 1.1 + rnorm(90) loading1 <- c(1, 1, rep(0, 3)) loading2 <- c(-0.5, -1, rep(0, 3)) loading.mat <- cbind(loading1, loading2) Est <- CATE(X1, y1, X2, y2, loading.mat, model = "linear") ## compute confidence intervals ci(Est, alpha = 0.05, alternative = "two.sided") ## summary statistics summary(Est)
X1 <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5) y1 <- -0.5 + X1[, 1] * 0.5 + X1[, 2] * 1 + rnorm(100) X2 <- matrix(rnorm(90 * 5), nrow = 90, ncol = 5) y2 <- -0.4 + X2[, 1] * 0.48 + X2[, 2] * 1.1 + rnorm(90) loading1 <- c(1, 1, rep(0, 3)) loading2 <- c(-0.5, -1, rep(0, 3)) loading.mat <- cbind(loading1, loading2) Est <- CATE(X1, y1, X2, y2, loading.mat, model = "linear") ## compute confidence intervals ci(Est, alpha = 0.05, alternative = "two.sided") ## summary statistics summary(Est)
Inference for weighted quadratic functional of difference of the regression vectors (excluding the intercept term) in high dimensional generalized linear regressions.
Dist( X1, y1, X2, y2, G, A = NULL, model = c("linear", "logistic", "logistic_alter"), intercept = TRUE, beta.init1 = NULL, beta.init2 = NULL, split = TRUE, lambda = NULL, mu = NULL, prob.filter = 0.05, rescale = 1.1, tau = c(0.25, 0.5, 1), verbose = FALSE )
Dist( X1, y1, X2, y2, G, A = NULL, model = c("linear", "logistic", "logistic_alter"), intercept = TRUE, beta.init1 = NULL, beta.init2 = NULL, split = TRUE, lambda = NULL, mu = NULL, prob.filter = 0.05, rescale = 1.1, tau = c(0.25, 0.5, 1), verbose = FALSE )
X1 |
Design matrix for the first sample, of dimension |
y1 |
Outcome vector for the first sample, of length |
X2 |
Design matrix for the second sample, of dimension |
y2 |
Outcome vector for the second sample, of length |
G |
The set of indices, |
A |
The matrix A in the quadratic form, of dimension
|
model |
The high dimensional regression model, either |
intercept |
Should intercept(s) be fitted for the initial estimators
(default = |
beta.init1 |
The initial estimator of the regression vector for the 1st
data (default = |
beta.init2 |
The initial estimator of the regression vector for the 2nd
data (default = |
split |
Sampling splitting or not for computing the initial estimators.
It take effects only when |
lambda |
The tuning parameter in fitting initial model. If |
mu |
The dual tuning parameter used in the construction of the
projection direction. If |
prob.filter |
The threshold of estimated probabilities for filtering observations in logistic regression. (default = 0.05) |
rescale |
The factor to enlarge the standard error to account for the finite sample bias. (default = 1.1) |
tau |
The enlargement factor for asymptotic variance of the
bias-corrected estimator to handle super-efficiency. It allows for a scalar
or vector. (default = |
verbose |
Should intermediate message(s) be printed. (default =
|
est.plugin |
The plugin(biased) estimator for the quadratic form
of the regression vectors restricted to |
est.debias |
The bias-corrected estimator of the quadratic form of the regression vectors |
se |
Standard errors of the bias-corrected estimator,
length of |
X1 <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5) y1 <- -0.5 + X1[, 1] * 0.5 + X1[, 2] * 1 + rnorm(100) X2 <- matrix(rnorm(90 * 5), nrow = 90, ncol = 5) y2 <- -0.4 + X2[, 1] * 0.48 + X2[, 2] * 1.1 + rnorm(90) G <- c(1, 2) A <- matrix(c(1.5, 0.8, 0.8, 1.5), nrow = 2, ncol = 2) Est <- Dist(X1, y1, X2, y2, G, A, model = "linear") ## compute confidence intervals ci(Est, alpha = 0.05, alternative = "two.sided") ## summary statistics summary(Est)
X1 <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5) y1 <- -0.5 + X1[, 1] * 0.5 + X1[, 2] * 1 + rnorm(100) X2 <- matrix(rnorm(90 * 5), nrow = 90, ncol = 5) y2 <- -0.4 + X2[, 1] * 0.48 + X2[, 2] * 1.1 + rnorm(90) G <- c(1, 2) A <- matrix(c(1.5, 0.8, 0.8, 1.5), nrow = 2, ncol = 2) Est <- Dist(X1, y1, X2, y2, G, A, model = "linear") ## compute confidence intervals ci(Est, alpha = 0.05, alternative = "two.sided") ## summary statistics summary(Est)
Inference for weighted inner product of the regression vectors in high dimensional generalized linear regressions
InnProd( X1, y1, X2, y2, G, A = NULL, model = c("linear", "logistic", "logistic_alter"), intercept = TRUE, beta.init1 = NULL, beta.init2 = NULL, split = TRUE, lambda = NULL, mu = NULL, prob.filter = 0.05, rescale = 1.1, tau = c(0.25, 0.5, 1), verbose = FALSE )
InnProd( X1, y1, X2, y2, G, A = NULL, model = c("linear", "logistic", "logistic_alter"), intercept = TRUE, beta.init1 = NULL, beta.init2 = NULL, split = TRUE, lambda = NULL, mu = NULL, prob.filter = 0.05, rescale = 1.1, tau = c(0.25, 0.5, 1), verbose = FALSE )
X1 |
Design matrix for the first sample, of dimension |
y1 |
Outcome vector for the first sample, of length |
X2 |
Design matrix for the second sample, of dimension |
y2 |
Outcome vector for the second sample, of length |
G |
The set of indices, |
A |
The matrix A in the quadratic form, of dimension
|
model |
The high dimensional regression model, either |
intercept |
Should intercept(s) be fitted for the initial estimators
(default = |
beta.init1 |
The initial estimator of the regression vector for the 1st
data (default = |
beta.init2 |
The initial estimator of the regression vector for the 2nd
data (default = |
split |
Sampling splitting or not for computing the initial estimators.
It take effects only when |
lambda |
The tuning parameter in fitting initial model. If |
mu |
The dual tuning parameter used in the construction of the
projection direction. If |
prob.filter |
The threshold of estimated probabilities for filtering observations in logistic regression. (default = 0.05) |
rescale |
The factor to enlarge the standard error to account for the finite sample bias. (default = 1.1) |
tau |
The enlargement factor for asymptotic variance of the
bias-corrected estimator to handle super-efficiency. It allows for a scalar
or vector. (default = |
verbose |
Should intermediate message(s) be printed. (default =
|
est.plugin |
The plugin(biased) estimator for the inner product
form of the regression vectors restricted to |
est.debias |
The bias-corrected estimator of the inner product form of the regression vectors |
se |
Standard errors of the bias-corrected estimator,
length of |
X1 <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5) y1 <- -0.5 + X1[, 1] * 0.5 + X1[, 2] * 1 + rnorm(100) X2 <- matrix(rnorm(90 * 5), nrow = 90, ncol = 5) y2 <- -0.4 + X2[, 1] * 0.48 + X2[, 2] * 1.1 + rnorm(90) G <- c(1, 2) A <- matrix(c(1.5, 0.8, 0.8, 1.5), nrow = 2, ncol = 2) Est <- InnProd(X1, y1, X2, y2, G, A, model = "linear") ## compute confidence intervals ci(Est, alpha = 0.05, alternative = "two.sided") ## summary statistics summary(Est)
X1 <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5) y1 <- -0.5 + X1[, 1] * 0.5 + X1[, 2] * 1 + rnorm(100) X2 <- matrix(rnorm(90 * 5), nrow = 90, ncol = 5) y2 <- -0.4 + X2[, 1] * 0.48 + X2[, 2] * 1.1 + rnorm(90) G <- c(1, 2) A <- matrix(c(1.5, 0.8, 0.8, 1.5), nrow = 2, ncol = 2) Est <- InnProd(X1, y1, X2, y2, G, A, model = "linear") ## compute confidence intervals ci(Est, alpha = 0.05, alternative = "two.sided") ## summary statistics summary(Est)
Inference for linear combination of the regression vector in high dimensional generalized linear regression
LF( X, y, loading.mat, model = c("linear", "logistic", "logistic_alter"), intercept = TRUE, intercept.loading = FALSE, beta.init = NULL, lambda = NULL, mu = NULL, prob.filter = 0.05, rescale = 1.1, verbose = FALSE )
LF( X, y, loading.mat, model = c("linear", "logistic", "logistic_alter"), intercept = TRUE, intercept.loading = FALSE, beta.init = NULL, lambda = NULL, mu = NULL, prob.filter = 0.05, rescale = 1.1, verbose = FALSE )
X |
Design matrix, of dimension |
y |
Outcome vector, of length |
loading.mat |
Loading matrix, nrow= |
model |
The high dimensional regression model, either |
intercept |
Should intercept be fitted for the initial estimator
(default = |
intercept.loading |
Should intercept term be included for the loading
(default = |
beta.init |
The initial estimator of the regression vector (default =
|
lambda |
The tuning parameter in fitting initial model. If |
mu |
The dual tuning parameter used in the construction of the
projection direction. If |
prob.filter |
The threshold of estimated probabilities for filtering observations in logistic regression. (default = 0.05) |
rescale |
The factor to enlarge the standard error to account for the finite sample bias. (default = 1.1) |
verbose |
Should intermediate message(s) be printed. (default =
|
est.plugin.vec |
The vector of plugin(biased) estimators for the
linear combination of regression coefficients, length of
|
est.debias.vec |
The vector of bias-corrected estimators for the linear
combination of regression coefficients, length of |
se.vec |
The vector of standard errors of the bias-corrected estimators,
length of |
proj.mat |
The matrix of projection directions; each column corresponding to a loading of interest. |
X <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5) y <- -0.5 + X[, 1] * 0.5 + X[, 2] * 1 + rnorm(100) loading1 <- c(1, 1, rep(0, 3)) loading2 <- c(-0.5, -1, rep(0, 3)) loading.mat <- cbind(loading1, loading2) Est <- LF(X, y, loading.mat, model = "linear") ## compute confidence intervals ci(Est, alpha = 0.05, alternative = "two.sided") ## summary statistics summary(Est)
X <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5) y <- -0.5 + X[, 1] * 0.5 + X[, 2] * 1 + rnorm(100) loading1 <- c(1, 1, rep(0, 3)) loading2 <- c(-0.5, -1, rep(0, 3)) loading.mat <- cbind(loading1, loading2) Est <- LF(X, y, loading.mat, model = "linear") ## compute confidence intervals ci(Est, alpha = 0.05, alternative = "two.sided") ## summary statistics summary(Est)
Inference for quadratic forms of the regression vector in high dimensional generalized linear regressions
QF( X, y, G, A = NULL, model = c("linear", "logistic", "logistic_alter"), intercept = TRUE, beta.init = NULL, split = TRUE, lambda = NULL, mu = NULL, prob.filter = 0.05, rescale = 1.1, tau = c(0.25, 0.5, 1), verbose = FALSE )
QF( X, y, G, A = NULL, model = c("linear", "logistic", "logistic_alter"), intercept = TRUE, beta.init = NULL, split = TRUE, lambda = NULL, mu = NULL, prob.filter = 0.05, rescale = 1.1, tau = c(0.25, 0.5, 1), verbose = FALSE )
X |
Design matrix, of dimension |
y |
Outcome vector, of length |
G |
The set of indices, |
A |
The matrix A in the quadratic form, of dimension
|
model |
The high dimensional regression model, either |
intercept |
Should intercept be fitted for the initial estimator
(default = |
beta.init |
The initial estimator of the regression vector (default =
|
split |
Sampling splitting or not for computing the initial estimator.
It take effects only when |
lambda |
The tuning parameter in fitting initial model. If |
mu |
The dual tuning parameter used in the construction of the
projection direction. If |
prob.filter |
The threshold of estimated probabilities for filtering observations in logistic regression. (default = 0.05) |
rescale |
The factor to enlarge the standard error to account for the finite sample bias. (default = 1.1) |
tau |
The enlargement factor for asymptotic variance of the
bias-corrected estimator to handle super-efficiency. It allows for a scalar
or vector. (default = |
verbose |
Should intermediate message(s) be printed. (default =
|
est.plugin |
The plugin(biased) estimator for the quadratic form of the
regression vector restricted to |
est.debias |
The bias-corrected estimator of the quadratic form of the regression vector |
se |
Standard errors of the bias-corrected estimator,
length of |
X <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5) y <- X[, 1] * 0.5 + X[, 2] * 1 + rnorm(100) G <- c(1, 2) A <- matrix(c(1.5, 0.8, 0.8, 1.5), nrow = 2, ncol = 2) Est <- QF(X, y, G, A, model = "linear") ## compute confidence intervals ci(Est, alpha = 0.05, alternative = "two.sided") ## summary statistics summary(Est)
X <- matrix(rnorm(100 * 5), nrow = 100, ncol = 5) y <- X[, 1] * 0.5 + X[, 2] * 1 + rnorm(100) G <- c(1, 2) A <- matrix(c(1.5, 0.8, 0.8, 1.5), nrow = 2, ncol = 2) Est <- QF(X, y, G, A, model = "linear") ## compute confidence intervals ci(Est, alpha = 0.05, alternative = "two.sided") ## summary statistics summary(Est)