Title: | Linear Models for Panel Data |
---|---|
Description: | A set of estimators for models and (robust) covariance matrices, and tests for panel data econometrics, including within/fixed effects, random effects, between, first-difference, nested random effects as well as instrumental-variable (IV) and Hausman-Taylor-style models, panel generalized method of moments (GMM) and general FGLS models, mean groups (MG), demeaned MG, and common correlated effects (CCEMG) and pooled (CCEP) estimators with common factors, variable coefficients and limited dependent variables models. Test functions include model specification, serial correlation, cross-sectional dependence, panel unit root and panel Granger (non-)causality. Typical references are general econometrics text books such as Baltagi (2021), Econometric Analysis of Panel Data (<doi:10.1007/978-3-030-53953-5>), Hsiao (2014), Analysis of Panel Data (<doi:10.1017/CBO9781139839327>), and Croissant and Millo (2018), Panel Data Econometrics with R (<doi:10.1002/9781119504641>). |
Authors: | Yves Croissant [aut], Giovanni Millo [aut], Kevin Tappe [aut, cre], Ott Toomet [ctb], Christian Kleiber [ctb], Achim Zeileis [ctb], Arne Henningsen [ctb], Liviu Andronic [ctb], Nina Schoenfelder [ctb] |
Maintainer: | Kevin Tappe <[email protected]> |
License: | GPL (>=2) |
Version: | 2.6-9999 |
Built: | 2025-01-17 15:24:36 UTC |
Source: | https://github.com/ycroissant/plm |
plm is a package for R which intends to make the estimation of linear panel models straightforward. plm provides functions to estimate a wide variety of models and to make (robust) inference.
For a gentle and comprehensive introduction to the package, please see the package's vignette.
The main functions to estimate models are:
plm
: panel data estimators using lm
on transformed data,
pvcm
: variable coefficients models
pgmm
: generalized method of moments (GMM) estimation for panel
data,
pggls
: estimation of general feasible generalized least squares models,
pmg
: mean groups (MG), demeaned MG and common correlated effects
(CCEMG) estimators,
pcce
: estimators for common correlated effects mean groups (CCEMG) and
pooled (CCEP) for panel data with common factors,
pldv
: panel estimators for limited dependent variables.
Next to the model estimation functions, the package offers several functions for statistical tests related to panel data/models.
Multiple functions for (robust) variance–covariance matrices are at hand as well.
The package also provides data sets to demonstrate functions and to
replicate some text book/paper results. Use
data(package="plm")
to view a list of available data sets in
the package.
Maintainer: Kevin Tappe [email protected]
Authors:
Yves Croissant [email protected]
Giovanni Millo [email protected]
Other contributors:
Ott Toomet [email protected] [contributor]
Christian Kleiber [email protected] [contributor]
Achim Zeileis [email protected] [contributor]
Arne Henningsen [email protected] [contributor]
Liviu Andronic [contributor]
Nina Schoenfelder [contributor]
Useful links:
Report bugs at https://github.com/ycroissant/plm/issues
data("Produc", package = "plm") zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c("state","year")) summary(zz) # replicates some results from Baltagi (2013), table 3.1 data("Grunfeld", package = "plm") p <- plm(inv ~ value + capital, data = Grunfeld, model="pooling") wi <- plm(inv ~ value + capital, data = Grunfeld, model="within", effect = "twoways") swar <- plm(inv ~ value + capital, data = Grunfeld, model="random", effect = "twoways") amemiya <- plm(inv ~ value + capital, data = Grunfeld, model = "random", random.method = "amemiya", effect = "twoways") walhus <- plm(inv ~ value + capital, data = Grunfeld, model = "random", random.method = "walhus", effect = "twoways")
data("Produc", package = "plm") zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c("state","year")) summary(zz) # replicates some results from Baltagi (2013), table 3.1 data("Grunfeld", package = "plm") p <- plm(inv ~ value + capital, data = Grunfeld, model="pooling") wi <- plm(inv ~ value + capital, data = Grunfeld, model="within", effect = "twoways") swar <- plm(inv ~ value + capital, data = Grunfeld, model="random", effect = "twoways") amemiya <- plm(inv ~ value + capital, data = Grunfeld, model = "random", random.method = "amemiya", effect = "twoways") walhus <- plm(inv ~ value + capital, data = Grunfeld, model = "random", random.method = "walhus", effect = "twoways")
Angrist and Newey's version of the Chamberlain test
aneweytest(formula, data, subset, na.action, index = NULL, ...)
aneweytest(formula, data, subset, na.action, index = NULL, ...)
formula |
a symbolic description for the model to be estimated, |
data |
a |
subset |
see |
na.action |
see |
index |
the indexes, |
... |
further arguments. |
Angrist and Newey's test is based on the results of the artifactual regression of the within residuals on the covariates for all the periods.
An object of class "htest"
.
Yves Croissant
Angrist JD, Newey WK (1991). “Over-identification tests in earnings functions with fixed effects.” Journal of Business & Economic Statistics, 9(3), 317–323.
piest()
for Chamberlain's test
data("RiceFarms", package = "plm") aneweytest(log(goutput) ~ log(seed) + log(totlabor) + log(size), RiceFarms, index = "id")
data("RiceFarms", package = "plm") aneweytest(log(goutput) ~ log(seed) + log(totlabor) + log(size), RiceFarms, index = "id")
a panel of 46 observations from 1963 to 1992
A data frame containing :
state abbreviation
the year
price per pack of cigarettes
population
population above the age of 16
consumer price index (1983=100)
per capita disposable income
cigarette sales in packs per capita
minimum price in adjoining states per pack of cigarettes
total number of observations : 1380
observation : regional
country : United States
Online complements to Baltagi (2001):
https://www.wiley.com/legacy/wileychi/baltagi/
Online complements to Baltagi (2013):
https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=4338&itemId=1118672321&resourceId=13452
Baltagi BH (2001). Econometric Analysis of Panel Data, 3rd edition. John Wiley and Sons ltd.
Baltagi BH (2013). Econometric Analysis of Panel Data, 5th edition. John Wiley and Sons ltd.
Baltagi B, Levin D (1992). “Cigarette taxation: Raising revenues and reducing consumption.” Structural Change and Economic Dynamics, 3(2), 321-335. https://EconPapers.repec.org/RePEc:eee:streco:v:3:y:1992:i:2:p:321-335.
Baltagi BH, Griffin JM, Xiong W (2000). “To Pool or Not to Pool: Homogeneous Versus Heterogeneous Estimators Applied to Cigarette Demand.” The Review of Economics and Statistics, 82(1), 117-126. doi:10.1162/003465300558551, https://doi.org/10.1162/003465300558551.
Cross-sectionally augmented Im, Pesaran and Shin (IPS) test for unit roots in panel models.
cipstest( x, lags = 2L, type = c("trend", "drift", "none"), model = c("cmg", "mg", "dmg"), truncated = FALSE, ... )
cipstest( x, lags = 2L, type = c("trend", "drift", "none"), model = c("cmg", "mg", "dmg"), truncated = FALSE, ... )
x |
an object of class |
lags |
integer, lag order for Dickey-Fuller augmentation, |
type |
one of |
model |
one of |
truncated |
logical, specifying whether to calculate the
truncated version of the test (default: |
... |
further arguments passed to |
Pesaran's (Pesaran 2007) cross-sectionally augmented version of
the IPS unit root test (Im et al. 2003) (H0: pseries
has a unit root) is a so-called second-generation panel unit root test: it
is in fact robust against cross-sectional dependence, provided that the default
model="cmg"
is calculated. Else one can obtain the standard
(model="mg"
) or cross-sectionally demeaned (model="dmg"
)
versions of the IPS test.
Argument type
controls how the test is executed:
"none"
: no intercept, no trend (Case I in (Pesaran 2007)),
"drift"
: with intercept, no trend (Case II),
"trend"
(default): with intercept, with trend (Case III).
An object of class "htest"
.
Giovanni Millo
Im KS, Pesaran MH, Shin Y (2003).
“Testing for unit roots in heterogenous panels.”
Journal of Econometrics, 115(1), 53-74.
Pesaran MH (2007).
“A simple panel unit root test in the presence of cross-section dependence.”
Journal of Applied Econometrics, 22(2), 265–312.
data("Produc", package = "plm") Produc <- pdata.frame(Produc, index=c("state", "year")) ## check whether the gross state product (gsp) is trend-stationary cipstest(Produc$gsp, type = "trend")
data("Produc", package = "plm") Produc <- pdata.frame(Produc, index=c("state", "year")) ## check whether the gross state product (gsp) is trend-stationary cipstest(Produc$gsp, type = "trend")
Computes the cross–sectional correlation matrix
cortab(x, grouping, groupnames = NULL, value = "statistic", ...)
cortab(x, grouping, groupnames = NULL, value = "statistic", ...)
x |
an object of class |
grouping |
grouping variable, |
groupnames |
a character vector of group names, |
value |
to complete, |
... |
further arguments. |
A matrix with average correlation coefficients within a group (diagonal) and between groups (off-diagonal).
data("Grunfeld", package = "plm") pGrunfeld <- pdata.frame(Grunfeld) grp <- c(rep(1, 100), rep(2, 50), rep(3, 50)) # make 3 groups cortab(pGrunfeld$value, grouping = grp, groupnames = c("A", "B", "C"))
data("Grunfeld", package = "plm") pGrunfeld <- pdata.frame(Grunfeld) grp <- c(rep(1, 100), rep(2, 50), rep(3, 50)) # make 3 groups cortab(pGrunfeld$value, grouping = grp, groupnames = c("A", "B", "C"))
a panel of 90 observational units (counties) from 1981 to 1987
A data frame containing :
county identifier
year from 1981 to 1987
crimes committed per person
'probability' of arrest
'probability' of conviction
'probability' of prison sentence
average sentence, days
police per capita
people per square mile
tax revenue per capita
factor. One of 'other', 'west' or 'central'.
factor. (Also called "urban".) Does the individual reside in a SMSA (standard metropolitan statistical area)?
percentage minority in 1980
weekly wage in construction
weekly wage in transportation, utilities, communications
weekly wage in wholesale and retail trade
weekly wage in finance, insurance and real estate
weekly wage in service industry
weekly wage in manufacturing
weekly wage in federal government
weekly wage in state government
weekly wage in local government
offence mix: face-to-face/other
percentage of young males (between ages 15 to 24)
log of crimes committed per person
log of 'probability' of arrest
log of 'probability' of conviction
log of 'probability' of prison sentence
log of average sentence, days
log of police per capita
log of people per square mile
log of tax revenue per capita
log of percentage minority in 1980
log of weekly wage in construction
log of weekly wage in transportation, utilities, communications
log of weekly wage in wholesale and retail trade
log of weekly wage in finance, insurance and real estate
log of weekly wage in service industry
log of weekly wage in manufacturing
log of weekly wage in federal government
log of weekly wage in state government
log of weekly wage in local government
log of offence mix: face-to-face/other
log of percentage of young males (between ages 15 to 24)
total number of observations : 630
observation : regional
country : United States
The variables l* (lcrmrte, lprbarr, ...) contain the pre-computed logarithms of the base variables as found in the original data set. Note that these values slightly differ from what R's log() function yields for the base variables. In order to reproduce examples from the literature, the pre-computed logs need to be used, otherwise the results differ slightly.
Journal of Applied Econometrics Data Archive (complements Baltagi (2006)):
http://qed.econ.queensu.ca/jae/2006-v21.4/baltagi/
Online complements to Baltagi (2001):
https://www.wiley.com/legacy/wileychi/baltagi/
Online complements to Baltagi (2013):
https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=4338&itemId=1118672321&resourceId=13452
See also Journal of Applied Econometrics data archive entry for Baltagi (2006) at http://qed.econ.queensu.ca/jae/2006-v21.4/baltagi/.
Cornwell C, Trumbull WN (1994). “Estimating the economic model of crime with panel data.” Review of Economics and Statistics, 76, 360–366.
Baltagi BH (2006). “Estmating an economic model of crime using panel data from North Carolina.” Journal of Applied Econometrics, 21(4).
Baltagi BH (2001). Econometric Analysis of Panel Data, 3rd edition. John Wiley and Sons ltd.
Baltagi BH (2013). Econometric Analysis of Panel Data, 5th edition. John Wiley and Sons ltd.
Little helper functions to aid users to detect linear dependent columns in a two-dimensional data structure, especially in a (transformed) model matrix - typically useful in interactive mode during model building phase.
detect.lindep(object, ...) ## S3 method for class 'matrix' detect.lindep(object, suppressPrint = FALSE, ...) ## S3 method for class 'data.frame' detect.lindep(object, suppressPrint = FALSE, ...) ## S3 method for class 'plm' detect.lindep(object, suppressPrint = FALSE, ...) ## S3 method for class 'plm' alias(object, ...) ## S3 method for class 'pdata.frame' alias( object, model = c("pooling", "within", "Between", "between", "mean", "random", "fd"), effect = c("individual", "time", "twoways"), ... )
detect.lindep(object, ...) ## S3 method for class 'matrix' detect.lindep(object, suppressPrint = FALSE, ...) ## S3 method for class 'data.frame' detect.lindep(object, suppressPrint = FALSE, ...) ## S3 method for class 'plm' detect.lindep(object, suppressPrint = FALSE, ...) ## S3 method for class 'plm' alias(object, ...) ## S3 method for class 'pdata.frame' alias( object, model = c("pooling", "within", "Between", "between", "mean", "random", "fd"), effect = c("individual", "time", "twoways"), ... )
object |
for |
... |
further arguments. |
suppressPrint |
for |
model |
(see |
effect |
(see |
Linear dependence of columns/variables is (usually) readily avoided when
building one's model. However, linear dependence is sometimes not obvious
and harder to detect for less experienced applied statisticians. The so
called "dummy variable trap" is a common and probably the best–known
fallacy of this kind (see e. g. Wooldridge (2016), sec. 7-2.). When building
linear models with lm
or plm
's pooling
model, linear
dependence in one's model is easily detected, at times post hoc.
However, linear dependence might also occur after some transformations of
the data, albeit it is not present in the untransformed data. The within
transformation (also called fixed effect transformation) used in the
"within"
model can result in such linear dependence and this is
harder to come to mind when building a model. See Examples for two
examples of linear dependent columns after the within transformation: ex. 1)
the transformed variables have the opposite sign of one another; ex. 2) the
transformed variables are identical.
During plm
's model estimation, linear dependent columns and their
corresponding coefficients in the resulting object are silently dropped,
while the corresponding model frame and model matrix still contain the
affected columns. The plm object contains an element aliased
which
indicates any such aliased coefficients by a named logical.
Both functions, detect.lindep
and alias
, help to
detect linear dependence and accomplish almost the same:
detect.lindep
is a stand alone implementation while
alias
is a wrapper around
stats::alias.lm()
, extending the alias
generic to classes "plm"
and "pdata.frame"
.
alias
hinges on the availability of the package
MASS on the system. Not all arguments of alias.lm
are supported. Output of alias
is more informative as it
gives the linear combination of dependent columns (after data
transformations, i. e., after (quasi)-demeaning) while
detect.lindep
only gives columns involved in the linear
dependence in a simple format (thus being more suited for automatic
post–processing of the information).
For detect.lindep
: A named numeric vector containing column
numbers of the linear dependent columns in the object after data
transformation, if any are present. NULL
if no linear dependent
columns are detected.
For alias
: return value of stats::alias.lm()
run on the
(quasi-)demeaned model, i. e., the information outputted applies to
the transformed model matrix, not the original data.
function detect.lindep
was called detect_lin_dep
initially but renamed for naming consistency later.
Kevin Tappe
Wooldridge JM (2013). Introductory Econometrics: a modern approach, 5th edition. South-Western (Cengage Learning).
stats::alias()
, stats::model.matrix()
and especially
plm
's model.matrix()
for (transformed) model matrices,
plm's model.frame()
.
### Example 1 ### # prepare the data data("Cigar" , package = "plm") Cigar[ , "fact1"] <- c(0,1) Cigar[ , "fact2"] <- c(1,0) Cigar.p <- pdata.frame(Cigar) # setup a formula and a model frame form <- price ~ 0 + cpi + fact1 + fact2 mf <- model.frame(Cigar.p, form) # no linear dependence in the pooling model's model matrix # (with intercept in the formula, there would be linear dependence) detect.lindep(model.matrix(mf, model = "pooling")) # linear dependence present in the FE transformed model matrix modmat_FE <- model.matrix(mf, model = "within") detect.lindep(modmat_FE) mod_FE <- plm(form, data = Cigar.p, model = "within") detect.lindep(mod_FE) alias(mod_FE) # => fact1 == -1*fact2 plm(form, data = mf, model = "within")$aliased # "fact2" indicated as aliased # look at the data: after FE transformation fact1 == -1*fact2 head(modmat_FE) all.equal(modmat_FE[ , "fact1"], -1*modmat_FE[ , "fact2"]) ### Example 2 ### # Setup the data: # Assume CEOs stay with the firms of the Grunfeld data # for the firm's entire lifetime and assume some fictional # data about CEO tenure and age in year 1935 (first observation # in the data set) to be at 1 to 10 years and 38 to 55 years, respectively. # => CEO tenure and CEO age increase by same value (+1 year per year). data("Grunfeld", package = "plm") set.seed(42) # add fictional data Grunfeld$CEOtenure <- c(replicate(10, seq(from=s<-sample(1:10, 1), to=s+19, by=1))) Grunfeld$CEOage <- c(replicate(10, seq(from=s<-sample(38:65, 1), to=s+19, by=1))) # look at the data head(Grunfeld, 50) form <- inv ~ value + capital + CEOtenure + CEOage mf <- model.frame(pdata.frame(Grunfeld), form) # no linear dependent columns in original data/pooling model modmat_pool <- model.matrix(mf, model="pooling") detect.lindep(modmat_pool) mod_pool <- plm(form, data = Grunfeld, model = "pooling") alias(mod_pool) # CEOtenure and CEOage are linear dependent after FE transformation # (demeaning per individual) modmat_FE <- model.matrix(mf, model="within") detect.lindep(modmat_FE) mod_FE <- plm(form, data = Grunfeld, model = "within") detect.lindep(mod_FE) alias(mod_FE) # look at the transformed data: after FE transformation CEOtenure == 1*CEOage head(modmat_FE, 50) all.equal(modmat_FE[ , "CEOtenure"], modmat_FE[ , "CEOage"])
### Example 1 ### # prepare the data data("Cigar" , package = "plm") Cigar[ , "fact1"] <- c(0,1) Cigar[ , "fact2"] <- c(1,0) Cigar.p <- pdata.frame(Cigar) # setup a formula and a model frame form <- price ~ 0 + cpi + fact1 + fact2 mf <- model.frame(Cigar.p, form) # no linear dependence in the pooling model's model matrix # (with intercept in the formula, there would be linear dependence) detect.lindep(model.matrix(mf, model = "pooling")) # linear dependence present in the FE transformed model matrix modmat_FE <- model.matrix(mf, model = "within") detect.lindep(modmat_FE) mod_FE <- plm(form, data = Cigar.p, model = "within") detect.lindep(mod_FE) alias(mod_FE) # => fact1 == -1*fact2 plm(form, data = mf, model = "within")$aliased # "fact2" indicated as aliased # look at the data: after FE transformation fact1 == -1*fact2 head(modmat_FE) all.equal(modmat_FE[ , "fact1"], -1*modmat_FE[ , "fact2"]) ### Example 2 ### # Setup the data: # Assume CEOs stay with the firms of the Grunfeld data # for the firm's entire lifetime and assume some fictional # data about CEO tenure and age in year 1935 (first observation # in the data set) to be at 1 to 10 years and 38 to 55 years, respectively. # => CEO tenure and CEO age increase by same value (+1 year per year). data("Grunfeld", package = "plm") set.seed(42) # add fictional data Grunfeld$CEOtenure <- c(replicate(10, seq(from=s<-sample(1:10, 1), to=s+19, by=1))) Grunfeld$CEOage <- c(replicate(10, seq(from=s<-sample(38:65, 1), to=s+19, by=1))) # look at the data head(Grunfeld, 50) form <- inv ~ value + capital + CEOtenure + CEOage mf <- model.frame(pdata.frame(Grunfeld), form) # no linear dependent columns in original data/pooling model modmat_pool <- model.matrix(mf, model="pooling") detect.lindep(modmat_pool) mod_pool <- plm(form, data = Grunfeld, model = "pooling") alias(mod_pool) # CEOtenure and CEOage are linear dependent after FE transformation # (demeaning per individual) modmat_FE <- model.matrix(mf, model="within") detect.lindep(modmat_FE) mod_FE <- plm(form, data = Grunfeld, model = "within") detect.lindep(mod_FE) alias(mod_FE) # look at the transformed data: after FE transformation CEOtenure == 1*CEOage head(modmat_FE, 50) all.equal(modmat_FE[ , "CEOtenure"], modmat_FE[ , "CEOage"])
An unbalanced panel of 140 observations from 1976 to 1984
A data frame containing :
firm index
year
the sector of activity
employment
wages
capital
output
total number of observations : 1031
observation : firms
country : United Kingdom
Arellano M, Bond S (1991). “Some Tests of Specification for Panel Data : Monte Carlo Evidence and an Application to Employment Equations.” Review of Economic Studies, 58, 277–297.
This function enables the estimation of the variance components of a panel model.
ercomp(object, ...) ## S3 method for class 'plm' ercomp(object, ...) ## S3 method for class 'pdata.frame' ercomp( object, effect = c("individual", "time", "twoways", "nested"), method = NULL, models = NULL, dfcor = NULL, index = NULL, ... ) ## S3 method for class 'formula' ercomp( object, data, effect = c("individual", "time", "twoways", "nested"), method = NULL, models = NULL, dfcor = NULL, index = NULL, ... ) ## S3 method for class 'ercomp' print(x, digits = max(3, getOption("digits") - 3), ...)
ercomp(object, ...) ## S3 method for class 'plm' ercomp(object, ...) ## S3 method for class 'pdata.frame' ercomp( object, effect = c("individual", "time", "twoways", "nested"), method = NULL, models = NULL, dfcor = NULL, index = NULL, ... ) ## S3 method for class 'formula' ercomp( object, data, effect = c("individual", "time", "twoways", "nested"), method = NULL, models = NULL, dfcor = NULL, index = NULL, ... ) ## S3 method for class 'ercomp' print(x, digits = max(3, getOption("digits") - 3), ...)
object |
a |
... |
further arguments. |
effect |
the effects introduced in the model, see |
method |
method of estimation for the variance components, see
|
models |
the models used to estimate the variance components (an alternative to the previous argument), |
dfcor |
a numeric vector of length 2 indicating which degree of freedom should be used, |
index |
the indexes, |
data |
a |
x |
an |
digits |
digits, |
An object of class "ercomp"
: a list containing
sigma2
a named numeric with estimates of the variance
components,
theta
contains the parameter(s) used for
the transformation of the variables: For a one-way model, a
numeric corresponding to the selected effect (individual or
time); for a two-ways model a list of length 3 with the
parameters. In case of a balanced model, the numeric has length
1 while for an unbalanced model, the numerics' length equal the
number of observations.
Yves Croissant
Amemiya T (1971). “The Estimation of the Variances in a Variance–Components Model.” International Economic Review, 12, 1–13.
Nerlove M (1971). “Further Evidence on the Estimation of Dynamic Economic Relations from a Time–Series of Cross–Sections.” Econometrica, 39, 359–382.
Swamy PAVB, Arora SS (1972). “The Exact Finite Sample Properties of the Estimators of Coefficients in the Error Components Regression Models.” Econometrica, 40, 261–275.
Wallace TD, Hussain A (1969). “The Use of Error Components Models in Combining Cross Section With Time Series Data.” Econometrica, 37(1), 55–72.
plm()
where the estimates of the variance components are
used if a random effects model is estimated
data("Produc", package = "plm") # an example of the formula method ercomp(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, method = "walhus", effect = "time") # same with the plm method z <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, random.method = "walhus", effect = "time", model = "random") ercomp(z) # a two-ways model ercomp(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, method = "amemiya", effect = "twoways")
data("Produc", package = "plm") # an example of the formula method ercomp(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, method = "walhus", effect = "time") # same with the plm method z <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, random.method = "walhus", effect = "time", model = "random") ercomp(z) # a two-ways model ercomp(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, method = "amemiya", effect = "twoways")
Function to extract the fixed effects from a plm
object and
associated summary method.
## S3 method for class 'plm' fixef( object, effect = NULL, type = c("level", "dfirst", "dmean"), vcov = NULL, ... ) ## S3 method for class 'fixef' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'fixef' summary(object, ...) ## S3 method for class 'summary.fixef' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'pggls' fixef( object, effect = NULL, type = c("level", "dfirst", "dmean"), vcov = NULL, ... )
## S3 method for class 'plm' fixef( object, effect = NULL, type = c("level", "dfirst", "dmean"), vcov = NULL, ... ) ## S3 method for class 'fixef' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'fixef' summary(object, ...) ## S3 method for class 'summary.fixef' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'pggls' fixef( object, effect = NULL, type = c("level", "dfirst", "dmean"), vcov = NULL, ... )
effect |
one of |
type |
one of |
vcov |
a variance–covariance matrix furnished by the user or a function to calculate one (see Examples), |
... |
further arguments. |
x , object
|
an object of class |
digits |
digits, |
width |
the maximum length of the lines in the print output, |
Function fixef
calculates the fixed effects and returns an object
of class c("fixef", "numeric")
. By setting the type
argument,
the fixed effects may be returned in levels ("level"
), as
deviations from the first value of the index ("dfirst"
), or as
deviations from the overall mean ("dmean"
). If the argument
vcov
was specified, the standard errors (stored as attribute "se"
in the return value) are the respective robust standard errors.
For two-way fixed-effect models, argument effect
controls which
of the fixed effects are to be extracted: "individual"
, "time"
, or
the sum of individual and time effects ("twoways"
).
NB: See Examples for how the sum of effects can be split in an individual
and a time component.
For one-way models, the effects of the model are extracted and the
argument effect
is disrespected.
The associated summary
method returns an extended object of class
c("summary.fixef", "matrix")
with more information (see sections
Value and Examples).
References with formulae (except for the two-ways unbalanced case) are, e.g., Greene (2012), Ch. 11.4.4, p. 364, formulae (11-25); Wooldridge (2010), Ch. 10.5.3, pp. 308-309, formula (10.58).
For function fixef
, an object of class c("fixef", "numeric")
is returned: It is a numeric vector containing
the fixed effects with attribute se
which contains the
standard errors. There are two further attributes: attribute
type
contains the chosen type (the value of argument type
as a character); attribute df.residual
holds the residual
degrees of freedom (integer) from the fixed effects model (plm
object) on which fixef
was run. For the two-way unbalanced case, only
attribute type
is added.
For function summary.fixef
, an object of class
c("summary.fixef", "matrix")
is returned: It is a matrix with four
columns in this order: the estimated fixed effects, their standard
errors and associated t–values and p–values.
For the two-ways unbalanced case, the matrix contains only the estimates.
The type of the fixed effects and the standard errors in the
summary.fixef object correspond to was requested in the fixef
function by arguments type
and vcov
, respectively.
Yves Croissant
Greene WH (2012).
Econometric Analysis, 7th edition.
Prentice Hall.
Wooldridge JM (2010).
Econometric Analysis of Cross–Section and Panel Data, 2nd edition.
MIT Press.
within_intercept()
for the overall intercept of fixed
effect models along its standard error, plm()
for plm objects
and within models (= fixed effects models) in general. See
ranef()
to extract the random effects from a random effects
model.
data("Grunfeld", package = "plm") gi <- plm(inv ~ value + capital, data = Grunfeld, model = "within") fixef(gi) summary(fixef(gi)) summary(fixef(gi))[ , c("Estimate", "Pr(>|t|)")] # only estimates and p-values # relationship of type = "dmean" and "level" and overall intercept fx_level <- fixef(gi, type = "level") fx_dmean <- fixef(gi, type = "dmean") overallint <- within_intercept(gi) all.equal(overallint + fx_dmean, fx_level, check.attributes = FALSE) # TRUE # extract time effects in a twoways effects model gi_tw <- plm(inv ~ value + capital, data = Grunfeld, model = "within", effect = "twoways") fixef(gi_tw, effect = "time") # with supplied variance-covariance matrix as matrix, function, # and function with additional arguments fx_level_robust1 <- fixef(gi, vcov = vcovHC(gi)) fx_level_robust2 <- fixef(gi, vcov = vcovHC) fx_level_robust3 <- fixef(gi, vcov = function(x) vcovHC(x, method = "white2")) summary(fx_level_robust1) # gives fixed effects, robust SEs, t- and p-values # calc. fitted values of oneway within model: fixefs <- fixef(gi)[index(gi, which = "id")] fitted_by_hand <- fixefs + gi$coefficients["value"] * gi$model$value + gi$coefficients["capital"] * gi$model$capital # calc. fittes values of twoway unbalanced within model via effects: gtw_u <- plm(inv ~ value + capital, data = Grunfeld[-200, ], effect = "twoways") yhat <- as.numeric(gtw_u$model[ , 1] - gtw_u$residuals) # reference pred_beta <- as.numeric(tcrossprod(coef(gtw_u), as.matrix(gtw_u$model[ , -1]))) pred_effs <- as.numeric(fixef(gtw_u, "twoways")) # sum of ind and time effects all.equal(pred_effs + pred_beta, yhat) # TRUE # Splits of summed up individual and time effects: # use one "level" and one "dfirst" ii <- index(gtw_u)[[1L]]; it <- index(gtw_u)[[2L]] eff_id_dfirst <- c(0, as.numeric(fixef(gtw_u, "individual", "dfirst")))[ii] eff_ti_dfirst <- c(0, as.numeric(fixef(gtw_u, "time", "dfirst")))[it] eff_id_level <- as.numeric(fixef(gtw_u, "individual"))[ii] eff_ti_level <- as.numeric(fixef(gtw_u, "time"))[it] all.equal(pred_effs, eff_id_level + eff_ti_dfirst) # TRUE all.equal(pred_effs, eff_id_dfirst + eff_ti_level) # TRUE
data("Grunfeld", package = "plm") gi <- plm(inv ~ value + capital, data = Grunfeld, model = "within") fixef(gi) summary(fixef(gi)) summary(fixef(gi))[ , c("Estimate", "Pr(>|t|)")] # only estimates and p-values # relationship of type = "dmean" and "level" and overall intercept fx_level <- fixef(gi, type = "level") fx_dmean <- fixef(gi, type = "dmean") overallint <- within_intercept(gi) all.equal(overallint + fx_dmean, fx_level, check.attributes = FALSE) # TRUE # extract time effects in a twoways effects model gi_tw <- plm(inv ~ value + capital, data = Grunfeld, model = "within", effect = "twoways") fixef(gi_tw, effect = "time") # with supplied variance-covariance matrix as matrix, function, # and function with additional arguments fx_level_robust1 <- fixef(gi, vcov = vcovHC(gi)) fx_level_robust2 <- fixef(gi, vcov = vcovHC) fx_level_robust3 <- fixef(gi, vcov = function(x) vcovHC(x, method = "white2")) summary(fx_level_robust1) # gives fixed effects, robust SEs, t- and p-values # calc. fitted values of oneway within model: fixefs <- fixef(gi)[index(gi, which = "id")] fitted_by_hand <- fixefs + gi$coefficients["value"] * gi$model$value + gi$coefficients["capital"] * gi$model$capital # calc. fittes values of twoway unbalanced within model via effects: gtw_u <- plm(inv ~ value + capital, data = Grunfeld[-200, ], effect = "twoways") yhat <- as.numeric(gtw_u$model[ , 1] - gtw_u$residuals) # reference pred_beta <- as.numeric(tcrossprod(coef(gtw_u), as.matrix(gtw_u$model[ , -1]))) pred_effs <- as.numeric(fixef(gtw_u, "twoways")) # sum of ind and time effects all.equal(pred_effs + pred_beta, yhat) # TRUE # Splits of summed up individual and time effects: # use one "level" and one "dfirst" ii <- index(gtw_u)[[1L]]; it <- index(gtw_u)[[2L]] eff_id_dfirst <- c(0, as.numeric(fixef(gtw_u, "individual", "dfirst")))[ii] eff_ti_dfirst <- c(0, as.numeric(fixef(gtw_u, "time", "dfirst")))[it] eff_id_level <- as.numeric(fixef(gtw_u, "individual"))[ii] eff_ti_level <- as.numeric(fixef(gtw_u, "time"))[it] all.equal(pred_effs, eff_id_level + eff_ti_dfirst) # TRUE all.equal(pred_effs, eff_id_dfirst + eff_ti_level) # TRUE
A panel of 18 observations from 1960 to 1978
A data frame containing :
a factor with 18 levels
the year
logarithm of motor gasoline consumption per car
logarithm of real per-capita income
logarithm of real motor gasoline price
logarithm of the stock of cars per capita
total number of observations : 342
observation : country
country : OECD
Online complements to Baltagi (2001):
https://www.wiley.com/legacy/wileychi/baltagi/
Online complements to Baltagi (2013):
https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=4338&itemId=1118672321&resourceId=13452
Baltagi BH (2001). Econometric Analysis of Panel Data, 3rd edition. John Wiley and Sons ltd.
Baltagi BH (2013). Econometric Analysis of Panel Data, 5th edition. John Wiley and Sons ltd.
Baltagi BH, Griffin JM (1983). “Gasoline demand in the OECD: An application of pooling and testing procedures.” European Economic Review, 22(2), 117 - 137. ISSN 0014-2921, https://doi.org/10.1016/0014-2921(83)90077-6.
A balanced panel of 10 observational units (firms) from 1935 to 1954
A data frame containing :
observation
date
gross Investment
value of the firm
stock of plant and equipment
total number of observations : 200
observation : production units
country : United States
The Grunfeld data as provided in package plm
is the
same data as used in Baltagi (2001), see Examples below.
NB:
Various versions of the Grunfeld data circulate
online. Also, various text books (and also varying among editions)
and papers use different subsets of the original Grunfeld data,
some of which contain errors in a few data points compared to the
original data used by Grunfeld (1958) in his PhD thesis. See
Kleiber/Zeileis (2010) and its accompanying website for a
comparison of various Grunfeld data sets in use.
Online complements to Baltagi (2001):
https://www.wiley.com/legacy/wileychi/baltagi/
https://www.wiley.com/legacy/wileychi/baltagi/supp/Grunfeld.fil
Online complements to Baltagi (2013):
https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=4338&itemId=1118672321&resourceId=13452
Baltagi BH (2001). Econometric Analysis of Panel Data, 3rd edition. John Wiley and Sons ltd.
Baltagi BH (2013). Econometric Analysis of Panel Data, 5th edition. John Wiley and Sons ltd.
Grunfeld Y (1958). The determinants of corporate investment. Ph.D. thesis, Department of Economics, University of Chicago.
Kleiber C, Zeileis A (2010). “The Grunfeld Data at 50.” German Economic Review, 11, 404-417. https://doi.org/10.1111/j.1468-0475.2010.00513.x.
website accompanying the paper with various variants of the Grunfeld data: https://www.zeileis.org/grunfeld/.
For the complete Grunfeld data (11 firms), see
AER::Grunfeld, in the AER
package.
## Not run: # Compare plm's Grunfeld data to Baltagi's (2001) Grunfeld data: data("Grunfeld", package="plm") Grunfeld_baltagi2001 <- read.csv("http://www.wiley.com/legacy/wileychi/ baltagi/supp/Grunfeld.fil", sep="", header = FALSE) library(compare) compare::compare(Grunfeld, Grunfeld_baltagi2001, allowAll = T) # same data set ## End(Not run)
## Not run: # Compare plm's Grunfeld data to Baltagi's (2001) Grunfeld data: data("Grunfeld", package="plm") Grunfeld_baltagi2001 <- read.csv("http://www.wiley.com/legacy/wileychi/ baltagi/supp/Grunfeld.fil", sep="", header = FALSE) library(compare) compare::compare(Grunfeld, Grunfeld_baltagi2001, allowAll = T) # same data set ## End(Not run)
The presence of an intercept is checked using the formula which is either provided as the argument of the function or extracted from a fitted model.
has.intercept(object, ...) ## Default S3 method: has.intercept(object, data = NULL, ...) ## S3 method for class 'formula' has.intercept(object, data = NULL, ...) ## S3 method for class 'Formula' has.intercept(object, rhs = NULL, data = NULL, ...) ## S3 method for class 'panelmodel' has.intercept(object, ...) ## S3 method for class 'plm' has.intercept(object, rhs = 1L, ...)
has.intercept(object, ...) ## Default S3 method: has.intercept(object, data = NULL, ...) ## S3 method for class 'formula' has.intercept(object, data = NULL, ...) ## S3 method for class 'Formula' has.intercept(object, rhs = NULL, data = NULL, ...) ## S3 method for class 'panelmodel' has.intercept(object, ...) ## S3 method for class 'plm' has.intercept(object, rhs = 1L, ...)
object |
a |
... |
further arguments. |
data |
default is |
rhs |
an integer (length > 1 is possible), indicating the parts of right
hand sides of the formula to be evaluated for the presence of an
intercept or |
a logical
A cross-section
A dataframe containing:
median value of owner–occupied homes
crime rate
proportion of 25,000 square feet residential lots
proportion of no–retail business acres
is the tract bounds the Charles River?
annual average nitrogen oxide concentration in parts per hundred million
average number of rooms
proportion of owner units built prior to 1940
weighted distances to five employment centers in the Boston area
index of accessibility to radial highways
full value property tax rate ($/$10,000)
pupil/teacher ratio
proportion of blacks in the population
proportion of population that is lower status
town identifier
number of observations : 506
observation : regional
country : United States
Online complements to Baltagi (2001):
https://www.wiley.com/legacy/wileychi/baltagi/
Online complements to Baltagi (2013):
https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=4338&itemId=1118672321&resourceId=13452
Baltagi BH (2001). Econometric Analysis of Panel Data, 3rd edition. John Wiley and Sons ltd.
Baltagi BH (2013). Econometric Analysis of Panel Data, 5th edition. John Wiley and Sons ltd.
Besley DA, Kuh E, Welsch RE (1980). Regression diagnostics: identifying influential data and sources of collinearity. John Wiley and Sons ltd. Wiley series in probability and statistics.
Harrison D, Rubinfeld DL (1978). “Hedonic housing prices and the demand for clean air.” Journal of Environmental Economics and Management, 5, 81-102.
This function extracts the information about the structure of the individual and time dimensions of panel data. Grouping information can also be extracted if the panel data were created with a grouping variable.
## S3 method for class 'pindex' index(x, which = NULL, ...) ## S3 method for class 'pdata.frame' index(x, which = NULL, ...) ## S3 method for class 'pseries' index(x, which = NULL, ...) ## S3 method for class 'panelmodel' index(x, which = NULL, ...)
## S3 method for class 'pindex' index(x, which = NULL, ...) ## S3 method for class 'pdata.frame' index(x, which = NULL, ...) ## S3 method for class 'pseries' index(x, which = NULL, ...) ## S3 method for class 'panelmodel' index(x, which = NULL, ...)
x |
an object of class |
which |
the index(es) to be extracted (see details), |
... |
further arguments. |
Panel data are stored in a "pdata.frame"
which has an "index"
attribute. Fitted models in "plm"
have a "model"
element which
is also a "pdata.frame"
and therefore also has an "index"
attribute. Finally, each series, once extracted from a
"pdata.frame"
, becomes of class "pseries"
, which also has this
"index"
attribute. "index"
methods are available for all these
objects. The argument "which"
indicates which index should be
extracted. If which = NULL
, all indexes are extracted. "which"
can also be a vector of length 1, 2, or 3 (3 only if the pdata
frame was constructed with an additional group index) containing
either characters (the names of the individual variable and/or of
the time variable and/or the group variable or "id"
and "time"
)
and "group"
or integers (1 for the individual index, 2 for the
time index, and 3 for the group index (the latter only if the pdata
frame was constructed with such).)
A vector or an object of class c("pindex","data.frame")
containing either one index, individual and time index, or (any
combination of) individual, time and group indexes.
Yves Croissant
data("Grunfeld", package = "plm") Gr <- pdata.frame(Grunfeld, index = c("firm", "year")) m <- plm(inv ~ value + capital, data = Gr) index(Gr, "firm") index(Gr, "time") index(Gr$inv, c(2, 1)) index(m, "id") # with additional group index data("Produc", package = "plm") pProduc <- pdata.frame(Produc, index = c("state", "year", "region")) index(pProduc, 3) index(pProduc, "region") index(pProduc, "group")
data("Grunfeld", package = "plm") Gr <- pdata.frame(Grunfeld, index = c("firm", "year")) m <- plm(inv ~ value + capital, data = Gr) index(Gr, "firm") index(Gr, "time") index(Gr$inv, c(2, 1)) index(m, "id") # with additional group index data("Produc", package = "plm") pProduc <- pdata.frame(Produc, index = c("state", "year", "region")) index(pProduc, 3) index(pProduc, "region") index(pProduc, "group")
This function checks if the data are balanced, i.e., if each individual has the same time periods
is.pbalanced(x, ...) ## Default S3 method: is.pbalanced(x, y, ...) ## S3 method for class 'data.frame' is.pbalanced(x, index = NULL, ...) ## S3 method for class 'pdata.frame' is.pbalanced(x, ...) ## S3 method for class 'pseries' is.pbalanced(x, ...) ## S3 method for class 'pggls' is.pbalanced(x, ...) ## S3 method for class 'pcce' is.pbalanced(x, ...) ## S3 method for class 'pmg' is.pbalanced(x, ...) ## S3 method for class 'pgmm' is.pbalanced(x, ...) ## S3 method for class 'panelmodel' is.pbalanced(x, ...)
is.pbalanced(x, ...) ## Default S3 method: is.pbalanced(x, y, ...) ## S3 method for class 'data.frame' is.pbalanced(x, index = NULL, ...) ## S3 method for class 'pdata.frame' is.pbalanced(x, ...) ## S3 method for class 'pseries' is.pbalanced(x, ...) ## S3 method for class 'pggls' is.pbalanced(x, ...) ## S3 method for class 'pcce' is.pbalanced(x, ...) ## S3 method for class 'pmg' is.pbalanced(x, ...) ## S3 method for class 'pgmm' is.pbalanced(x, ...) ## S3 method for class 'panelmodel' is.pbalanced(x, ...)
x |
an object of class |
... |
further arguments. |
y |
(only in default method) the time index variable (2nd index variable), |
index |
only relevant for |
Balanced data are data for which each individual has the same time periods.
The returned values of the is.pbalanced(object)
methods are identical
to pdim(object)$balanced
. is.pbalanced
is provided as a short
cut and is faster than pdim(object)$balanced
because it avoids those
computations performed by pdim
which are unnecessary to determine the
balancedness of the data.
A logical indicating whether the data associated with
object x
are balanced (TRUE
) or not
(FALSE
).
punbalancedness()
for two measures of
unbalancedness, make.pbalanced()
to make data
balanced; is.pconsecutive()
to check if data are
consecutive; make.pconsecutive()
to make data
consecutive (and, optionally, also balanced).pdim()
to check the dimensions of a 'pdata.frame'
(and other objects), pvar()
to check for individual
and time variation of a 'pdata.frame' (and other objects),
pseries()
, data.frame()
,
pdata.frame()
.
# take balanced data and make it unbalanced # by deletion of 2nd row (2nd time period for first individual) data("Grunfeld", package = "plm") Grunfeld_missing_period <- Grunfeld[-2, ] is.pbalanced(Grunfeld_missing_period) # check if balanced: FALSE pdim(Grunfeld_missing_period)$balanced # same # pdata.frame interface pGrunfeld_missing_period <- pdata.frame(Grunfeld_missing_period) is.pbalanced(Grunfeld_missing_period) # pseries interface is.pbalanced(pGrunfeld_missing_period$inv)
# take balanced data and make it unbalanced # by deletion of 2nd row (2nd time period for first individual) data("Grunfeld", package = "plm") Grunfeld_missing_period <- Grunfeld[-2, ] is.pbalanced(Grunfeld_missing_period) # check if balanced: FALSE pdim(Grunfeld_missing_period)$balanced # same # pdata.frame interface pGrunfeld_missing_period <- pdata.frame(Grunfeld_missing_period) is.pbalanced(Grunfeld_missing_period) # pseries interface is.pbalanced(pGrunfeld_missing_period$inv)
This function checks for each individual if its associated time periods are consecutive (no "gaps" in time dimension per individual)
is.pconsecutive(x, ...) ## Default S3 method: is.pconsecutive(x, id, time, na.rm.tindex = FALSE, ...) ## S3 method for class 'data.frame' is.pconsecutive(x, index = NULL, na.rm.tindex = FALSE, ...) ## S3 method for class 'pseries' is.pconsecutive(x, na.rm.tindex = FALSE, ...) ## S3 method for class 'pdata.frame' is.pconsecutive(x, na.rm.tindex = FALSE, ...) ## S3 method for class 'panelmodel' is.pconsecutive(x, na.rm.tindex = FALSE, ...)
is.pconsecutive(x, ...) ## Default S3 method: is.pconsecutive(x, id, time, na.rm.tindex = FALSE, ...) ## S3 method for class 'data.frame' is.pconsecutive(x, index = NULL, na.rm.tindex = FALSE, ...) ## S3 method for class 'pseries' is.pconsecutive(x, na.rm.tindex = FALSE, ...) ## S3 method for class 'pdata.frame' is.pconsecutive(x, na.rm.tindex = FALSE, ...) ## S3 method for class 'panelmodel' is.pconsecutive(x, na.rm.tindex = FALSE, ...)
x |
usually, an object of class |
... |
further arguments. |
id , time
|
only relevant for default method: vectors specifying the id and time dimensions, i. e., a sequence of individual and time identifiers, each as stacked time series, |
na.rm.tindex |
logical indicating whether any |
index |
only relevant for |
(p)data.frame, pseries and estimated panelmodel objects can be tested if
their time periods are consecutive per individual. For evaluation of
consecutiveness, the time dimension is interpreted to be numeric, and the
data are tested for being a regularly spaced sequence with distance 1
between the time periods for each individual (for each individual the time
dimension can be interpreted as sequence t, t+1, t+2, ... where t is an
integer). As such, the "numerical content" of the time index variable is
considered for consecutiveness, not the "physical position" of the various
observations for an individuals in the (p)data.frame/pseries (it is not
about "neighbouring" rows). If the object to be evaluated is a pseries or a
pdata.frame, the time index is coerced from factor via as.character to
numeric, i.e., the series
as.numeric(as.character(index(<pseries/pdata.frame>)[[2]]))]
is
evaluated for gaps.
The default method also works for argument x
being an arbitrary
vector (see Examples), provided one can supply arguments id
and time
, which need to ordered as stacked time series. As only
id
and time
are really necessary for the default method to
evaluate the consecutiveness, x = NULL
is also possible. However, if
the vector x
is also supplied, additional input checking for equality
of the lengths of x
, id
and time
is performed, which is
safer.
For the data.frame interface, the data is ordered in the appropriate way (stacked time series) before the consecutiveness is evaluated. For the pdata.frame and pseries interface, ordering is not performed because both data types are already ordered in the appropriate way when created.
Note: Only the presence of the time period itself in the object is tested,
not if there are any other variables. NA
values in individual index
are not examined but silently dropped - In this case, it is not clear which
individual is meant by id value NA
, thus no statement about
consecutiveness of time periods for those "NA
-individuals" is
possible.
A named logical
vector (names are those of the
individuals). The i-th element of the returned vector
corresponds to the i-th individual. The values of the i-th
element can be:
TRUE |
if the i-th individual has consecutive time periods, |
FALSE |
if the i-th individual has non-consecutive time periods, |
"NA" |
if there are any NA values in time index of
the i-th the individual; see also argument |
Kevin Tappe
make.pconsecutive()
to make data consecutive
(and, as an option, balanced at the same time) and
make.pbalanced()
to make data balanced.pdim()
to check the dimensions of a 'pdata.frame'
(and other objects), pvar()
to check for individual
and time variation of a 'pdata.frame' (and other objects),
lag()
for lagged (and leading) values of a
'pseries' object.
pseries()
, data.frame()
, pdata.frame()
,
for class 'panelmodel' see plm()
and pgmm()
.
data("Grunfeld", package = "plm") is.pconsecutive(Grunfeld) is.pconsecutive(Grunfeld, index=c("firm", "year")) # delete 2nd row (2nd time period for first individual) # -> non consecutive Grunfeld_missing_period <- Grunfeld[-2, ] is.pconsecutive(Grunfeld_missing_period) all(is.pconsecutive(Grunfeld_missing_period)) # FALSE # delete rows 1 and 2 (1st and 2nd time period for first individual) # -> consecutive Grunfeld_missing_period_other <- Grunfeld[-c(1,2), ] is.pconsecutive(Grunfeld_missing_period_other) # all TRUE # delete year 1937 (3rd period) for _all_ individuals Grunfeld_wo_1937 <- Grunfeld[Grunfeld$year != 1937, ] is.pconsecutive(Grunfeld_wo_1937) # all FALSE # pdata.frame interface pGrunfeld <- pdata.frame(Grunfeld) pGrunfeld_missing_period <- pdata.frame(Grunfeld_missing_period) is.pconsecutive(pGrunfeld) # all TRUE is.pconsecutive(pGrunfeld_missing_period) # first FALSE, others TRUE # panelmodel interface (first, estimate some models) mod_pGrunfeld <- plm(inv ~ value + capital, data = Grunfeld) mod_pGrunfeld_missing_period <- plm(inv ~ value + capital, data = Grunfeld_missing_period) is.pconsecutive(mod_pGrunfeld) is.pconsecutive(mod_pGrunfeld_missing_period) nobs(mod_pGrunfeld) # 200 nobs(mod_pGrunfeld_missing_period) # 199 # pseries interface pinv <- pGrunfeld$inv pinv_missing_period <- pGrunfeld_missing_period$inv is.pconsecutive(pinv) is.pconsecutive(pinv_missing_period) # default method for arbitrary vectors or NULL inv <- Grunfeld$inv inv_missing_period <- Grunfeld_missing_period$inv is.pconsecutive(inv, id = Grunfeld$firm, time = Grunfeld$year) is.pconsecutive(inv_missing_period, id = Grunfeld_missing_period$firm, time = Grunfeld_missing_period$year) # (not run) demonstrate mismatch lengths of x, id, time # is.pconsecutive(x = inv_missing_period, id = Grunfeld$firm, time = Grunfeld$year) # only id and time are needed for evaluation is.pconsecutive(NULL, id = Grunfeld$firm, time = Grunfeld$year)
data("Grunfeld", package = "plm") is.pconsecutive(Grunfeld) is.pconsecutive(Grunfeld, index=c("firm", "year")) # delete 2nd row (2nd time period for first individual) # -> non consecutive Grunfeld_missing_period <- Grunfeld[-2, ] is.pconsecutive(Grunfeld_missing_period) all(is.pconsecutive(Grunfeld_missing_period)) # FALSE # delete rows 1 and 2 (1st and 2nd time period for first individual) # -> consecutive Grunfeld_missing_period_other <- Grunfeld[-c(1,2), ] is.pconsecutive(Grunfeld_missing_period_other) # all TRUE # delete year 1937 (3rd period) for _all_ individuals Grunfeld_wo_1937 <- Grunfeld[Grunfeld$year != 1937, ] is.pconsecutive(Grunfeld_wo_1937) # all FALSE # pdata.frame interface pGrunfeld <- pdata.frame(Grunfeld) pGrunfeld_missing_period <- pdata.frame(Grunfeld_missing_period) is.pconsecutive(pGrunfeld) # all TRUE is.pconsecutive(pGrunfeld_missing_period) # first FALSE, others TRUE # panelmodel interface (first, estimate some models) mod_pGrunfeld <- plm(inv ~ value + capital, data = Grunfeld) mod_pGrunfeld_missing_period <- plm(inv ~ value + capital, data = Grunfeld_missing_period) is.pconsecutive(mod_pGrunfeld) is.pconsecutive(mod_pGrunfeld_missing_period) nobs(mod_pGrunfeld) # 200 nobs(mod_pGrunfeld_missing_period) # 199 # pseries interface pinv <- pGrunfeld$inv pinv_missing_period <- pGrunfeld_missing_period$inv is.pconsecutive(pinv) is.pconsecutive(pinv_missing_period) # default method for arbitrary vectors or NULL inv <- Grunfeld$inv inv_missing_period <- Grunfeld_missing_period$inv is.pconsecutive(inv, id = Grunfeld$firm, time = Grunfeld$year) is.pconsecutive(inv_missing_period, id = Grunfeld_missing_period$firm, time = Grunfeld_missing_period$year) # (not run) demonstrate mismatch lengths of x, id, time # is.pconsecutive(x = inv_missing_period, id = Grunfeld$firm, time = Grunfeld$year) # only id and time are needed for evaluation is.pconsecutive(NULL, id = Grunfeld$firm, time = Grunfeld$year)
This function checks if an object qualifies as a pseries
is.pseries(object)
is.pseries(object)
object |
object to be checked for pseries features |
A "pseries"
is a wrapper around a "basic class" (numeric, factor,
logical, character, or complex).
To qualify as a pseries, an object needs to have the following features:
class contains "pseries"
and there are at least two classes
("pseries"
and the basic class),
have an appropriate index attribute (defines the panel structure),
any of is.numeric
, is.factor
, is.logical
, is.character
,
is.complex
is TRUE
.
A logical indicating whether the object is a pseries (TRUE
)
or not (FALSE
).
pseries()
for some computations on pseries and some
further links.
# Create a pdata.frame and extract a series, which becomes a pseries data("EmplUK", package = "plm") Em <- pdata.frame(EmplUK) z <- Em$output class(z) # pseries as indicated by class is.pseries(z) # and confirmed by check # destroy index of pseries and re-check attr(z, "index") <- NA is.pseries(z) # now FALSE
# Create a pdata.frame and extract a series, which becomes a pseries data("EmplUK", package = "plm") Em <- pdata.frame(EmplUK) z <- Em$output class(z) # pseries as indicated by class is.pseries(z) # and confirmed by check # destroy index of pseries and re-check attr(z, "index") <- NA is.pseries(z) # now FALSE
A panel of 532 observations from 1979 to 1988
A data frame containing :
log of annual hours worked
log of hourly wage
number of children
age
bad health
id
year
number of observations : 5320
Online complements to Ziliak (1997).
Journal of Business Economics and Statistics web site: https://amstat.tandfonline.com/loi/ubes20/.
Colin Cameron A, K. Trivedi P (2005). Microeconometrics: Methods and Applications. Cambridge University Press. ISBN 0521848059, doi:10.1017/CBO9780511811241.
Ziliak JP (1997). “Efficient Estimation with Panel Data When Instruments Are Predetermined: An Empirical Comparison of Moment-Condition Estimators.” Journal of Business & Economic Statistics, 15(4), 419–431. ISSN 07350015.
lag, lead, and diff functions for class pseries.
lead(x, k = 1L, ...) ## S3 method for class 'pseries' lag(x, k = 1L, shift = c("time", "row"), ...) ## S3 method for class 'pseries' lead(x, k = 1L, shift = c("time", "row"), ...) ## S3 method for class 'pseries' diff(x, lag = 1L, shift = c("time", "row"), ...)
lead(x, k = 1L, ...) ## S3 method for class 'pseries' lag(x, k = 1L, shift = c("time", "row"), ...) ## S3 method for class 'pseries' lead(x, k = 1L, shift = c("time", "row"), ...) ## S3 method for class 'pseries' diff(x, lag = 1L, shift = c("time", "row"), ...)
x |
a |
k |
an integer, the number of lags for the |
... |
further arguments (currently none evaluated). |
shift |
character, either |
lag |
integer, the number of lags for the |
This set of functions perform lagging, leading (lagging in the
opposite direction), and differencing operations on pseries
objects, i. e., they take the panel structure of the data into
account by performing the operations per individual.
Argument shift
controls the shifting of observations to be used
by methods lag
, lead
, and diff
:
shift = "time"
(default): Methods respect the
numerical value in the time dimension of the index. The time
dimension needs to be interpretable as a sequence t, t+1, t+2,
... where t is an integer (from a technical viewpoint,
as.numeric(as.character(index(your_pdata.frame)[[2]]))
needs to
result in a meaningful integer).
shift = "row":
Methods perform the shifting operation based
solely on the "physical position" of the observations,
i.e., neighbouring rows are shifted per individual. The value in the
time index is not relevant in this case.
For consecutive time periods per individual, a switch of shifting behaviour results in no difference. Different return values will occur for non-consecutive time periods per individual ("holes in time"), see also Examples.
An object of class pseries
, if the argument specifying the lag
has length 1 (argument k
in functions lag
and lead
,
argument lag
in function diff
).
A matrix containing the various series in its columns, if the argument specifying the lag has length > 1.
The sign of k
in lag.pseries
results in inverse behaviour
compared to stats::lag()
and zoo::lag.zoo()
.
Yves Croissant and Kevin Tappe
To check if the time periods are consecutive per
individual, see is.pconsecutive()
.
For further function for 'pseries' objects: between()
,
Between(), Within()
, summary.pseries()
,
print.summary.pseries()
, as.matrix.pseries()
.
# First, create a pdata.frame data("EmplUK", package = "plm") Em <- pdata.frame(EmplUK) # Then extract a series, which becomes additionally a pseries z <- Em$output class(z) # compute the first and third lag, and the difference lagged twice lag(z) lag(z, 3L) diff(z, 2L) # compute negative lags (= leading values) lag(z, -1L) lead(z, 1L) # same as line above identical(lead(z, 1L), lag(z, -1L)) # TRUE # compute more than one lag and diff at once (matrix returned) lag(z, c(1L,2L)) diff(z, c(1L,2L)) ## demonstrate behaviour of shift = "time" vs. shift = "row" # delete 2nd time period for first individual (1978 is missing (not NA)): Em_hole <- Em[-2L, ] is.pconsecutive(Em_hole) # check: non-consecutive for 1st individual now # original non-consecutive data: head(Em_hole$emp, 10) # for shift = "time", 1-1979 contains the value of former 1-1977 (2 periods lagged): head(lag(Em_hole$emp, k = 2L, shift = "time"), 10L) # for shift = "row", 1-1979 contains NA (2 rows lagged (and no entry for 1976): head(lag(Em_hole$emp, k = 2L, shift = "row"), 10L)
# First, create a pdata.frame data("EmplUK", package = "plm") Em <- pdata.frame(EmplUK) # Then extract a series, which becomes additionally a pseries z <- Em$output class(z) # compute the first and third lag, and the difference lagged twice lag(z) lag(z, 3L) diff(z, 2L) # compute negative lags (= leading values) lag(z, -1L) lead(z, 1L) # same as line above identical(lead(z, 1L), lag(z, -1L)) # TRUE # compute more than one lag and diff at once (matrix returned) lag(z, c(1L,2L)) diff(z, c(1L,2L)) ## demonstrate behaviour of shift = "time" vs. shift = "row" # delete 2nd time period for first individual (1978 is missing (not NA)): Em_hole <- Em[-2L, ] is.pconsecutive(Em_hole) # check: non-consecutive for 1st individual now # original non-consecutive data: head(Em_hole$emp, 10) # for shift = "time", 1-1979 contains the value of former 1-1977 (2 periods lagged): head(lag(Em_hole$emp, k = 2L, shift = "time"), 10L) # for shift = "row", 1-1979 contains NA (2 rows lagged (and no entry for 1976): head(lag(Em_hole$emp, k = 2L, shift = "row"), 10L)
Contrast-coded dummy matrix (treatment coding) created from a factor
make.dummies(x, ...) ## Default S3 method: make.dummies(x, base = 1L, base.add = TRUE, ...) ## S3 method for class 'data.frame' make.dummies(x, col, base = 1L, base.add = TRUE, ...) ## S3 method for class 'pdata.frame' make.dummies(x, col, base = 1L, base.add = TRUE, ...)
make.dummies(x, ...) ## Default S3 method: make.dummies(x, base = 1L, base.add = TRUE, ...) ## S3 method for class 'data.frame' make.dummies(x, col, base = 1L, base.add = TRUE, ...) ## S3 method for class 'pdata.frame' make.dummies(x, col, base = 1L, base.add = TRUE, ...)
x |
a factor from which the dummies are created (x is coerced to factor if not yet a factor) for the default method or a data data frame/pdata.frame for the respective method. |
... |
further arguments. |
base |
integer or character, specifies the reference level (base), if
integer it refers to position in |
base.add |
logical, if |
col |
character (only for the data frame and pdata.frame methods), to specify the column which is used to derive the dummies from, |
This function creates a matrix of dummies from the levels of a factor in treatment coding. In model estimations, it is usually preferable to not create the dummy matrix prior to estimation but to simply specify a factor in the formula and let the estimation function handle the creation of the dummies.
This function is merely a convenience wrapper around stats::contr.treatment
to ease the dummy matrix creation process shall the dummy matrix be explicitly
required. See Examples for a use case in LSDV (least squares dummy variable)
model estimation.
The default method uses a factor as main input (or something coercible to a
factor) to derive the dummy matrix from. Methods for data frame and pdata.frame
are available as well and have the additional argument col
to specify the
the column from which the dummies are created; both methods merge the dummy
matrix to the data frame/pdata.frame yielding a ready-to-use data set.
See also Examples for use cases.
For the default method, a matrix containing the contrast-coded
dummies (treatment coding),
dimensions are n x n where n = length(levels(x))
if argument
base.add = TRUE
or n = length(levels(x)-1)
if base.add = FALSE
;
for the data frame and pdata.frame method, a data frame or pdata.frame,
respectively, with the dummies appropriately merged to the input as
last columns (column names are derived from the name of the column
used to create the dummies and its levels).
Kevin Tappe
stats::contr.treatment()
, stats::contrasts()
library(plm) data("Grunfeld", package = "plm") Grunfeld <- Grunfeld[1:100, ] # reduce data set (down to 5 firms) ## default method make.dummies(Grunfeld$firm) # gives 5 x 5 matrix (5 firms, base level incl.) make.dummies(Grunfeld$firm, base = 2L, base.add = FALSE) # gives 5 x 4 matrix ## data frame method Grun.dummies <- make.dummies(Grunfeld, col = "firm") ## pdata.frame method pGrun <- pdata.frame(Grunfeld) pGrun.dummies <- make.dummies(pGrun, col = "firm") ## Model estimation: ## estimate within model (individual/firm effects) and LSDV models (firm dummies) # within model: plm(inv ~ value + capital, data = pGrun, model = "within") ## LSDV with user-created dummies by make.dummies: form_dummies <- paste0("firm", c(1:5), collapse = "+") form_dummies <- formula(paste0("inv ~ value + capital + ", form_dummies)) plm(form_dummies, data = pGrun.dummies, model = "pooling") # last dummy is dropped # LSDV via factor(year) -> let estimation function generate dummies: plm(inv ~ value + capital + factor(firm), data = pGrun, model = "pooling")
library(plm) data("Grunfeld", package = "plm") Grunfeld <- Grunfeld[1:100, ] # reduce data set (down to 5 firms) ## default method make.dummies(Grunfeld$firm) # gives 5 x 5 matrix (5 firms, base level incl.) make.dummies(Grunfeld$firm, base = 2L, base.add = FALSE) # gives 5 x 4 matrix ## data frame method Grun.dummies <- make.dummies(Grunfeld, col = "firm") ## pdata.frame method pGrun <- pdata.frame(Grunfeld) pGrun.dummies <- make.dummies(pGrun, col = "firm") ## Model estimation: ## estimate within model (individual/firm effects) and LSDV models (firm dummies) # within model: plm(inv ~ value + capital, data = pGrun, model = "within") ## LSDV with user-created dummies by make.dummies: form_dummies <- paste0("firm", c(1:5), collapse = "+") form_dummies <- formula(paste0("inv ~ value + capital + ", form_dummies)) plm(form_dummies, data = pGrun.dummies, model = "pooling") # last dummy is dropped # LSDV via factor(year) -> let estimation function generate dummies: plm(inv ~ value + capital + factor(firm), data = pGrun, model = "pooling")
This function makes the data balanced, i.e., each individual has the same time periods, by filling in or dropping observations
make.pbalanced( x, balance.type = c("fill", "shared.times", "shared.individuals"), ... ) ## S3 method for class 'pdata.frame' make.pbalanced( x, balance.type = c("fill", "shared.times", "shared.individuals"), ... ) ## S3 method for class 'pseries' make.pbalanced( x, balance.type = c("fill", "shared.times", "shared.individuals"), ... ) ## S3 method for class 'data.frame' make.pbalanced( x, balance.type = c("fill", "shared.times", "shared.individuals"), index = NULL, ... )
make.pbalanced( x, balance.type = c("fill", "shared.times", "shared.individuals"), ... ) ## S3 method for class 'pdata.frame' make.pbalanced( x, balance.type = c("fill", "shared.times", "shared.individuals"), ... ) ## S3 method for class 'pseries' make.pbalanced( x, balance.type = c("fill", "shared.times", "shared.individuals"), ... ) ## S3 method for class 'data.frame' make.pbalanced( x, balance.type = c("fill", "shared.times", "shared.individuals"), index = NULL, ... )
x |
an object of class |
balance.type |
character, one of |
... |
further arguments. |
index |
only relevant for |
(p)data.frame and pseries objects are made balanced, meaning each
individual has the same time periods. Depending on the value of
balance.type
, the balancing is done in different ways:
balance.type = "fill"
(default): The union
of available time periods over all individuals is taken (w/o
NA
values). Missing time periods for an individual are
identified and corresponding rows (elements for pseries) are
inserted and filled with NA
for the non–index variables
(elements for a pseries). This means, only time periods present
for at least one individual are inserted, if missing.
balance.type = "shared.times"
: The intersect of available time
periods over all individuals is taken (w/o NA
values). Thus, time
periods not available for all individuals are discarded, i. e., only time
periods shared by all individuals are left in the result).
balance.type = "shared.individuals"
: All available time periods
are kept and those individuals are dropped for which not all time periods
are available, i. e., only individuals shared by all time periods are left
in the result (symmetric to "shared.times"
).
The data are not necessarily made consecutive (regular time series
with distance 1), because balancedness does not imply
consecutiveness. For making the data consecutive, use
make.pconsecutive()
(and, optionally, set argument
balanced = TRUE
to make consecutive and balanced, see also
Examples for a comparison of the two functions.
Note: Rows of (p)data.frames (elements for pseries) with NA
values in individual or time index are not examined but silently
dropped before the data are made balanced. In this case, it cannot
be inferred which individual or time period is meant by the missing
value(s) (see also Examples). Especially, this means:
NA
values in the first/last position of the original time
periods for an individual are dropped, which are usually meant to
depict the beginning and ending of the time series for that
individual. Thus, one might want to check if there are any
NA
values in the index variables before applying
make.pbalanced
, and especially check for NA
values in the
first and last position for each individual in original data and,
if so, maybe set those to some meaningful begin/end value for the
time series.
An object of the same class as the input x
, i.e., a
pdata.frame, data.frame or a pseries which is made balanced
based on the index variables. The returned data are sorted as a
stacked time series.
Kevin Tappe
is.pbalanced()
to check if data are balanced;
is.pconsecutive()
to check if data are consecutive;
make.pconsecutive()
to make data consecutive (and,
optionally, also balanced).punbalancedness()
for two measures of unbalancedness, pdim()
to check
the dimensions of a 'pdata.frame' (and other objects),
pvar()
to check for individual and time variation
of a 'pdata.frame' (and other objects), lag()
for
lagging (and leading) values of a 'pseries' object.pseries()
, data.frame()
,
pdata.frame()
.
# take data and make it unbalanced # by deletion of 2nd row (2nd time period for first individual) data("Grunfeld", package = "plm") nrow(Grunfeld) # 200 rows Grunfeld_missing_period <- Grunfeld[-2, ] pdim(Grunfeld_missing_period)$balanced # check if balanced: FALSE make.pbalanced(Grunfeld_missing_period) # make it balanced (by filling) make.pbalanced(Grunfeld_missing_period, balance.type = "shared.times") # (shared periods) nrow(make.pbalanced(Grunfeld_missing_period)) nrow(make.pbalanced(Grunfeld_missing_period, balance.type = "shared.times")) # more complex data: # First, make data unbalanced (and non-consecutive) # by deletion of 2nd time period (year 1936) for all individuals # and more time periods for first individual only Grunfeld_unbalanced <- Grunfeld[Grunfeld$year != 1936, ] Grunfeld_unbalanced <- Grunfeld_unbalanced[-c(1,4), ] pdim(Grunfeld_unbalanced)$balanced # FALSE all(is.pconsecutive(Grunfeld_unbalanced)) # FALSE g_bal <- make.pbalanced(Grunfeld_unbalanced) pdim(g_bal)$balanced # TRUE unique(g_bal$year) # all years but 1936 nrow(g_bal) # 190 rows head(g_bal) # 1st individual: years 1935, 1939 are NA # NA in 1st, 3rd time period (years 1935, 1937) for first individual Grunfeld_NA <- Grunfeld Grunfeld_NA[c(1, 3), "year"] <- NA g_bal_NA <- make.pbalanced(Grunfeld_NA) head(g_bal_NA) # years 1935, 1937: NA for non-index vars nrow(g_bal_NA) # 200 # pdata.frame interface pGrunfeld_missing_period <- pdata.frame(Grunfeld_missing_period) make.pbalanced(Grunfeld_missing_period) # pseries interface make.pbalanced(pGrunfeld_missing_period$inv) # comparison to make.pconsecutive g_consec <- make.pconsecutive(Grunfeld_unbalanced) all(is.pconsecutive(g_consec)) # TRUE pdim(g_consec)$balanced # FALSE head(g_consec, 22) # 1st individual: no years 1935/6; 1939 is NA; # other indviduals: years 1935-1954, 1936 is NA nrow(g_consec) # 198 rows g_consec_bal <- make.pconsecutive(Grunfeld_unbalanced, balanced = TRUE) all(is.pconsecutive(g_consec_bal)) # TRUE pdim(g_consec_bal)$balanced # TRUE head(g_consec_bal) # year 1936 is NA for all individuals nrow(g_consec_bal) # 200 rows head(g_bal) # no year 1936 at all nrow(g_bal) # 190 rows
# take data and make it unbalanced # by deletion of 2nd row (2nd time period for first individual) data("Grunfeld", package = "plm") nrow(Grunfeld) # 200 rows Grunfeld_missing_period <- Grunfeld[-2, ] pdim(Grunfeld_missing_period)$balanced # check if balanced: FALSE make.pbalanced(Grunfeld_missing_period) # make it balanced (by filling) make.pbalanced(Grunfeld_missing_period, balance.type = "shared.times") # (shared periods) nrow(make.pbalanced(Grunfeld_missing_period)) nrow(make.pbalanced(Grunfeld_missing_period, balance.type = "shared.times")) # more complex data: # First, make data unbalanced (and non-consecutive) # by deletion of 2nd time period (year 1936) for all individuals # and more time periods for first individual only Grunfeld_unbalanced <- Grunfeld[Grunfeld$year != 1936, ] Grunfeld_unbalanced <- Grunfeld_unbalanced[-c(1,4), ] pdim(Grunfeld_unbalanced)$balanced # FALSE all(is.pconsecutive(Grunfeld_unbalanced)) # FALSE g_bal <- make.pbalanced(Grunfeld_unbalanced) pdim(g_bal)$balanced # TRUE unique(g_bal$year) # all years but 1936 nrow(g_bal) # 190 rows head(g_bal) # 1st individual: years 1935, 1939 are NA # NA in 1st, 3rd time period (years 1935, 1937) for first individual Grunfeld_NA <- Grunfeld Grunfeld_NA[c(1, 3), "year"] <- NA g_bal_NA <- make.pbalanced(Grunfeld_NA) head(g_bal_NA) # years 1935, 1937: NA for non-index vars nrow(g_bal_NA) # 200 # pdata.frame interface pGrunfeld_missing_period <- pdata.frame(Grunfeld_missing_period) make.pbalanced(Grunfeld_missing_period) # pseries interface make.pbalanced(pGrunfeld_missing_period$inv) # comparison to make.pconsecutive g_consec <- make.pconsecutive(Grunfeld_unbalanced) all(is.pconsecutive(g_consec)) # TRUE pdim(g_consec)$balanced # FALSE head(g_consec, 22) # 1st individual: no years 1935/6; 1939 is NA; # other indviduals: years 1935-1954, 1936 is NA nrow(g_consec) # 198 rows g_consec_bal <- make.pconsecutive(Grunfeld_unbalanced, balanced = TRUE) all(is.pconsecutive(g_consec_bal)) # TRUE pdim(g_consec_bal)$balanced # TRUE head(g_consec_bal) # year 1936 is NA for all individuals nrow(g_consec_bal) # 200 rows head(g_bal) # no year 1936 at all nrow(g_bal) # 190 rows
This function makes the data consecutive for each individual (no "gaps" in time dimension per individual) and, optionally, also balanced
make.pconsecutive(x, ...) ## S3 method for class 'data.frame' make.pconsecutive(x, balanced = FALSE, index = NULL, ...) ## S3 method for class 'pdata.frame' make.pconsecutive(x, balanced = FALSE, ...) ## S3 method for class 'pseries' make.pconsecutive(x, balanced = FALSE, ...)
make.pconsecutive(x, ...) ## S3 method for class 'data.frame' make.pconsecutive(x, balanced = FALSE, index = NULL, ...) ## S3 method for class 'pdata.frame' make.pconsecutive(x, balanced = FALSE, ...) ## S3 method for class 'pseries' make.pconsecutive(x, balanced = FALSE, ...)
x |
an object of class |
... |
further arguments. |
balanced |
logical, indicating whether the data should additionally be made balanced (default: FALSE), |
index |
only relevant for |
(p)data.frame and pseries objects are made consecutive, meaning their time
periods are made consecutive per individual. For consecutiveness, the time
dimension is interpreted to be numeric, and the data are extended to a
regularly spaced sequence with distance 1 between the time periods for each
individual (for each individual the time dimension become a sequence t, t+1,
t+2, ..., where t is an integer). Non–index variables are filled with
NA
for the inserted elements (rows for (p)data.frames, vector
elements for pseries).
With argument balanced = TRUE
, additionally to be made consecutive,
the data also can be made a balanced panel/pseries. Note: This means
consecutive AND balanced; balancedness does not imply consecutiveness. In
the result, each individual will have the same time periods in their time
dimension by taking the min and max of the time index variable over all
individuals (w/o NA
values) and inserting the missing time periods.
Looking at the number of rows of the resulting (pdata.frame) (elements for
pseries), this results in nrow(make.pconsecutive(<.>, balanced = FALSE))
<=
nrow(make.pconsecutive(<.>, balanced = TRUE))
. For making the data only
balanced, i.e., not demanding consecutiveness at the same time, use
make.pbalanced()
(see Examples for a comparison)).
Note: rows of (p)data.frames (elements for pseries) with NA
values in
individual or time index are not examined but silently dropped before the
data are made consecutive. In this case, it is not clear which individual or
time period is meant by the missing value(s). Especially, this means: If
there are NA
values in the first/last position of the original time
periods for an individual, which usually depicts the beginning and ending of
the time series for that individual, the beginning/end of the resulting time
series is taken to be the min and max (w/o NA
values) of the original
time series for that individual, see also Examples. Thus, one might
want to check if there are any NA
values in the index variables
before applying make.pconsecutive
, and especially check for NA
values
in the first and last position for each individual in original data and, if
so, maybe set those to some meaningful begin/end value for the time series.
An object of the same class as the input x
, i.e., a
pdata.frame, data.frame or a pseries which is made
time–consecutive based on the index variables. The returned
data are sorted as a stacked time series.
Kevin Tappe
is.pconsecutive()
to check if data are
consecutive; make.pbalanced()
to make data only
balanced (not consecutive).punbalancedness()
for two measures of unbalancedness, pdim()
to check
the dimensions of a 'pdata.frame' (and other objects),
pvar()
to check for individual and time variation
of a 'pdata.frame' (and other objects), lag()
for
lagged (and leading) values of a 'pseries' object.pseries()
, data.frame()
,
pdata.frame()
.
# take data and make it non-consecutive # by deletion of 2nd row (2nd time period for first individual) data("Grunfeld", package = "plm") nrow(Grunfeld) # 200 rows Grunfeld_missing_period <- Grunfeld[-2, ] is.pconsecutive(Grunfeld_missing_period) # check for consecutiveness make.pconsecutive(Grunfeld_missing_period) # make it consecutiveness # argument balanced: # First, make data non-consecutive and unbalanced # by deletion of 2nd time period (year 1936) for all individuals # and more time periods for first individual only Grunfeld_unbalanced <- Grunfeld[Grunfeld$year != 1936, ] Grunfeld_unbalanced <- Grunfeld_unbalanced[-c(1,4), ] all(is.pconsecutive(Grunfeld_unbalanced)) # FALSE pdim(Grunfeld_unbalanced)$balanced # FALSE g_consec_bal <- make.pconsecutive(Grunfeld_unbalanced, balanced = TRUE) all(is.pconsecutive(g_consec_bal)) # TRUE pdim(g_consec_bal)$balanced # TRUE nrow(g_consec_bal) # 200 rows head(g_consec_bal) # 1st individual: years 1935, 1936, 1939 are NA g_consec <- make.pconsecutive(Grunfeld_unbalanced) # default: balanced = FALSE all(is.pconsecutive(g_consec)) # TRUE pdim(g_consec)$balanced # FALSE nrow(g_consec) # 198 rows head(g_consec) # 1st individual: years 1935, 1936 dropped, 1939 is NA # NA in 1st, 3rd time period (years 1935, 1937) for first individual Grunfeld_NA <- Grunfeld Grunfeld_NA[c(1, 3), "year"] <- NA g_NA <- make.pconsecutive(Grunfeld_NA) head(g_NA) # 1936 is begin for 1st individual, 1937: NA for non-index vars nrow(g_NA) # 199, year 1935 from original data is dropped # pdata.frame interface pGrunfeld_missing_period <- pdata.frame(Grunfeld_missing_period) make.pconsecutive(Grunfeld_missing_period) # pseries interface make.pconsecutive(pGrunfeld_missing_period$inv) # comparison to make.pbalanced (makes the data only balanced, not consecutive) g_bal <- make.pbalanced(Grunfeld_unbalanced) all(is.pconsecutive(g_bal)) # FALSE pdim(g_bal)$balanced # TRUE nrow(g_bal) # 190 rows
# take data and make it non-consecutive # by deletion of 2nd row (2nd time period for first individual) data("Grunfeld", package = "plm") nrow(Grunfeld) # 200 rows Grunfeld_missing_period <- Grunfeld[-2, ] is.pconsecutive(Grunfeld_missing_period) # check for consecutiveness make.pconsecutive(Grunfeld_missing_period) # make it consecutiveness # argument balanced: # First, make data non-consecutive and unbalanced # by deletion of 2nd time period (year 1936) for all individuals # and more time periods for first individual only Grunfeld_unbalanced <- Grunfeld[Grunfeld$year != 1936, ] Grunfeld_unbalanced <- Grunfeld_unbalanced[-c(1,4), ] all(is.pconsecutive(Grunfeld_unbalanced)) # FALSE pdim(Grunfeld_unbalanced)$balanced # FALSE g_consec_bal <- make.pconsecutive(Grunfeld_unbalanced, balanced = TRUE) all(is.pconsecutive(g_consec_bal)) # TRUE pdim(g_consec_bal)$balanced # TRUE nrow(g_consec_bal) # 200 rows head(g_consec_bal) # 1st individual: years 1935, 1936, 1939 are NA g_consec <- make.pconsecutive(Grunfeld_unbalanced) # default: balanced = FALSE all(is.pconsecutive(g_consec)) # TRUE pdim(g_consec)$balanced # FALSE nrow(g_consec) # 198 rows head(g_consec) # 1st individual: years 1935, 1936 dropped, 1939 is NA # NA in 1st, 3rd time period (years 1935, 1937) for first individual Grunfeld_NA <- Grunfeld Grunfeld_NA[c(1, 3), "year"] <- NA g_NA <- make.pconsecutive(Grunfeld_NA) head(g_NA) # 1936 is begin for 1st individual, 1937: NA for non-index vars nrow(g_NA) # 199, year 1935 from original data is dropped # pdata.frame interface pGrunfeld_missing_period <- pdata.frame(Grunfeld_missing_period) make.pconsecutive(Grunfeld_missing_period) # pseries interface make.pconsecutive(pGrunfeld_missing_period$inv) # comparison to make.pbalanced (makes the data only balanced, not consecutive) g_bal <- make.pbalanced(Grunfeld_unbalanced) all(is.pconsecutive(g_bal)) # FALSE pdim(g_bal)$balanced # TRUE nrow(g_bal) # 190 rows
A panel of 545 observations from 1980 to 1987
A data frame containing :
identifier
year
years of schooling
years of experience (computed as age-6-school
)
wage set by collective bargaining?
a factor with levels black, hisp, other
married?
health problem?
log of hourly wage
a factor with 12 levels
a factor with 9 levels
a factor with levels rural_area, north_east, northern_central, south
total number of observations : 4360
observation : individuals
country : United States
Journal of Applied Econometrics data archive http://qed.econ.queensu.ca/jae/1998-v13.2/vella-verbeek/.
Vella F, Verbeek M (1998). “Whose wages do unions raise? A dynamic model of unionism and wage rate determination for young men.” Journal of Applied Econometrics, 13, 163–183.
Verbeek M (2004). A Guide to Modern Econometrics, 2nd edition. Wiley.
Methods to create model frame and model matrix for panel data.
## S3 method for class 'pdata.frame' model.frame( formula, data = NULL, ..., lhs = NULL, rhs = NULL, dot = "previous" ) ## S3 method for class 'pdata.frame' formula(x, ...) ## S3 method for class 'plm' model.matrix(object, ...) ## S3 method for class 'pdata.frame' model.matrix( object, model = c("pooling", "within", "Between", "Sum", "between", "mean", "random", "fd"), effect = c("individual", "time", "twoways", "nested"), rhs = 1, theta = NULL, cstcovar.rm = NULL, ... )
## S3 method for class 'pdata.frame' model.frame( formula, data = NULL, ..., lhs = NULL, rhs = NULL, dot = "previous" ) ## S3 method for class 'pdata.frame' formula(x, ...) ## S3 method for class 'plm' model.matrix(object, ...) ## S3 method for class 'pdata.frame' model.matrix( object, model = c("pooling", "within", "Between", "Sum", "between", "mean", "random", "fd"), effect = c("individual", "time", "twoways", "nested"), rhs = 1, theta = NULL, cstcovar.rm = NULL, ... )
data |
a |
... |
further arguments. |
lhs |
inherited from package |
rhs |
inherited from package |
dot |
inherited from package |
x |
a |
object , formula
|
an object of class |
model |
one of |
effect |
the effects introduced in the model, one of
|
theta |
the parameter for the transformation if |
cstcovar.rm |
remove the constant columns, one of |
The lhs
and rhs
arguments are inherited from Formula
, see
there for more details.
The model.frame
methods return a
pdata.frame
object suitable as an input to plm's
model.matrix
.
The model.matrix
methods builds a model matrix
with transformations performed as specified by the model
and
effect
arguments (and theta
if model = "random"
is
requested), in this case the supplied data
argument should be a
model frame created by plm's model.frame
method. If not, it is
tried to construct the model frame from the data. Constructing the
model frame first ensures proper NA
handling, see Examples.
The model.frame
methods return a pdata.frame
.
The
model.matrix
methods return a matrix
.
Yves Croissant
pmodel.response()
for (transformed) response
variable.Formula::Formula()
from package Formula
,
especially for the lhs
and rhs
arguments.
# First, make a pdata.frame data("Grunfeld", package = "plm") pGrunfeld <- pdata.frame(Grunfeld) # then make a model frame from a formula and a pdata.frame form <- inv ~ value mf <- model.frame(pGrunfeld, form) # then construct the (transformed) model matrix (design matrix) # from model frame modmat <- model.matrix(mf, model = "within") ## retrieve model frame and model matrix from an estimated plm object fe_model <- plm(form, data = pGrunfeld, model = "within") model.frame(fe_model) model.matrix(fe_model) # same as constructed before all.equal(mf, model.frame(fe_model), check.attributes = FALSE) # TRUE all.equal(modmat, model.matrix(fe_model), check.attributes = FALSE) # TRUE
# First, make a pdata.frame data("Grunfeld", package = "plm") pGrunfeld <- pdata.frame(Grunfeld) # then make a model frame from a formula and a pdata.frame form <- inv ~ value mf <- model.frame(pGrunfeld, form) # then construct the (transformed) model matrix (design matrix) # from model frame modmat <- model.matrix(mf, model = "within") ## retrieve model frame and model matrix from an estimated plm object fe_model <- plm(form, data = pGrunfeld, model = "within") model.frame(fe_model) model.matrix(fe_model) # same as constructed before all.equal(mf, model.frame(fe_model), check.attributes = FALSE) # TRUE all.equal(modmat, model.matrix(fe_model), check.attributes = FALSE) # TRUE
Test of serial correlation for models estimated by GMM
mtest(object, ...) ## S3 method for class 'pgmm' mtest(object, order = 1L, vcov = NULL, ...)
mtest(object, ...) ## S3 method for class 'pgmm' mtest(object, order = 1L, vcov = NULL, ...)
object |
an object of class |
... |
further arguments (currently unused). |
order |
integer: the order of the serial correlation, |
vcov |
a matrix of covariance for the coefficients or a function to compute it, |
The Arellano–Bond test is a test of correlation based on the residuals of
the estimation. By default, the computation is done with the standard
covariance matrix of the coefficients. A robust estimator of a
covariance matrix can be supplied with the vcov
argument.
Note that mtest
computes like DPD for Ox and xtabond do, i.e., uses for two-steps
models the one-step model's residuals which were used to construct the efficient
two-steps estimator, see (Arellano and Bond 2012), p. 32, footnote 7;
As noted by (Arellano and Bond 1991) (p. 282), the m statistic is rather
flexible and can be defined with any consistent GMM estimator which gives leeway
for implementation, but the test's asymptotic power depends on the estimator's efficiency.
(Arellano and Bond 1991) (see their footnote 9) used
DPD98 for Gauss ((Arellano and Bond 1998)) as did
(Windmeijer 2005) (see footnote 10) for the basis of his
covariance correction, both with a slightly different implementation.
Hence some results for mtest
with two-step models diverge from original papers,
see examples below.
An object of class "htest"
.
Yves Croissant
Arellano M, Bond S (1991).
“Some Tests of Specification for Panel Data : Monte Carlo Evidence and an Application to Employment Equations.”
Review of Economic Studies, 58, 277–297.
Arellano M, Bond S (1998).
“Dynamic panel data estimation using DPD98 for GAUSS: a guide for users.”
unpublished, https://ifs.org.uk/publications/dpd-gauss.
Arellano M, Bond S (2012).
“Panel data estimation using DPD for Ox.”
unpublished, https://www.doornik.com/download/oxmetrics7/Ox_Packages/dpd.pdf.
Windmeijer F (2005).
“A Finite Sample Correction for the Variance of Linear Efficient Two–Steps GMM Estimators.”
Journal of Econometrics, 126, 25–51.
data("EmplUK", package = "plm") # Arellano/Bond 1991, Table 4, column (a1) ab.a1 <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "onestep") mtest(ab.a1, 1L) mtest(ab.a1, 2L, vcov = vcovHC) # Windmeijer (2005), table 2, onestep with corrected std. err ab.b.onestep <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + log(capital) + lag(log(output), 0:1) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "onestep") mtest(ab.b.onestep, 1L, vcov = vcovHC) mtest(ab.b.onestep, 2L, vcov = vcovHC) # Arellano/Bond 1991, Table 4, column (a2) ab.a2 <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "twosteps") mtest(ab.a2, 1L) mtest(ab.a2, 2L) # while a la Arellano/Bond (1991) -0.434
data("EmplUK", package = "plm") # Arellano/Bond 1991, Table 4, column (a1) ab.a1 <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "onestep") mtest(ab.a1, 1L) mtest(ab.a1, 2L, vcov = vcovHC) # Windmeijer (2005), table 2, onestep with corrected std. err ab.b.onestep <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + log(capital) + lag(log(output), 0:1) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "onestep") mtest(ab.b.onestep, 1L, vcov = vcovHC) mtest(ab.b.onestep, 2L, vcov = vcovHC) # Arellano/Bond 1991, Table 4, column (a2) ab.a2 <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "twosteps") mtest(ab.a2, 1L) mtest(ab.a2, 2L) # while a la Arellano/Bond (1991) -0.434
This function extracts the total number of 'observations' from a fitted panel model.
## S3 method for class 'panelmodel' nobs(object, ...) ## S3 method for class 'pgmm' nobs(object, ...)
## S3 method for class 'panelmodel' nobs(object, ...) ## S3 method for class 'pgmm' nobs(object, ...)
object |
a |
... |
further arguments. |
The number of observations is usually the length of the residuals
vector. Thus, nobs
gives the number of observations actually
used by the estimation procedure. It is not necessarily the number
of observations of the model frame (number of rows in the model
frame), because sometimes the model frame is further reduced by the
estimation procedure. This is, e.g., the case for first–difference
models estimated by plm(..., model = "fd")
where the model
frame does not yet contain the differences (see also
Examples).
A single number, normally an integer.
# estimate a panelmodel data("Produc", package = "plm") z <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp,data=Produc, model="random", subset = gsp > 5000) nobs(z) # total observations used in estimation pdim(z)$nT$N # same information pdim(z) # more information about the dimensions (no. of individuals and time periods) # illustrate difference between nobs and pdim for first-difference model data("Grunfeld", package = "plm") fdmod <- plm(inv ~ value + capital, data = Grunfeld, model = "fd") nobs(fdmod) # 190 pdim(fdmod)$nT$N # 200
# estimate a panelmodel data("Produc", package = "plm") z <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp,data=Produc, model="random", subset = gsp > 5000) nobs(z) # total observations used in estimation pdim(z)$nT$N # same information pdim(z) # more information about the dimensions (no. of individuals and time periods) # illustrate difference between nobs and pdim for first-difference model data("Grunfeld", package = "plm") fdmod <- plm(inv ~ value + capital, data = Grunfeld, model = "fd") nobs(fdmod) # 190 pdim(fdmod)$nT$N # 200
A panel of 104 quarterly observations from 1973Q1 to 1998Q4
A data frame containing :
country codes: a factor with 17 levels
the quarter index, 1973Q1-1998Q4
log spot exchange rate vs. USD
log price level
short term interest rate
long term interest rate
log price differential vs. USA
U.S. short term interest rate
U.S. long term interest rate
total number of observations : 1768
observation : country
country : OECD
Coakley J, Fuertes A, Smith R (2006). “Unobserved heterogeneity in panel time series models.” Computational Statistics & Data Analysis, 50(9), 2361–2380.
Coakley J, Fuertes A, Smith R (2006). “Unobserved heterogeneity in panel time series models.” Computational Statistics & Data Analysis, 50(9), 2361–2380.
Driscoll JC, Kraay AC (1998). “Consistent covariance matrix estimation with spatially dependent panel data.” Review of economics and statistics, 80(4), 549–560.
Test of serial correlation for (the idiosyncratic component of) the errors in panel models.
pbgtest(x, ...) ## S3 method for class 'panelmodel' pbgtest(x, order = NULL, type = c("Chisq", "F"), ...) ## S3 method for class 'formula' pbgtest( x, order = NULL, type = c("Chisq", "F"), data, model = c("pooling", "random", "within"), ... )
pbgtest(x, ...) ## S3 method for class 'panelmodel' pbgtest(x, order = NULL, type = c("Chisq", "F"), ...) ## S3 method for class 'formula' pbgtest( x, order = NULL, type = c("Chisq", "F"), data, model = c("pooling", "random", "within"), ... )
x |
an object of class |
... |
further arguments (see |
order |
an integer indicating the order of serial correlation
to be tested for. |
type |
type of test statistic to be calculated; either
|
data |
only relevant for formula interface: data set for which
the respective panel model (see |
model |
only relevant for formula interface: compute test
statistic for model |
This Lagrange multiplier test uses the auxiliary model on
(quasi-)demeaned data taken from a model of class plm
which may
be a pooling
(default for formula interface), random
or
within
model. It performs a Breusch–Godfrey test (using bgtest
from package lmtest on the residuals of the
(quasi-)demeaned model, which should be serially uncorrelated under
the null of no serial correlation in idiosyncratic errors, as
illustrated in Wooldridge (2010). The function
takes the demeaned data, estimates the model and calls bgtest
.
Unlike most other tests for serial correlation in panels, this one allows to choose the order of correlation to test for.
An object of class "htest"
.
The argument order
defaults to the minimum number of
observations over the time dimension, while for
lmtest::bgtest
it defaults to 1
.
Giovanni Millo
Breusch TS (1978). “Testing for autocorrelation in dynamic linear models.” Australian Economic Papers, 17(31), 334–355.
Godfrey LG (1978). “Testing against general autoregressive and moving average error models when the regressors include lagged dependent variables.” Econometrica, 46(6), 1293–1301.
Wooldridge JM (2002). Econometric Analysis of Cross–Section and Panel Data. MIT Press.
Wooldridge JM (2010). Econometric Analysis of Cross–Section and Panel Data, 2nd edition. MIT Press.
Wooldridge JM (2013). Introductory Econometrics: a modern approach, 5th edition. South-Western (Cengage Learning). Sec. 12.2, pp. 421–422.
For the original test in package lmtest see
lmtest::bgtest()
. See pdwtest()
for the analogous
panel Durbin–Watson test. See pbltest()
, pbsytest()
,
pwartest()
and pwfdtest()
for other serial correlation
tests for panel models.
data("Grunfeld", package = "plm") g <- plm(inv ~ value + capital, data = Grunfeld, model = "random") # panelmodel interface pbgtest(g) pbgtest(g, order = 4) # formula interface pbgtest(inv ~ value + capital, data = Grunfeld, model = "random") # F test statistic (instead of default type="Chisq") pbgtest(g, type="F") pbgtest(inv ~ value + capital, data = Grunfeld, model = "random", type = "F")
data("Grunfeld", package = "plm") g <- plm(inv ~ value + capital, data = Grunfeld, model = "random") # panelmodel interface pbgtest(g) pbgtest(g, order = 4) # formula interface pbgtest(inv ~ value + capital, data = Grunfeld, model = "random") # F test statistic (instead of default type="Chisq") pbgtest(g, type="F") pbgtest(inv ~ value + capital, data = Grunfeld, model = "random", type = "F")
Baltagi and Li (1995)'s Lagrange multiplier test for AR(1) or MA(1) idiosyncratic errors in panel models with random effects.
pbltest(x, ...) ## S3 method for class 'formula' pbltest(x, data, alternative = c("twosided", "onesided"), index = NULL, ...) ## S3 method for class 'plm' pbltest(x, alternative = c("twosided", "onesided"), ...)
pbltest(x, ...) ## S3 method for class 'formula' pbltest(x, data, alternative = c("twosided", "onesided"), index = NULL, ...) ## S3 method for class 'plm' pbltest(x, alternative = c("twosided", "onesided"), ...)
x |
a model formula or an estimated random–effects model of
class |
... |
further arguments. |
data |
for the formula interface only: a |
alternative |
one of |
index |
the index of the |
This is a Lagrange multiplier test for the null of no serial
correlation, against the alternative of either an AR(1) or a MA(1)
process, in the idiosyncratic component of the error term in a
random effects panel model (as the analytical expression of the
test turns out to be the same under both alternatives,
(see Baltagi and Li 1995 and Baltagi and Li 1997). The
alternative
argument, defaulting to twosided
, allows testing
for positive serial correlation only, if set to onesided
.
An object of class "htest"
.
Giovanni Millo
Baltagi B, Li Q (1995). “Testing AR(1) Against MA(1) Disturbances in an Error Component Model.” Journal of Econometrics, 68, 133–151.
Baltagi B, Li Q (1997). “Monte Carlo Results on Pure and Pretest Estimators of an Error Components Model With Autocorrelated Disturbances.” Annales d'Economie et de Statistique, 48, 69–82.
pdwtest()
, pbnftest()
, pbgtest()
,
pbsytest()
, pwartest()
and
pwfdtest()
for other serial correlation tests for
panel models.
data("Grunfeld", package = "plm") # formula interface pbltest(inv ~ value + capital, data = Grunfeld) # plm interface re_mod <- plm(inv ~ value + capital, data = Grunfeld, model = "random") pbltest(re_mod) pbltest(re_mod, alternative = "onesided")
data("Grunfeld", package = "plm") # formula interface pbltest(inv ~ value + capital, data = Grunfeld) # plm interface re_mod <- plm(inv ~ value + capital, data = Grunfeld, model = "random") pbltest(re_mod) pbltest(re_mod, alternative = "onesided")
Tests for AR(1) disturbances in panel models.
pbnftest(x, ...) ## S3 method for class 'panelmodel' pbnftest(x, test = c("bnf", "lbi"), ...) ## S3 method for class 'formula' pbnftest( x, data, test = c("bnf", "lbi"), model = c("pooling", "within", "random"), ... )
pbnftest(x, ...) ## S3 method for class 'panelmodel' pbnftest(x, test = c("bnf", "lbi"), ...) ## S3 method for class 'formula' pbnftest( x, data, test = c("bnf", "lbi"), model = c("pooling", "within", "random"), ... )
x |
an object of class |
... |
only relevant for formula interface: further arguments
to specify the model to test (arguments passed on to plm()),
e.g., |
test |
a character indicating the test to be performed, either
|
data |
a |
model |
a character indicating on which type of model the test
shall be performed ( |
The default, test = "bnf"
, gives the (modified) BNF statistic,
the generalised Durbin-Watson statistic for panels. For balanced
and consecutive panels, the reference is
Bhargava/Franzini/Narendranathan (1982). The modified BNF is given
for unbalanced and/or non-consecutive panels (d1 in formula 16 of
Baltagi and Wu (1999)).
test = "lbi"
yields Baltagi–Wu's LBI statistic
(Baltagi and Wu 1999), the locally best invariant test which
is based on the modified BNF statistic.
No specific variants of these tests are available for random effect models. As the within estimator is consistent also under the random effects assumptions, the test for random effect models is performed by taking the within residuals.
No p-values are given for the statistics as their distribution is quite difficult. Bhargava et al. (1982) supply tabulated bounds for p = 0.05 for the balanced case and consecutive case.
For large N, (Bhargava et al. 1982) suggest it is sufficient to check whether the BNF statistic is < 2 to test against positive serial correlation.
An object of class "htest"
.
Kevin Tappe
Baltagi BH (2013). Econometric Analysis of Panel Data, 5th edition. John Wiley and Sons ltd.
Baltagi BH, Wu PX (1999). “Unequally Spaced Panel Data Regressions with AR(1) Disturbances.” Econometric Theory, 15(6), 814–823. ISSN 02664666, 14694360.
Bhargava A, Franzini L, Narendranathan W (1982). “Serial Correlation and the Fixed Effects Model.” The Review of Economic Studies, 49(4), 533–549. ISSN 00346527, 1467937X.
pdwtest()
for the original Durbin–Watson test using
(quasi-)demeaned residuals of the panel model without taking
the panel structure into account. pbltest()
, pbsytest()
,
pwartest()
and pwfdtest()
for other serial correlation
tests for panel models.
data("Grunfeld", package = "plm") # formula interface, replicate Baltagi/Wu (1999), table 1, test case A: data_A <- Grunfeld[!Grunfeld[["year"]] %in% c("1943", "1944"), ] pbnftest(inv ~ value + capital, data = data_A, model = "within") pbnftest(inv ~ value + capital, data = data_A, test = "lbi", model = "within") # replicate Baltagi (2013), p. 101, table 5.1: re <- plm(inv ~ value + capital, data = Grunfeld, model = "random") pbnftest(re) pbnftest(re, test = "lbi")
data("Grunfeld", package = "plm") # formula interface, replicate Baltagi/Wu (1999), table 1, test case A: data_A <- Grunfeld[!Grunfeld[["year"]] %in% c("1943", "1944"), ] pbnftest(inv ~ value + capital, data = data_A, model = "within") pbnftest(inv ~ value + capital, data = data_A, test = "lbi", model = "within") # replicate Baltagi (2013), p. 101, table 5.1: re <- plm(inv ~ value + capital, data = Grunfeld, model = "random") pbnftest(re) pbnftest(re, test = "lbi")
Test for residual serial correlation (or individual random effects) locally robust vs. individual random effects (serial correlation) for panel models and joint test of serial correlation and the random effect specification by Baltagi and Li.
pbsytest(x, ...) ## S3 method for class 'formula' pbsytest( x, data, ..., test = c("ar", "re", "j"), re.normal = if (test == "re") TRUE else NULL ) ## S3 method for class 'panelmodel' pbsytest( x, test = c("ar", "re", "j"), re.normal = if (test == "re") TRUE else NULL, ... )
pbsytest(x, ...) ## S3 method for class 'formula' pbsytest( x, data, ..., test = c("ar", "re", "j"), re.normal = if (test == "re") TRUE else NULL ) ## S3 method for class 'panelmodel' pbsytest( x, test = c("ar", "re", "j"), re.normal = if (test == "re") TRUE else NULL, ... )
x |
an object of class |
... |
further arguments. |
data |
a |
test |
a character string indicating which test to perform:
first–order serial correlation ( |
re.normal |
logical, only relevant for |
These Lagrange multiplier tests are robust vs. local misspecification of the alternative hypothesis, i.e., they test the null of serially uncorrelated residuals against AR(1) residuals in a pooling model, allowing for local departures from the assumption of no random effects; or they test the null of no random effects allowing for local departures from the assumption of no serial correlation in residuals. They use only the residuals of the pooled OLS model and correct for local misspecification as outlined in Bera et al. (2001).
For test = "re"
, the default (re.normal = TRUE
) is to compute
a one-sided test which is expected to lead to a more powerful test
(asymptotically N(0,1) distributed). Setting re.normal = FALSE
gives
the two-sided test (asymptotically chi-squared(2) distributed). Argument
re.normal
is irrelevant for all other values of test
.
The joint test of serial correlation and the random effect
specification (test = "j"
) is due to
Baltagi and Li (1991) (also mentioned in
Baltagi and Li (1995), pp. 135–136) and is added
for convenience under this same function.
The unbalanced version of all tests are derived in Sosa-Escudero and Bera (2008). The functions implemented are suitable for balanced as well as unbalanced panel data sets.
A concise treatment of the statistics for only balanced panels is given in Baltagi (2013), p. 108.
Here is an overview of how the various values of the test
argument relate to the literature:
test = "ar"
:
in Bera
et al. (2001), p. 9 (balanced)
in Baltagi (2013), p.
108 (balanced)
in Sosa-Escudero/Bera (2008), p. 73
(unbalanced)
test = "re", re.normal = TRUE
(default) (one-sided test,
asymptotically N(0,1) distributed):
in Bera
et al. (2001), p. 11 (balanced)
in Sosa-Escudero/Bera
(2008), p. 75 (unbalanced)
test = "re", re.normal = FALSE
(two-sided test, asymptotically
chi-squared(2) distributed):
in Bera et al.
(2001), p. 7 (balanced)
in Baltagi (2013), p. 108
(balanced)
in Sosa-Escudero/Bera (2008), p. 73
(unbalanced)
test = "j"
:
in Bera et al.
(2001), p. 10 (balanced)
in Baltagi/Li (2001), p. 279
(balanced)
in Baltagi and Li (1995), pp. 135–136
(balanced)
in Baltagi (2013), p. 108 (balanced)
in Sosa-Escudero/Bera (2008), p. 74 (unbalanced)
An object of class "htest"
.
Giovanni Millo (initial implementation) & Kevin Tappe (extension to unbalanced panels)
Bera AK, Sosa–Escudero W, Yoon M (2001). “Tests for the Error Component Model in the Presence of Local Misspecification.” Journal of Econometrics, 101, 1–23.
Baltagi BH (2013). Econometric Analysis of Panel Data, 5th edition. John Wiley and Sons ltd.
Baltagi B, Li Q (1991). “A Joint Test for Serial Correlation and Random Individual Effects.” Statistics and Probability Letters, 11, 277–280.
Baltagi B, Li Q (1995). “Testing AR(1) Against MA(1) Disturbances in an Error Component Model.” Journal of Econometrics, 68, 133–151.
Sosa-Escudero W, Bera AK (2008). “Tests for Unbalanced Error-Components Models under Local Misspecification.” The Stata Journal, 8(1), 68-78. doi:10.1177/1536867X0800800105, https://doi.org/10.1177/1536867X0800800105.
plmtest()
for individual and/or time random effects
tests based on a correctly specified model; pbltest()
,
pbgtest()
and pdwtest()
for serial correlation tests
in random effects models.
## Bera et. al (2001), p. 13, table 1 use ## a subset of the original Grunfeld ## data which contains three errors -> construct this subset: data("Grunfeld", package = "plm") Grunsubset <- rbind(Grunfeld[1:80, ], Grunfeld[141:160, ]) Grunsubset[Grunsubset$firm == 2 & Grunsubset$year %in% c(1940, 1952), ][["inv"]] <- c(261.6, 645.2) Grunsubset[Grunsubset$firm == 2 & Grunsubset$year == 1946, ][["capital"]] <- 232.6 ## default is AR testing (formula interface) pbsytest(inv ~ value + capital, data = Grunsubset, index = c("firm", "year")) pbsytest(inv ~ value + capital, data = Grunsubset, index = c("firm", "year"), test = "re") pbsytest(inv ~ value + capital, data = Grunsubset, index = c("firm", "year"), test = "re", re.normal = FALSE) pbsytest(inv ~ value + capital, data = Grunsubset, index = c("firm", "year"), test = "j") ## plm interface mod <- plm(inv ~ value + capital, data = Grunsubset, model = "pooling") pbsytest(mod)
## Bera et. al (2001), p. 13, table 1 use ## a subset of the original Grunfeld ## data which contains three errors -> construct this subset: data("Grunfeld", package = "plm") Grunsubset <- rbind(Grunfeld[1:80, ], Grunfeld[141:160, ]) Grunsubset[Grunsubset$firm == 2 & Grunsubset$year %in% c(1940, 1952), ][["inv"]] <- c(261.6, 645.2) Grunsubset[Grunsubset$firm == 2 & Grunsubset$year == 1946, ][["capital"]] <- 232.6 ## default is AR testing (formula interface) pbsytest(inv ~ value + capital, data = Grunsubset, index = c("firm", "year")) pbsytest(inv ~ value + capital, data = Grunsubset, index = c("firm", "year"), test = "re") pbsytest(inv ~ value + capital, data = Grunsubset, index = c("firm", "year"), test = "re", re.normal = FALSE) pbsytest(inv ~ value + capital, data = Grunsubset, index = c("firm", "year"), test = "j") ## plm interface mod <- plm(inv ~ value + capital, data = Grunsubset, model = "pooling") pbsytest(mod)
Common Correlated Effects Mean Groups (CCEMG) and Pooled (CCEP) estimators for panel data with common factors (balanced or unbalanced)
pcce( formula, data, subset, na.action, model = c("mg", "p"), index = NULL, trend = FALSE, ... ) ## S3 method for class 'pcce' summary(object, vcov = NULL, ...) ## S3 method for class 'summary.pcce' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'pcce' residuals(object, type = c("defactored", "standard"), ...) ## S3 method for class 'pcce' model.matrix(object, ...) ## S3 method for class 'pcce' pmodel.response(object, ...)
pcce( formula, data, subset, na.action, model = c("mg", "p"), index = NULL, trend = FALSE, ... ) ## S3 method for class 'pcce' summary(object, vcov = NULL, ...) ## S3 method for class 'summary.pcce' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'pcce' residuals(object, type = c("defactored", "standard"), ...) ## S3 method for class 'pcce' model.matrix(object, ...) ## S3 method for class 'pcce' pmodel.response(object, ...)
formula |
a symbolic description of the model to be estimated, |
data |
a |
subset |
see |
na.action |
see |
model |
one of |
index |
the indexes, see |
trend |
logical specifying whether an individual-specific trend has to be included, |
... |
further arguments. |
object , x
|
an object of class |
vcov |
a variance-covariance matrix furnished by the user or a function to calculate one, |
digits |
digits, |
width |
the maximum length of the lines in the print output, |
type |
one of |
pcce
is a function for the estimation of linear panel models by
the Common Correlated Effects Mean Groups or Pooled estimator,
consistent under the hypothesis of unobserved common factors and
idiosyncratic factor loadings. The CCE estimator works by
augmenting the model by cross-sectional averages of the dependent
variable and regressors in order to account for the common factors,
and adding individual intercepts and possibly trends.
An object of class c("pcce", "panelmodel")
containing:
coefficients |
the vector of coefficients, |
residuals |
the vector of (defactored) residuals, |
stdres |
the vector of (raw) residuals, |
tr.model |
the transformed data after projection on H, |
fitted.values |
the vector of fitted values, |
vcov |
the covariance matrix of the coefficients, |
df.residual |
degrees of freedom of the residuals, |
model |
a data.frame containing the variables used for the estimation, |
call |
the call, |
indcoef |
the matrix of individual coefficients from separate time series regressions, |
r.squared |
numeric, the R squared. |
Giovanni Millo
Kapetanios G, Pesaran MH, Yamagata T (2011). “Panels with non-stationary multifactor error structures.” Journal of Econometrics, 160(2), 326–348.
Holly S, Pesaran MH, Yamagata T (2010). “A spatio-temporal model of house prices in the USA.” Journal of Econometrics, 158(1), 160–173.
data("Produc", package = "plm") ccepmod <- pcce(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model="p") summary(ccepmod) summary(ccepmod, vcov = vcovHC) # use argument vcov for robust std. errors ccemgmod <- pcce(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model="mg") summary(ccemgmod)
data("Produc", package = "plm") ccepmod <- pcce(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model="p") summary(ccepmod) summary(ccepmod, vcov = vcovHC) # use argument vcov for robust std. errors ccemgmod <- pcce(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model="mg") summary(ccemgmod)
Pesaran's CD or Breusch–Pagan's LM (local or global) tests for cross sectional dependence in panel models
pcdtest(x, ...) ## S3 method for class 'formula' pcdtest( x, data, index = NULL, model = NULL, test = c("cd", "sclm", "bcsclm", "lm", "rho", "absrho"), w = NULL, ... ) ## S3 method for class 'panelmodel' pcdtest( x, test = c("cd", "sclm", "bcsclm", "lm", "rho", "absrho"), w = NULL, ... ) ## S3 method for class 'pseries' pcdtest( x, test = c("cd", "sclm", "bcsclm", "lm", "rho", "absrho"), w = NULL, ... )
pcdtest(x, ...) ## S3 method for class 'formula' pcdtest( x, data, index = NULL, model = NULL, test = c("cd", "sclm", "bcsclm", "lm", "rho", "absrho"), w = NULL, ... ) ## S3 method for class 'panelmodel' pcdtest( x, test = c("cd", "sclm", "bcsclm", "lm", "rho", "absrho"), w = NULL, ... ) ## S3 method for class 'pseries' pcdtest( x, test = c("cd", "sclm", "bcsclm", "lm", "rho", "absrho"), w = NULL, ... )
x |
an object of class |
... |
further arguments to be passed on for model estimation to |
data |
a |
index |
an optional numerical index, if |
model |
an optional character string indicating which type of
model to estimate; if left to |
test |
the type of test statistic to be returned. One of
|
w |
either |
These tests are originally meant to use the residuals of separate
estimation of one time–series regression for each cross-sectional
unit in order to check for cross–sectional dependence (model = NULL
).
If a different model specification (model = "within"
, "random"
,
...) is assumed consistent, one can resort to its residuals for
testing (which is common, e.g., when the time dimension's length is
insufficient for estimating the heterogeneous model).
If the time
dimension is insufficient and model = NULL
, the function defaults
to estimation of a within
model and issues a warning. The main
argument of this function may be either a model of class
panelmodel
or a formula
and data frame
; in the second case,
unless model
is set to NULL
, all usual parameters relative to
the estimation of a plm
model may be passed on. The test is
compatible with any consistent panelmodel
for the data at hand,
with any specification of effect
(except for test = "bcsclm"
which
requires a within model with either individual or two-ways effect).
E.g., specifying effect = "time"
or effect = "twoways"
allows
to test for residual cross-sectional dependence after the introduction
of time fixed effects to account for common shocks.
A local version of either test can be computed by supplying a
proximity matrix (elements coercible to logical
) with argument
w
which provides information on whether any pair of individuals
are neighbours or not. If w
is supplied, only neighbouring pairs
will be used in computing the test; else, w
will default to
NULL
and all observations will be used. The matrix need not be
binary, so commonly used "row–standardized" matrices can be
employed as well. nb
objects from spdep must instead be
transformed into matrices by spdep's function nb2mat
before using.
The methods implemented are suitable also for unbalanced panels.
Pesaran's CD test (test="cd"
), Breusch and Pagan's LM test
(test="lm"
), and its scaled version (test="sclm"
) are all
described in Pesaran (2004) (and complemented by
Pesaran (2005)). The bias-corrected scaled test (test="bcsclm"
)
is due to (Baltagi et al. 2012) and only valid for
within models including the individual effect (it's unbalanced
version uses max(Tij) for T) in the bias-correction term).
Breusch and Pagan (1980) is the original source for
the LM test.
The test on a pseries
is the same as a test on a pooled
regression model of that variable on a constant, i.e.,
pcdtest(some_pseries)
is equivalent to pcdtest(plm(some_var ~ 1, data = some_pdata.frame, model = "pooling")
and also equivalent to
pcdtest(some_var ~ 1, data = some_data)
, where some_var
is
the variable name in the data which corresponds to some_pseries
.
An object of class "htest"
.
Baltagi BH, Feng Q, Kao C (2012). “A Lagrange Multiplier test for cross-sectional dependence in a fixed effects panel data model.” Journal of Econometrics, 170(1), 164 - 177. ISSN 0304-4076, https://doi.org/10.1016/j.jeconom.2012.04.004.
Breusch TS, Pagan AR (1980). “The Lagrange Multiplier Test and Its Applications to Model Specification in Econometrics.” Review of Economic Studies, 47, 239–253.
Pesaran MH (2004). “General Diagnostic Tests for Cross Section Dependence in Panels.” CESifo Working Paper Series, 1229.
Pesaran MH (2015). “Testing Weak Cross-Sectional Dependence in Large Panels.” Econometric Reviews, 34(6-10), 1089-1117. doi:10.1080/07474938.2014.956623, https://doi.org/10.1080/07474938.2014.956623.
data("Grunfeld", package = "plm") ## test on heterogeneous model (separate time series regressions) pcdtest(inv ~ value + capital, data = Grunfeld, index = c("firm", "year")) ## test on two-way fixed effects homogeneous model pcdtest(inv ~ value + capital, data = Grunfeld, model = "within", effect = "twoways", index = c("firm", "year")) ## test on panelmodel object g <- plm(inv ~ value + capital, data = Grunfeld, index = c("firm", "year")) pcdtest(g) ## scaled LM test pcdtest(g, test = "sclm") ## test on pseries pGrunfeld <- pdata.frame(Grunfeld) pcdtest(pGrunfeld$value) ## local test ## define neighbours for individual 2: 1, 3, 4, 5 in lower triangular matrix w <- matrix(0, ncol= 10, nrow=10) w[2,1] <- w[3,2] <- w[4,2] <- w[5,2] <- 1 pcdtest(g, w = w)
data("Grunfeld", package = "plm") ## test on heterogeneous model (separate time series regressions) pcdtest(inv ~ value + capital, data = Grunfeld, index = c("firm", "year")) ## test on two-way fixed effects homogeneous model pcdtest(inv ~ value + capital, data = Grunfeld, model = "within", effect = "twoways", index = c("firm", "year")) ## test on panelmodel object g <- plm(inv ~ value + capital, data = Grunfeld, index = c("firm", "year")) pcdtest(g) ## scaled LM test pcdtest(g, test = "sclm") ## test on pseries pGrunfeld <- pdata.frame(Grunfeld) pcdtest(pGrunfeld$value) ## local test ## define neighbours for individual 2: 1, 3, 4, 5 in lower triangular matrix w <- matrix(0, ncol= 10, nrow=10) w[2,1] <- w[3,2] <- w[4,2] <- w[5,2] <- 1 pcdtest(g, w = w)
An object of class 'pdata.frame' is a data.frame with an index attribute that describes its individual and time dimensions.
pdata.frame( x, index = NULL, drop.index = FALSE, row.names = TRUE, stringsAsFactors = FALSE, replace.non.finite = FALSE, drop.NA.series = FALSE, drop.const.series = FALSE, drop.unused.levels = FALSE, ... ) ## S3 replacement method for class 'pdata.frame' x$name <- value ## S3 method for class 'pdata.frame' x[i, j, drop] ## S3 method for class 'pdata.frame' x[[y]] ## S3 method for class 'pdata.frame' x$y ## S3 method for class 'pdata.frame' print(x, ...) ## S3 method for class 'pdata.frame' as.list(x, keep.attributes = FALSE, ...) ## S3 method for class 'pdata.frame' as.data.frame( x, row.names = NULL, optional = FALSE, keep.attributes = TRUE, ... )
pdata.frame( x, index = NULL, drop.index = FALSE, row.names = TRUE, stringsAsFactors = FALSE, replace.non.finite = FALSE, drop.NA.series = FALSE, drop.const.series = FALSE, drop.unused.levels = FALSE, ... ) ## S3 replacement method for class 'pdata.frame' x$name <- value ## S3 method for class 'pdata.frame' x[i, j, drop] ## S3 method for class 'pdata.frame' x[[y]] ## S3 method for class 'pdata.frame' x$y ## S3 method for class 'pdata.frame' print(x, ...) ## S3 method for class 'pdata.frame' as.list(x, keep.attributes = FALSE, ...) ## S3 method for class 'pdata.frame' as.data.frame( x, row.names = NULL, optional = FALSE, keep.attributes = TRUE, ... )
x |
a |
index |
this argument indicates the individual and time indexes. See Details, |
drop.index |
logical, indicates whether the indexes are to be excluded from the resulting pdata.frame, |
row.names |
|
stringsAsFactors |
logical, indicating whether character vectors are to be converted to factors, |
replace.non.finite |
logical, indicating whether values for
which |
drop.NA.series |
logical, indicating whether all- |
drop.const.series |
logical, indicating whether constant
columns are to be removed from the pdata.frame (defaults to
|
drop.unused.levels |
logical, indicating whether unused levels
of factors are to be dropped (defaults to |
... |
further arguments passed on to internal usage of |
name |
the name of the |
value |
the name of the variable to include, |
i |
see |
j |
see |
drop |
see |
y |
one of the columns of the |
keep.attributes |
logical, only for as.list and as.data.frame methods, indicating whether the elements of the returned list/columns of the data.frame should have the pdata.frame's attributes added (default: FALSE for as.list, TRUE for as.data.frame), |
optional |
see |
The index
argument indicates the dimensions of the panel. It can
be:
a vector of two character strings which contains the names of the individual and of the time indexes,
a character string which is the name of the individual index variable. In this case, the time index is created automatically and a new variable called "time" is added, assuming consecutive and ascending time periods in the order of the original data,
an integer, the number of individuals. In this case, the data need to be a balanced panel and be organized as a stacked time series (successive blocks of individuals, each block being a time series for the respective individual) assuming consecutive and ascending time periods in the order of the original data. Two new variables are added: "id" and "time" which contain the individual and the time indexes.
The "[["
and "$"
extract a series from the pdata.frame
. The
"index"
attribute is then added to the series and a class
attribute "pseries"
is added. The "["
method behaves as for
data.frame
, except that the extraction is also applied to the
index
attribute. A safe way to extract the index attribute is to
use the function index()
for 'pdata.frames' (and other objects).
as.data.frame
removes the index attribute from the pdata.frame
and adds it to each column. For its argument row.names
set to
FALSE
row names are an integer series, TRUE
gives "fancy" row
names; if a character (with length of the resulting data frame),
the row names will be the character's elements.
as.list
behaves by default identical to
base::as.list.data.frame()
which means it drops the
attributes specific to a pdata.frame; if a list of pseries is
wanted, the attribute keep.attributes
can to be set to
TRUE
. This also makes lapply
work as expected on a pdata.frame
(see also Examples).
a pdata.frame
object: this is a data.frame
with an
index
attribute which is a data.frame
with two variables,
the individual and the time indexes, both being factors. The
resulting pdata.frame is sorted by the individual index, then
by the time index.
Yves Croissant
index()
to extract the index variables from a
'pdata.frame' (and other objects), pdim()
to check the
dimensions of a 'pdata.frame' (and other objects), pvar()
to
check for each variable if it varies cross-sectionally and over
time. To check if the time periods are consecutive per
individual, see is.pconsecutive()
.
# Gasoline contains two variables which are individual and time # indexes data("Gasoline", package = "plm") Gas <- pdata.frame(Gasoline, index = c("country", "year"), drop.index = TRUE) # Hedonic is an unbalanced panel, townid is the individual index data("Hedonic", package = "plm") Hed <- pdata.frame(Hedonic, index = "townid", row.names = FALSE) # In case of balanced panel, it is sufficient to give number of # individuals data set 'Wages' is organized as a stacked time # series data("Wages", package = "plm") Wag <- pdata.frame(Wages, 595) # lapply on a pdata.frame by making it a list of pseries first lapply(as.list(Wag[ , c("ed", "lwage")], keep.attributes = TRUE), lag)
# Gasoline contains two variables which are individual and time # indexes data("Gasoline", package = "plm") Gas <- pdata.frame(Gasoline, index = c("country", "year"), drop.index = TRUE) # Hedonic is an unbalanced panel, townid is the individual index data("Hedonic", package = "plm") Hed <- pdata.frame(Hedonic, index = "townid", row.names = FALSE) # In case of balanced panel, it is sufficient to give number of # individuals data set 'Wages' is organized as a stacked time # series data("Wages", package = "plm") Wag <- pdata.frame(Wages, 595) # lapply on a pdata.frame by making it a list of pseries first lapply(as.list(Wag[ , c("ed", "lwage")], keep.attributes = TRUE), lag)
This function checks the number of individuals and time observations in the panel and whether it is balanced or not.
pdim(x, ...) ## Default S3 method: pdim(x, y, ...) ## S3 method for class 'data.frame' pdim(x, index = NULL, ...) ## S3 method for class 'pdata.frame' pdim(x, ...) ## S3 method for class 'pseries' pdim(x, ...) ## S3 method for class 'pggls' pdim(x, ...) ## S3 method for class 'pcce' pdim(x, ...) ## S3 method for class 'pmg' pdim(x, ...) ## S3 method for class 'pgmm' pdim(x, ...) ## S3 method for class 'panelmodel' pdim(x, ...) ## S3 method for class 'pdim' print(x, ...)
pdim(x, ...) ## Default S3 method: pdim(x, y, ...) ## S3 method for class 'data.frame' pdim(x, index = NULL, ...) ## S3 method for class 'pdata.frame' pdim(x, ...) ## S3 method for class 'pseries' pdim(x, ...) ## S3 method for class 'pggls' pdim(x, ...) ## S3 method for class 'pcce' pdim(x, ...) ## S3 method for class 'pmg' pdim(x, ...) ## S3 method for class 'pgmm' pdim(x, ...) ## S3 method for class 'panelmodel' pdim(x, ...) ## S3 method for class 'pdim' print(x, ...)
x |
a |
... |
further arguments. |
y |
a vector, |
index |
see |
pdim
is called by the estimation functions and can be also used
stand-alone.
An object of class pdim
containing the following
elements:
nT |
a list containing |
Tint |
a list containing two vectors (of type integer): |
balanced |
a logical value: |
panel.names |
a list of character vectors: |
Calling pdim
on an estimated panelmodel
object
and on the corresponding (p)data.frame
used for this
estimation does not necessarily yield the same result. When
called on an estimated panelmodel
, the number of
observations (individual, time) actually used for model
estimation are taken into account. When called on a
(p)data.frame
, the rows in the (p)data.frame
are
considered, disregarding any NA
values in the dependent or
independent variable(s) which would be dropped during model
estimation.
Yves Croissant
is.pbalanced()
to just determine balancedness
of data (slightly faster than pdim
),punbalancedness()
for measures of
unbalancedness,nobs()
,
pdata.frame()
,pvar()
to check for
each variable if it varies cross-sectionally and over time.
# There are 595 individuals data("Wages", package = "plm") pdim(Wages, 595) # Gasoline contains two variables which are individual and time # indexes and are the first two variables data("Gasoline", package="plm") pdim(Gasoline) # Hedonic is an unbalanced panel, townid is the individual index data("Hedonic", package = "plm") pdim(Hedonic, "townid") # An example of the panelmodel method data("Produc", package = "plm") z <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp,data=Produc, model="random", subset = gsp > 5000) pdim(z)
# There are 595 individuals data("Wages", package = "plm") pdim(Wages, 595) # Gasoline contains two variables which are individual and time # indexes and are the first two variables data("Gasoline", package="plm") pdim(Gasoline) # Hedonic is an unbalanced panel, townid is the individual index data("Hedonic", package = "plm") pdim(Hedonic, "townid") # An example of the panelmodel method data("Produc", package = "plm") z <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp,data=Produc, model="random", subset = gsp > 5000) pdim(z)
Test of serial correlation for (the idiosyncratic component of) the errors in panel models.
pdwtest(x, ...) ## S3 method for class 'panelmodel' pdwtest(x, ...) ## S3 method for class 'formula' pdwtest(x, data, ...)
pdwtest(x, ...) ## S3 method for class 'panelmodel' pdwtest(x, ...) ## S3 method for class 'formula' pdwtest(x, data, ...)
x |
an object of class |
... |
further arguments to be passed on to |
data |
a |
This Durbin–Watson test uses the auxiliary model on
(quasi-)demeaned data taken from a model of class plm
which may
be a pooling
(the default), random
or within
model. It
performs a Durbin–Watson test (using dwtest
from package
lmtest on the residuals of the (quasi-)demeaned model,
which should be serially uncorrelated under the null of no serial
correlation in idiosyncratic errors. The function takes the
demeaned data, estimates the model and calls dwtest
. Thus, this
test does not take the panel structure of the residuals into
consideration; it shall not be confused with the generalized
Durbin-Watson test for panels in pbnftest
.
An object of class "htest"
.
Giovanni Millo
Durbin J, Watson GS (1950). “Testing for Serial Correlation in Least Squares Regression: I.” Biometrika, 37(3/4), 409–428. ISSN 00063444.
Durbin J, Watson GS (1951). “Testing for serial correlation in least squares regression. II.” Biometrika, 38(1-2), 159-178. ISSN 0006-3444, doi:10.1093/biomet/38.1-2.159.
Durbin J, Watson GS (1971). “Testing for Serial Correlation in Least Squares Regression. III.” Biometrika, 58(1), 1–19. ISSN 00063444.
Wooldridge JM (2002). Econometric Analysis of Cross–Section and Panel Data. MIT Press.
Wooldridge JM (2010). Econometric Analysis of Cross–Section and Panel Data, 2nd edition. MIT Press.
lmtest::dwtest()
for the Durbin–Watson test
in lmtest, pbgtest()
for the analogous
Breusch–Godfrey test for panel models,
lmtest::bgtest()
for the Breusch–Godfrey test for
serial correlation in the linear model. pbltest()
,
pbsytest()
, pwartest()
and
pwfdtest()
for other serial correlation tests for
panel models.
For the Durbin-Watson test generalized to panel data models see
pbnftest()
.
data("Grunfeld", package = "plm") g <- plm(inv ~ value + capital, data = Grunfeld, model="random") pdwtest(g) pdwtest(g, alternative="two.sided") ## formula interface pdwtest(inv ~ value + capital, data=Grunfeld, model="random")
data("Grunfeld", package = "plm") g <- plm(inv ~ value + capital, data = Grunfeld, model="random") pdwtest(g) pdwtest(g, alternative="two.sided") ## formula interface pdwtest(inv ~ value + capital, data=Grunfeld, model="random")
Test of individual and/or time effects based on the comparison of the
within
and the pooling
model.
pFtest(x, ...) ## S3 method for class 'formula' pFtest(x, data, ...) ## S3 method for class 'plm' pFtest(x, z, ...)
pFtest(x, ...) ## S3 method for class 'formula' pFtest(x, data, ...) ## S3 method for class 'plm' pFtest(x, z, ...)
x |
an object of class |
... |
further arguments. |
data |
a |
z |
an object of class |
For the plm
method, the argument of this function is two plm
objects, the first being a within model, the second a pooling
model. The effects tested are either individual, time or twoways,
depending on the effects introduced in the within model.
An object of class "htest"
.
Yves Croissant
plmtest()
for Lagrange multiplier tests of individuals
and/or time effects.
data("Grunfeld", package="plm") gp <- plm(inv ~ value + capital, data = Grunfeld, model = "pooling") gi <- plm(inv ~ value + capital, data = Grunfeld, effect = "individual", model = "within") gt <- plm(inv ~ value + capital, data = Grunfeld, effect = "time", model = "within") gd <- plm(inv ~ value + capital, data = Grunfeld, effect = "twoways", model = "within") pFtest(gi, gp) pFtest(gt, gp) pFtest(gd, gp) pFtest(inv ~ value + capital, data = Grunfeld, effect = "twoways")
data("Grunfeld", package="plm") gp <- plm(inv ~ value + capital, data = Grunfeld, model = "pooling") gi <- plm(inv ~ value + capital, data = Grunfeld, effect = "individual", model = "within") gt <- plm(inv ~ value + capital, data = Grunfeld, effect = "time", model = "within") gd <- plm(inv ~ value + capital, data = Grunfeld, effect = "twoways", model = "within") pFtest(gi, gp) pFtest(gt, gp) pFtest(gd, gp) pFtest(inv ~ value + capital, data = Grunfeld, effect = "twoways")
General FGLS estimators for panel data (balanced or unbalanced)
pggls( formula, data, subset, na.action, effect = c("individual", "time"), model = c("within", "pooling", "fd"), index = NULL, ... ) ## S3 method for class 'pggls' summary(object, ...) ## S3 method for class 'summary.pggls' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'pggls' residuals(object, ...)
pggls( formula, data, subset, na.action, effect = c("individual", "time"), model = c("within", "pooling", "fd"), index = NULL, ... ) ## S3 method for class 'pggls' summary(object, ...) ## S3 method for class 'summary.pggls' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'pggls' residuals(object, ...)
formula |
a symbolic description of the model to be estimated, |
data |
a |
subset |
see |
na.action |
see |
effect |
the effects introduced in the model, one of
|
model |
one of |
index |
the indexes, see |
... |
further arguments. |
object , x
|
an object of class |
digits |
digits, |
width |
the maximum length of the lines in the print output, |
pggls
is a function for the estimation of linear panel models by
general feasible generalized least squares, either with or without
fixed effects. General FGLS is based on a two-step estimation
process: first a model is estimated by OLS (model = "pooling"
),
fixed effects (model = "within"
) or first differences
(model = "fd"
), then its residuals are used to estimate an error
covariance matrix for use in a feasible-GLS analysis. This framework allows
the error covariance structure inside every group
(if effect = "individual"
, else symmetric) of observations to be fully
unrestricted and is therefore robust against any type of intragroup
heteroskedasticity and serial correlation. Conversely, this
structure is assumed identical across groups and thus general FGLS
estimation is inefficient under groupwise heteroskedasticity. Note
also that this method requires estimation of
variance parameters, thus efficiency requires N >> T
(if
effect = "individual"
, else the opposite).
If model = "within"
(the default) then a FEGLS (fixed effects GLS, see
Wooldridge, Ch. 10.5) is estimated; if model = "fd"
a FDGLS
(first-difference GLS). Setting model = "pooling"
produces an unrestricted
FGLS model (see ibid.) (model = "random"
does the same, but using this value
is deprecated and included only for retro–compatibility reasons).
An object of class c("pggls","panelmodel")
containing:
coefficients |
the vector of coefficients, |
residuals |
the vector of residuals, |
fitted.values |
the vector of fitted values, |
vcov |
the covariance matrix of the coefficients, |
df.residual |
degrees of freedom of the residuals, |
model |
a data.frame containing the variables used for the estimation, |
call |
the call, |
sigma |
the estimated intragroup (or cross-sectional, if
|
Giovanni Millo
Im KS, Ahn SC, Schmidt P, Wooldridge JM (1999). “Efficient estimation of panel data models with strictly exogenous explanatory variables.” Journal of Econometrics, 93(1), 177 - 201. ISSN 0304-4076, https://doi.org/10.1016/S0304-4076(99)00008-1.
Kiefer NM (1980). “Estimation of fixed effect models for time series of cross-sections with arbitrary intertemporal covariance.” Journal of Econometrics, 14(2), 195–202.
Wooldridge JM (2002). Econometric Analysis of Cross–Section and Panel Data. MIT Press.
Wooldridge JM (2010). Econometric Analysis of Cross–Section and Panel Data, 2nd edition. MIT Press.
data("Produc", package = "plm") zz_wi <- pggls(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "within") summary(zz_wi) zz_pool <- pggls(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "pooling") summary(zz_pool) zz_fd <- pggls(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "fd") summary(zz_fd)
data("Produc", package = "plm") zz_wi <- pggls(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "within") summary(zz_wi) zz_pool <- pggls(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "pooling") summary(zz_pool) zz_fd <- pggls(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "fd") summary(zz_fd)
Generalized method of moments estimation for static or dynamic models with panel data.
pgmm( formula, data, subset, na.action, effect = c("twoways", "individual"), model = c("onestep", "twosteps"), collapse = FALSE, lost.ts = NULL, transformation = c("d", "ld"), fsm = switch(transformation, d = "G", ld = "full"), index = NULL, ... ) ## S3 method for class 'pgmm' coef(object, ...) ## S3 method for class 'pgmm' summary(object, robust = TRUE, time.dummies = FALSE, ...) ## S3 method for class 'summary.pgmm' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... )
pgmm( formula, data, subset, na.action, effect = c("twoways", "individual"), model = c("onestep", "twosteps"), collapse = FALSE, lost.ts = NULL, transformation = c("d", "ld"), fsm = switch(transformation, d = "G", ld = "full"), index = NULL, ... ) ## S3 method for class 'pgmm' coef(object, ...) ## S3 method for class 'pgmm' summary(object, robust = TRUE, time.dummies = FALSE, ...) ## S3 method for class 'summary.pgmm' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... )
formula |
a symbolic description for the model to be estimated. The preferred interface is now to indicate a multi–part formula, the first two parts describing the covariates and the GMM instruments and, if any, the third part the 'normal' instruments, |
data |
a |
subset |
see |
na.action |
see |
effect |
the effects introduced in the model, one of
|
model |
one of |
collapse |
if |
lost.ts |
the number of lost time series: if |
transformation |
the kind of transformation to apply to the
model: either |
fsm |
character of length 1 to specify type of weighting matrix
for the first step /the |
index |
the indexes, |
... |
further arguments. |
object , x
|
an object of class |
robust |
for pgmm's summary method: if |
time.dummies |
for pgmm's summary method: if |
digits |
digits, |
width |
the maximum length of the lines in the print output. |
pgmm
estimates a model for panel data with a generalized method
of moments (GMM) estimator. The description of the model to
estimate is provided with a multi–part formula which is (or which
is coerced to) a Formula
object. The first right–hand side part
describes the covariates. The second one, which is mandatory,
describes the GMM instruments. The third one, which is optional,
describes the 'normal' instruments. By default, all the variables
of the model which are not used as GMM instruments are used as
normal instruments with the same lag structure as the one specified
in the model.
y~lag(y, 1:2)+lag(x1, 0:1)+lag(x2, 0:2) | lag(y, 2:99)
is similar to
y~lag(y, 1:2)+lag(x1, 0:1)+lag(x2, 0:2) | lag(y, 2:99) | lag(x1, 0:1)+lag(x2, 0:2)
and indicates that all lags from 2 of y
are used
as GMM instruments.
transformation
indicates how the model should be transformed for
the estimation. "d"
gives the "difference GMM" model
(see Arellano and Bond 1991), "ld"
the "system GMM" model
(see Blundell and Bond 1998).
pgmm
is an attempt to adapt GMM estimators available within the
DPD library for GAUSS (see Arellano and Bond 1998) and Ox
(see Arellano and Bond 2012) and within the xtabond2
library for Stata (see Roodman 2009).
An object of class c("pgmm","panelmodel")
, which has the
following elements:
coefficients |
the vector (or the list for fixed effects) of coefficients, |
residuals |
the list of residuals for each individual, |
vcov |
the covariance matrix of the coefficients, |
fitted.values |
the vector of fitted values, |
df.residual |
degrees of freedom of the residuals, |
model |
a list containing the variables used for the estimation for each individual, |
W |
a list containing the instruments for each individual (a matrix per list element) (two lists in case of system GMM, |
A1 |
the weighting matrix for the one–step estimator, |
A2 |
the weighting matrix for the two–steps estimator, |
call |
the call. |
It has print
, summary
and print.summary
methods.
Yves Croissant
Arellano M, Bond S (1991).
“Some Tests of Specification for Panel Data : Monte Carlo Evidence and an Application to Employment Equations.”
Review of Economic Studies, 58, 277–297.
Arellano M, Bond S (1998).
“Dynamic panel data estimation using DPD98 for GAUSS: a guide for users.”
unpublished, https://ifs.org.uk/publications/dpd-gauss.
Arellano M, Bond S (2012).
“Panel data estimation using DPD for Ox.”
unpublished, https://www.doornik.com/download/oxmetrics7/Ox_Packages/dpd.pdf.
Blundell R, Bond S (1998).
“Initial Conditions and Moment Restrictions in Dynamic Panel Data Models.”
Journal of Econometrics, 87, 115–143.
Roodman D (2009).
“How to do xtabond2: An introduction to difference and system GMM in Stata.”
The Stata Journal, 9, 86-136.
https://www.stata-journal.com/article.html?article=st0159.
Windmeijer F (2005).
“A Finite Sample Correction for the Variance of Linear Efficient Two–Steps GMM Estimators.”
Journal of Econometrics, 126, 25–51.
sargan()
for the Hansen–Sargan test and mtest()
for
Arellano–Bond's test of serial correlation. vcovHC.pgmm for the robust
inference.
data("EmplUK", package = "plm") # Arellano/Bond 1991, Table 4, column (a1) (has robust SEs) ab.a1 <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "onestep") summary(ab.a1, robust = TRUE) # Arellano/Bond 1991, Table 4, column (a2) (has non-robust SEs) ab.a2 <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "twosteps") summary(ab.a2, robust = FALSE) # Arellano and Bond (1991), table 4 col. b / # Windmeijer (2005), table 2, std. errc ab.b <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + log(capital) + lag(log(output), 0:1) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "twosteps") summary(ab.b, robust = FALSE) # Arellano/Bond summary(ab.b, robust = TRUE) # Windmeijer ## Blundell and Bond (1998) table 4 (cf. DPD for OX p. 12 col. 4) bb.4 <- pgmm(log(emp) ~ lag(log(emp), 1)+ lag(log(wage), 0:1) + lag(log(capital), 0:1) | lag(log(emp), 2:99) + lag(log(wage), 2:99) + lag(log(capital), 2:99), data = EmplUK, effect = "twoways", model = "onestep", transformation = "ld") summary(bb.4, robust = TRUE) ## Not run: ## Same with the old formula or dynformula interface ## Arellano and Bond (1991), table 4, col. b ab.b <- pgmm(log(emp) ~ log(wage) + log(capital) + log(output), lag.form = list(2,1,0,1), data = EmplUK, effect = "twoways", model = "twosteps", gmm.inst = ~log(emp), lag.gmm = list(c(2,99))) summary(ab.b, robust = FALSE) ## Blundell and Bond (1998) table 4 (cf. DPD for OX p. 12 col. 4) bb.4 <- pgmm(dynformula(log(emp) ~ log(wage) + log(capital), list(1,1,1)), data = EmplUK, effect = "twoways", model = "onestep", gmm.inst = ~log(emp) + log(wage) + log(capital), lag.gmm = c(2,99), transformation = "ld") summary(bb.4, robust = TRUE) ## End(Not run)
data("EmplUK", package = "plm") # Arellano/Bond 1991, Table 4, column (a1) (has robust SEs) ab.a1 <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "onestep") summary(ab.a1, robust = TRUE) # Arellano/Bond 1991, Table 4, column (a2) (has non-robust SEs) ab.a2 <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "twosteps") summary(ab.a2, robust = FALSE) # Arellano and Bond (1991), table 4 col. b / # Windmeijer (2005), table 2, std. errc ab.b <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + log(capital) + lag(log(output), 0:1) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "twosteps") summary(ab.b, robust = FALSE) # Arellano/Bond summary(ab.b, robust = TRUE) # Windmeijer ## Blundell and Bond (1998) table 4 (cf. DPD for OX p. 12 col. 4) bb.4 <- pgmm(log(emp) ~ lag(log(emp), 1)+ lag(log(wage), 0:1) + lag(log(capital), 0:1) | lag(log(emp), 2:99) + lag(log(wage), 2:99) + lag(log(capital), 2:99), data = EmplUK, effect = "twoways", model = "onestep", transformation = "ld") summary(bb.4, robust = TRUE) ## Not run: ## Same with the old formula or dynformula interface ## Arellano and Bond (1991), table 4, col. b ab.b <- pgmm(log(emp) ~ log(wage) + log(capital) + log(output), lag.form = list(2,1,0,1), data = EmplUK, effect = "twoways", model = "twosteps", gmm.inst = ~log(emp), lag.gmm = list(c(2,99))) summary(ab.b, robust = FALSE) ## Blundell and Bond (1998) table 4 (cf. DPD for OX p. 12 col. 4) bb.4 <- pgmm(dynformula(log(emp) ~ log(wage) + log(capital), list(1,1,1)), data = EmplUK, effect = "twoways", model = "onestep", gmm.inst = ~log(emp) + log(wage) + log(capital), lag.gmm = c(2,99), transformation = "ld") summary(bb.4, robust = TRUE) ## End(Not run)
Test for Granger (non-)causality in panel data.
pgrangertest( formula, data, test = c("Ztilde", "Zbar", "Wbar"), order = 1L, index = NULL )
pgrangertest( formula, data, test = c("Ztilde", "Zbar", "Wbar"), order = 1L, index = NULL )
formula |
a |
data |
a |
test |
a character to request the statistic to be returned,
either |
order |
integer(s) giving the number of lags to include in the test's auxiliary regressions, the length of order must be either 1 (same lag order for all individuals) or equal to the number of individuals (to specify a lag order per individual), |
index |
only relevant if |
The panel Granger (non-)causality test is a combination of Granger tests (Granger 1969) performed per individual. The test is developed by Dumitrescu and Hurlin (2012), a shorter exposition is given in Lopez and Weber (2017).
The formula formula
describes the direction of the (panel) Granger
causation where y ~ x
means "x (panel) Granger causes y".
By setting argument test
to either "Ztilde"
(default) or
"Zbar"
, two different statistics can be requested. "Ztilde"
gives the standardised statistic recommended by Dumitrescu/Hurlin (2012) for
fixed T samples. If set to "Wbar"
, the intermediate Wbar statistic
(average of individual Granger chi-square statistics) is given which is used
to derive the other two.
The Zbar statistic is not suitable for unbalanced panels. For the Wbar statistic, no p-value is available.
The implementation uses lmtest::grangertest()
from
package lmtest to perform the individual Granger tests.
An object of class c("pgrangertest", "htest")
. Besides
the usual elements of a htest
object, it contains the data
frame indgranger
which carries the Granger test statistics
per individual along the associated p-values, degrees of
freedom, and the specified lag order.
Kevin Tappe
Dumitrescu E, Hurlin C (2012). “Testing for Granger non-causality in heterogeneous panels.” Economic Modelling, 29(4), 1450 - 1460. ISSN 0264-9993, https://doi.org/10.1016/j.econmod.2012.02.014.
Granger CWJ (1969). “Investigating Causal Relations by Econometric Models and Cross-spectral Methods.” Econometrica, 37(3), 424–438. ISSN 00129682, 14680262.
Lopez L, Weber S (2017). “Testing for Granger causality in panel data.” Stata Journal, 17(4), 972-984. https://www.stata-journal.com/article.html?article=st0507.
lmtest::grangertest()
for the original (non-panel)
Granger causality test in lmtest.
## not meaningful, just to demonstrate usage ## H0: 'value' does not Granger cause 'inv' for all invididuals data("Grunfeld", package = "plm") pgrangertest(inv ~ value, data = Grunfeld) pgrangertest(inv ~ value, data = Grunfeld, order = 2L) pgrangertest(inv ~ value, data = Grunfeld, order = 2L, test = "Zbar") # varying lag order (last individual lag order 3, others lag order 2) (pgrt <- pgrangertest(inv ~ value, data = Grunfeld, order = c(rep(2L, 9), 3L))) # chisq statistics per individual pgrt$indgranger
## not meaningful, just to demonstrate usage ## H0: 'value' does not Granger cause 'inv' for all invididuals data("Grunfeld", package = "plm") pgrangertest(inv ~ value, data = Grunfeld) pgrangertest(inv ~ value, data = Grunfeld, order = 2L) pgrangertest(inv ~ value, data = Grunfeld, order = 2L, test = "Zbar") # varying lag order (last individual lag order 3, others lag order 2) (pgrt <- pgrangertest(inv ~ value, data = Grunfeld, order = c(rep(2L, 9), 3L))) # chisq statistics per individual pgrt$indgranger
Simes' test of intersection of individual hypothesis tests (Simes (1986)) applied to panel unit root tests as suggested by Hanck (2013).
phansitest(object, alpha = 0.05) ## S3 method for class 'phansitest' print(x, cutoff = 10L, ...)
phansitest(object, alpha = 0.05) ## S3 method for class 'phansitest' print(x, cutoff = 10L, ...)
object |
either a numeric containing p-values of individual unit root
test results (does not need to be sorted) or a suitable |
alpha |
numeric, the pre-specified significance level (defaults to |
x |
an object of class |
cutoff |
integer, cutoff value for printing of enumeration of individuals with rejected individual H0, for print method only, |
... |
further arguments (currently not used). |
Simes' approach to testing is combining p-values from single hypothesis tests with a global (intersected) hypothesis. Hanck (2013) mentions it can be applied to any panel unit root test which yields a p-value for each individual series. The test is robust versus general patterns of cross-sectional dependence.
Further, this approach allows to discriminate between individuals for which
the individual H0 (unit root present for individual series) is rejected/is
not rejected by Hommel's procedure (Hommel (1988)) for
family-wise error rate control (FWER) at a pre-specified significance level
via argument
alpha
(defaulting to 0.05
), i.e., it controls
for the multiplicity in testing.
The function phansitest
takes as main input object
either a plain numeric
containing p-values of individual tests or a purtest
object which holds
a suitable pre-computed panel unit root test (one that produces p-values per
individual series).
The function's return value (see section Value) is a list with detailed evaluation of the applied Simes test.
The associated print
method prints a verbal evaluation.
For phansitest
, an object of class c("phansitest", "list")
which
is a list with the elements:
id
: integer, the identifier of the individual (integer sequence referring to
position in input),
name
: character, name of the input's individual (if it has a name,
otherwise "1", "2", "3", ...),
p
: numeric, p-values as input (either the numeric or extracted from
the purtest object),
p.hommel
: numeric, p-values after Hommel's transformation,
rejected
: logical, indicating for which individual the individual null
hypothesis is rejected (TRUE
)/non-rejected (FALSE
) (after controlling
for multiplicity),
rejected.no
: integer, giving the total number of rejected individual series,
alpha
: numeric, the input alpha
.
Kevin Tappe
Hanck C (2013).
“An Intersection Test for Panel Unit Roots.”
Econometric Reviews, 32, 183-203.
Hommel G (1988).
“A stage wise rejective multiple test procedure based on a modified Bonferroni test.”
Biometrika, 75, 383-386.
Simes RJ (1986).
“An improved Bonferroni procedure for multiple tests of significance.”
Biometrika, 73, 751-754.
### input is numeric (p-values) #### example from Hanck (2013), Table 11 (left side) pvals <- c(0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0050,0.0050,0.0050, 0.0050,0.0175,0.0175,0.0200,0.0250,0.0400,0.0500,0.0575,0.2375,0.2475) countries <- c("Argentina","Sweden","Norway","Mexico","Italy","Finland","France", "Germany","Belgium","U.K.","Brazil","Australia","Netherlands", "Portugal","Canada", "Spain","Denmark","Switzerland","Japan") names(pvals) <- countries h <- phansitest(pvals) print(h) # (explicitly) prints test's evaluation print(h, cutoff = 3L) # print only first 3 rejected ids h$rejected # logical indicating the individuals with rejected individual H0 ### input is a (suitable) purtest object data("Grunfeld", package = "plm") y <- data.frame(split(Grunfeld$inv, Grunfeld$firm)) obj <- purtest(y, pmax = 4, exo = "intercept", test = "madwu") phansitest(obj)
### input is numeric (p-values) #### example from Hanck (2013), Table 11 (left side) pvals <- c(0.0001,0.0001,0.0001,0.0001,0.0001,0.0001,0.0050,0.0050,0.0050, 0.0050,0.0175,0.0175,0.0200,0.0250,0.0400,0.0500,0.0575,0.2375,0.2475) countries <- c("Argentina","Sweden","Norway","Mexico","Italy","Finland","France", "Germany","Belgium","U.K.","Brazil","Australia","Netherlands", "Portugal","Canada", "Spain","Denmark","Switzerland","Japan") names(pvals) <- countries h <- phansitest(pvals) print(h) # (explicitly) prints test's evaluation print(h, cutoff = 3L) # print only first 3 rejected ids h$rejected # logical indicating the individuals with rejected individual H0 ### input is a (suitable) purtest object data("Grunfeld", package = "plm") y <- data.frame(split(Grunfeld$inv, Grunfeld$firm)) obj <- purtest(y, pmax = 4, exo = "intercept", test = "madwu") phansitest(obj)
The Hausman–Taylor estimator is an instrumental variable estimator without external instruments (function deprecated).
pht( formula, data, subset, na.action, model = c("ht", "am", "bms"), index = NULL, ... ) ## S3 method for class 'pht' summary(object, ...) ## S3 method for class 'summary.pht' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), subset = NULL, ... )
pht( formula, data, subset, na.action, model = c("ht", "am", "bms"), index = NULL, ... ) ## S3 method for class 'pht' summary(object, ...) ## S3 method for class 'summary.pht' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), subset = NULL, ... )
formula |
a symbolic description for the model to be estimated, |
data |
a |
subset |
see |
na.action |
see |
model |
one of |
index |
the indexes, |
... |
further arguments. |
object , x
|
an object of class |
digits |
digits, |
width |
the maximum length of the lines in the print output, |
pht
estimates panels models using the Hausman–Taylor estimator,
Amemiya–MaCurdy estimator, or Breusch–Mizon–Schmidt estimator, depending
on the argument model
. The model is specified as a two–part formula,
the second part containing the exogenous variables.
An object of class c("pht", "plm", "panelmodel")
.
A "pht"
object contains the same elements as plm
object, with a further argument called varlist
which
describes the typology of the variables. It has summary
and
print.summary
methods.
The function pht
is deprecated. Please use function plm
to estimate Taylor–Hausman models like this with a three-part
formula as shown in the example:plm(<formula>, random.method = "ht", model = "random", inst.method = "baltagi")
. The Amemiya–MaCurdy estimator and the
Breusch–Mizon–Schmidt estimator is computed likewise with
plm
.
Yves Croissant
(Amemiya and MaCurdy 1986)
(Baltagi 2013)
(Breusch et al. 1989)
(Hausman and Taylor 1981)
## replicates Baltagi (2005, 2013), table 7.4; Baltagi (2021), table 7.5 ## preferred way with plm() data("Wages", package = "plm") ht <- plm(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) + bluecol + ind + union + sex + black + ed | bluecol + south + smsa + ind + sex + black | wks + married + union + exp + I(exp ^ 2), data = Wages, index = 595, random.method = "ht", model = "random", inst.method = "baltagi") summary(ht) am <- plm(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) + bluecol + ind + union + sex + black + ed | bluecol + south + smsa + ind + sex + black | wks + married + union + exp + I(exp ^ 2), data = Wages, index = 595, random.method = "ht", model = "random", inst.method = "am") summary(am) ## deprecated way with pht() for HT #ht <- pht(lwage ~ wks + south + smsa + married + exp + I(exp^2) + # bluecol + ind + union + sex + black + ed | # sex + black + bluecol + south + smsa + ind, # data = Wages, model = "ht", index = 595) #summary(ht) # deprecated way with pht() for AM #am <- pht(lwage ~ wks + south + smsa + married + exp + I(exp^2) + # bluecol + ind + union + sex + black + ed | # sex + black + bluecol + south + smsa + ind, # data = Wages, model = "am", index = 595) #summary(am)
## replicates Baltagi (2005, 2013), table 7.4; Baltagi (2021), table 7.5 ## preferred way with plm() data("Wages", package = "plm") ht <- plm(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) + bluecol + ind + union + sex + black + ed | bluecol + south + smsa + ind + sex + black | wks + married + union + exp + I(exp ^ 2), data = Wages, index = 595, random.method = "ht", model = "random", inst.method = "baltagi") summary(ht) am <- plm(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) + bluecol + ind + union + sex + black + ed | bluecol + south + smsa + ind + sex + black | wks + married + union + exp + I(exp ^ 2), data = Wages, index = 595, random.method = "ht", model = "random", inst.method = "am") summary(am) ## deprecated way with pht() for HT #ht <- pht(lwage ~ wks + south + smsa + married + exp + I(exp^2) + # bluecol + ind + union + sex + black + ed | # sex + black + bluecol + south + smsa + ind, # data = Wages, model = "ht", index = 595) #summary(ht) # deprecated way with pht() for AM #am <- pht(lwage ~ wks + south + smsa + married + exp + I(exp^2) + # bluecol + ind + union + sex + black + ed | # sex + black + bluecol + south + smsa + ind, # data = Wages, model = "am", index = 595) #summary(am)
Specification test for panel models.
phtest(x, ...) ## S3 method for class 'formula' phtest( x, data, model = c("within", "random"), effect = c("individual", "time", "twoways"), method = c("chisq", "aux"), index = NULL, vcov = NULL, ... ) ## S3 method for class 'panelmodel' phtest(x, x2, ...)
phtest(x, ...) ## S3 method for class 'formula' phtest( x, data, model = c("within", "random"), effect = c("individual", "time", "twoways"), method = c("chisq", "aux"), index = NULL, vcov = NULL, ... ) ## S3 method for class 'panelmodel' phtest(x, x2, ...)
x |
an object of class |
... |
further arguments to be passed on (currently none). |
data |
a |
model |
a character vector containing the names of two models (length(model) must be 2), |
effect |
a character specifying the effect to be introduced to both models,
one of |
method |
one of |
index |
an optional vector of index variables, |
vcov |
an optional covariance function, |
x2 |
an object of class |
The Hausman test (sometimes also called Durbin–Wu–Hausman test)
is based on the difference of the vectors of coefficients of two
different models. The panelmodel
method computes the original
version of the test based on a quadratic form
(Hausman 1978). The formula
method, if
method = "chisq"
(default), computes the original version of the
test based on a quadratic form; if method ="aux"
then the
auxiliary-regression-based version as in Wooldridge (2010),
Sec.10.7.3. Only the latter can be robustified by specifying a robust
covariance estimator as a function through the argument vcov
(see
Examples).
The effect
argument is only relevant for the formula method/interface and
is then applied to both models. For the panelmodel method/interface, the test
is run with the effects of the already estimated models.
The equivalent tests in the one-way case using a between
model (either "within vs. between" or "random vs. between")
(see Hausman and Taylor 1981 or Baltagi 2013 Sec.4.3) can also
be performed by phtest
, but only for test = "chisq"
, not for
the regression-based test. NB: These equivalent tests using the
between model do not extend to the two-ways case. There are,
however, some other equivalent tests,
(see Kang 1985 or Baltagi 2013 Sec.4.3.7),
but those are unsupported by phtest
.
An object of class "htest"
.
Yves Croissant, Giovanni Millo
Hausman JA (1978). “Specification Tests in Econometrics.” Econometrica, 46, 1251–1271.
Hausman JA, Taylor WE (1981). “Panel Data and Unobservable Individual Effects.” Econometrica, 49, 1377–1398.
Kang S (1985). “A note on the equivalence of specification tests in the two-factor multivariate variance components model.” Journal of Econometrics, 28(2), 193 - 203. ISSN 0304-4076, https://doi.org/10.1016/0304-4076(85)90119-8.
Wooldridge JM (2010). Econometric Analysis of Cross–Section and Panel Data, 2nd edition. MIT Press.
Baltagi BH (2013). Econometric Analysis of Panel Data, 5th edition. John Wiley and Sons ltd.
data("Gasoline", package = "plm") form <- lgaspcar ~ lincomep + lrpmg + lcarpcap wi <- plm(form, data = Gasoline, model = "within") re <- plm(form, data = Gasoline, model = "random") phtest(wi, re) phtest(form, data = Gasoline) phtest(form, data = Gasoline, effect = "time") # Regression-based Hausman test phtest(form, data = Gasoline, method = "aux") # robust Hausman test with vcov supplied as a function and # with additional parameters phtest(form, data = Gasoline, method = "aux", vcov = vcovHC) phtest(form, data = Gasoline, method = "aux", vcov = function(x) vcovHC(x, method="white2", type="HC3"))
data("Gasoline", package = "plm") form <- lgaspcar ~ lincomep + lrpmg + lcarpcap wi <- plm(form, data = Gasoline, model = "within") re <- plm(form, data = Gasoline, model = "random") phtest(wi, re) phtest(form, data = Gasoline) phtest(form, data = Gasoline, effect = "time") # Regression-based Hausman test phtest(form, data = Gasoline, method = "aux") # robust Hausman test with vcov supplied as a function and # with additional parameters phtest(form, data = Gasoline, method = "aux", vcov = vcovHC) phtest(form, data = Gasoline, method = "aux", vcov = function(x) vcovHC(x, method="white2", type="HC3"))
General estimator useful for testing the within specification
piest(formula, data, subset, na.action, index = NULL, robust = TRUE, ...) ## S3 method for class 'piest' print(x, ...) ## S3 method for class 'piest' summary(object, ...) ## S3 method for class 'summary.piest' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), subset = NULL, ... )
piest(formula, data, subset, na.action, index = NULL, robust = TRUE, ...) ## S3 method for class 'piest' print(x, ...) ## S3 method for class 'piest' summary(object, ...) ## S3 method for class 'summary.piest' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), subset = NULL, ... )
formula |
a symbolic description for the model to be estimated, |
data |
a |
subset |
see |
na.action |
see |
index |
the indexes, |
robust |
logical, if |
... |
further arguments. |
object , x
|
an object of class |
digits |
number of digits for printed output, |
width |
the maximum length of the lines in the printed output, |
The Chamberlain method consists in using the covariates of all the periods as regressors. It allows to test the within specification.
An object of class "piest"
.
Yves Croissant
Chamberlain G (1982). “Multivariate regression models for panel data.” Journal of Econometrics, 18, 5–46.
data("RiceFarms", package = "plm") pirice <- piest(log(goutput) ~ log(seed) + log(totlabor) + log(size), RiceFarms, index = "id") summary(pirice)
data("RiceFarms", package = "plm") pirice <- piest(log(goutput) ~ log(seed) + log(totlabor) + log(size), RiceFarms, index = "id") summary(pirice)
Fixed and random effects estimators for truncated or censored limited dependent variable
pldv( formula, data, subset, weights, na.action, model = c("fd", "random", "pooling"), index = NULL, R = 20, start = NULL, lower = 0, upper = +Inf, objfun = c("lsq", "lad"), sample = c("cens", "trunc"), ... )
pldv( formula, data, subset, weights, na.action, model = c("fd", "random", "pooling"), index = NULL, R = 20, start = NULL, lower = 0, upper = +Inf, objfun = c("lsq", "lad"), sample = c("cens", "trunc"), ... )
formula |
a symbolic description for the model to be estimated, |
data |
a |
subset |
see |
weights |
see |
na.action |
see |
model |
one of |
index |
the indexes, see |
R |
the number of points for the gaussian quadrature, |
start |
a vector of starting values, |
lower |
the lower bound for the censored/truncated dependent variable, |
upper |
the upper bound for the censored/truncated dependent variable, |
objfun |
the objective function for the fixed effect model ( |
sample |
|
... |
further arguments. |
pldv
computes two kinds of models: a LSQ/LAD estimator for the
first-difference model (model = "fd"
) and a maximum likelihood estimator
with an assumed normal distribution for the individual effects
(model = "random"
or "pooling"
).
For maximum-likelihood estimations, pldv
uses internally function
maxLik::maxLik()
(from package maxLik).
For model = "fd"
, an object of class c("plm", "panelmodel")
, for
model = "random"
and model = "pooling"
an object of class c("maxLik", "maxim")
.
Yves Croissant
Honoré BE (1992). “Trimmed LAD and least squares estimation of truncated and censored regression models with fixed effects.” Econometrica, 60(3).
## as these examples take a bit of time, do not run them automatically ## Not run: data("Donors", package = "pder") library("plm") pDonors <- pdata.frame(Donors, index = "id") # replicate Landry/Lange/List/Price/Rupp (2010), online appendix, table 5a, models A and B modA <- pldv(donation ~ treatment + prcontr, data = pDonors, model = "random", method = "bfgs") summary(modA) modB <- pldv(donation ~ treatment * prcontr - prcontr, data = pDonors, model = "random", method = "bfgs") summary(modB) ## End(Not run)
## as these examples take a bit of time, do not run them automatically ## Not run: data("Donors", package = "pder") library("plm") pDonors <- pdata.frame(Donors, index = "id") # replicate Landry/Lange/List/Price/Rupp (2010), online appendix, table 5a, models A and B modA <- pldv(donation ~ treatment + prcontr, data = pDonors, model = "random", method = "bfgs") summary(modA) modB <- pldv(donation ~ treatment * prcontr - prcontr, data = pDonors, model = "random", method = "bfgs") summary(modB) ## End(Not run)
Linear models for panel data estimated using the lm
function on
transformed data.
plm( formula, data, subset, weights, na.action, effect = c("individual", "time", "twoways", "nested"), model = c("within", "random", "ht", "between", "pooling", "fd"), random.method = NULL, random.models = NULL, random.dfcor = NULL, inst.method = c("bvk", "baltagi", "am", "bms"), restrict.matrix = NULL, restrict.rhs = NULL, index = NULL, ... ) ## S3 method for class 'plm.list' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'panelmodel' terms(x, ...) ## S3 method for class 'panelmodel' vcov(object, ...) ## S3 method for class 'panelmodel' fitted(object, ...) ## S3 method for class 'panelmodel' residuals(object, ...) ## S3 method for class 'panelmodel' df.residual(object, ...) ## S3 method for class 'panelmodel' coef(object, ...) ## S3 method for class 'panelmodel' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'panelmodel' update(object, formula., ..., evaluate = TRUE) ## S3 method for class 'panelmodel' deviance(object, model = NULL, ...) ## S3 method for class 'plm' formula(x, ...) ## S3 method for class 'plm' plot( x, dx = 0.2, N = NULL, seed = 1, within = TRUE, pooling = TRUE, between = FALSE, random = FALSE, ... ) ## S3 method for class 'plm' residuals(object, model = NULL, effect = NULL, ...) ## S3 method for class 'plm' fitted(object, model = NULL, effect = NULL, ...)
plm( formula, data, subset, weights, na.action, effect = c("individual", "time", "twoways", "nested"), model = c("within", "random", "ht", "between", "pooling", "fd"), random.method = NULL, random.models = NULL, random.dfcor = NULL, inst.method = c("bvk", "baltagi", "am", "bms"), restrict.matrix = NULL, restrict.rhs = NULL, index = NULL, ... ) ## S3 method for class 'plm.list' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'panelmodel' terms(x, ...) ## S3 method for class 'panelmodel' vcov(object, ...) ## S3 method for class 'panelmodel' fitted(object, ...) ## S3 method for class 'panelmodel' residuals(object, ...) ## S3 method for class 'panelmodel' df.residual(object, ...) ## S3 method for class 'panelmodel' coef(object, ...) ## S3 method for class 'panelmodel' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'panelmodel' update(object, formula., ..., evaluate = TRUE) ## S3 method for class 'panelmodel' deviance(object, model = NULL, ...) ## S3 method for class 'plm' formula(x, ...) ## S3 method for class 'plm' plot( x, dx = 0.2, N = NULL, seed = 1, within = TRUE, pooling = TRUE, between = FALSE, random = FALSE, ... ) ## S3 method for class 'plm' residuals(object, model = NULL, effect = NULL, ...) ## S3 method for class 'plm' fitted(object, model = NULL, effect = NULL, ...)
formula |
a symbolic description for the model to be estimated, |
data |
a |
subset |
see |
weights |
see |
na.action |
see |
effect |
the effects introduced in the model, one of
|
model |
one of |
random.method |
method of estimation for the variance
components in the random effects model, one of |
random.models |
an alternative to the previous argument, the models used to compute the variance components estimations are indicated, |
random.dfcor |
a numeric vector of length 2 indicating which degree of freedom should be used, |
inst.method |
the instrumental variable transformation: one of
|
restrict.matrix |
a matrix which defines linear restrictions on the coefficients, |
restrict.rhs |
the right hand side vector of the linear restrictions on the coefficients, |
index |
the indexes, |
... |
further arguments. |
x , object
|
an object of class |
digits |
number of digits for printed output, |
width |
the maximum length of the lines in the printed output, |
formula. |
a new formula for the update method, |
evaluate |
a boolean for the update method, if |
dx |
the half–length of the individual lines for the plot method (relative to x range), |
N |
the number of individual to plot, |
seed |
the seed which will lead to individual selection, |
within |
if |
pooling |
if |
between |
if |
random |
if |
plm
is a general function for the estimation of linear panel
models. It supports the following estimation methods: pooled OLS
(model = "pooling"
), fixed effects ("within"
), random effects
("random"
), first–differences ("fd"
), and between
("between"
). It supports unbalanced panels and two–way effects
(although not with all methods).
For random effects models, four estimators of the transformation
parameter are available by setting random.method
to one of
"swar"
(Swamy and Arora 1972) (default), "amemiya"
(Amemiya 1971), "walhus"
(Wallace and Hussain 1969), or "nerlove"
(Nerlove 1971) (see below for Hausman-Taylor instrumental
variable case).
The nested random effect model ((Baltagi et al. 2001))
is estimated by setting model = "random"
and effect = "nested"
,
requiring the data to be indexed by a third index in which the "individual"
dimension is nested (see section Examples and the vignette
"Estimation of error components models with the plm function".)
For first–difference models, the intercept is maintained (which
from a specification viewpoint amounts to allowing for a trend in
the levels model). The user can exclude it from the estimated
specification the usual way by adding "-1"
to the model formula.
Instrumental variables estimation is obtained using two–part
formulas, the second part indicating the instrumental variables
used. This can be a complete list of instrumental variables or an
update of the first part. If, for example, the model is y ~ x1 + x2 + x3
, with x1
and x2
endogenous and z1
and z2
external
instruments, the model can be estimated with:
formula = y~x1+x2+x3 | x3+z1+z2
,
formula = y~x1+x2+x3 | . -x1-x2+z1+z2
.
If an instrument variable estimation is requested, argument
inst.method
selects the instrument variable transformation
method:
"bvk"
(default) for Balestra and Varadharajan–Krishnakumar (1987),
"baltagi"
for Baltagi (1981),
"am"
for Amemiya and MaCurdy (1986),
"bms"
for Breusch et al. (1989).
The Hausman–Taylor estimator (Hausman and Taylor 1981) is
computed with arguments random.method = "ht"
, model = "random"
,
inst.method = "baltagi"
(the other way with only model = "ht"
is deprecated).
See also the vignettes for introductions to model estimations (and more) with examples.
An object of class "plm"
.
A "plm"
object has the following elements :
coefficients |
the vector of coefficients, |
vcov |
the variance–covariance matrix of the coefficients, |
residuals |
the vector of residuals (these are the residuals of the (quasi-)demeaned model), |
weights |
(only for weighted estimations) weights as specified, |
df.residual |
degrees of freedom of the residuals, |
formula |
an object of class |
model |
the model frame as a |
ercomp |
an object of class |
aliased |
named logical vector indicating any aliased
coefficients which are silently dropped by |
call |
the call. |
It has print
, summary
and print.summary
methods. The
summary
method creates an object of class "summary.plm"
that
extends the object it is run on with information about (inter alia) F
statistic and (adjusted) R-squared of model, standard errors, t–values, and
p–values of coefficients, (if supplied) the furnished vcov, see
summary.plm()
for further details.
Yves Croissant
Amemiya T (1971). “The Estimation of the Variances in a Variance–Components Model.” International Economic Review, 12, 1–13.
Amemiya T, MaCurdy TE (1986). “Instrumental-Variable Estimation of an Error-Components Model.” Econometrica, 54(4), 869-80.
Balestra P, Varadharajan–Krishnakumar J (1987). “Full Information Estimations of a System of Simultaneous Equations With Error Components.” Econometric Theory, 3, 223–246.
Baltagi BH (1981). “Simultaneous Equations With Error Components.” Journal of Econometrics, 17, 21–49.
Baltagi BH, Song SH, Jung BC (2001). “The unbalanced nested error component regression model.” Journal of Econometrics, 101, 357-381.
Baltagi BH (2013). Econometric Analysis of Panel Data, 5th edition. John Wiley and Sons ltd.
Breusch TS, Mizon GE, Schmidt P (1989). “Efficient Estimation Using Panel Data.” Econometrica, 57(3), 695-700.
Hausman JA, Taylor WE (1981). “Panel Data and Unobservable Individual Effects.” Econometrica, 49, 1377–1398.
Nerlove M (1971). “Further Evidence on the Estimation of Dynamic Economic Relations from a Time–Series of Cross–Sections.” Econometrica, 39, 359–382.
Swamy PAVB, Arora SS (1972). “The Exact Finite Sample Properties of the Estimators of Coefficients in the Error Components Regression Models.” Econometrica, 40, 261–275.
Wallace TD, Hussain A (1969). “The Use of Error Components Models in Combining Cross Section With Time Series Data.” Econometrica, 37(1), 55–72.
summary.plm()
for further details about the associated
summary method and the "summary.plm" object both of which provide some model
tests and tests of coefficients. fixef()
to compute the fixed
effects for "within" models (=fixed effects models). predict.plm()
for
predicted values.
data("Produc", package = "plm") zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c("state","year")) summary(zz) # replicates some results from Baltagi (2013), table 3.1 data("Grunfeld", package = "plm") p <- plm(inv ~ value + capital, data = Grunfeld, model = "pooling") wi <- plm(inv ~ value + capital, data = Grunfeld, model = "within", effect = "twoways") swar <- plm(inv ~ value + capital, data = Grunfeld, model = "random", effect = "twoways") amemiya <- plm(inv ~ value + capital, data = Grunfeld, model = "random", random.method = "amemiya", effect = "twoways") walhus <- plm(inv ~ value + capital, data = Grunfeld, model = "random", random.method = "walhus", effect = "twoways") # summary and summary with a furnished vcov (passed as matrix, # as function, and as function with additional argument) summary(wi) summary(wi, vcov = vcovHC(wi)) summary(wi, vcov = vcovHC) summary(wi, vcov = function(x) vcovHC(x, method = "white2")) ## nested random effect model # replicate Baltagi/Song/Jung (2001), p. 378 (table 6), columns SA, WH # == Baltagi (2013), pp. 204-205 data("Produc", package = "plm") pProduc <- pdata.frame(Produc, index = c("state", "year", "region")) form <- log(gsp) ~ log(pc) + log(emp) + log(hwy) + log(water) + log(util) + unemp summary(plm(form, data = pProduc, model = "random", effect = "nested")) summary(plm(form, data = pProduc, model = "random", effect = "nested", random.method = "walhus")) ## Instrumental variable estimations # replicate Baltagi (2013/2021), p. 133/162, table 7.1 data("Crime", package = "plm") FE2SLS <- plm(lcrmrte ~ lprbarr + lpolpc + lprbconv + lprbpris + lavgsen + ldensity + lwcon + lwtuc + lwtrd + lwfir + lwser + lwmfg + lwfed + lwsta + lwloc + lpctymle + lpctmin + region + smsa + factor(year) | . - lprbarr - lpolpc + ltaxpc + lmix, data = Crime, model = "within") G2SLS <- update(FE2SLS, model = "random", inst.method = "bvk") EC2SLS <- update(G2SLS, model = "random", inst.method = "baltagi") ## Hausman-Taylor estimator and Amemiya-MaCurdy estimator # replicate Baltagi (2005, 2013), table 7.4; Baltagi (2021), table 7.5 data("Wages", package = "plm") ht <- plm(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) + bluecol + ind + union + sex + black + ed | bluecol + south + smsa + ind + sex + black | wks + married + union + exp + I(exp ^ 2), data = Wages, index = 595, random.method = "ht", model = "random", inst.method = "baltagi") summary(ht) am <- plm(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) + bluecol + ind + union + sex + black + ed | bluecol + south + smsa + ind + sex + black | wks + married + union + exp + I(exp ^ 2), data = Wages, index = 595, random.method = "ht", model = "random", inst.method = "am") summary(am)
data("Produc", package = "plm") zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c("state","year")) summary(zz) # replicates some results from Baltagi (2013), table 3.1 data("Grunfeld", package = "plm") p <- plm(inv ~ value + capital, data = Grunfeld, model = "pooling") wi <- plm(inv ~ value + capital, data = Grunfeld, model = "within", effect = "twoways") swar <- plm(inv ~ value + capital, data = Grunfeld, model = "random", effect = "twoways") amemiya <- plm(inv ~ value + capital, data = Grunfeld, model = "random", random.method = "amemiya", effect = "twoways") walhus <- plm(inv ~ value + capital, data = Grunfeld, model = "random", random.method = "walhus", effect = "twoways") # summary and summary with a furnished vcov (passed as matrix, # as function, and as function with additional argument) summary(wi) summary(wi, vcov = vcovHC(wi)) summary(wi, vcov = vcovHC) summary(wi, vcov = function(x) vcovHC(x, method = "white2")) ## nested random effect model # replicate Baltagi/Song/Jung (2001), p. 378 (table 6), columns SA, WH # == Baltagi (2013), pp. 204-205 data("Produc", package = "plm") pProduc <- pdata.frame(Produc, index = c("state", "year", "region")) form <- log(gsp) ~ log(pc) + log(emp) + log(hwy) + log(water) + log(util) + unemp summary(plm(form, data = pProduc, model = "random", effect = "nested")) summary(plm(form, data = pProduc, model = "random", effect = "nested", random.method = "walhus")) ## Instrumental variable estimations # replicate Baltagi (2013/2021), p. 133/162, table 7.1 data("Crime", package = "plm") FE2SLS <- plm(lcrmrte ~ lprbarr + lpolpc + lprbconv + lprbpris + lavgsen + ldensity + lwcon + lwtuc + lwtrd + lwfir + lwser + lwmfg + lwfed + lwsta + lwloc + lpctymle + lpctmin + region + smsa + factor(year) | . - lprbarr - lpolpc + ltaxpc + lmix, data = Crime, model = "within") G2SLS <- update(FE2SLS, model = "random", inst.method = "bvk") EC2SLS <- update(G2SLS, model = "random", inst.method = "baltagi") ## Hausman-Taylor estimator and Amemiya-MaCurdy estimator # replicate Baltagi (2005, 2013), table 7.4; Baltagi (2021), table 7.5 data("Wages", package = "plm") ht <- plm(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) + bluecol + ind + union + sex + black + ed | bluecol + south + smsa + ind + sex + black | wks + married + union + exp + I(exp ^ 2), data = Wages, index = 595, random.method = "ht", model = "random", inst.method = "baltagi") summary(ht) am <- plm(lwage ~ wks + south + smsa + married + exp + I(exp ^ 2) + bluecol + ind + union + sex + black + ed | bluecol + south + smsa + ind + sex + black | wks + married + union + exp + I(exp ^ 2), data = Wages, index = 595, random.method = "ht", model = "random", inst.method = "am") summary(am)
dynformula
, pht
, plm.data
, and pvcovHC
are
deprecated functions which could be removed from plm in a near future.
pvcovHC(x, ...) plm.data(x, indexes = NULL) dynformula(formula, lag.form = NULL, diff.form = NULL, log.form = NULL) ## S3 method for class 'dynformula' formula(x, ...) ## S3 method for class 'dynformula' print(x, ...)
pvcovHC(x, ...) plm.data(x, indexes = NULL) dynformula(formula, lag.form = NULL, diff.form = NULL, log.form = NULL) ## S3 method for class 'dynformula' formula(x, ...) ## S3 method for class 'dynformula' print(x, ...)
x |
an object of class |
... |
further arguments. |
indexes |
a vector (of length one or two) indicating the (individual and time) indexes (see Details); |
formula |
a formula, |
lag.form |
a list containing the lag structure of each variable in the formula, |
diff.form |
a vector (or a list) of logical values indicating whether variables should be differenced, |
log.form |
a vector (or a list) of logical values indicating whether variables should be in logarithms. |
data |
a |
dynformula
was used to construct a dynamic formula which was the
first argument of pgmm
. pgmm
uses now multi-part formulas.
pht
estimates the Hausman-Taylor model, which can now be estimated
using the more general plm
function.
plm.data
is replaced by pdata.frame
.
pvcovHC
is replaced by vcovHC
.
detect_lin_dep
was renamed to detect.lindep
.
A significant speed up can be gained by using fast (panel) data transformation
functions from package collapse
.
An additional significant speed up for the two-way fixed effects case can be
achieved if package fixest
or lfe
is installed (package collapse
needs to be installed for the fast mode in any case).
By default, this speed up is enabled.
Option plm.fast
can be used to enable/disable the speed up. The option is
evaluated prior to execution of supported transformations (see below), so
option("plm.fast" = TRUE)
enables the speed up while
option("plm.fast" = FALSE)
disables the speed up.
To have it always switched off, put options("plm.fast" = FALSE)
in your
.Rprofile file.
See Examples for how to use the option and for a benchmarking example.
For long, package plm
used base R implementations and R-based code. The
package collapse
provides fast data transformation functions written
in C/C++, among them some especially suitable for panel data.
Having package collapse
installed is a requirement for the speed up, so
this package is a hard dependency for package plm
.
Availability of packages fixest
and lfe
is checked for once when
package plm is attached and the additional speed up for the two-way fixed
effect case is enabled automatically (fixest
wins over lfe
),
given one of the packages is detected and options("plm.fast" = TRUE)
(default) is set. If so, the packages' fast algorithms to partial out fixed
effects are used (fixest::demean
(via collapse::fhdwithin
),
lfe::demeanlist
). Both packages are 'Suggests' dependencies.
Users might experience neglectable numerical differences between enabled and disabled fast mode and base R implementation, depending on the platform and the additional packages installed.
Currently, these basic functions benefit from the speed-up, used as building
blocks in most model estimation functions, e.g., in plm
(more functions are
under investigation):
between,
Between,
Sum,
Within,
lag, lead, and diff,
pseriesfy,
pdiff (internal function).
## Not run: ### A benchmark of plm without and with speed-up library("plm") library("collapse") library("microbenchmark") rm(list = ls()) data("wlddev", package = "collapse") form <- LIFEEX ~ PCGDP + GINI # produce big data set (taken from collapse's vignette) wlddevsmall <- get_vars(wlddev, c("iso3c","year","OECD","PCGDP","LIFEEX","GINI","ODA")) wlddevsmall$iso3c <- as.character(wlddevsmall$iso3c) data <- replicate(100, wlddevsmall, simplify = FALSE) rm(wlddevsmall) uniquify <- function(x, i) { x$iso3c <- paste0(x$iso3c, i) x } data <- unlist2d(Map(uniquify, data, as.list(1:100)), idcols = FALSE) data <- pdata.frame(data, index = c("iso3c", "year")) pdim(data) # Balanced Panel: n = 21600, T = 59, N = 1274400 // but many NAs # data <- na.omit(data) # pdim(data) # Unbalanced Panel: n = 13300, T = 1-31, N = 93900 times <- 1 # no. of repetitions for benchmark - this takes quite long! onewayFE <- microbenchmark( {options("plm.fast" = FALSE); plm(form, data = data, model = "within")}, {options("plm.fast" = TRUE); plm(form, data = data, model = "within")}, times = times) summary(onewayFE, unit = "relative") ## two-ways FE benchmark requires pkg fixest and lfe ## (End-users shall only set option plm.fast. Option plm.fast.pkg.FE.tw shall ## _not_ be set by the end-user, it is determined automatically when pkg plm ## is attached; however, it needs to be set explicitly in this example for the ## benchmark.) if(requireNamespace("fixest", quietly = TRUE) && requireNamespace("lfe", quietly = TRUE)) { twowayFE <- microbenchmark( {options("plm.fast" = FALSE); plm(form, data = data, model = "within", effect = "twoways")}, {options("plm.fast" = TRUE, "plm.fast.pkg.FE.tw" = "collapse"); plm(form, data = data, model = "within", effect = "twoways")}, {options("plm.fast" = TRUE, "plm.fast.pkg.FE.tw" = "fixest"); plm(form, data = data, model = "within", effect = "twoways")}, {options("plm.fast" = TRUE, "plm.fast.pkg.FE.tw" = "lfe"); plm(form, data = data, model = "within", effect = "twoways")}, times = times) summary(twowayFE, unit = "relative") } onewayRE <- microbenchmark( {options("plm.fast" = FALSE); plm(form, data = data, model = "random")}, {options("plm.fast" = TRUE); plm(form, data = data, model = "random")}, times = times) summary(onewayRE, unit = "relative") twowayRE <- microbenchmark( {options("plm.fast" = FALSE); plm(form, data = data, model = "random", effect = "twoways")}, {options("plm.fast" = TRUE); plm(form, data = data, model = "random", effect = "twoways")}, times = times) summary(twowayRE, unit = "relative") ## End(Not run)
## Not run: ### A benchmark of plm without and with speed-up library("plm") library("collapse") library("microbenchmark") rm(list = ls()) data("wlddev", package = "collapse") form <- LIFEEX ~ PCGDP + GINI # produce big data set (taken from collapse's vignette) wlddevsmall <- get_vars(wlddev, c("iso3c","year","OECD","PCGDP","LIFEEX","GINI","ODA")) wlddevsmall$iso3c <- as.character(wlddevsmall$iso3c) data <- replicate(100, wlddevsmall, simplify = FALSE) rm(wlddevsmall) uniquify <- function(x, i) { x$iso3c <- paste0(x$iso3c, i) x } data <- unlist2d(Map(uniquify, data, as.list(1:100)), idcols = FALSE) data <- pdata.frame(data, index = c("iso3c", "year")) pdim(data) # Balanced Panel: n = 21600, T = 59, N = 1274400 // but many NAs # data <- na.omit(data) # pdim(data) # Unbalanced Panel: n = 13300, T = 1-31, N = 93900 times <- 1 # no. of repetitions for benchmark - this takes quite long! onewayFE <- microbenchmark( {options("plm.fast" = FALSE); plm(form, data = data, model = "within")}, {options("plm.fast" = TRUE); plm(form, data = data, model = "within")}, times = times) summary(onewayFE, unit = "relative") ## two-ways FE benchmark requires pkg fixest and lfe ## (End-users shall only set option plm.fast. Option plm.fast.pkg.FE.tw shall ## _not_ be set by the end-user, it is determined automatically when pkg plm ## is attached; however, it needs to be set explicitly in this example for the ## benchmark.) if(requireNamespace("fixest", quietly = TRUE) && requireNamespace("lfe", quietly = TRUE)) { twowayFE <- microbenchmark( {options("plm.fast" = FALSE); plm(form, data = data, model = "within", effect = "twoways")}, {options("plm.fast" = TRUE, "plm.fast.pkg.FE.tw" = "collapse"); plm(form, data = data, model = "within", effect = "twoways")}, {options("plm.fast" = TRUE, "plm.fast.pkg.FE.tw" = "fixest"); plm(form, data = data, model = "within", effect = "twoways")}, {options("plm.fast" = TRUE, "plm.fast.pkg.FE.tw" = "lfe"); plm(form, data = data, model = "within", effect = "twoways")}, times = times) summary(twowayFE, unit = "relative") } onewayRE <- microbenchmark( {options("plm.fast" = FALSE); plm(form, data = data, model = "random")}, {options("plm.fast" = TRUE); plm(form, data = data, model = "random")}, times = times) summary(onewayRE, unit = "relative") twowayRE <- microbenchmark( {options("plm.fast" = FALSE); plm(form, data = data, model = "random", effect = "twoways")}, {options("plm.fast" = TRUE); plm(form, data = data, model = "random", effect = "twoways")}, times = times) summary(twowayRE, unit = "relative") ## End(Not run)
Test of individual and/or time effects for panel models.
plmtest(x, ...) ## S3 method for class 'plm' plmtest( x, effect = c("individual", "time", "twoways"), type = c("honda", "bp", "ghm", "kw"), ... ) ## S3 method for class 'formula' plmtest( x, data, ..., effect = c("individual", "time", "twoways"), type = c("honda", "bp", "ghm", "kw") )
plmtest(x, ...) ## S3 method for class 'plm' plmtest( x, effect = c("individual", "time", "twoways"), type = c("honda", "bp", "ghm", "kw"), ... ) ## S3 method for class 'formula' plmtest( x, data, ..., effect = c("individual", "time", "twoways"), type = c("honda", "bp", "ghm", "kw") )
x |
an object of class |
... |
further arguments passed to |
effect |
a character string indicating which effects are
tested: individual effects ( |
type |
a character string indicating the test to be performed:
|
data |
a |
These Lagrange multiplier tests use only the residuals of the
pooling model. The first argument of this function may be either a
pooling model of class plm
or an object of class formula
describing the model. For input within (fixed effects) or random
effects models, the corresponding pooling model is calculated
internally first as the tests are based on the residuals of the
pooling model.
The "bp"
test for unbalanced panels was derived in
Baltagi and Li (1990)
(1990), the "kw"
test for unbalanced panels in
Baltagi et al. (1998).
The "ghm"
test and the "kw"
test were extended to two-way
effects in Baltagi et al. (1992).
For a concise overview of all these statistics see Baltagi (2003), Sec. 4.2, pp. 68–76 (for balanced panels) and Sec. 9.5, pp. 200–203 (for unbalanced panels).
An object of class "htest"
.
For the King-Wu statistics ("kw"
), the oneway statistics
("individual"
and "time"
) coincide with the respective
Honda statistics ("honda"
); twoway statistics of "kw"
and
"honda"
differ.
Yves Croissant (initial implementation), Kevin Tappe (generalization to unbalanced panels)
Baltagi BH (2013). Econometric Analysis of Panel Data, 5th edition. John Wiley and Sons ltd.
Baltagi BH, Li Q (1990). “A Lagrange multiplier test for the error components model with incomplete panels.” Econometric Reviews, 9, 103–107.
Baltagi BH, Chang YJ, Li Q (1992). “Monte Carlo results on several new and existing tests for the error components model.” Journal of Econometrics, 54, 95–120.
Baltagi B, Chang YA, Li Q (1998). “Testing for random individual and time effects using unbalanced panel data.” Advances in econometrics, 13, 1-20.
Breusch TS, Pagan AR (1980). “The Lagrange Multiplier Test and Its Applications to Model Specification in Econometrics.” Review of Economic Studies, 47, 239–253.
Gourieroux C, Holly A, Monfort A (1982). “Likelihood Ratio Test, Wald Test, and Kuhn–Tucker Test in Linear Models With Inequality Constraints on the Regression Parameters.” Econometrica, 50, 63–80.
Honda Y (1985). “Testing the Error Components Model With Non–Normal Disturbances.” Review of Economic Studies, 52, 681–690.
King ML, Wu PX (1997). “Locally Optimal One–Sided Tests for Multiparameter Hypothese.” Econometric Reviews, 33, 523–529.
pFtest()
for individual and/or time effects tests based
on the within model.
data("Grunfeld", package = "plm") g <- plm(inv ~ value + capital, data = Grunfeld, model = "pooling") plmtest(g) plmtest(g, effect="time") plmtest(inv ~ value + capital, data = Grunfeld, type = "honda") plmtest(inv ~ value + capital, data = Grunfeld, type = "bp") plmtest(inv ~ value + capital, data = Grunfeld, type = "bp", effect = "twoways") plmtest(inv ~ value + capital, data = Grunfeld, type = "ghm", effect = "twoways") plmtest(inv ~ value + capital, data = Grunfeld, type = "kw", effect = "twoways") Grunfeld_unbal <- Grunfeld[1:(nrow(Grunfeld)-1), ] # create an unbalanced panel data set g_unbal <- plm(inv ~ value + capital, data = Grunfeld_unbal, model = "pooling") plmtest(g_unbal) # unbalanced version of test is indicated in output
data("Grunfeld", package = "plm") g <- plm(inv ~ value + capital, data = Grunfeld, model = "pooling") plmtest(g) plmtest(g, effect="time") plmtest(inv ~ value + capital, data = Grunfeld, type = "honda") plmtest(inv ~ value + capital, data = Grunfeld, type = "bp") plmtest(inv ~ value + capital, data = Grunfeld, type = "bp", effect = "twoways") plmtest(inv ~ value + capital, data = Grunfeld, type = "ghm", effect = "twoways") plmtest(inv ~ value + capital, data = Grunfeld, type = "kw", effect = "twoways") Grunfeld_unbal <- Grunfeld[1:(nrow(Grunfeld)-1), ] # create an unbalanced panel data set g_unbal <- plm(inv ~ value + capital, data = Grunfeld_unbal, model = "pooling") plmtest(g_unbal) # unbalanced version of test is indicated in output
Mean Groups (MG), Demeaned MG (DMG) and Common Correlated Effects MG (CCEMG) estimators for heterogeneous panel models, possibly with common factors (CCEMG)
pmg( formula, data, subset, na.action, model = c("mg", "cmg", "dmg"), index = NULL, trend = FALSE, ... ) ## S3 method for class 'pmg' summary(object, ...) ## S3 method for class 'summary.pmg' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'pmg' residuals(object, ...)
pmg( formula, data, subset, na.action, model = c("mg", "cmg", "dmg"), index = NULL, trend = FALSE, ... ) ## S3 method for class 'pmg' summary(object, ...) ## S3 method for class 'summary.pmg' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'pmg' residuals(object, ...)
formula |
a symbolic description of the model to be estimated, |
data |
a |
subset |
see |
na.action |
see |
model |
one of |
index |
the indexes, see |
trend |
logical specifying whether an individual-specific trend has to be included, |
... |
further arguments. |
object , x
|
an object of class |
digits |
digits, |
width |
the maximum length of the lines in the print output, |
pmg
is a function for the estimation of linear panel models with
heterogeneous coefficients by various Mean Groups estimators. Setting
argument model = "mg"
specifies the standard Mean Groups estimator, based on the
average of individual time series regressions. If model = "dmg"
the data are demeaned cross-sectionally, which is believed to
reduce the influence of common factors (and is akin to what is done
in homogeneous panels when model = "within"
and effect = "time"
).
Lastly, if model = "cmg"
the CCEMG estimator is
employed which is consistent under the hypothesis of
unobserved common factors and idiosyncratic factor loadings; it
works by augmenting the model by cross-sectional averages of the
dependent variable and regressors in order to account for the
common factors, and adding individual intercepts and possibly
trends.
An object of class c("pmg", "panelmodel")
containing:
coefficients |
the vector of coefficients, |
residuals |
the vector of residuals, |
fitted.values |
the vector of fitted values, |
vcov |
the covariance matrix of the coefficients, |
df.residual |
degrees of freedom of the residuals, |
model |
a data.frame containing the variables used for the estimation, |
r.squared |
numeric, the R squared, |
call |
the call, |
indcoef |
the matrix of individual coefficients from separate time series regressions. |
Giovanni Millo
Pesaran MH (2006). “Estimation and inference in large heterogeneous panels with a multifactor error structure.” Econometrica, 74(4), 967–1012.
data("Produc", package = "plm") ## Mean Groups estimator mgmod <- pmg(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc) summary(mgmod) ## demeaned Mean Groups dmgmod <- pmg(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "dmg") summary(dmgmod) ## Common Correlated Effects Mean Groups ccemgmod <- pmg(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "cmg") summary(ccemgmod)
data("Produc", package = "plm") ## Mean Groups estimator mgmod <- pmg(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc) summary(mgmod) ## demeaned Mean Groups dmgmod <- pmg(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "dmg") summary(dmgmod) ## Common Correlated Effects Mean Groups ccemgmod <- pmg(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "cmg") summary(ccemgmod)
pmodel.response has several methods to conveniently extract the response of several objects.
pmodel.response(object, ...) ## S3 method for class 'plm' pmodel.response(object, ...) ## S3 method for class 'data.frame' pmodel.response(object, ...) ## S3 method for class 'formula' pmodel.response(object, data, ...)
pmodel.response(object, ...) ## S3 method for class 'plm' pmodel.response(object, ...) ## S3 method for class 'data.frame' pmodel.response(object, ...) ## S3 method for class 'formula' pmodel.response(object, data, ...)
object |
an object of class |
... |
further arguments. |
data |
a |
The model response is extracted from a pdata.frame
(where the
response must reside in the first column; this is the case for a
model frame), a Formula
+ data
(being a model frame) or a plm
object, and the
transformation specified by effect
and model
is applied to
it.
Constructing the model frame first ensures proper NA
handling and the response being placed in the first column, see
also Examples for usage.
A pseries except if model responses' of a "between"
or "fd"
model as these models "compress" the data (the number
of observations used in estimation is smaller than the original
data due to the specific transformation). A numeric is returned
for the "between"
and "fd"
model.
Yves Croissant
plm
's model.matrix()
for (transformed)
model matrix and the corresponding model.frame()
method to construct a model frame.
# First, make a pdata.frame data("Grunfeld", package = "plm") pGrunfeld <- pdata.frame(Grunfeld) # then make a model frame from a Formula and a pdata.frame form <- inv ~ value + capital mf <- model.frame(pGrunfeld, form) # retrieve (transformed) response directly from model frame resp_mf <- pmodel.response(mf, model = "within", effect = "individual") # retrieve (transformed) response from a plm object, i.e., an estimated model fe_model <- plm(form, data = pGrunfeld, model = "within") pmodel.response(fe_model) # same as constructed before all.equal(resp_mf, pmodel.response(fe_model), check.attributes = FALSE) # TRUE
# First, make a pdata.frame data("Grunfeld", package = "plm") pGrunfeld <- pdata.frame(Grunfeld) # then make a model frame from a Formula and a pdata.frame form <- inv ~ value + capital mf <- model.frame(pGrunfeld, form) # retrieve (transformed) response directly from model frame resp_mf <- pmodel.response(mf, model = "within", effect = "individual") # retrieve (transformed) response from a plm object, i.e., an estimated model fe_model <- plm(form, data = pGrunfeld, model = "within") pmodel.response(fe_model) # same as constructed before all.equal(resp_mf, pmodel.response(fe_model), check.attributes = FALSE) # TRUE
A Chow test for the poolability of the data.
pooltest(x, ...) ## S3 method for class 'plm' pooltest(x, z, ...) ## S3 method for class 'formula' pooltest(x, data, ...)
pooltest(x, ...) ## S3 method for class 'plm' pooltest(x, z, ...) ## S3 method for class 'formula' pooltest(x, data, ...)
x |
an object of class |
... |
further arguments passed to plm. |
z |
an object of class |
data |
a |
pooltest
is a F test of stability (or Chow test) for the
coefficients of a panel model. For argument x
, the estimated
plm
object should be a "pooling"
model or a "within"
model
(the default); intercepts are assumed to be identical in the first
case and different in the second case.
An object of class "htest"
.
Yves Croissant
data("Gasoline", package = "plm") form <- lgaspcar ~ lincomep + lrpmg + lcarpcap gasw <- plm(form, data = Gasoline, model = "within") gasp <- plm(form, data = Gasoline, model = "pooling") gasnp <- pvcm(form, data = Gasoline, model = "within") pooltest(gasw, gasnp) pooltest(gasp, gasnp) pooltest(form, data = Gasoline, effect = "individual", model = "within") pooltest(form, data = Gasoline, effect = "individual", model = "pooling")
data("Gasoline", package = "plm") form <- lgaspcar ~ lincomep + lrpmg + lcarpcap gasw <- plm(form, data = Gasoline, model = "within") gasp <- plm(form, data = Gasoline, model = "pooling") gasnp <- pvcm(form, data = Gasoline, model = "within") pooltest(gasw, gasnp) pooltest(gasp, gasnp) pooltest(form, data = Gasoline, effect = "individual", model = "within") pooltest(form, data = Gasoline, effect = "individual", model = "pooling")
Predicted values of response based on plm models.
## S3 method for class 'plm' predict( object, newdata = NULL, na.fill = !inherits(newdata, "pdata.frame"), ... )
## S3 method for class 'plm' predict( object, newdata = NULL, na.fill = !inherits(newdata, "pdata.frame"), ... )
object |
An object of class |
newdata |
An optional pdata.frame in which to look for variables to be
used for prediction. If |
na.fill |
A logical, only relevant if |
... |
further arguments. |
predict
calculates predicted values by evaluating the regression function of
a plm model for newdata
or, if newdata = NULL
, it returns the fitted values
the plm model.
The fixed effects (within) model is somewhat special in prediction as it has
fixed effects estimated per individual, time period (one-way) or both (two-ways
model) which should to be respected when predicting values relating to these
fixed effects in the model: To do so, it is recommended to supply a pdata.frame
(and not a plain data.frame) in newdata
as it describes the relationship
between the data supplied to the individual. and/or time periods. In case
the newdata
´'s pdata.frame has out-of-sample data (data contains individuals
and/or time periods not contained in the original model), it is not clear
how values are to be predicted and the result will contain NA
values for these out-of-sample data. Argument na.fill
can be set to TRUE
to apply the original model's weighted mean of fixed effects for the
out-of-sample data to derive a prediction.
If a plain data.frame is given in newdata
for a fixed effects model, the
weighted mean is used for all fixed effects as newdata
for prediction as a
plain data.frame cannot describe any relation to individuals/time periods
(na.fill
is automatically set to TRUE
and the function warns).
See also Examples.
A numeric (or a pseries if newdata
is a pdata.frame) carrying the
predicted values with length equal to the number of rows as the data
supplied in newdata
and with names the row names of newdata
or, if
newdata = NULL
, the fitted values the original model given in object
.
library(plm) data("Grunfeld", package = "plm") # fit a fixed effect model fit.fe <- plm(inv ~ value + capital, data = Grunfeld, model = "within") # generate 55 new observations of three firms used for prediction: # * firm 1 with years 1935:1964 (has out-of-sample years 1955:1964), # * firm 2 with years 1935:1949 (all in sample), # * firm 11 with years 1935:1944 (firm 11 is out-of-sample) set.seed(42L) new.value2 <- runif(55, min = min(Grunfeld$value), max = max(Grunfeld$value)) new.capital2 <- runif(55, min = min(Grunfeld$capital), max = max(Grunfeld$capital)) newdata <- data.frame(firm = c(rep(1, 30), rep(2, 15), rep(11, 10)), year = c(1935:(1935+29), 1935:(1935+14), 1935:(1935+9)), value = new.value2, capital = new.capital2) # make pdata.frame newdata.p <- pdata.frame(newdata, index = c("firm", "year")) ## predict from fixed effect model with new data as pdata.frame predict(fit.fe, newdata = newdata.p) ## set na.fill = TRUE to have the weighted mean used to for fixed effects -> no NA values predict(fit.fe, newdata = newdata.p, na.fill = TRUE) ## predict with plain data.frame from fixed effect model: uses mean fixed effects ## for prediction and, thus, yields different result with a warning predict(fit.fe, newdata = newdata)
library(plm) data("Grunfeld", package = "plm") # fit a fixed effect model fit.fe <- plm(inv ~ value + capital, data = Grunfeld, model = "within") # generate 55 new observations of three firms used for prediction: # * firm 1 with years 1935:1964 (has out-of-sample years 1955:1964), # * firm 2 with years 1935:1949 (all in sample), # * firm 11 with years 1935:1944 (firm 11 is out-of-sample) set.seed(42L) new.value2 <- runif(55, min = min(Grunfeld$value), max = max(Grunfeld$value)) new.capital2 <- runif(55, min = min(Grunfeld$capital), max = max(Grunfeld$capital)) newdata <- data.frame(firm = c(rep(1, 30), rep(2, 15), rep(11, 10)), year = c(1935:(1935+29), 1935:(1935+14), 1935:(1935+9)), value = new.value2, capital = new.capital2) # make pdata.frame newdata.p <- pdata.frame(newdata, index = c("firm", "year")) ## predict from fixed effect model with new data as pdata.frame predict(fit.fe, newdata = newdata.p) ## set na.fill = TRUE to have the weighted mean used to for fixed effects -> no NA values predict(fit.fe, newdata = newdata.p, na.fill = TRUE) ## predict with plain data.frame from fixed effect model: uses mean fixed effects ## for prediction and, thus, yields different result with a warning predict(fit.fe, newdata = newdata)
A panel of 48 observations from 1970 to 1986
A data frame containing :
the state
the year
the region
public capital stock
highway and streets
water and sewer facilities
other public buildings and structures
private capital stock
gross state product
labor input measured by the employment in non–agricultural payrolls
state unemployment rate
total number of observations : 816
observation : regional
country : United States
Online complements to Baltagi (2001):
https://www.wiley.com/legacy/wileychi/baltagi/
Online complements to Baltagi (2013):
https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=4338&itemId=1118672321&resourceId=13452
Baltagi BH (2001). Econometric Analysis of Panel Data, 3rd edition. John Wiley and Sons ltd.
Baltagi BH (2013). Econometric Analysis of Panel Data, 5th edition. John Wiley and Sons ltd.
Baltagi BH, Pinnoi N (1995). “Public capital stock and state productivity growth: further evidence from an error components model.” Empirical Economics, 20, 351-359.
Munnell A (1990). “Why Has Productivity Growth Declined? Productivity and Public Investment.” New England Economic Review, 3–22.
A class for panel series for which several useful computations and data transformations are available.
## S3 method for class 'pseries' print(x, ...) ## S3 method for class 'pseries' as.matrix(x, idbyrow = TRUE, ...) ## S3 method for class 'pseries' plot( x, plot = c("lattice", "superposed"), scale = FALSE, transparency = TRUE, col = "blue", lwd = 1, ... ) ## S3 method for class 'pseries' summary(object, ...) ## S3 method for class 'summary.pseries' plot(x, ...) ## S3 method for class 'summary.pseries' print(x, ...) Sum(x, ...) ## Default S3 method: Sum(x, effect, ...) ## S3 method for class 'pseries' Sum(x, effect = c("individual", "time", "group"), ...) ## S3 method for class 'matrix' Sum(x, effect, ...) Between(x, ...) ## Default S3 method: Between(x, effect, ...) ## S3 method for class 'pseries' Between(x, effect = c("individual", "time", "group"), ...) ## S3 method for class 'matrix' Between(x, effect, ...) between(x, ...) ## Default S3 method: between(x, effect, ...) ## S3 method for class 'pseries' between(x, effect = c("individual", "time", "group"), ...) ## S3 method for class 'matrix' between(x, effect, ...) Within(x, ...) ## Default S3 method: Within(x, effect, ...) ## S3 method for class 'pseries' Within(x, effect = c("individual", "time", "group", "twoways"), ...) ## S3 method for class 'matrix' Within(x, effect, ...)
## S3 method for class 'pseries' print(x, ...) ## S3 method for class 'pseries' as.matrix(x, idbyrow = TRUE, ...) ## S3 method for class 'pseries' plot( x, plot = c("lattice", "superposed"), scale = FALSE, transparency = TRUE, col = "blue", lwd = 1, ... ) ## S3 method for class 'pseries' summary(object, ...) ## S3 method for class 'summary.pseries' plot(x, ...) ## S3 method for class 'summary.pseries' print(x, ...) Sum(x, ...) ## Default S3 method: Sum(x, effect, ...) ## S3 method for class 'pseries' Sum(x, effect = c("individual", "time", "group"), ...) ## S3 method for class 'matrix' Sum(x, effect, ...) Between(x, ...) ## Default S3 method: Between(x, effect, ...) ## S3 method for class 'pseries' Between(x, effect = c("individual", "time", "group"), ...) ## S3 method for class 'matrix' Between(x, effect, ...) between(x, ...) ## Default S3 method: between(x, effect, ...) ## S3 method for class 'pseries' between(x, effect = c("individual", "time", "group"), ...) ## S3 method for class 'matrix' between(x, effect, ...) Within(x, ...) ## Default S3 method: Within(x, effect, ...) ## S3 method for class 'pseries' Within(x, effect = c("individual", "time", "group", "twoways"), ...) ## S3 method for class 'matrix' Within(x, effect, ...)
x , object
|
a |
... |
further arguments, e. g., |
idbyrow |
if |
plot , scale , transparency , col , lwd
|
plot arguments, |
effect |
for the pseries methods: character string indicating the
|
The functions between
, Between
, Within
, and Sum
perform specific
data transformations, i. e., the between, within, and sum transformation,
respectively.
between
returns a vector/matrix containing the individual means (over
time) with the length of the vector equal to the number of
individuals (if effect = "individual"
(default); if effect = "time"
,
it returns the time means (over individuals)). Between
duplicates the values and returns a vector/matrix which length/number of rows
is the number of total observations. Within
returns a vector/matrix
containing the values in deviation from the individual means
(if effect = "individual"
, from time means if effect = "time"
), the so
called demeaned data. Sum
returns a vector/matrix with sum per individual
(over time) or the sum per time period (over individuals) with
effect = "individual"
or effect = "time"
, respectively, and has length/
number of rows of the total observations (like Between
).
For between
, Between
, Within
, and Sum
in presence of NA values it
can be useful to supply na.rm = TRUE
as an additional argument to
keep as many observations as possible in the resulting transformation.
na.rm is passed on to the mean()/sum() function used by these transformations
(i.e., it does not remove NAs prior to any processing!), see also
Examples.
All these functions return an object of class pseries
or a matrix,
except:between
, which returns a numeric vector or a matrix;
as.matrix
, which returns a matrix.
Yves Croissant
is.pseries()
to check if an object is a pseries. For
more functions on class 'pseries' see lag()
, lead()
,
diff()
for lagging values, leading values (negative lags) and
differencing.
# First, create a pdata.frame data("EmplUK", package = "plm") Em <- pdata.frame(EmplUK) # Then extract a series, which becomes additionally a pseries z <- Em$output class(z) # obtain the matrix representation as.matrix(z) # compute the between and within transformations between(z) Within(z) # Between and Sum replicate the values for each time observation Between(z) Sum(z) # between, Between, Within, and Sum transformations on other dimension between(z, effect = "time") Between(z, effect = "time") Within(z, effect = "time") Sum(z, effect = "time") # NA treatment for between, Between, Within, and Sum z2 <- z z2[length(z2)] <- NA # set last value to NA between(z2, na.rm = TRUE) # non-NA value for last individual Between(z2, na.rm = TRUE) # only the NA observation is lost Within(z2, na.rm = TRUE) # only the NA observation is lost Sum(z2, na.rm = TRUE) # only the NA observation is lost sum(is.na(Between(z2))) # 9 observations lost due to one NA value sum(is.na(Between(z2, na.rm = TRUE))) # only the NA observation is lost sum(is.na(Within(z2))) # 9 observations lost due to one NA value sum(is.na(Within(z2, na.rm = TRUE))) # only the NA observation is lost sum(is.na(Sum(z2))) # 9 observations lost due to one NA value sum(is.na(Sum(z2, na.rm = TRUE))) # only the NA observation is lost
# First, create a pdata.frame data("EmplUK", package = "plm") Em <- pdata.frame(EmplUK) # Then extract a series, which becomes additionally a pseries z <- Em$output class(z) # obtain the matrix representation as.matrix(z) # compute the between and within transformations between(z) Within(z) # Between and Sum replicate the values for each time observation Between(z) Sum(z) # between, Between, Within, and Sum transformations on other dimension between(z, effect = "time") Between(z, effect = "time") Within(z, effect = "time") Sum(z, effect = "time") # NA treatment for between, Between, Within, and Sum z2 <- z z2[length(z2)] <- NA # set last value to NA between(z2, na.rm = TRUE) # non-NA value for last individual Between(z2, na.rm = TRUE) # only the NA observation is lost Within(z2, na.rm = TRUE) # only the NA observation is lost Sum(z2, na.rm = TRUE) # only the NA observation is lost sum(is.na(Between(z2))) # 9 observations lost due to one NA value sum(is.na(Between(z2, na.rm = TRUE))) # only the NA observation is lost sum(is.na(Within(z2))) # 9 observations lost due to one NA value sum(is.na(Within(z2, na.rm = TRUE))) # only the NA observation is lost sum(is.na(Sum(z2))) # 9 observations lost due to one NA value sum(is.na(Sum(z2, na.rm = TRUE))) # only the NA observation is lost
This function takes a pdata.frame and turns all of its columns into objects of class pseries.
pseriesfy(x, ...)
pseriesfy(x, ...)
x |
an object of class |
... |
further arguments (currently not used). |
Background: Initially created pdata.frames have as columns the pure/basic class (e.g., numeric, factor, character). When extracting a column from such a pdata.frame, the extracted column is turned into a pseries.
At times, it can be convenient to apply data transformation operations on
such a pseriesfy
-ed pdata.frame, see Examples.
A pdata.frame like the input pdata.frame but with all columns turned into pseries.
library("plm") data("Grunfeld", package = "plm") pGrun <- pdata.frame(Grunfeld[ , 1:4], drop.index = TRUE) pGrun2 <- pseriesfy(pGrun) # pseriesfy-ed pdata.frame # compare classes of columns lapply(pGrun, class) lapply(pGrun2, class) # When using with() with(pGrun, lag(value)) # dispatches to base R's lag() with(pGrun2, lag(value)) # dispatches to plm's lag() respect. panel structure # When lapply()-ing lapply(pGrun, lag) # dispatches to base R's lag() lapply(pGrun2, lag) # dispatches to plm's lag() respect. panel structure # as.list(., keep.attributes = TRUE) on a non-pseriesfy-ed # pdata.frame is similar and dispatches to plm's lag lapply(as.list(pGrun, keep.attributes = TRUE), lag)
library("plm") data("Grunfeld", package = "plm") pGrun <- pdata.frame(Grunfeld[ , 1:4], drop.index = TRUE) pGrun2 <- pseriesfy(pGrun) # pseriesfy-ed pdata.frame # compare classes of columns lapply(pGrun, class) lapply(pGrun2, class) # When using with() with(pGrun, lag(value)) # dispatches to base R's lag() with(pGrun2, lag(value)) # dispatches to plm's lag() respect. panel structure # When lapply()-ing lapply(pGrun, lag) # dispatches to base R's lag() lapply(pGrun2, lag) # dispatches to plm's lag() respect. panel structure # as.list(., keep.attributes = TRUE) on a non-pseriesfy-ed # pdata.frame is similar and dispatches to plm's lag lapply(as.list(pGrun, keep.attributes = TRUE), lag)
This function reports unbalancedness measures for panel data as defined in Ahrens and Pincus (1981) and Baltagi et al. (2001).
punbalancedness(x, ...) ## S3 method for class 'pdata.frame' punbalancedness(x, ...) ## S3 method for class 'data.frame' punbalancedness(x, index = NULL, ...) ## S3 method for class 'panelmodel' punbalancedness(x, ...)
punbalancedness(x, ...) ## S3 method for class 'pdata.frame' punbalancedness(x, ...) ## S3 method for class 'data.frame' punbalancedness(x, index = NULL, ...) ## S3 method for class 'panelmodel' punbalancedness(x, ...)
x |
a |
... |
further arguments. |
index |
only relevant for |
punbalancedness
returns measures for the unbalancedness of a
panel data set.
For two-dimensional data:
The two measures of
Ahrens and Pincus (1981) are calculated, called
"gamma" () and "nu" (
).
If the panel data are balanced, both measures equal 1. The more
"unbalanced" the panel data, the lower the measures (but > 0). The
upper and lower bounds as given in Ahrens and Pincus (1981)
are:, and for
more precisely
, with
being
the number of individuals (as in
pdim(x)$nT$n
).
For nested panel data (meaning including a grouping variable):
The extension of the above measures by
Baltagi et al. (2001), p. 368, are
calculated:
c1: measure of subgroup (individual) unbalancedness,
c2: measure of time unbalancedness,
c3: measure of group unbalancedness due to each group size.
Values are 1 if the data are balanced and become smaller as the data become more unbalanced.
An application of the measure "gamma" is found in e. g. Baltagi et al. (2001), pp. 488-491, and Baltagi and Chang (1994), pp. 78–87, where it is used to measure the unbalancedness of various unbalanced data sets used for Monte Carlo simulation studies. Measures c1, c2, c3 are used for similar purposes in Baltagi et al. (2001).
In the two-dimensional case, punbalancedness
uses output of
pdim()
to calculate the two unbalancedness measures, so inputs to
punbalancedness
can be whatever pdim
works on. pdim
returns
detailed information about the number of individuals and time
observations (see pdim()
).
A named numeric containing either two or three entries, depending on the panel structure inputted:
For the two-dimensional panel structure, the entries are called
gamma
and nu
,
For a nested panel structure, the entries are called c1
, c2
,
c3
.
Calling punbalancedness
on an estimated panelmodel
object
and on the corresponding (p)data.frame
used for this
estimation does not necessarily yield the same result (true
also for pdim
). When called on an estimated panelmodel
, the
number of observations (individual, time) actually used for
model estimation are taken into account. When called on a
(p)data.frame
, the rows in the (p)data.frame
are
considered, disregarding any NA
values in the dependent or
independent variable(s) which would be dropped during model
estimation.
Kevin Tappe
Ahrens H, Pincus R (1981). “On Two Measures of Unbalancedness in a One-Way Model and Their Relation to Efficiency.” Biometrical Journal, 23(3), 227-235. doi:10.1002/bimj.4710230302.
Baltagi BH, Chang YJ (1994). “Incomplete panels: a comparative study of alternative estimators for the unbalanced one-way error component regression model.” Journal of Econometrics, 62, 67-89.
Baltagi BH, Song SH, Jung BC (2001). “The unbalanced nested error component regression model.” Journal of Econometrics, 101, 357-381.
Baltagi BH, Song SH, Jung BC (2002). “A comparative study of alternative estimators for the unbalanced two-way error component regression model.” The Econometrics Journal, 5(2), 480–493. ISSN 13684221, 1368423X.
# Grunfeld is a balanced panel, Hedonic is an unbalanced panel data(list=c("Grunfeld", "Hedonic"), package="plm") # Grunfeld has individual and time index in first two columns punbalancedness(Grunfeld) # c(1,1) indicates balanced panel pdim(Grunfeld)$balanced # TRUE # Hedonic has individual index in column "townid" (in last column) punbalancedness(Hedonic, index="townid") # c(0.472, 0.519) pdim(Hedonic, index="townid")$balanced # FALSE # punbalancedness on estimated models plm_mod_pool <- plm(inv ~ value + capital, data = Grunfeld) punbalancedness(plm_mod_pool) plm_mod_fe <- plm(inv ~ value + capital, data = Grunfeld[1:99, ], model = "within") punbalancedness(plm_mod_fe) # replicate results for panel data design no. 1 in Ahrens/Pincus (1981), p. 234 ind_d1 <- c(1,1,1,2,2,2,3,3,3,3,3,4,4,4,4,4,4,4,5,5,5,5,5,5,5) time_d1 <- c(1,2,3,1,2,3,1,2,3,4,5,1,2,3,4,5,6,7,1,2,3,4,5,6,7) df_d1 <- data.frame(individual = ind_d1, time = time_d1) punbalancedness(df_d1) # c(0.868, 0.887) # example for a nested panel structure with a third index variable # specifying a group (states are grouped by region) and without grouping data("Produc", package = "plm") punbalancedness(Produc, index = c("state", "year", "region")) punbalancedness(Produc, index = c("state", "year"))
# Grunfeld is a balanced panel, Hedonic is an unbalanced panel data(list=c("Grunfeld", "Hedonic"), package="plm") # Grunfeld has individual and time index in first two columns punbalancedness(Grunfeld) # c(1,1) indicates balanced panel pdim(Grunfeld)$balanced # TRUE # Hedonic has individual index in column "townid" (in last column) punbalancedness(Hedonic, index="townid") # c(0.472, 0.519) pdim(Hedonic, index="townid")$balanced # FALSE # punbalancedness on estimated models plm_mod_pool <- plm(inv ~ value + capital, data = Grunfeld) punbalancedness(plm_mod_pool) plm_mod_fe <- plm(inv ~ value + capital, data = Grunfeld[1:99, ], model = "within") punbalancedness(plm_mod_fe) # replicate results for panel data design no. 1 in Ahrens/Pincus (1981), p. 234 ind_d1 <- c(1,1,1,2,2,2,3,3,3,3,3,4,4,4,4,4,4,4,5,5,5,5,5,5,5) time_d1 <- c(1,2,3,1,2,3,1,2,3,4,5,1,2,3,4,5,6,7,1,2,3,4,5,6,7) df_d1 <- data.frame(individual = ind_d1, time = time_d1) punbalancedness(df_d1) # c(0.868, 0.887) # example for a nested panel structure with a third index variable # specifying a group (states are grouped by region) and without grouping data("Produc", package = "plm") punbalancedness(Produc, index = c("state", "year", "region")) punbalancedness(Produc, index = c("state", "year"))
purtest
implements several testing procedures that have been proposed
to test unit root hypotheses with panel data.
purtest( object, data = NULL, index = NULL, test = c("levinlin", "ips", "madwu", "Pm", "invnormal", "logit", "hadri"), exo = c("none", "intercept", "trend"), lags = c("SIC", "AIC", "Hall"), pmax = 10, Hcons = TRUE, q = NULL, dfcor = FALSE, fixedT = TRUE, ips.stat = NULL, ... ) ## S3 method for class 'purtest' print(x, ...) ## S3 method for class 'purtest' summary(object, ...) ## S3 method for class 'summary.purtest' print(x, ...)
purtest( object, data = NULL, index = NULL, test = c("levinlin", "ips", "madwu", "Pm", "invnormal", "logit", "hadri"), exo = c("none", "intercept", "trend"), lags = c("SIC", "AIC", "Hall"), pmax = 10, Hcons = TRUE, q = NULL, dfcor = FALSE, fixedT = TRUE, ips.stat = NULL, ... ) ## S3 method for class 'purtest' print(x, ...) ## S3 method for class 'purtest' summary(object, ...) ## S3 method for class 'summary.purtest' print(x, ...)
object , x
|
Either a |
data |
a |
index |
the indexes, |
test |
the test to be computed: one of |
exo |
the exogenous variables to introduce in the augmented
Dickey–Fuller (ADF) regressions, one of: no exogenous
variables ( |
lags |
the number of lags to be used for the augmented
Dickey-Fuller regressions: either a single value integer (the number of
lags for all time series), a vector of integers (one for each
time series), or a character string for an automatic
computation of the number of lags, based on the AIC
( |
pmax |
maximum number of lags (irrelevant for |
Hcons |
logical, only relevant for |
q |
the bandwidth for the estimation of the long-run variance
(only relevant for |
dfcor |
logical, indicating whether the standard deviation of the regressions is to be computed using a degrees-of-freedom correction, |
fixedT |
logical, indicating whether the individual ADF
regressions are to be computed using the same number of
observations (irrelevant for |
ips.stat |
|
... |
further arguments (can set argument |
All these tests except "hadri"
are based on the estimation of
augmented Dickey-Fuller (ADF) regressions for each time series. A
statistic is then computed using the t-statistics associated with
the lagged variable. The Hadri residual-based LM statistic is the
cross-sectional average of the individual KPSS statistics
Kwiatkowski et al. (1992), standardized by their
asymptotic mean and standard deviation.
Several Fisher-type tests that combine p-values from tests based on ADF regressions per individual are available:
"madwu"
is the inverse chi-squared test
Maddala and Wu (1999), also called P test by
Choi (2001).
"Pm"
is the modified P test proposed by
Choi (2001) for large N,
"invnormal"
is the inverse normal test by Choi (2001), and
"logit"
is the logit test by Choi (2001).
The individual p-values for the Fisher-type tests are approximated as described in MacKinnon (1996) if the package urca (Pfaff (2008)) is available, otherwise as described in MacKinnon (1994).
For the test statistic tbar of the test of Im/Pesaran/Shin (2003)
(ips.stat = "tbar"
), no p-value is given but 1%, 5%, and 10% critical
values are interpolated from paper's tabulated values via inverse distance
weighting (printed and contained in the returned value's element
statistic$ips.tbar.crit
).
Hadri's test, the test of Levin/Lin/Chu, and the tbar statistic of
Im/Pesaran/Shin are not applicable to unbalanced panels; the tbar statistic
is not applicable when lags > 0
is given.
The exogenous instruments of the tests (where applicable) can be specified in several ways, depending on how the data is handed over to the function:
For the formula
/data
interface (if data
is a data.frame
,
an additional index
argument should be specified); the formula
should be of the form: y ~ 0
, y ~ 1
, or y ~ trend
for a test
with no exogenous variables, with an intercept, or with individual
intercepts and time trend, respectively. The exo
argument is
ignored in this case.
For the data.frame
, matrix
, and pseries
interfaces: in
these cases, the exogenous variables are specified using the exo
argument.
With the associated summary
and print
methods, additional
information can be extracted/displayed (see also Value).
For purtest: An object of class "purtest"
: a list with the elements
named:
"statistic"
(a "htest"
object),
"call"
,
"args"
,
"idres"
(containing results from the individual regressions),
"adjval"
(containing the simulated means and variances needed to compute
the statistic, for test = "levinlin"
and "ips"
, otherwise NULL
),
"sigma2"
(short-run and long-run variance for test = "levinlin"
,
otherwise NULL
).
Yves Croissant and for "Pm", "invnormal", and "logit" Kevin Tappe
Choi I (2001).
“Unit root tests for panel data.”
Journal of International Money and Finance, 20(2), 249 - 272.
ISSN 0261-5606, https://doi.org/10.1016/S0261-5606(00)00048-6.
Hadri K (2000).
“Testing for stationarity in heterogeneous panel data.”
The Econometrics Journal, 3(2), 148–161.
ISSN 13684221, 1368423X.
Hall A (1994).
“Testing for a unit root in time series with pretest data-based model selection.”
Journal of Business & Economic Statistics, 12(4), 461–470.
Im KS, Pesaran MH, Shin Y (2003).
“Testing for unit roots in heterogenous panels.”
Journal of Econometrics, 115(1), 53-74.
Kwiatkowski D, Phillips PC, Schmidt P, Shin Y (1992).
“Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root?”
Journal of Econometrics, 54(1), 159 - 178.
ISSN 0304-4076, https://doi.org/10.1016/0304-4076(92)90104-Y.
Levin A, Lin CF, Chu CSJ (2002).
“Unit root tests in panel data : asymptotic and finite-sample properties.”
Journal of Econometrics, 108, 1-24.
MacKinnon JG (1994).
“Approximate Asymptotic Distribution Functions for Unit-Root and Cointegration Tests.”
Journal of Business & Economic Statistics, 12(2), 167–176.
ISSN 07350015.
MacKinnon JG (1996).
“Numerical Distribution Functions for Unit Root and Cointegration Tests.”
Journal of Applied Econometrics, 11(6), 601–618.
ISSN 08837252.
Maddala GS, Wu S (1999).
“A comparative study of unit root tests with panel data and a new simple test.”
Oxford Bulletin of Economics and Statistics, 61, 631-52.
Pfaff B (2008).
Analysis of Integrated and Cointegrated Time Series with R, Second edition.
Springer, New York.
ISBN 0-387-27960-1, https://CRAN.r-project.org/package=urca.
data("Grunfeld", package = "plm") y <- data.frame(split(Grunfeld$inv, Grunfeld$firm)) # individuals in columns purtest(y, pmax = 4, exo = "intercept", test = "madwu") ## same via pseries interface pGrunfeld <- pdata.frame(Grunfeld, index = c("firm", "year")) purtest(pGrunfeld$inv, pmax = 4, exo = "intercept", test = "madwu") ## same via formula interface purtest(inv ~ 1, data = Grunfeld, index = c("firm", "year"), pmax = 4, test = "madwu")
data("Grunfeld", package = "plm") y <- data.frame(split(Grunfeld$inv, Grunfeld$firm)) # individuals in columns purtest(y, pmax = 4, exo = "intercept", test = "madwu") ## same via pseries interface pGrunfeld <- pdata.frame(Grunfeld, index = c("firm", "year")) purtest(pGrunfeld$inv, pmax = 4, exo = "intercept", test = "madwu") ## same via formula interface purtest(inv ~ 1, data = Grunfeld, index = c("firm", "year"), pmax = 4, test = "madwu")
This function checks for each variable of a panel if it varies cross-sectionally and over time.
pvar(x, ...) ## S3 method for class 'matrix' pvar(x, index = NULL, ...) ## S3 method for class 'data.frame' pvar(x, index = NULL, ...) ## S3 method for class 'pdata.frame' pvar(x, ...) ## S3 method for class 'pseries' pvar(x, ...) ## S3 method for class 'pvar' print(x, ...)
pvar(x, ...) ## S3 method for class 'matrix' pvar(x, index = NULL, ...) ## S3 method for class 'data.frame' pvar(x, index = NULL, ...) ## S3 method for class 'pdata.frame' pvar(x, ...) ## S3 method for class 'pseries' pvar(x, ...) ## S3 method for class 'pvar' print(x, ...)
x |
a |
... |
further arguments. |
index |
see |
For (p)data.frame and matrix interface: All-NA
columns are removed
prior to calculation of variation due to coercing to pdata.frame
first.
An object of class pvar
containing the following
elements:
id.variation |
a logical vector with |
time.variation |
a logical vector with |
id.variation_anyNA |
a logical vector with |
time.variation_anyNA |
a logical vector with |
pvar
can be time consuming for “big” panels. As a fast alternative
collapse::varying()
from package collapse could be used.
Yves Croissant
pdim()
to check the dimensions of a 'pdata.frame' (and
other objects),
# Gasoline contains two variables which are individual and time # indexes and are the first two variables data("Gasoline", package = "plm") pvar(Gasoline) # Hedonic is an unbalanced panel, townid is the individual index; # the drop.index argument is passed to pdata.frame data("Hedonic", package = "plm") pvar(Hedonic, "townid", drop.index = TRUE) # same using pdata.frame Hed <- pdata.frame(Hedonic, "townid", drop.index = TRUE) pvar(Hed) # Gasoline with pvar's matrix interface Gasoline_mat <- as.matrix(Gasoline) pvar(Gasoline_mat) pvar(Gasoline_mat, index=c("country", "year"))
# Gasoline contains two variables which are individual and time # indexes and are the first two variables data("Gasoline", package = "plm") pvar(Gasoline) # Hedonic is an unbalanced panel, townid is the individual index; # the drop.index argument is passed to pdata.frame data("Hedonic", package = "plm") pvar(Hedonic, "townid", drop.index = TRUE) # same using pdata.frame Hed <- pdata.frame(Hedonic, "townid", drop.index = TRUE) pvar(Hed) # Gasoline with pvar's matrix interface Gasoline_mat <- as.matrix(Gasoline) pvar(Gasoline_mat) pvar(Gasoline_mat, index=c("country", "year"))
Estimators for random and fixed effects models with variable coefficients.
pvcm( formula, data, subset, na.action, effect = c("individual", "time"), model = c("within", "random"), index = NULL, ... ) ## S3 method for class 'pvcm' summary(object, ...) ## S3 method for class 'summary.pvcm' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... )
pvcm( formula, data, subset, na.action, effect = c("individual", "time"), model = c("within", "random"), index = NULL, ... ) ## S3 method for class 'pvcm' summary(object, ...) ## S3 method for class 'summary.pvcm' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... )
formula |
a symbolic description for the model to be estimated, |
data |
a |
subset |
see |
na.action |
see |
effect |
the effects introduced in the model: one of
|
model |
one of |
index |
the indexes, see |
... |
further arguments. |
object , x
|
an object of class |
digits |
digits, |
width |
the maximum length of the lines in the print output, |
pvcm
estimates variable coefficients models. Individual or time
effects are introduced, respectively, if effect = "individual"
(default) or effect = "time"
.
Coefficients are assumed to be fixed if model = "within"
, i.e., separate
pooled OLS models are estimated per individual (effect = "individual"
)
or per time period (effect = "time"
). Coefficients are assumed to be
random if model = "random"
and the model by
Swamy (1970) is estimated; it is a generalized least
squares model which uses the results of the OLS models estimated per
individual/time dimension (coefficient estimates are weighted averages of the
single OLS estimates with weights inversely proportional to the
variance-covariance matrices). The corresponding unbiased single coefficients,
variance-covariance matrices, and standard errors of the random coefficients
model are available in the returned object (see Value).
A test for parameter stability (homogeneous coefficients) of the random coefficients model is printed in the model's summary and is available in the returned object (see Value).
pvcm
objects have print
, summary
and print.summary
methods.
An object of class c("pvcm", "panelmodel")
, which has the
following elements:
coefficients |
the vector (numeric) of coefficients (or data frame for fixed effects), |
residuals |
the vector (numeric) of residuals, |
fitted.values |
the vector of fitted values, |
vcov |
the covariance matrix of the coefficients (a list for
fixed effects model ( |
df.residual |
degrees of freedom of the residuals, |
model |
a data frame containing the variables used for the estimation, |
call |
the call, |
args |
the arguments of the call, |
random coefficients model only (model = "random"
):
Delta |
the estimation of the covariance matrix of the coefficients, |
single.coefs |
matrix of unbiased coefficients of single estimations, |
single.vcov |
list of variance-covariance matrices for |
single.std.error |
matrix of standard errors of |
chisq.test |
htest object: parameter stability test (homogeneous coefficients), |
separate OLS estimations only (model = "within"
):
std.error |
a data frame containing standard errors for all coefficients for each single regression. |
Yves Croissant, Kevin Tappe
Swamy PAVB (1970). “Efficient Inference in a Random Coefficient Regression Model.” Econometrica, 38, 311–323.
Swamy PAVB (1971). Statistical Inference in Random Coefficient Regression Models. Springer.
Greene WH (2018). Econometric Analysis, 8th edition. Prentice Hall.
Poi BP (2003). “From the help desk: Swamy’s random-coefficients model.” Stata Journal, 3(3), 302–308.
Kleiber C, Zeileis A (2010). “The Grunfeld Data at 50.” German Economic Review, 11, 404-417. https://doi.org/10.1111/j.1468-0475.2010.00513.x.
data("Produc", package = "plm") zw <- pvcm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "within") zr <- pvcm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "random") ## replicate Greene (2018), p. 452, table 11.22/(2012), p. 419, table 11.14 summary(pvcm(log(gsp) ~ log(pc) + log(hwy) + log(water) + log(util) + log(emp) + unemp, data = Produc, model = "random")) ## replicate Poi (2003) (need data adjustment, remaining tiny diffs are due ## Poi's data set having more digits, not justified by the original Grunfeld data) data(Grunfeld) # need firm = 1, 4, 3, 8, 2 Gr.Poi.2003 <- Grunfeld[c(1:20, 61:80, 41:60, 141:160, 21:40), ] Gr.Poi.2003$firm <- rep(1:5, each = 20) Gr.Poi.2003[c(86, 98), "inv"] <- c(261.6, 645.2) Gr.Poi.2003[c(92), "capital"] <- c(232.6) mod.poi <- pvcm(inv ~ value + capital, data = Gr.Poi.2003, model = "random") summary(mod.poi) print(mod.poi$single.coefs) print(mod.poi$single.std.err) ## Not run: # replicate Swamy (1971), p. 166, table 5.2 data(Grunfeld, package = "AER") # 11 firm Grunfeld data needed from package AER gw <- pvcm(invest ~ value + capital, data = Grunfeld, index = c("firm", "year")) # close replication of Swamy (1970), (7.4) [remaining diffs likely due to less # precise numerical methods in the 1970, as supposed in Kleiber/Zeileis (2010), p. 9] gr <- pvcm(invest ~ value + capital, data = Grunfeld, index = c("firm", "year"), model = "random") ## End(Not run)
data("Produc", package = "plm") zw <- pvcm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "within") zr <- pvcm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "random") ## replicate Greene (2018), p. 452, table 11.22/(2012), p. 419, table 11.14 summary(pvcm(log(gsp) ~ log(pc) + log(hwy) + log(water) + log(util) + log(emp) + unemp, data = Produc, model = "random")) ## replicate Poi (2003) (need data adjustment, remaining tiny diffs are due ## Poi's data set having more digits, not justified by the original Grunfeld data) data(Grunfeld) # need firm = 1, 4, 3, 8, 2 Gr.Poi.2003 <- Grunfeld[c(1:20, 61:80, 41:60, 141:160, 21:40), ] Gr.Poi.2003$firm <- rep(1:5, each = 20) Gr.Poi.2003[c(86, 98), "inv"] <- c(261.6, 645.2) Gr.Poi.2003[c(92), "capital"] <- c(232.6) mod.poi <- pvcm(inv ~ value + capital, data = Gr.Poi.2003, model = "random") summary(mod.poi) print(mod.poi$single.coefs) print(mod.poi$single.std.err) ## Not run: # replicate Swamy (1971), p. 166, table 5.2 data(Grunfeld, package = "AER") # 11 firm Grunfeld data needed from package AER gw <- pvcm(invest ~ value + capital, data = Grunfeld, index = c("firm", "year")) # close replication of Swamy (1970), (7.4) [remaining diffs likely due to less # precise numerical methods in the 1970, as supposed in Kleiber/Zeileis (2010), p. 9] gr <- pvcm(invest ~ value + capital, data = Grunfeld, index = c("firm", "year"), model = "random") ## End(Not run)
Wald-style Chi-square test and F test of slope coefficients being zero jointly, including robust versions of the tests.
pwaldtest(x, ...) ## S3 method for class 'plm' pwaldtest( x, test = c("Chisq", "F"), vcov = NULL, df2adj = (test == "F" && !is.null(vcov) && missing(.df2)), .df1, .df2, ... ) ## S3 method for class 'pvcm' pwaldtest(x, ...) ## S3 method for class 'pgmm' pwaldtest(x, param = c("coef", "time", "all"), vcov = NULL, ...)
pwaldtest(x, ...) ## S3 method for class 'plm' pwaldtest( x, test = c("Chisq", "F"), vcov = NULL, df2adj = (test == "F" && !is.null(vcov) && missing(.df2)), .df1, .df2, ... ) ## S3 method for class 'pvcm' pwaldtest(x, ...) ## S3 method for class 'pgmm' pwaldtest(x, param = c("coef", "time", "all"), vcov = NULL, ...)
x |
an estimated model of which the coefficients should be
tested (usually of class |
... |
further arguments (currently none). |
test |
a character, indicating the test to be performed, may
be either |
vcov |
|
df2adj |
logical, only relevant for |
.df1 |
a numeric, used if one wants to overwrite the first degrees of freedom parameter in the performed test (usually not used), |
.df2 |
a numeric, used if one wants to overwrite the second degrees of freedom parameter for the F test (usually not used), |
param |
(for pgmm method only): select the parameters to be tested:
|
pwaldtest
can be used stand–alone with a plm object, a pvcm object,
and a pgmm object (for pvcm objects only the 'random' type is valid and no
further arguments are processed; for pgmm objects only arguments param
and vcov
are valid). It is also used in
summary.plm()
to produce the F statistic and the Chi-square
statistic for the joint test of coefficients and in summary.pgmm()
.
pwaldtest
performs the test if the slope coefficients of a panel
regression are jointly zero. It does not perform general purpose
Wald-style tests (for those, see lmtest::waldtest()
(from package
lmtest) or car::linearHypothesis()
(from package
car)).
If a user specified variance-covariance matrix/function is given in
argument vcov
, the robust version of the tests are carried out.
In that case, if the F test is requested (test = "F"
) and no
overwriting of the second degrees of freedom parameter is given (by
supplying argument (.df2
)), the adjustment of the second degrees
of freedom parameter is performed by default. The second degrees of
freedom parameter is adjusted to be the number of unique elements
of the cluster variable - 1, e. g., the number of individuals minus 1.
For the degrees of freedom adjustment of the F test in general,
see e. g. Cameron and Miller (2015), section VII;
(Andreß et al. 2013), pp. 126, footnote 4.
The degrees of freedom adjustment requires the vcov object supplied
or created by a supplied function to carry an attribute called
"cluster" with a known clustering described as a character (for now
this could be either "group"
or "time"
). The vcovXX functions
of the package plm provide such an attribute for their
returned variance–covariance matrices. No adjustment is done for
unknown descriptions given in the attribute "cluster" or when the
attribute "cluster" is not present. Robust vcov objects/functions
from package clubSandwich work as inputs to pwaldtest
's
F test because a they are translated internally to match the needs
described above.
An object of class "htest"
, except for pvcm's within model for which
a data.frame with results of the Wald chi-square tests and F tests per
regression is returned.
Yves Croissant (initial implementation) and Kevin Tappe (extensions: vcov argument and F test's df2 adjustment)
Wooldridge JM (2010). Econometric Analysis of Cross–Section and Panel Data, 2nd edition. MIT Press.
Andreß H, Golsch K, Schmidt-Catran A (2013). Applied Panel Data Analysis for Economic and Social Surveys. Springer. doi:10.1007/978-3-642-32914-2.
Cameron AC, Miller DL (2015). “A Practitioner's Guide to Cluster-Robust Inference.” Journal of Human Resources, 50(2), 317-372. https://ideas.repec.org/a/uwp/jhriss/v50y2015i2p317-372.html.
vcovHC()
for an example of the vcovXX functions, a robust
estimation for the variance–covariance matrix; summary.plm()
data("Grunfeld", package = "plm") mod_fe <- plm(inv ~ value + capital, data = Grunfeld, model = "within") mod_re <- plm(inv ~ value + capital, data = Grunfeld, model = "random") pwaldtest(mod_fe, test = "F") pwaldtest(mod_re, test = "Chisq") # with robust vcov (matrix, function) pwaldtest(mod_fe, vcov = vcovHC(mod_fe)) pwaldtest(mod_fe, vcov = function(x) vcovHC(x, type = "HC3")) pwaldtest(mod_fe, vcov = vcovHC(mod_fe), df2adj = FALSE) # w/o df2 adjustment # example without attribute "cluster" in the vcov vcov_mat <- vcovHC(mod_fe) attr(vcov_mat, "cluster") <- NULL # remove attribute pwaldtest(mod_fe, vcov = vcov_mat) # no df2 adjustment performed
data("Grunfeld", package = "plm") mod_fe <- plm(inv ~ value + capital, data = Grunfeld, model = "within") mod_re <- plm(inv ~ value + capital, data = Grunfeld, model = "random") pwaldtest(mod_fe, test = "F") pwaldtest(mod_re, test = "Chisq") # with robust vcov (matrix, function) pwaldtest(mod_fe, vcov = vcovHC(mod_fe)) pwaldtest(mod_fe, vcov = function(x) vcovHC(x, type = "HC3")) pwaldtest(mod_fe, vcov = vcovHC(mod_fe), df2adj = FALSE) # w/o df2 adjustment # example without attribute "cluster" in the vcov vcov_mat <- vcovHC(mod_fe) attr(vcov_mat, "cluster") <- NULL # remove attribute pwaldtest(mod_fe, vcov = vcov_mat) # no df2 adjustment performed
Test of serial correlation for (the idiosyncratic component of) the errors in fixed–effects panel models.
pwartest(x, ...) ## S3 method for class 'formula' pwartest(x, data, ...) ## S3 method for class 'panelmodel' pwartest(x, ...)
pwartest(x, ...) ## S3 method for class 'formula' pwartest(x, data, ...) ## S3 method for class 'panelmodel' pwartest(x, ...)
x |
an object of class |
... |
further arguments to be passed on to |
data |
a |
As Wooldridge (2010), Sec. 10.5.4 observes, under
the null of no serial correlation in the errors, the residuals of a
FE model must be negatively serially correlated, with
for each
. He suggests basing a test for this null hypothesis on a
pooled regression of FE residuals on their first lag:
. Rejecting the restriction
makes us conclude against the original null of no serial
correlation.
pwartest
estimates the within
model and retrieves residuals,
then estimates an AR(1) pooling
model on them. The test statistic
is obtained by applying a F test to the latter model to test the
above restriction on , setting the covariance matrix to
vcovHC
with the option method="arellano"
to control for serial
correlation.
Unlike the pbgtest()
and pdwtest()
, this test does
not rely on large–T asymptotics and has therefore good properties in
“short” panels. Furthermore, it is robust to general heteroskedasticity.
An object of class "htest"
.
Giovanni Millo
Wooldridge JM (2002). Econometric Analysis of Cross–Section and Panel Data. MIT Press.
Wooldridge JM (2010). Econometric Analysis of Cross–Section and Panel Data, 2nd edition. MIT Press.
pwfdtest()
, pdwtest()
, pbgtest()
, pbltest()
,
pbsytest()
.
data("EmplUK", package = "plm") pwartest(log(emp) ~ log(wage) + log(capital), data = EmplUK) # pass argument 'type' to vcovHC used in test pwartest(log(emp) ~ log(wage) + log(capital), data = EmplUK, type = "HC3")
data("EmplUK", package = "plm") pwartest(log(emp) ~ log(wage) + log(capital), data = EmplUK) # pass argument 'type' to vcovHC used in test pwartest(log(emp) ~ log(wage) + log(capital), data = EmplUK, type = "HC3")
First–differencing–based test of serial correlation for (the idiosyncratic component of) the errors in either levels or first–differenced panel models.
pwfdtest(x, ...) ## S3 method for class 'formula' pwfdtest(x, data, ..., h0 = c("fd", "fe")) ## S3 method for class 'panelmodel' pwfdtest(x, ..., h0 = c("fd", "fe"))
pwfdtest(x, ...) ## S3 method for class 'formula' pwfdtest(x, data, ..., h0 = c("fd", "fe")) ## S3 method for class 'panelmodel' pwfdtest(x, ..., h0 = c("fd", "fe"))
x |
an object of class |
... |
further arguments to be passed on to |
data |
a |
h0 |
the null hypothesis: one of |
As Wooldridge (2010), Sec. 10.6.3 observes, if the
idiosyncratic errors in the model in levels are uncorrelated (which
we label hypothesis "fe"
), then the errors of the model in first
differences (FD) must be serially correlated with
for each
. If
on the contrary the levels model's errors are a random walk, then
there must be no serial correlation in the FD errors (hypothesis
"fd"
). Both the fixed effects (FE) and the first–differenced
(FD) estimators remain consistent under either assumption, but the
relative efficiency changes: FE is more efficient under "fe"
, FD
under "fd"
.
Wooldridge (ibid.) suggests basing a test for either hypothesis on
a pooled regression of FD residuals on their first lag:
. Rejecting the restriction
makes us
conclude against the null of no serial correlation in errors of the
levels equation (
"fe"
). The null hypothesis of no serial
correlation in differenced errors ("fd"
) is tested in a similar
way, but based on the zero restriction on (
). Rejecting
"fe"
favours the use of the first–differences
estimator and the contrary, although it is possible that both be
rejected.
pwfdtest
estimates the fd
model (or takes an fd
model as
input for the panelmodel interface) and retrieves its residuals,
then estimates an AR(1) pooling
model on them. The test statistic
is obtained by applying a F test to the latter model to test the
relevant restriction on , setting the covariance matrix
to
vcovHC
with the option method="arellano"
to control for
serial correlation.
Unlike the pbgtest
and pdwtest
, this test does not rely on
large–T asymptotics and has therefore good properties in ”short”
panels. Furthermore, it is robust to general
heteroskedasticity. The "fe"
version can be used to test for
error autocorrelation regardless of whether the maintained
specification has fixed or random effects
(see Drukker 2003).
An object of class "htest"
.
Giovanni Millo
Drukker DM (2003). “Testing for Serial Correlation in Linear Panel–Data Models.” The Stata Journal, 3(2), 168-177.
Wooldridge JM (2002). Econometric Analysis of Cross–Section and Panel Data. MIT Press. Sec. 10.6.3, pp. 282–283.
Wooldridge JM (2010). Econometric Analysis of Cross–Section and Panel Data, 2nd edition. MIT Press. Sec. 10.6.3, pp. 319–320
pdwtest
, pbgtest
, pwartest
,
data("EmplUK" , package = "plm") pwfdtest(log(emp) ~ log(wage) + log(capital), data = EmplUK) pwfdtest(log(emp) ~ log(wage) + log(capital), data = EmplUK, h0 = "fe") # pass argument 'type' to vcovHC used in test pwfdtest(log(emp) ~ log(wage) + log(capital), data = EmplUK, type = "HC3", h0 = "fe") # same with panelmodel interface mod <- plm(log(emp) ~ log(wage) + log(capital), data = EmplUK, model = "fd") pwfdtest(mod) pwfdtest(mod, h0 = "fe") pwfdtest(mod, type = "HC3", h0 = "fe")
data("EmplUK" , package = "plm") pwfdtest(log(emp) ~ log(wage) + log(capital), data = EmplUK) pwfdtest(log(emp) ~ log(wage) + log(capital), data = EmplUK, h0 = "fe") # pass argument 'type' to vcovHC used in test pwfdtest(log(emp) ~ log(wage) + log(capital), data = EmplUK, type = "HC3", h0 = "fe") # same with panelmodel interface mod <- plm(log(emp) ~ log(wage) + log(capital), data = EmplUK, model = "fd") pwfdtest(mod) pwfdtest(mod, h0 = "fe") pwfdtest(mod, type = "HC3", h0 = "fe")
Semi-parametric test for the presence of (individual or time) unobserved effects in panel models.
pwtest(x, ...) ## S3 method for class 'formula' pwtest(x, data, effect = c("individual", "time"), ...) ## S3 method for class 'panelmodel' pwtest(x, effect = c("individual", "time"), ...)
pwtest(x, ...) ## S3 method for class 'formula' pwtest(x, data, effect = c("individual", "time"), ...) ## S3 method for class 'panelmodel' pwtest(x, effect = c("individual", "time"), ...)
x |
an object of class |
... |
further arguments passed to |
data |
a |
effect |
the effect to be tested for, one of |
This semi-parametric test checks the null hypothesis of zero correlation between errors of the same group. Therefore, it has power both against individual effects and, more generally, any kind of serial correlation.
The test relies on large-N asymptotics. It is valid under error heteroskedasticity and departures from normality.
The above is valid if effect="individual"
, which is the most
likely usage. If effect="time"
, symmetrically, the test relies on
large-T asymptotics and has power against time effects and, more
generally, against cross-sectional correlation.
If the panelmodel interface is used, the inputted model must be a pooling model.
An object of class "htest"
.
Giovanni Millo
Wooldridge JM (2002). Econometric Analysis of Cross–Section and Panel Data. MIT Press.
Wooldridge JM (2010). Econometric Analysis of Cross–Section and Panel Data, 2nd edition. MIT Press.
pbltest()
, pbgtest()
,
pdwtest()
, pbsytest()
, pwartest()
,
pwfdtest()
for tests for serial correlation in panel models.
plmtest()
for tests for random effects.
data("Produc", package = "plm") ## formula interface pwtest(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc) pwtest(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, effect = "time") ## panelmodel interface # first, estimate a pooling model, than compute test statistics form <- formula(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp) pool_prodc <- plm(form, data = Produc, model = "pooling") pwtest(pool_prodc) # == effect="individual" pwtest(pool_prodc, effect="time")
data("Produc", package = "plm") ## formula interface pwtest(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc) pwtest(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, effect = "time") ## panelmodel interface # first, estimate a pooling model, than compute test statistics form <- formula(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp) pool_prodc <- plm(form, data = Produc, model = "pooling") pwtest(pool_prodc) # == effect="individual" pwtest(pool_prodc, effect="time")
This function computes R squared or adjusted R squared for plm objects. It allows to define on which transformation of the data the (adjusted) R squared is to be computed and which method for calculation is used.
r.squared(object, model = NULL, type = c("cor", "rss", "ess"), dfcor = FALSE)
r.squared(object, model = NULL, type = c("cor", "rss", "ess"), dfcor = FALSE)
object |
an object of class |
model |
on which transformation of the data the R-squared is to be
computed. If |
type |
indicates method which is used to compute R squared. One of |
dfcor |
if |
A numerical value. The R squared or adjusted R squared of the model estimated on the transformed data, e. g., for the within model the so called "within R squared".
plm()
for estimation of various models;
summary.plm()
which makes use of r.squared
.
data("Grunfeld", package = "plm") p <- plm(inv ~ value + capital, data = Grunfeld, model = "pooling") r.squared(p) r.squared(p, dfcor = TRUE)
data("Grunfeld", package = "plm") p <- plm(inv ~ value + capital, data = Grunfeld, model = "pooling") r.squared(p) r.squared(p, dfcor = TRUE)
Function to calculate the random effects from a plm
object
(random effects model).
## S3 method for class 'plm' ranef(object, effect = NULL, ...)
## S3 method for class 'plm' ranef(object, effect = NULL, ...)
object |
an object of class |
effect |
|
... |
further arguments (currently not used). |
Function ranef
calculates the random effects of a fitted random
effects model. For one-way models, the effects of the estimated
model are extracted (either individual or time effects). For
two-way models, extracting the individual effects is the default
(both, argument effect = NULL
and effect = "individual"
will
give individual effects). Time effects can be extracted by setting
effect = "time"
.
Not all random effect model types are supported (yet?).
A named numeric with the random effects per dimension (individual or time).
Kevin Tappe
fixef()
to extract the fixed effects from a fixed
effects model (within model).
data("Grunfeld", package = "plm") m1 <- plm(inv ~ value + capital, data = Grunfeld, model = "random") ranef(m1) # individual random effects # compare to random effects by ML estimation via lme from package nlme library(nlme) m2 <- lme(inv ~ value + capital, random = ~1|firm, data = Grunfeld) cbind("plm" = ranef(m1), "lme" = unname(ranef(m2))) # two-ways RE model, calculate individual and time random effects data("Cigar", package = "plm") tw <- plm(sales ~ pop + price, data = Cigar, model = "random", effect = "twoways") ranef(tw) # individual random effects ranef(tw, effect = "time") # time random effects
data("Grunfeld", package = "plm") m1 <- plm(inv ~ value + capital, data = Grunfeld, model = "random") ranef(m1) # individual random effects # compare to random effects by ML estimation via lme from package nlme library(nlme) m2 <- lme(inv ~ value + capital, random = ~1|firm, data = Grunfeld) cbind("plm" = ranef(m1), "lme" = unname(ranef(m2))) # two-ways RE model, calculate individual and time random effects data("Cigar", package = "plm") tw <- plm(sales ~ pop + price, data = Cigar, model = "random", effect = "twoways") ranef(tw) # individual random effects ranef(tw, effect = "time") # time random effects
a panel of 171 observations
A dataframe containing :
the farm identifier
the total area cultivated with rice, measured in hectares
land status, on of 'owner'
(non sharecroppers,
owner operators or leaseholders or both), 'share'
(sharecroppers), 'mixed'
(mixed of the two previous status)
one of 'trad'
(traditional varieties),
'high'
(high yielding varieties) and 'mixed'
(mixed
varieties)
bIMAS is an intensification program; one of
'no'
(non-bimas farmer), 'yes'
(bimas farmer) or
'mixed'
(part but not all of farmer's land was registered to
be in the bimas program)
seed in kilogram
urea in kilogram
phosphate in kilogram
pesticide cost in Rupiah
price of seed in Rupiah per kg
price of urea in Rupiah per kg
price of phosphate in Rupiah per kg
hired labor in hours
family labor in hours
total labor (excluding harvest labor)
labor wage in Rupiah per hour
gross output of rice in kg
net output, gross output minus harvesting cost (paid in terms of rice)
price of rough rice in Rupiah per kg
one of 'wargabinangun'
, 'langan'
,
'gunungwangi'
, 'malausma'
, 'sukaambit'
,
'ciwangi'
number of observations : 1026
observation : farms
country : Indonesia
Feng Q, Horrace WC (2012). “Alternative technical efficiency measures: Skew, bias and scale.” Journal of Applied Econometrics, 27(2), 253-268. doi:10.1002/jae.1190.
A test of overidentifying restrictions for models estimated by GMM.
sargan(object, weights = c("twosteps", "onestep"))
sargan(object, weights = c("twosteps", "onestep"))
object |
an object of class |
weights |
the weighting matrix to be used for the computation of the test. |
The Hansen–Sargan test ("J test") calculates the quadratic form of the moment restrictions that is minimized while computing the GMM estimator. It follows asymptotically a chi-square distribution with number of degrees of freedom equal to the difference between the number of moment conditions and the number of coefficients.
An object of class "htest"
.
Yves Croissant
(Hansen 1982)
(Sargan 1958)
data("EmplUK", package = "plm") ar <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "twosteps") sargan(ar)
data("EmplUK", package = "plm") ar <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + lag(log(capital), 0:2) + lag(log(output), 0:2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "twosteps") sargan(ar)
A panel of 738 observations from 1983 to 1990
A data frame containing:
firm index
year
log of employment
log of wages
log of real output
log of intermediate inputs
log of real capital stock
real cash flow
total number of observations: 5904
observation: firms
country: Spain
Journal of Business Economics and Statistics data archive:
https://amstat.tandfonline.com/loi/ubes20/.
Alonso-Borrego C, Arellano M (1999). “Symmetrically Normalized Instrumental-Variable Estimation Using Panel Data.” Journal of Business and Economic Statistics, 17(1), 36-49.
A panel of 125 observations from 1960 to 1985
A data frame containing :
the year
the country name (factor)
OPEC member?
communist regime?
country's population (in thousands)
real GDP per capita (in 1985 US dollars)
saving rate (in percent)
total number of observations : 3250
observation : country
country : World
Online supplements to Hayashi (2000).
http://fhayashi.fc2web.com/datasets.htm
Hayashi F (2000). Econometrics. Princeton University Press.
Summers R, Heston A (1991). “The Penn World Table (Mark 5): An Expanded Set of International Comparisons, 1950–1988.” The Quarterly Journal of Economics, 106, 327-68. doi:10.2307/2937941.
The summary method for plm objects generates some more information about estimated plm models.
## S3 method for class 'plm.list' summary(object, ...) ## S3 method for class 'summary.plm.list' coef(object, eq = NULL, ...) ## S3 method for class 'summary.plm.list' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'plm' summary(object, vcov = NULL, ...) ## S3 method for class 'summary.plm' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), subset = NULL, ... )
## S3 method for class 'plm.list' summary(object, ...) ## S3 method for class 'summary.plm.list' coef(object, eq = NULL, ...) ## S3 method for class 'summary.plm.list' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), ... ) ## S3 method for class 'plm' summary(object, vcov = NULL, ...) ## S3 method for class 'summary.plm' print( x, digits = max(3, getOption("digits") - 2), width = getOption("width"), subset = NULL, ... )
object |
an object of class |
... |
further arguments. |
eq |
the selected equation for list objects |
x |
an object of class |
digits |
number of digits for printed output, |
width |
the maximum length of the lines in the printed output, |
vcov |
a variance–covariance matrix furnished by the user or a function to calculate one (see Examples), |
subset |
a character or numeric vector indicating a subset of
the table of coefficients to be printed for
|
The summary
method for plm objects (summary.plm
) creates an
object of class c("summary.plm", "plm", "panelmodel")
that
extends the plm object it is run on with various information about
the estimated model like (inferential) statistics, see
Value. It has an associated print method
(print.summary.plm
).
An object of class c("summary.plm", "plm", "panelmodel")
. Some of its elements are carried over from the
associated plm object and described there
(plm()
). The following elements are new or changed
relative to the elements of a plm object:
fstatistic |
'htest' object: joint test of significance of
coefficients (F or Chi-square test) (robust statistic in case of
supplied argument |
coefficients |
a matrix with the estimated coefficients,
standard errors, t–values, and p–values, if argument |
vcov |
the "regular" variance–covariance matrix of the coefficients (class "matrix"), |
rvcov |
only present if argument |
r.squared |
a named numeric containing the R-squared ("rsq") and the adjusted R-squared ("adjrsq") of the model, |
df |
an integer vector with 3 components, (p, n-p, p*), where p is the number of estimated (non-aliased) coefficients of the model, n-p are the residual degrees of freedom (n being number of observations), and p* is the total number of coefficients (incl. any aliased ones). |
Yves Croissant
plm()
for estimation of various models; vcovHC()
for
an example of a robust estimation of variance–covariance
matrix; r.squared()
for the function to calculate R-squared;
stats::print.power.htest()
for some information about class
"htest"; fixef()
to compute the fixed effects for "within"
(=fixed effects) models and within_intercept()
for an
"overall intercept" for such models; pwaldtest()
data("Produc", package = "plm") zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c("state","year")) summary(zz) # summary with a furnished vcov, passed as matrix, as function, and # as function with additional argument data("Grunfeld", package = "plm") wi <- plm(inv ~ value + capital, data = Grunfeld, model="within", effect = "individual") summary(wi, vcov = vcovHC(wi)) summary(wi, vcov = vcovHC) summary(wi, vcov = function(x) vcovHC(x, method = "white2")) # extract F statistic wi_summary <- summary(wi) Fstat <- wi_summary[["fstatistic"]] # extract estimates and p-values est <- wi_summary[["coefficients"]][ , "Estimate"] pval <- wi_summary[["coefficients"]][ , "Pr(>|t|)"] # print summary only for coefficient "value" print(wi_summary, subset = "value")
data("Produc", package = "plm") zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, index = c("state","year")) summary(zz) # summary with a furnished vcov, passed as matrix, as function, and # as function with additional argument data("Grunfeld", package = "plm") wi <- plm(inv ~ value + capital, data = Grunfeld, model="within", effect = "individual") summary(wi, vcov = vcovHC(wi)) summary(wi, vcov = vcovHC) summary(wi, vcov = function(x) vcovHC(x, method = "white2")) # extract F statistic wi_summary <- summary(wi) Fstat <- wi_summary[["fstatistic"]] # extract estimates and p-values est <- wi_summary[["coefficients"]][ , "Estimate"] pval <- wi_summary[["coefficients"]][ , "Pr(>|t|)"] # print summary only for coefficient "value" print(wi_summary, subset = "value")
Unconditional Robust covariance matrix estimators a la Beck and Katz for panel models (a.k.a. Panel Corrected Standard Errors (PCSE)).
vcovBK(x, ...) ## S3 method for class 'plm' vcovBK( x, type = c("HC0", "HC1", "HC2", "HC3", "HC4"), cluster = c("group", "time"), diagonal = FALSE, ... )
vcovBK(x, ...) ## S3 method for class 'plm' vcovBK( x, type = c("HC0", "HC1", "HC2", "HC3", "HC4"), cluster = c("group", "time"), diagonal = FALSE, ... )
x |
an object of class |
... |
further arguments. |
type |
the weighting scheme used, one of |
cluster |
one of |
diagonal |
a logical value specifying whether to force non-diagonal elements to zero, |
vcovBK
is a function for estimating a robust covariance matrix of
parameters for a panel model according to the
Beck and Katz (1995) method, a.k.a. Panel
Corrected Standard Errors (PCSE), which uses an unconditional
estimate of the error covariance across time periods (groups)
inside the standard formula for coefficient
covariance. Observations may be clustered either by "group"
to
account for timewise heteroskedasticity and serial correlation or
by "time"
to account for cross-sectional heteroskedasticity and
correlation. It must be borne in mind that the Beck and Katz
formula is based on N- (T-) asymptotics and will not be appropriate
elsewhere.
The diagonal
logical argument can be used, if set to
TRUE
, to force to zero all non-diagonal elements in the
estimated error covariances; this is appropriate if both serial and
cross–sectional correlation are assumed out, and yields a
timewise- (groupwise-) heteroskedasticity–consistent estimator.
Weighting schemes specified by type
are analogous to those in
sandwich::vcovHC()
in package sandwich and are
justified theoretically (although in the context of the standard
linear model) by MacKinnon and White (1985) and
Cribari–Neto (2004) (see Zeileis 2004).
The main use of vcovBK
(and the other variance-covariance estimators
provided in the package vcovHC
, vcovNW
, vcovDC
, vcovSCC
) is to pass
it to plm's own functions like summary
, pwaldtest
, and phtest
or
together with testing functions from the lmtest
and car
packages. All of
these typically allow passing the vcov
or vcov.
parameter either as a
matrix or as a function, e.g., for Wald–type testing: argument vcov.
to
coeftest()
, argument vcov
to waldtest()
and other methods in the
lmtest package; and argument vcov.
to
linearHypothesis()
in the car package (see the
examples), see (see also Zeileis 2004), 4.1-2, and examples below.
An object of class "matrix"
containing the estimate of
the covariance matrix of coefficients.
Giovanni Millo
Beck N, Katz JN (1995). “What to do (and not to do) with time-series cross-section data.” American Political Science Review, 89(03), 634–647.
Cribari–Neto F (2004). “Asymptotic Inference Under Heteroskedasticity of Unknown Form.” Computational Statistics & Data Analysis, 45, 215–233.
Greene WH (2003). Econometric Analysis, 5th edition. Prentice Hall.
MacKinnon JG, White H (1985). “Some Heteroskedasticity–Consistent Covariance Matrix Estimators With Improved Finite Sample Properties.” Journal of Econometrics, 29, 305–325.
Zeileis A (2004). “Econometric Computing With HC and HAC Covariance Matrix Estimators.” Journal of Statistical Software, 11(10), 1–17. https://www.jstatsoft.org/article/view/v011i10.
sandwich::vcovHC()
from the sandwich
package for weighting schemes (type
argument).
data("Produc", package="plm") zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, model="random") summary(zz, vcov = vcovBK) summary(zz, vcov = function(x) vcovBK(x, type="HC1")) ## standard coefficient significance test library(lmtest) coeftest(zz) ## robust significance test, cluster by group ## (robust vs. serial correlation), default arguments coeftest(zz, vcov.=vcovBK) ## idem with parameters, pass vcov as a function argument coeftest(zz, vcov.=function(x) vcovBK(x, type="HC1")) ## idem, cluster by time period ## (robust vs. cross-sectional correlation) coeftest(zz, vcov.=function(x) vcovBK(x, type="HC1", cluster="time")) ## idem with parameters, pass vcov as a matrix argument coeftest(zz, vcov.=vcovBK(zz, type="HC1")) ## joint restriction test waldtest(zz, update(zz, .~.-log(emp)-unemp), vcov=vcovBK) ## Not run: ## test of hyp.: 2*log(pc)=log(emp) library(car) linearHypothesis(zz, "2*log(pc)=log(emp)", vcov.=vcovBK) ## End(Not run)
data("Produc", package="plm") zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, model="random") summary(zz, vcov = vcovBK) summary(zz, vcov = function(x) vcovBK(x, type="HC1")) ## standard coefficient significance test library(lmtest) coeftest(zz) ## robust significance test, cluster by group ## (robust vs. serial correlation), default arguments coeftest(zz, vcov.=vcovBK) ## idem with parameters, pass vcov as a function argument coeftest(zz, vcov.=function(x) vcovBK(x, type="HC1")) ## idem, cluster by time period ## (robust vs. cross-sectional correlation) coeftest(zz, vcov.=function(x) vcovBK(x, type="HC1", cluster="time")) ## idem with parameters, pass vcov as a matrix argument coeftest(zz, vcov.=vcovBK(zz, type="HC1")) ## joint restriction test waldtest(zz, update(zz, .~.-log(emp)-unemp), vcov=vcovBK) ## Not run: ## test of hyp.: 2*log(pc)=log(emp) library(car) linearHypothesis(zz, "2*log(pc)=log(emp)", vcov.=vcovBK) ## End(Not run)
High-level convenience wrapper for double-clustering robust covariance matrix estimators a la Thompson (2011) and Cameron et al. (2011) for panel models.
vcovDC(x, ...) ## S3 method for class 'plm' vcovDC(x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), ...)
vcovDC(x, ...) ## S3 method for class 'plm' vcovDC(x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), ...)
x |
an object of class |
... |
further arguments |
type |
the weighting scheme used, one of |
vcovDC
is a function for estimating a robust covariance matrix of
parameters for a panel model with errors clustering along both dimensions.
The function is a convenience wrapper simply summing a group- and a
time-clustered covariance matrix and subtracting a diagonal one a la
White.
Weighting schemes specified by type
are analogous to those in
sandwich::vcovHC()
in package sandwich and are
justified theoretically (although in the context of the standard
linear model) by MacKinnon and White (1985) and
Cribari–Neto (2004) (see Zeileis 2004).
The main use of vcovDC
(and the other variance-covariance estimators
provided in the package vcovHC
, vcovBK
, vcovNW
, vcovSCC
) is to pass
it to plm's own functions like summary
, pwaldtest
, and phtest
or
together with testing functions from the lmtest
and car
packages. All of
these typically allow passing the vcov
or vcov.
parameter either as a
matrix or as a function, e.g., for Wald–type testing: argument vcov.
to
coeftest()
, argument vcov
to waldtest()
and other methods in the
lmtest package; and argument vcov.
to
linearHypothesis()
in the car package (see the
examples), see (see also Zeileis 2004), 4.1-2, and examples below.
An object of class "matrix"
containing the estimate of
the covariance matrix of coefficients.
Giovanni Millo
Cameron AC, Gelbach JB, Miller DL (2011). “Robust inference with multiway clustering.” Journal of Business & Economic Statistics, 29(2).
Cribari–Neto F (2004). “Asymptotic Inference Under Heteroskedasticity of Unknown Form.” Computational Statistics & Data Analysis, 45, 215–233.
MacKinnon JG, White H (1985). “Some Heteroskedasticity–Consistent Covariance Matrix Estimators With Improved Finite Sample Properties.” Journal of Econometrics, 29, 305–325.
Thompson SB (2011). “Simple formulas for standard errors that cluster by both firm and time.” Journal of Financial Economics, 99(1), 1–10.
Zeileis A (2004). “Econometric Computing With HC and HAC Covariance Matrix Estimators.” Journal of Statistical Software, 11(10), 1–17. https://www.jstatsoft.org/article/view/v011i10.
sandwich::vcovHC()
from the sandwich
package for weighting schemes (type
argument).
data("Produc", package="plm") zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, model="pooling") ## as function input to plm's summary method (with and without additional arguments): summary(zz, vcov = vcovDC) summary(zz, vcov = function(x) vcovDC(x, type="HC1", maxlag=4)) ## standard coefficient significance test library(lmtest) coeftest(zz) ## DC robust significance test, default coeftest(zz, vcov.=vcovDC) ## idem with parameters, pass vcov as a function argument coeftest(zz, vcov.=function(x) vcovDC(x, type="HC1", maxlag=4)) ## joint restriction test waldtest(zz, update(zz, .~.-log(emp)-unemp), vcov=vcovDC) ## Not run: ## test of hyp.: 2*log(pc)=log(emp) library(car) linearHypothesis(zz, "2*log(pc)=log(emp)", vcov.=vcovDC) ## End(Not run)
data("Produc", package="plm") zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, model="pooling") ## as function input to plm's summary method (with and without additional arguments): summary(zz, vcov = vcovDC) summary(zz, vcov = function(x) vcovDC(x, type="HC1", maxlag=4)) ## standard coefficient significance test library(lmtest) coeftest(zz) ## DC robust significance test, default coeftest(zz, vcov.=vcovDC) ## idem with parameters, pass vcov as a function argument coeftest(zz, vcov.=function(x) vcovDC(x, type="HC1", maxlag=4)) ## joint restriction test waldtest(zz, update(zz, .~.-log(emp)-unemp), vcov=vcovDC) ## Not run: ## test of hyp.: 2*log(pc)=log(emp) library(car) linearHypothesis(zz, "2*log(pc)=log(emp)", vcov.=vcovDC) ## End(Not run)
Generic Lego building block for robust covariance matrix estimators of the vcovXX kind for panel models.
vcovG(x, ...) ## S3 method for class 'plm' vcovG( x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), cluster = c("group", "time"), l = 0, inner = c("cluster", "white", "diagavg"), ... ) ## S3 method for class 'pcce' vcovG( x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), cluster = c("group", "time"), l = 0, inner = c("cluster", "white", "diagavg"), ... )
vcovG(x, ...) ## S3 method for class 'plm' vcovG( x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), cluster = c("group", "time"), l = 0, inner = c("cluster", "white", "diagavg"), ... ) ## S3 method for class 'pcce' vcovG( x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), cluster = c("group", "time"), l = 0, inner = c("cluster", "white", "diagavg"), ... )
x |
an object of class |
... |
further arguments |
type |
the weighting scheme used, one of |
cluster |
one of |
l |
lagging order, defaulting to zero |
inner |
the function to be applied to the residuals inside the
sandwich: one of |
vcovG
is the generic building block for use by higher–level
wrappers vcovHC()
, vcovSCC()
, vcovDC()
, and vcovNW()
. The
main use of vcovG
is to be used internally by the former, but it
is made available in the user space for use in non–standard
combinations. For more documentation, see see wrapper functions
mentioned.
An object of class "matrix"
containing the estimate
of the covariance matrix of coefficients.
Giovanni Millo
Millo G (2017). “Robust standard error estimators for panel models: A unifying approach.” Journal of Statistical Software, 82(3), 1–27.
vcovHC()
, vcovSCC()
,
vcovDC()
, vcovNW()
, and
vcovBK()
albeit the latter does not make use of
vcovG.
data("Produc", package="plm") zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, model="pooling") ## reproduce Arellano's covariance matrix vcovG(zz, cluster="group", inner="cluster", l=0) ## define custom covariance function ## (in this example, same as vcovHC) myvcov <- function(x) vcovG(x, cluster="group", inner="cluster", l=0) summary(zz, vcov = myvcov) ## use in coefficient significance test library(lmtest) ## robust significance test coeftest(zz, vcov. = myvcov)
data("Produc", package="plm") zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, model="pooling") ## reproduce Arellano's covariance matrix vcovG(zz, cluster="group", inner="cluster", l=0) ## define custom covariance function ## (in this example, same as vcovHC) myvcov <- function(x) vcovG(x, cluster="group", inner="cluster", l=0) summary(zz, vcov = myvcov) ## use in coefficient significance test library(lmtest) ## robust significance test coeftest(zz, vcov. = myvcov)
Robust covariance matrix estimators a la White for panel models.
## S3 method for class 'plm' vcovHC( x, method = c("arellano", "white1", "white2"), type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), cluster = c("group", "time"), ... ) ## S3 method for class 'pcce' vcovHC( x, method = c("arellano", "white1", "white2"), type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), cluster = c("group", "time"), ... ) ## S3 method for class 'pgmm' vcovHC(x, ...)
## S3 method for class 'plm' vcovHC( x, method = c("arellano", "white1", "white2"), type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), cluster = c("group", "time"), ... ) ## S3 method for class 'pcce' vcovHC( x, method = c("arellano", "white1", "white2"), type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), cluster = c("group", "time"), ... ) ## S3 method for class 'pgmm' vcovHC(x, ...)
x |
an object of class |
method |
one of |
type |
the weighting scheme used, one of |
cluster |
one of |
... |
further arguments. |
vcovHC
is a function for estimating a robust covariance matrix of
parameters for a fixed effects or random effects panel model
according to the White method
(White 1980, 1984; Arellano 1987). Observations may be
clustered by "group"
("time"
) to account for serial
(cross-sectional) correlation.
All types assume no intragroup (serial) correlation between errors
and allow for heteroskedasticity across groups (time periods). As
for the error covariance matrix of every single group of
observations, "white1"
allows for general heteroskedasticity but
no serial (cross–sectional) correlation; "white2"
is "white1"
restricted to a common variance inside every group (time period)
(see Greene 2003, Sec. 13.7.1-2, Greene 2012, Sec. 11.6.1-2
and Wooldridge 2002, Sec. 10.7.2); "arellano"
(see
ibid. and the original ref. Arellano 1987) allows a fully general
structure w.r.t. heteroskedasticity and serial (cross–sectional)
correlation.
Weighting schemes specified by type
are analogous to those in
sandwich::vcovHC()
in package sandwich and are
justified theoretically (although in the context of the standard
linear model) by MacKinnon and White (1985) and
Cribari–Neto (2004)
(Zeileis 2004). type = "sss"
employs the small sample
correction as used by Stata.
The main use of vcovHC
(and the other variance-covariance estimators
provided in the package vcovBK
, vcovNW
, vcovDC
, vcovSCC
) is to pass
it to plm's own functions like summary
, pwaldtest
, and phtest
or
together with testing functions from the lmtest
and car
packages. All of
these typically allow passing the vcov
or vcov.
parameter either as a
matrix or as a function, e.g., for Wald–type testing: argument vcov.
to
coeftest()
, argument vcov
to waldtest()
and other methods in the
lmtest package; and argument vcov.
to
linearHypothesis()
in the car package (see the
examples), see (see also Zeileis 2004), 4.1-2, and examples below.
A method for pgmm
objects, vcovHC.pgmm
, is also provided and gives the robust
variance-covariances matrix, in case of a two-steps panel GMM model with the
small-sample correction proposed by Windmeijer (2005).
An object of class "matrix"
containing the estimate of
the asymptotic covariance matrix of coefficients.
The function pvcovHC
is deprecated. Use vcovHC
for the
same functionality.
Giovanni Millo & Yves Croissant
Arellano M (1987). “Computing Robust Standard Errors for Within-groups Estimators.” Oxford bulletin of Economics and Statistics, 49(4), 431–434.
Cribari–Neto F (2004). “Asymptotic Inference Under Heteroskedasticity of Unknown Form.” Computational Statistics & Data Analysis, 45, 215–233.
Greene WH (2003). Econometric Analysis, 5th edition. Prentice Hall.
Greene WH (2012). Econometric Analysis, 7th edition. Prentice Hall.
MacKinnon JG, White H (1985). “Some Heteroskedasticity–Consistent Covariance Matrix Estimators With Improved Finite Sample Properties.” Journal of Econometrics, 29, 305–325.
Windmeijer F (2005). “A Finite Sample Correction for the Variance of Linear Efficient Two–Steps GMM Estimators.” Journal of Econometrics, 126, 25–51.
White H (1984). Asymptotic Theory for Econometricians. New York: Academic press. chap. 6
White H (1980). “A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity.” Econometrica, 48(4), 817–838.
Wooldridge JM (2002). Econometric Analysis of Cross–Section and Panel Data. MIT Press.
Zeileis A (2004). “Econometric Computing With HC and HAC Covariance Matrix Estimators.” Journal of Statistical Software, 11(10), 1–17. https://www.jstatsoft.org/article/view/v011i10.
sandwich::vcovHC()
from the sandwich
package for weighting schemes (type
argument).
data("Produc", package = "plm") zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "random") ## as function input to plm's summary method (with and without additional arguments): summary(zz, vcov = vcovHC) summary(zz, vcov = function(x) vcovHC(x, method="arellano", type="HC1")) ## standard coefficient significance test library(lmtest) coeftest(zz) ## robust significance test, cluster by group ## (robust vs. serial correlation) coeftest(zz, vcov.=vcovHC) ## idem with parameters, pass vcov as a function argument coeftest(zz, vcov.=function(x) vcovHC(x, method="arellano", type="HC1")) ## idem, cluster by time period ## (robust vs. cross-sectional correlation) coeftest(zz, vcov.=function(x) vcovHC(x, method="arellano", type="HC1", cluster="group")) ## idem with parameters, pass vcov as a matrix argument coeftest(zz, vcov.=vcovHC(zz, method="arellano", type="HC1")) ## joint restriction test waldtest(zz, update(zz, .~.-log(emp)-unemp), vcov=vcovHC) ## Not run: ## test of hyp.: 2*log(pc)=log(emp) library(car) linearHypothesis(zz, "2*log(pc)=log(emp)", vcov.=vcovHC) ## End(Not run) ## Robust inference for CCE models data("Produc", package = "plm") ccepmod <- pcce(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model="p") summary(ccepmod, vcov = vcovHC) ## Robust inference for GMM models data("EmplUK", package="plm") ar <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + log(capital) + lag(log(capital), 2) + log(output) + lag(log(output),2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "twosteps") rv <- vcovHC(ar) mtest(ar, order = 2, vcov = rv)
data("Produc", package = "plm") zz <- plm(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model = "random") ## as function input to plm's summary method (with and without additional arguments): summary(zz, vcov = vcovHC) summary(zz, vcov = function(x) vcovHC(x, method="arellano", type="HC1")) ## standard coefficient significance test library(lmtest) coeftest(zz) ## robust significance test, cluster by group ## (robust vs. serial correlation) coeftest(zz, vcov.=vcovHC) ## idem with parameters, pass vcov as a function argument coeftest(zz, vcov.=function(x) vcovHC(x, method="arellano", type="HC1")) ## idem, cluster by time period ## (robust vs. cross-sectional correlation) coeftest(zz, vcov.=function(x) vcovHC(x, method="arellano", type="HC1", cluster="group")) ## idem with parameters, pass vcov as a matrix argument coeftest(zz, vcov.=vcovHC(zz, method="arellano", type="HC1")) ## joint restriction test waldtest(zz, update(zz, .~.-log(emp)-unemp), vcov=vcovHC) ## Not run: ## test of hyp.: 2*log(pc)=log(emp) library(car) linearHypothesis(zz, "2*log(pc)=log(emp)", vcov.=vcovHC) ## End(Not run) ## Robust inference for CCE models data("Produc", package = "plm") ccepmod <- pcce(log(gsp) ~ log(pcap) + log(pc) + log(emp) + unemp, data = Produc, model="p") summary(ccepmod, vcov = vcovHC) ## Robust inference for GMM models data("EmplUK", package="plm") ar <- pgmm(log(emp) ~ lag(log(emp), 1:2) + lag(log(wage), 0:1) + log(capital) + lag(log(capital), 2) + log(output) + lag(log(output),2) | lag(log(emp), 2:99), data = EmplUK, effect = "twoways", model = "twosteps") rv <- vcovHC(ar) mtest(ar, order = 2, vcov = rv)
Nonparametric robust covariance matrix estimators a la Newey and West for panel models with serial correlation.
vcovNW(x, ...) ## S3 method for class 'plm' vcovNW( x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), maxlag = NULL, wj = function(j, maxlag) 1 - j/(maxlag + 1), ... ) ## S3 method for class 'pcce' vcovNW( x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), maxlag = NULL, wj = function(j, maxlag) 1 - j/(maxlag + 1), ... )
vcovNW(x, ...) ## S3 method for class 'plm' vcovNW( x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), maxlag = NULL, wj = function(j, maxlag) 1 - j/(maxlag + 1), ... ) ## S3 method for class 'pcce' vcovNW( x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), maxlag = NULL, wj = function(j, maxlag) 1 - j/(maxlag + 1), ... )
x |
an object of class |
... |
further arguments |
type |
the weighting scheme used, one of |
maxlag |
either |
wj |
weighting function to be applied to lagged terms, |
vcovNW
is a function for estimating a robust covariance matrix of
parameters for a panel model according to the
Newey and West (1987) method. The function works
as a restriction of the Driscoll and Kraay (1998) covariance (see
vcovSCC()
) to no cross–sectional correlation.
Weighting schemes specified by type
are analogous to those in
sandwich::vcovHC()
in package sandwich and are
justified theoretically (although in the context of the standard
linear model) by MacKinnon and White (1985) and
Cribari–Neto (2004) (see Zeileis 2004).
The main use of vcovNW
(and the other variance-covariance estimators
provided in the package vcovHC
, vcovBK
, vcovDC
, vcovSCC
) is to pass
it to plm's own functions like summary
, pwaldtest
, and phtest
or
together with testing functions from the lmtest
and car
packages. All of
these typically allow passing the vcov
or vcov.
parameter either as a
matrix or as a function, e.g., for Wald–type testing: argument vcov.
to
coeftest()
, argument vcov
to waldtest()
and other methods in the
lmtest package; and argument vcov.
to
linearHypothesis()
in the car package (see the
examples), see (see also Zeileis 2004), 4.1-2, and examples below.
An object of class "matrix"
containing the estimate of
the covariance matrix of coefficients.
Giovanni Millo
Cribari–Neto F (2004). “Asymptotic Inference Under Heteroskedasticity of Unknown Form.” Computational Statistics & Data Analysis, 45, 215–233.
Driscoll JC, Kraay AC (1998). “Consistent covariance matrix estimation with spatially dependent panel data.” Review of economics and statistics, 80(4), 549–560.
MacKinnon JG, White H (1985). “Some Heteroskedasticity–Consistent Covariance Matrix Estimators With Improved Finite Sample Properties.” Journal of Econometrics, 29, 305–325.
Newey WK, West KD (1987). “A Simple, Positive Semi-definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix.” Econometrica, 55(3), 703–08.
Zeileis A (2004). “Econometric Computing With HC and HAC Covariance Matrix Estimators.” Journal of Statistical Software, 11(10), 1–17. https://www.jstatsoft.org/article/view/v011i10.
sandwich::vcovHC()
from the sandwich package
for weighting schemes (type
argument).
data("Produc", package="plm") zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, model="pooling") ## as function input to plm's summary method (with and without additional arguments): summary(zz, vcov = vcovNW) summary(zz, vcov = function(x) vcovNW(x, method="arellano", type="HC1")) ## standard coefficient significance test library(lmtest) coeftest(zz) ## NW robust significance test, default coeftest(zz, vcov.=vcovNW) ## idem with parameters, pass vcov as a function argument coeftest(zz, vcov.=function(x) vcovNW(x, type="HC1", maxlag=4)) ## joint restriction test waldtest(zz, update(zz, .~.-log(emp)-unemp), vcov=vcovNW) ## Not run: ## test of hyp.: 2*log(pc)=log(emp) library(car) linearHypothesis(zz, "2*log(pc)=log(emp)", vcov.=vcovNW) ## End(Not run)
data("Produc", package="plm") zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, model="pooling") ## as function input to plm's summary method (with and without additional arguments): summary(zz, vcov = vcovNW) summary(zz, vcov = function(x) vcovNW(x, method="arellano", type="HC1")) ## standard coefficient significance test library(lmtest) coeftest(zz) ## NW robust significance test, default coeftest(zz, vcov.=vcovNW) ## idem with parameters, pass vcov as a function argument coeftest(zz, vcov.=function(x) vcovNW(x, type="HC1", maxlag=4)) ## joint restriction test waldtest(zz, update(zz, .~.-log(emp)-unemp), vcov=vcovNW) ## Not run: ## test of hyp.: 2*log(pc)=log(emp) library(car) linearHypothesis(zz, "2*log(pc)=log(emp)", vcov.=vcovNW) ## End(Not run)
Nonparametric robust covariance matrix estimators a la Driscoll and Kraay for panel models with cross-sectional and serial correlation.
vcovSCC(x, ...) ## S3 method for class 'plm' vcovSCC( x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), cluster = "time", maxlag = NULL, inner = c("cluster", "white", "diagavg"), wj = function(j, maxlag) 1 - j/(maxlag + 1), ... ) ## S3 method for class 'pcce' vcovSCC( x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), cluster = "time", maxlag = NULL, inner = c("cluster", "white", "diagavg"), wj = function(j, maxlag) 1 - j/(maxlag + 1), ... )
vcovSCC(x, ...) ## S3 method for class 'plm' vcovSCC( x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), cluster = "time", maxlag = NULL, inner = c("cluster", "white", "diagavg"), wj = function(j, maxlag) 1 - j/(maxlag + 1), ... ) ## S3 method for class 'pcce' vcovSCC( x, type = c("HC0", "sss", "HC1", "HC2", "HC3", "HC4"), cluster = "time", maxlag = NULL, inner = c("cluster", "white", "diagavg"), wj = function(j, maxlag) 1 - j/(maxlag + 1), ... )
x |
an object of class |
... |
further arguments |
type |
the weighting scheme used, one of |
cluster |
switch for vcovG; set at |
maxlag |
either |
inner |
the function to be applied to the residuals inside the
sandwich: |
wj |
weighting function to be applied to lagged terms, |
vcovSCC
is a function for estimating a robust covariance matrix
of parameters for a panel model according to the
Driscoll and Kraay (1998) method, which is consistent
with cross–sectional and serial correlation in a T-asymptotic
setting and irrespective of the N dimension. The use with random
effects models is undocumented.
Weighting schemes specified by type
are analogous to those in
sandwich::vcovHC()
in package sandwich and are
justified theoretically (although in the context of the standard
linear model) by MacKinnon and White (1985) and
Cribari–Neto (2004) (see Zeileis 2004)).
The main use of vcovSCC
(and the other variance-covariance estimators
provided in the package vcovHC
, vcovBK
, vcovNW
, vcovDC
) is to pass
it to plm's own functions like summary
, pwaldtest
, and phtest
or
together with testing functions from the lmtest
and car
packages. All of
these typically allow passing the vcov
or vcov.
parameter either as a
matrix or as a function, e.g., for Wald–type testing: argument vcov.
to
coeftest()
, argument vcov
to waldtest()
and other methods in the
lmtest package; and argument vcov.
to
linearHypothesis()
in the car package (see the
examples), (see also Zeileis 2004), 4.1-2, and examples below.
An object of class "matrix"
containing the estimate of
the covariance matrix of coefficients.
Giovanni Millo, partially ported from Daniel Hoechle's (2007) Stata code
Cribari–Neto F (2004). “Asymptotic Inference Under Heteroskedasticity of Unknown Form.” Computational Statistics & Data Analysis, 45, 215–233.
Driscoll JC, Kraay AC (1998). “Consistent covariance matrix estimation with spatially dependent panel data.” Review of economics and statistics, 80(4), 549–560.
Hoechle D (2007). “Robust standard errors for panel regressions with cross-sectional dependence.” Stata Journal, 7(3), 281-312. https://ideas.repec.org/a/tsj/stataj/v7y2007i3p281-312.html.
MacKinnon JG, White H (1985). “Some Heteroskedasticity–Consistent Covariance Matrix Estimators With Improved Finite Sample Properties.” Journal of Econometrics, 29, 305–325.
Zeileis A (2004). “Econometric Computing With HC and HAC Covariance Matrix Estimators.” Journal of Statistical Software, 11(10), 1–17. https://www.jstatsoft.org/article/view/v011i10.
sandwich::vcovHC()
from the sandwich
package for weighting schemes (type
argument).
data("Produc", package="plm") zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, model="pooling") ## as function input to plm's summary method (with and without additional arguments): summary(zz, vcov = vcovSCC) summary(zz, vcov = function(x) vcovSCC(x, method="arellano", type="HC1")) ## standard coefficient significance test library(lmtest) coeftest(zz) ## SCC robust significance test, default coeftest(zz, vcov.=vcovSCC) ## idem with parameters, pass vcov as a function argument coeftest(zz, vcov.=function(x) vcovSCC(x, type="HC1", maxlag=4)) ## joint restriction test waldtest(zz, update(zz, .~.-log(emp)-unemp), vcov=vcovSCC) ## Not run: ## test of hyp.: 2*log(pc)=log(emp) library(car) linearHypothesis(zz, "2*log(pc)=log(emp)", vcov.=vcovSCC) ## End(Not run)
data("Produc", package="plm") zz <- plm(log(gsp)~log(pcap)+log(pc)+log(emp)+unemp, data=Produc, model="pooling") ## as function input to plm's summary method (with and without additional arguments): summary(zz, vcov = vcovSCC) summary(zz, vcov = function(x) vcovSCC(x, method="arellano", type="HC1")) ## standard coefficient significance test library(lmtest) coeftest(zz) ## SCC robust significance test, default coeftest(zz, vcov.=vcovSCC) ## idem with parameters, pass vcov as a function argument coeftest(zz, vcov.=function(x) vcovSCC(x, type="HC1", maxlag=4)) ## joint restriction test waldtest(zz, update(zz, .~.-log(emp)-unemp), vcov=vcovSCC) ## Not run: ## test of hyp.: 2*log(pc)=log(emp) library(car) linearHypothesis(zz, "2*log(pc)=log(emp)", vcov.=vcovSCC) ## End(Not run)
A panel of 595 individuals from 1976 to 1982, taken from the Panel Study of
Income Dynamics (PSID).
The data are organized as a stacked time
series/balanced panel, see Examples on how to convert to a
pdata.frame
.
A data frame containing:
years of full-time work experience.
weeks worked.
blue collar?
works in a manufacturing industry?
resides in the south?
resides in a standard metropolitan statistical area?
married?
a factor with levels "male"
and "female"
individual's wage set by a union contract?
years of education.
is the individual black?
logarithm of wage.
total number of observations : 4165
observation : individuals
country : United States
Online complements to Baltagi (2001):
https://www.wiley.com/legacy/wileychi/baltagi/
Online complements to Baltagi (2013):
https://bcs.wiley.com/he-bcs/Books?action=resource&bcsId=4338&itemId=1118672321&resourceId=13452
Baltagi BH (2001). Econometric Analysis of Panel Data, 3rd edition. John Wiley and Sons ltd.
Baltagi BH (2013). Econometric Analysis of Panel Data, 5th edition. John Wiley and Sons ltd.
Cornwell C, Rupert P (1988). “Efficient Estimation With Panel Data: an Empirical Comparison of Instrumental Variables Estimators.” Journal of Applied Econometrics, 3, 149–155.
# data set 'Wages' is organized as a stacked time series/balanced panel data("Wages", package = "plm") Wag <- pdata.frame(Wages, index=595)
# data set 'Wages' is organized as a stacked time series/balanced panel data("Wages", package = "plm") Wag <- pdata.frame(Wages, index=595)
This function gives an overall intercept for within models and its accompanying standard error or a within model with the overall intercept
within_intercept(object, ...) ## S3 method for class 'plm' within_intercept(object, vcov = NULL, return.model = FALSE, ...)
within_intercept(object, ...) ## S3 method for class 'plm' within_intercept(object, vcov = NULL, return.model = FALSE, ...)
object |
object of class |
... |
further arguments (currently none). |
vcov |
if not |
return.model |
a logical to indicate whether only the overall intercept
( |
The (somewhat artificial) intercept for within models (fixed effects models) was made popular by Stata of StataCorp (see Gould 2013), EViews of IHS, and gretl (see Cottrell and Lucchetti 2021, p. 200-201, listing 23.1), see for treatment in the literature, e.g., Greene (2012), Ch. 11.4.4, p. 364. It can be considered an overall intercept in the within model framework and is the weighted mean of fixed effects (see Examples for the relationship).
within_intercept
estimates a new model which is
computationally more demanding than just taking the weighted
mean. However, with within_intercept
one also gets the
associated standard error and it is possible to get an overall
intercept for two-way fixed effect models.
Users can set argument vcov
to a function to calculate a
specific (robust) variance–covariance matrix and get the
respective (robust) standard error for the overall intercept,
e.g., the function vcovHC()
, see examples for
usage. Note: The argument vcov
must be a function, not a
matrix, because the model to calculate the overall intercept for
the within model is different from the within model itself.
If argument return.model = TRUE
is set, the full model object is returned,
while in the default case only the intercept is returned.
Depending on argument return.model
: If FALSE
(default), a named
numeric
of length one: The overall intercept for the estimated within model
along attribute "se" which contains the standard error for the intercept.
If return.model = TRUE
, the full model object, a within model with the
overall intercept (NB: the model identifies itself as a pooling model, e.g.,
in summary()).
Kevin Tappe
Cottrell A, Lucchetti R (2021).
“Gretl User’s Guide.”
https://gretl.sourceforge.net/.
Gould W (2013).
“How can there be an intercept in the fixed-effects model estimated by xtreg, fe?”
https://www.stata.com/support/faqs/statistics/intercept-in-fixed-effects-model/.
Greene WH (2012).
Econometric Analysis, 7th edition.
Prentice Hall.
fixef()
to extract the fixed effects of a within model.
data("Hedonic", package = "plm") mod_fe <- plm(mv ~ age + crim, data = Hedonic, index = "townid") overallint <- within_intercept(mod_fe) attr(overallint, "se") # standard error # overall intercept is the weighted mean of fixed effects in the # one-way case weighted.mean(fixef(mod_fe), pdim(mod_fe)$Tint$Ti) ### relationship of type="dmean", "level" and within_intercept ## one-way balanced case data("Grunfeld", package = "plm") gi <- plm(inv ~ value + capital, data = Grunfeld, model = "within") fx_level <- fixef(gi, type = "level") fx_dmean <- fixef(gi, type = "dmean") overallint <- within_intercept(gi) all.equal(overallint + fx_dmean, fx_level, check.attributes = FALSE) # TRUE ## two-ways unbalanced case gtw_u <- plm(inv ~ value + capital, data = Grunfeld[-200, ], effect = "twoways") int_tw_u <- within_intercept(gtw_u) fx_dmean_tw_i_u <- fixef(gtw_u, type = "dmean", effect = "individual")[index(gtw_u)[[1L]]] fx_dmean_tw_t_u <- fixef(gtw_u, type = "dmean", effect = "time")[index(gtw_u)[[2L]]] fx_level_tw_u <- as.numeric(fixef(gtw_u, "twoways", "level")) fx_level_tw_u2 <- int_tw_u + fx_dmean_tw_i_u + fx_dmean_tw_t_u all.equal(fx_level_tw_u, fx_level_tw_u2, check.attributes = FALSE) # TRUE ## overall intercept with robust standard error within_intercept(gi, vcov = function(x) vcovHC(x, method="arellano", type="HC0")) ## have a model returned mod_fe_int <- within_intercept(gi, return.model = TRUE) summary(mod_fe_int) # replicates Stata's robust standard errors exactly as model is with intercept summary(mod_fe_int, vcov = function(x) vcovHC(x, type = "sss"))
data("Hedonic", package = "plm") mod_fe <- plm(mv ~ age + crim, data = Hedonic, index = "townid") overallint <- within_intercept(mod_fe) attr(overallint, "se") # standard error # overall intercept is the weighted mean of fixed effects in the # one-way case weighted.mean(fixef(mod_fe), pdim(mod_fe)$Tint$Ti) ### relationship of type="dmean", "level" and within_intercept ## one-way balanced case data("Grunfeld", package = "plm") gi <- plm(inv ~ value + capital, data = Grunfeld, model = "within") fx_level <- fixef(gi, type = "level") fx_dmean <- fixef(gi, type = "dmean") overallint <- within_intercept(gi) all.equal(overallint + fx_dmean, fx_level, check.attributes = FALSE) # TRUE ## two-ways unbalanced case gtw_u <- plm(inv ~ value + capital, data = Grunfeld[-200, ], effect = "twoways") int_tw_u <- within_intercept(gtw_u) fx_dmean_tw_i_u <- fixef(gtw_u, type = "dmean", effect = "individual")[index(gtw_u)[[1L]]] fx_dmean_tw_t_u <- fixef(gtw_u, type = "dmean", effect = "time")[index(gtw_u)[[2L]]] fx_level_tw_u <- as.numeric(fixef(gtw_u, "twoways", "level")) fx_level_tw_u2 <- int_tw_u + fx_dmean_tw_i_u + fx_dmean_tw_t_u all.equal(fx_level_tw_u, fx_level_tw_u2, check.attributes = FALSE) # TRUE ## overall intercept with robust standard error within_intercept(gi, vcov = function(x) vcovHC(x, method="arellano", type="HC0")) ## have a model returned mod_fe_int <- within_intercept(gi, return.model = TRUE) summary(mod_fe_int) # replicates Stata's robust standard errors exactly as model is with intercept summary(mod_fe_int, vcov = function(x) vcovHC(x, type = "sss"))