--- title: Model components for fitted models with plm author: - name: Yves Croissant date: '`r Sys.Date()`' output: rmarkdown::html_vignette bibliography: ../inst/REFERENCES.bib vignette: > %\VignetteIndexEntry{Model components for fitted models with plm} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, echo=FALSE} library("knitr") opts_chunk$set(message = FALSE, warning = FALSE) ``` plm tries to follow as close as possible the way models are fitted using `lm`. This relies on the following steps, using the `formula`-`data` with some modifications: - compute internally the `model.frame` by getting the relevant arguments (`formula`, `data`, `subset`, `weights`, `na.action` and `offset`) and the supplementary argument, - extract from the `model.frame` the response `y` (with `pmodel.response`) and the model matrix `X` (with `model.matrix`), - call the (non-exported) estimation function `plm.fit` with `X` and `y` as arguments. Panel data has a special structure which is described by an `index` argument. This argument can be used in the `pdata.frame` function which returns a `pdata.frame` object. A `pdata.frame` can be used as input to the `data` argument of `plm`. If the `data` argument of `plm` is an ordinary `data.frame`, the `index` argument can also be supplied as an argument of `plm`. In this case, the `pdata.frame` function is called internally to transform the data. Next, the `formula`, which is the first and mandatory argument of `plm` is coerced to a `Formula` object. `model.frame` is then called, but with the `data` argument in the first position (a `pdata.frame` object) and the `formula` in the second position. This unusual order of the arguments enables to use a specific `model.frame.pdata.frame` method defined in `plm`. As for the `model.frame.formula` method, a `data.frame` is returned, with a `terms` attribute. Next, the `X` matrix is extracted using `model.matrix`. The usual way to do so is to feed the function with two arguments, a `formula` or a `terms` object and a `data.frame` created with `model.frame`. `lm` uses something like `model.matrix(terms(mf), mf)` where `mf` is a `data.frame` created with `model.frame`. Therefore, `model.matrix` needs actually one argument and not two and we therefore wrote a `model.matrix.pdata.frame` which does the job ; the method first checks that the argument has a `term` attribute, extracts the `terms` (actually the `formula`) and then computes the model's matrix `X`. The response `y` is usually extracted using `model.response`, with a `data.frame` created with `model.frame` as first argument, but it is not generic. We therefore created a generic called `pmodel.response` and provide a `pmodel.response.pdata.frame` method. We illustrate these features using a simplified (in terms of covariates) example with the `SeatBelt` data set: ```{r } library("plm") data("SeatBelt", package = "pder") SeatBelt$occfat <- with(SeatBelt, log(farsocc / (vmtrural + vmturban))) pSB <- pdata.frame(SeatBelt) ``` We start with an OLS (pooling) specification: ```{r } formols <- occfat ~ log(usage) + log(percapin) mfols <- model.frame(pSB, formols) Xols <- model.matrix(mfols) y <- pmodel.response(mfols) coef(lm.fit(Xols, y)) ``` which is equivalent to: ```{r } coef(plm(formols, SeatBelt, model = "pooling")) ``` Next, we use an instrumental variables specification. Variable `usage` is endogenous and instrumented by three variables indicating the law context: `ds`, `dp`, and `dsp`. The model is described using a two-parts formula, the first part of the RHS describing the covariates and the second part the instruments. The following two formulations can be used: ```{r } formiv1 <- occfat ~ log(usage) + log(percapin) | log(percapin) + ds + dp + dsp formiv2 <- occfat ~ log(usage) + log(percapin) | . - log(usage) + ds + dp + dsp ``` The second formulation has two advantages: - in the common case when a lot of covariates are instruments, these covariates don't need to be indicated in the second RHS part of the formula, - the endogenous variables clearly appear as they are proceeded by a `-` sign in the second RHS part of the formula. The formula is coerced to a `Formula`, using the `Formula` package. `model.matrix.pdata.frame` then internally calls `model.matrix.Formula` in order to extract the covariates and instruments model matrices: ```{r } mfSB1 <- model.frame(pSB, formiv1) X1 <- model.matrix(mfSB1, rhs = 1) W1 <- model.matrix(mfSB1, rhs = 2) head(X1, 3) ; head(W1, 3) ``` For the second (and preferred formulation), the `dot` argument should be set and is passed to the `Formula` methods. `.` has actually two meanings: - all available covariates, - the previous covariates used while updating a formula. which correspond respectively to `dot = "seperate"` (the default) and `dot = "previous"`. See the difference between the following two examples: ```{r } library("Formula") head(model.frame(Formula(formiv2), SeatBelt), 3) head(model.frame(Formula(formiv2), SeatBelt, dot = "previous"), 3) ``` In the first case, all the covariates are returned by `model.frame` as the `.` is understood by default as "everything". In `plm`, the `dot` argument is internally set to `previous` so that the end-user doesn't have to worry about these subtleties. ```{r } mfSB2 <- model.frame(pSB, formiv2) X2 <- model.matrix(mfSB2, rhs = 1) W2 <- model.matrix(mfSB2, rhs = 2) head(X2, 3) ; head(W2, 3) ``` The IV estimator can then be obtained as a 2SLS estimator: First, regress the covariates on the instruments and get the fitted values: ```{r } HX1 <- lm.fit(W1, X1)$fitted.values head(HX1, 3) ``` Next, regress the response on these fitted values: ```{r } coef(lm.fit(HX1, y)) ``` The same can be achieved in one command by using the `formula`-`data` interface with `plm`: ```{r } coef(plm(formiv1, SeatBelt, model = "pooling")) ``` or with the `ivreg` function from package `AER` (or with the newer function `ivreg` in package `ivreg` superseding `AER::ivreg()`): ```{r } coef(AER::ivreg(formiv1, data = SeatBelt)) ``` ```{r eval = FALSE, include = FALSE} X2 <- model.matrix(Formula(form1), mfSB, rhs = 2, dot = "previous") formols <- occfat ~ log(usage) + log(percapin) | . - log(usage) + ds + dp + dsp form1 <- occfat ~ log(usage) + log(percapin) + log(unemp) + log(meanage) + log(precentb) + log(precenth) + log(densrur) + log(densurb) + log(viopcap) + log(proppcap) + log(vmtrural) + log(vmturban) + log(fueltax) + lim65 + lim70p + mlda21 + bac08 form2 <- . ~ . | . - log(usage) + ds + dp +dsp jorm1 <- occfat ~ log(usage) + log(percapin) + log(unemp) + log(meanage) + log(precentb) + log(precenth) + log(densrur) + log(densurb) + log(viopcap) + log(proppcap) + log(vmtrural) + log(vmturban) + log(fueltax) + lim65 + lim70p + mlda21 + bac08 | . - log(usage) + ds + dp + dsp jorm2 <- noccfat ~ . | . ```