Title: | Indexed Data Frames |
---|---|
Description: | Provides extended data frames, with a special data frame column which contains two indexes, with potentially a nesting structure. |
Authors: | Yves Croissant [aut, cre] |
Maintainer: | Yves Croissant <[email protected]> |
License: | GPL (>=2) |
Version: | 0.1-0 |
Built: | 2024-11-20 05:05:46 UTC |
Source: | https://github.com/ycroissant/dfidx |
data frames for which observations are defined by two (potentialy nested) indexes and for which series have thefore a natural tabular representation
dfidx( data, idx = NULL, drop.index = TRUE, as.factor = NULL, pkg = NULL, fancy.row.names = FALSE, subset = NULL, idnames = NULL, shape = c("long", "wide"), choice = NULL, varying = NULL, sep = ".", opposite = NULL, levels = NULL, ranked = FALSE, name, position, ... )
dfidx( data, idx = NULL, drop.index = TRUE, as.factor = NULL, pkg = NULL, fancy.row.names = FALSE, subset = NULL, idnames = NULL, shape = c("long", "wide"), choice = NULL, varying = NULL, sep = ".", opposite = NULL, levels = NULL, ranked = FALSE, name, position, ... )
data |
a data frame |
idx |
an index |
drop.index |
if |
as.factor |
should the indexes be coerced to factors ? |
pkg |
if set, the resulting |
fancy.row.names |
if |
subset |
a logical which defines a subset of rows to return |
idnames |
the names of the indexes |
shape |
either |
choice |
the choice |
varying , sep
|
relevant for data sets in wide format, these arguments are passed to reshape |
opposite |
return the opposite of the series |
levels |
the levels for the second index |
ranked |
a boolean for ranked data |
name |
name of the |
position |
position of the |
... |
further arguments |
Indexes are stored as a data.frame
column in the
resulting dfidx
object
an object of class "dfidx"
Yves Croissant
# the first two columns contain the index mn <- dfidx(munnell) # explicitely indicate the two indexes using either a vector or a # list of two characters mn <- dfidx(munnell, idx = c("state", "year")) mn <- dfidx(munnell, idx = list("state", "year")) # rename one or both indexes mn <- dfidx(munnell, idnames = c(NA, "period")) # for balanced data (with observations ordered by the first, then # by the second index # use the name of the first index mn <- dfidx(munnell, idx = "state", idnames = c("state", "year")) # or an integer equal to the cardinal of the first index mn <- dfidx(munnell, idx = 48, idnames = c("state", "year")) # Indicate the values of the second index using the levels argument mn <- dfidx(munnell, idx = 48, idnames = c("state", "year"), levels = 1970:1986) # Nesting structure for one of the index mn <- dfidx(munnell, idx = c(region = "state", president = "year")) # Data in wide format mn <- dfidx(munnell_wide, idx = c(region = "state"), varying = 3:36, sep = "_", idnames = c(NA, "year")) # Customize the name and the position of the `idx` column #dfidx(munnell, position = 3, name = "index")
# the first two columns contain the index mn <- dfidx(munnell) # explicitely indicate the two indexes using either a vector or a # list of two characters mn <- dfidx(munnell, idx = c("state", "year")) mn <- dfidx(munnell, idx = list("state", "year")) # rename one or both indexes mn <- dfidx(munnell, idnames = c(NA, "period")) # for balanced data (with observations ordered by the first, then # by the second index # use the name of the first index mn <- dfidx(munnell, idx = "state", idnames = c("state", "year")) # or an integer equal to the cardinal of the first index mn <- dfidx(munnell, idx = 48, idnames = c("state", "year")) # Indicate the values of the second index using the levels argument mn <- dfidx(munnell, idx = 48, idnames = c("state", "year"), levels = 1970:1986) # Nesting structure for one of the index mn <- dfidx(munnell, idx = c(region = "state", president = "year")) # Data in wide format mn <- dfidx(munnell_wide, idx = c(region = "state"), varying = 3:36, sep = "_", idnames = c(NA, "year")) # Customize the name and the position of the `idx` column #dfidx(munnell, position = 3, name = "index")
methods of dplyr
verbs for dfidx
objects. Default functions
don't work because most of these functions returns either a
tibble
or a data.frame
but not a dfidx
## S3 method for class 'dfidx' arrange(.data, ...) ## S3 method for class 'dfidx' filter(.data, ...) ## S3 method for class 'dfidx' slice(.data, ...) ## S3 method for class 'dfidx' mutate(.data, ...) ## S3 method for class 'dfidx' transmute(.data, ...) ## S3 method for class 'dfidx' select(.data, ...)
## S3 method for class 'dfidx' arrange(.data, ...) ## S3 method for class 'dfidx' filter(.data, ...) ## S3 method for class 'dfidx' slice(.data, ...) ## S3 method for class 'dfidx' mutate(.data, ...) ## S3 method for class 'dfidx' transmute(.data, ...) ## S3 method for class 'dfidx' select(.data, ...)
.data |
a dfidx object, |
... |
further arguments |
These methods always return the data frame column that
contains the indexes and return a dfidx
object.
an object of class "dfidx"
Yves Croissant
mn <- dfidx(munnell) select(mn, - gsp, - water) mutate(mn, lgsp = log(gsp), lgsp2 = lgsp ^ 2) transmute(mn, lgsp = log(gsp), lgsp2 = lgsp ^ 2) arrange(mn, desc(unemp), labor) filter(mn, unemp > 10) pull(mn, gsp) slice(mn, c(1:2, 5:7))
mn <- dfidx(munnell) select(mn, - gsp, - water) mutate(mn, lgsp = log(gsp), lgsp2 = lgsp ^ 2) transmute(mn, lgsp = log(gsp), lgsp2 = lgsp ^ 2) arrange(mn, desc(unemp), labor) filter(mn, unemp > 10) pull(mn, gsp) slice(mn, c(1:2, 5:7))
The index of a dfidx
is a data.frame containing the different
series which define the two indexes (with possibly a nesting
structure). It is stored as a "sticky" data.frame column of the
data.frame and is also inherited by series (of class 'xseries'
)
which are extracted from a dfidx
.
idx(x, n = NULL, m = NULL) ## S3 method for class 'dfidx' idx(x, n = NULL, m = NULL) ## S3 method for class 'idx' idx(x, n = NULL, m = NULL) ## S3 method for class 'xseries' idx(x, n = NULL, m = NULL) ## S3 method for class 'idx' format(x, size = 4, ...)
idx(x, n = NULL, m = NULL) ## S3 method for class 'dfidx' idx(x, n = NULL, m = NULL) ## S3 method for class 'idx' idx(x, n = NULL, m = NULL) ## S3 method for class 'xseries' idx(x, n = NULL, m = NULL) ## S3 method for class 'idx' format(x, size = 4, ...)
x |
a |
n , m
|
|
size |
the number of characters of the indexes for the format method |
... |
further arguments (for now unused) |
idx is defined as a generic with a dfidx
and a xseries
method.
a data.frame
containing the indexes or a series if a
specific index is selected
Yves Croissant
mn <- dfidx(munnell, idx = c(region = "state", president = "year")) idx(mn) gsp <- mn$gsp idx(gsp) # get the first index idx(mn, 1) # get the nesting variable of the first index idx(mn, 1, 2)
mn <- dfidx(munnell, idx = c(region = "state", president = "year")) idx(mn) gsp <- mn$gsp idx(gsp) # get the first index idx(mn, 1) # get the nesting variable of the first index idx(mn, 1, 2)
This function extract the names of the indexes or the name of a specific index
idx_name(x, n = 1, m = NULL) ## S3 method for class 'dfidx' idx_name(x, n = NULL, m = NULL) ## S3 method for class 'idx' idx_name(x, n = NULL, m = NULL) ## S3 method for class 'xseries' idx_name(x, n = NULL, m = NULL)
idx_name(x, n = 1, m = NULL) ## S3 method for class 'dfidx' idx_name(x, n = NULL, m = NULL) ## S3 method for class 'idx' idx_name(x, n = NULL, m = NULL) ## S3 method for class 'xseries' idx_name(x, n = NULL, m = NULL)
x |
a |
n |
the index to be extracted (1 or 2, ignoring the nesting variables) |
m |
if > 1, a nesting variable |
if n
is NULL
, a named integer which gives the posititon
of the idx
column in the dfidx
object, otherwise, a
character of length 1
Yves Croissant
mn <- dfidx(munnell, idx = c(region = "state", president = "year")) # get the position of the idx column idx_name(mn) # get the name of the first index idx_name(mn, 1) # get the name of the second index idx_name(mn, 2) # get the name of the nesting variable for the second index idx_name(mn, 2, 2)
mn <- dfidx(munnell, idx = c(region = "state", president = "year")) # get the position of the idx column idx_name(mn) # get the name of the first index idx_name(mn, 1) # get the name of the second index idx_name(mn, 2) # get the name of the nesting variable for the second index idx_name(mn, 2, 2)
A dfidx
is a data.frame
with a "sticky" data.frame column
which contains the indexes. Specific methods of functions that
extract lines and/or columns of a data.frame
are provided.
## S3 method for class 'dfidx' x[i, j, drop] ## S3 method for class 'dfidx' as.data.frame(x, row.names = NULL, optional = FALSE, ...) ## S3 method for class 'dfidx' print(x, ..., n = 10L) ## S3 method for class 'dfidx' head(x, n = 10L, ...) ## S3 method for class 'dfidx' x[[y]] ## S3 method for class 'dfidx' x$y ## S3 replacement method for class 'dfidx' object$y <- value ## S3 replacement method for class 'dfidx' object[[y]] <- value ## S3 method for class 'xseries' print(x, ..., n = 10L) ## S3 method for class 'idx' print(x, ..., n = 10L) ## S3 method for class 'dfidx' mean(x, ...)
## S3 method for class 'dfidx' x[i, j, drop] ## S3 method for class 'dfidx' as.data.frame(x, row.names = NULL, optional = FALSE, ...) ## S3 method for class 'dfidx' print(x, ..., n = 10L) ## S3 method for class 'dfidx' head(x, n = 10L, ...) ## S3 method for class 'dfidx' x[[y]] ## S3 method for class 'dfidx' x$y ## S3 replacement method for class 'dfidx' object$y <- value ## S3 replacement method for class 'dfidx' object[[y]] <- value ## S3 method for class 'xseries' print(x, ..., n = 10L) ## S3 method for class 'idx' print(x, ..., n = 10L) ## S3 method for class 'dfidx' mean(x, ...)
x , object
|
a |
i |
the row index |
j |
the column index |
drop |
if |
row.names , optional
|
arguments of the generic |
... |
further arguments |
n |
the number of rows for the print method |
y |
the name or the position of the series one wishes to extract |
value |
the value for the replacement method |
as.data.frame
and mean
return a data.frame
, [[
and
$
a vector, [
either a dfidx
or a vector, $<-
and [[<-
modify the values of an existing column or create a
new column of a dfidx
object, print
is called for its side
effect
Yves Croissant
mn <- dfidx(munnell) # extract a series (returns as a xseries object) mn$gsp # or mn[["gsp"]] # extract a subset of series (returns as a dfidx object) mn[c("gsp", "unemp")] # extract a subset of rows and columns mn[mn$unemp > 10, c("utilities", "water")] # dfidx, idx and xseries have print methods as (like tibbles), a n # argument print(mn, n = 3) print(idx(mn), n = 3) print(mn$gsp, n = 3) # a dfidx object can be coerced to a data.frame head(as.data.frame(mn))
mn <- dfidx(munnell) # extract a series (returns as a xseries object) mn$gsp # or mn[["gsp"]] # extract a subset of series (returns as a dfidx object) mn[c("gsp", "unemp")] # extract a subset of rows and columns mn[mn$unemp > 10, c("utilities", "water")] # dfidx, idx and xseries have print methods as (like tibbles), a n # argument print(mn, n = 3) print(idx(mn), n = 3) print(mn$gsp, n = 3) # a dfidx object can be coerced to a data.frame head(as.data.frame(mn))
Specific model.frame/matrix are provided for dfidx objects. This leads to an unusual order of arguments compared to the usage. Actually, the first two arguments of the model.frame method are a dfidx and a formula and the only main argument of the model.matrix is a dfidx which should be the result of a call to the model.frame method, i.e. it should have a term attribute.
## S3 method for class 'dfidx' model.frame( formula, data = NULL, ..., lhs = NULL, rhs = NULL, dot = "previous", alt.subset = NULL, reflevel = NULL, balanced = FALSE ) ## S3 method for class 'dfidx' model.matrix(object, ..., lhs = NULL, rhs = 1, dot = "separate") ## S3 method for class 'dfidx_matrix' print(x, ..., n = 10L)
## S3 method for class 'dfidx' model.frame( formula, data = NULL, ..., lhs = NULL, rhs = NULL, dot = "previous", alt.subset = NULL, reflevel = NULL, balanced = FALSE ) ## S3 method for class 'dfidx' model.matrix(object, ..., lhs = NULL, rhs = 1, dot = "separate") ## S3 method for class 'dfidx_matrix' print(x, ..., n = 10L)
formula |
a |
data |
a |
... , lhs , rhs , dot
|
see the |
alt.subset |
a subset of levels for the second index |
reflevel |
a user-defined first level for the second index |
balanced |
a boolean indicating if the resulting data.frame has to be balanced or not |
object |
a dfidx object |
x |
a model matrix |
n |
the number of lines to print |
a dfidx
object for the model.frame
method and a matrix
for the model.matrix
method.
Yves Croissant
mn <- dfidx(munnell) mf <- model.frame(mn, gsp ~ privatecap | publiccap + utilities | unemp + labor) model.matrix(mf, rhs = 1) model.matrix(mf, rhs = 2) model.matrix(mf, rhs = 1:3)
mn <- dfidx(munnell) mf <- model.frame(mn, gsp ~ privatecap | publiccap + utilities | unemp + labor) model.matrix(mf, rhs = 1) model.matrix(mf, rhs = 2) model.matrix(mf, rhs = 1:3)
fold_idx
takes a dfidx, includes the indexes as stand alone
columns, remove the idx
column and return a data.frame, with an
ids
attribute that contains the informations about the
indexes. fold_idx
performs the opposite operation
unfold_idx(x) fold_idx(x, pkg = NULL)
unfold_idx(x) fold_idx(x, pkg = NULL)
x |
a |
pkg |
if not |
a data.frame
for the unfold_dfidx
function, a dfidx
object for the fold_dfidx
function
Yves Croissant
mn <- dfidx(munnell, idx = c(region = "state", "year"), position = 3, name = "index") mn2 <- unfold_idx(mn) attr(mn, "ids") mn3 <- fold_idx(mn2) identical(mn, mn3)
mn <- dfidx(munnell, idx = c(region = "state", "year"), position = 3, name = "index") mn2 <- unfold_idx(mn) attr(mn, "ids") mn3 <- fold_idx(mn2) identical(mn, mn3)