Correlation structures

Correlation structures

Before fitting the model, you must specify the correlation between observations (a CorrStructure). It determines the calculation of covariance matrices. The default is always Heteroscedastic, i.e. independent but not identically distributed observations.

All constructors accept the Boolean keyword corrected (omitted in the following), which defaults to true. If true, a finite-sample adjustment is applied to the covariance matrix. The adjustment factor is n / (n - 1), where n is the number of clusters for clustered data and the number of observations otherwise.

Four subtypes are currently available: Homoscedastic, Heteroscedastic, Clustered and CrossCorrelated.

Homoscedastic

Homoscedastic(; expected::Bool = false)

Observations are independent and identically distributed. The optional keyword argument expected controls the estimation of the covariance matrix of maximum-likelihood estimators: false uses the observed information matrix, whereas "true" uses the outer product of the gradient. Only linear and maximum-likelihood estimators support homoscedastic errors.

Heteroscedastic

Heteroscedastic()

Observations are independent, but they may differ in distribution. This structure leads to sandwich covariance matrices (a.k.a. Huber-Eicker-White).

Clustered

Clustered(DF::DataFrame, cluster::Symbol)

Observations are independent across clusters, but they may differ in their joint distribution within clusters. cluster specifies the column of the DataFrame to cluster on.

CrossCorrelated

This structure accommodates other correlation structures. The first argument determines the precise pattern.

Two-way clustering

CrossCorrelated("Two-way clustering", DF::DataFrame, c₁::Symbol, c₂::Symbol)

if two observations share any cluster, they may be arbitrarily correlated.

Correlation across time

CrossCorrelated("Time",
        DF::DataFrame,
        time::Symbol,
        bandwidth::Real,
        kernel::Function = parzen
    )

The maximum possible correlation between two observations declines with the time difference between them. The actual correlation is arbitrary below that limit. (See Conley (1999).) The bandwidth and the kernel function control the upper bound. time specifies the column of DF that contains the date of each observation (of type Date).

The following kernels are predefined for convenience: Bartlett (bartlett), Parzen (parzen), Truncated (truncated) and Tukey-Hanning (tukeyhanning). See Andrews (1991) for formulae.

Warning

The resulting covariance matrices differ from the Newey-West estimator, which assumes independence across units (though observations for the same unit may correlate across time).

Correlation across space

CrossCorrelated("Space",
        DF::DataFrame,
        latitude::Symbol,
        longitude::Symbol,
        bandwidth::Real,
        kernel::Function = parzen
    )

The maximum possible correlation between two observations declines with the spatial distance between them. The actual correlation is arbitrary below that limit. (See Conley (1999).) The bandwidth and the kernel function control the upper bound. latitude and longitude specify the columns of DF that contain the coordinates of each observation in decimal degrees (of type Float64).

The following kernels are predefined for convenience: Bartlett (bartlett), Parzen (parzen), Truncated (truncated) and Tukey-Hanning (tukeyhanning). See Andrews (1991) for formulae.

Correlation across time and space

CrossCorrelated("Time and space",
        DF::DataFrame,
        time::Symbol,
        bandwidth_time::Real,
        latitude::Symbol,
        longitude::Symbol,
        bandwidth_space::Real,
        kernel::Function = parzen
    )

The maximum possible correlation between two observations declines with the time difference and the spatial distance between them. The actual correlation is arbitrary below that limit. (See Conley (1999).) The bandwidths and the kernel function control the upper bound. time specifies the column of DF that contains the date of each observation. latitude and longitude specify the columns of DF that contain the coordinates of each observation in decimal degrees (Float64).

The following kernels are predefined for convenience: Bartlett (bartlett), Parzen (parzen), Truncated (truncated) and Tukey-Hanning (tukeyhanning). See Andrews (1991) for formulae.