Title: | Statistical Methodology for Graphical Extreme Value Models |
---|---|
Description: | Statistical methodology for sparse multivariate extreme value models. Methods are provided for exact simulation and statistical inference for multivariate Pareto distributions on graphical structures as described in the paper 'Graphical Models for Extremes' by Engelke and Hitz (2020) <doi:10.1111/rssb.12355>. |
Authors: | Sebastian Engelke [aut, cre], Adrien S. Hitz [aut], Nicola Gnecco [aut], Manuel Hentschel [aut] |
Maintainer: | Sebastian Engelke <[email protected]> |
License: | GPL-3 |
Version: | 0.3.3 |
Built: | 2024-11-14 23:24:41 UTC |
Source: | https://github.com/sebastian-engelke/graphicalextremes |
Checks whether the matrix given is a valid Huesler-Reiss parameter matrix
in the role of ,
, or
, respectively.
checkGamma(Gamma, alert = NULL, tol = get_small_tol(), returnBoolean = FALSE) checkSigmaTheta( M, k, full, matrixName = "Sigma", tol = get_small_tol(), alert = NULL, returnBoolean = FALSE ) checkTheta( Theta, k = NULL, full = FALSE, tol = get_small_tol(), alert = NULL, returnBoolean = FALSE ) checkSigma( Sigma, k = NULL, full = FALSE, tol = get_small_tol(), alert = NULL, returnBoolean = FALSE ) checkMatrix( M, name = c("Gamma", "Sigma", "Theta")[1], k = NULL, full = FALSE, tol = get_small_tol(), alert = NULL, returnBoolean = FALSE ) is_valid_Gamma(M, tol = get_small_tol()) is_valid_Theta(Theta, k = NULL, full = FALSE, tol = get_small_tol()) is_valid_Sigma(Sigma, k = NULL, full = FALSE, tol = get_small_tol())
checkGamma(Gamma, alert = NULL, tol = get_small_tol(), returnBoolean = FALSE) checkSigmaTheta( M, k, full, matrixName = "Sigma", tol = get_small_tol(), alert = NULL, returnBoolean = FALSE ) checkTheta( Theta, k = NULL, full = FALSE, tol = get_small_tol(), alert = NULL, returnBoolean = FALSE ) checkSigma( Sigma, k = NULL, full = FALSE, tol = get_small_tol(), alert = NULL, returnBoolean = FALSE ) checkMatrix( M, name = c("Gamma", "Sigma", "Theta")[1], k = NULL, full = FALSE, tol = get_small_tol(), alert = NULL, returnBoolean = FALSE ) is_valid_Gamma(M, tol = get_small_tol()) is_valid_Theta(Theta, k = NULL, full = FALSE, tol = get_small_tol()) is_valid_Sigma(Sigma, k = NULL, full = FALSE, tol = get_small_tol())
Gamma |
Numeric |
alert |
Passed to |
tol |
Numeric scalar. Values below this are considered as zero, when zeros are required (e.g. row-sums). |
returnBoolean |
Logical scalar, set to |
M |
Numeric matrix, |
k |
|
full |
Logical. If |
matrixName |
Name of the matrix to be used in alerts/error messages. |
Theta |
Numeric |
Sigma |
Numeric |
name |
Name of the input matrix, indicating which other function to call. |
The function is_valid_*
are a wrapper around check*
, with arguments
alert=FALSE
and returnBoolean=TRUE
.
For check*
, the input matrix, passed through ensure_matrix_symmetry_and_truncate_zeros
.
For is_valid_*
, a boolean indicating whether the input is a valid parameter matrix.
Other input validation functions:
check_graph()
,
check_partial_matrix_and_graph()
,
ensure_matrix_symmetry()
and
Transforms between the extremal correlation and the variogram
.
Only valid for Huesler-Reiss distributions.
Done element-wise, no checks of the entire matrix structure are performed.
chi2Gamma(chi) Gamma2chi(Gamma)
chi2Gamma(chi) Gamma2chi(Gamma)
chi |
Numeric vector or matrix with entries between 0 and 1. |
Gamma |
Numeric vector or matrix with non-negative entries. |
The formula for transformation from to
is element-wise
where is the inverse of the standard normal distribution function.
The formula for transformation from to
is element-wise
where is the standard normal distribution function.
Numeric vector or matrix containing the implied .
Numeric vector or matrix containing the implied .
Other parameter matrix transformations:
Gamma2Sigma()
,
Gamma2graph()
,
par2Matrix()
Given a graph
and a (partial) variogram matrix Gamma
, returns a full
variogram matrix that agrees with Gamma
in entries corresponding to edges
of graph
and whose corresponding precision matrix, obtained by
Gamma2Theta()
, has zeros in entries corresponding to non-edges of graph
.
For results on the existence and uniqueness of this completion, see
Hentschel et al. (2022).
complete_Gamma(Gamma, graph = NULL, ...)
complete_Gamma(Gamma, graph = NULL, ...)
Gamma |
Numeric |
graph |
|
... |
Further arguments passed to |
If graph
is decomposable, Gamma
only needs to be specified on
the edges of the graph, other entries are ignored.
If graph
is not decomposable, the graphical completion algorithm requires
a fully specified (but non-graphical) variogram matrix Gamma
to begin with.
If not initial completion is provided, the function edmcr::npf()
can be used to compute one. The package edmcr
might need to be installed
manually from GitHub!
Completed variogram matrix.
Hentschel M, Engelke S, Segers J (2022). “Statistical Inference for Hüsler-Reiss Graphical Models Through Matrix Completions.” doi:10.48550/ARXIV.2210.14292, https://arxiv.org/abs/2210.14292.
Other matrix completion related topics:
complete_Gamma_decomposable()
,
complete_Gamma_general()
,
complete_Gamma_general_demo()
,
complete_Gamma_general_split()
## Block graph: Gamma <- rbind( c(0, .5, NA, NA), c(.5, 0, 1, 1.5), c(NA, 1, 0, .8), c(NA, 1.5, .8, 0) ) complete_Gamma(Gamma) ## Alternative representation of the same completion problem: my_graph <- igraph::graph_from_adjacency_matrix(rbind( c(0, 1, 0, 0), c(1, 0, 1, 1), c(0, 1, 0, 1), c(0, 1, 1, 0) ), mode = "undirected") Gamma_vec <- c(.5, 1, 1.5, .8) complete_Gamma(Gamma_vec, my_graph) ## Decomposable graph: G <- rbind( c(0, 5, 7, 6, NA), c(5, 0, 14, 15, NA), c(7, 14, 0, 5, 5), c(6, 15, 5, 0, 6), c(NA, NA, 5, 6, 0) ) complete_Gamma(G) ## Non-decomposable graph: G <- rbind( c(0, 5, 7, 6, 6), c(5, 0, 14, 15, 13), c(7, 14, 0, 5, 5), c(6, 15, 5, 0, 6), c(6, 13, 5, 6, 0) ) g <- igraph::make_ring(5) complete_Gamma(G, g)
## Block graph: Gamma <- rbind( c(0, .5, NA, NA), c(.5, 0, 1, 1.5), c(NA, 1, 0, .8), c(NA, 1.5, .8, 0) ) complete_Gamma(Gamma) ## Alternative representation of the same completion problem: my_graph <- igraph::graph_from_adjacency_matrix(rbind( c(0, 1, 0, 0), c(1, 0, 1, 1), c(0, 1, 0, 1), c(0, 1, 1, 0) ), mode = "undirected") Gamma_vec <- c(.5, 1, 1.5, .8) complete_Gamma(Gamma_vec, my_graph) ## Decomposable graph: G <- rbind( c(0, 5, 7, 6, NA), c(5, 0, 14, 15, NA), c(7, 14, 0, 5, 5), c(6, 15, 5, 0, 6), c(NA, NA, 5, 6, 0) ) complete_Gamma(G) ## Non-decomposable graph: G <- rbind( c(0, 5, 7, 6, 6), c(5, 0, 14, 15, 13), c(7, 14, 0, 5, 5), c(6, 15, 5, 0, 6), c(6, 13, 5, 6, 0) ) g <- igraph::make_ring(5) complete_Gamma(G, g)
Given a decomposable graph
and incomplete variogram matrix Gamma
,
returns the full Gamma
matrix implied by the conditional independencies.
complete_Gamma_decomposable(Gamma, graph = NULL)
complete_Gamma_decomposable(Gamma, graph = NULL)
Gamma |
A variogram matrix that is specified on the edges of |
graph |
|
A complete variogram matrix that agrees with Gamma
on the entries
corresponding to edges in graph
and the diagonals.
The corresponding matrix produced by
Gamma2Theta()
has zeros
in the remaining entries.
Other matrix completion related topics:
complete_Gamma()
,
complete_Gamma_general()
,
complete_Gamma_general_demo()
,
complete_Gamma_general_split()
Given a non-decomposable graph
, and (non-graphical) variogram matrix Gamma
,
modifies Gamma
in non-edge entries, such that the resulting matrix is a
variogram matrix with graphical structure described by graph
.
complete_Gamma_general( Gamma, graph, N = 10000, tol = get_large_tol(), check_tol = 100 )
complete_Gamma_general( Gamma, graph, N = 10000, tol = get_large_tol(), check_tol = 100 )
Gamma |
Numeric |
graph |
|
N |
Maximum number of iterations. |
tol |
Numeric scalar. Tolerance to be used when completing submatrices. |
check_tol |
Numeric/integer scalar. How often to check the tolerance when completing submatrices. |
A completed variogram matrix.
Other matrix completion related topics:
complete_Gamma()
,
complete_Gamma_decomposable()
,
complete_Gamma_general_demo()
,
complete_Gamma_general_split()
Given a graph
and variogram matrix Gamma
, returns the full Gamma
matrix implied by the conditional independencies.
DEMO VERSION: Returns a lot of details and allows specifying the graph list
that is used. Is way slower than other functions.
complete_Gamma_general_demo(Gamma, graph, N = 1000, tol = 0, gList = NULL)
complete_Gamma_general_demo(Gamma, graph, N = 1000, tol = 0, gList = NULL)
Gamma |
A complete variogram matrix (without any graphical structure). |
graph |
An |
N |
The maximal number of iterations of the algorithm. |
tol |
The tolerance to use when checking for zero entries in |
gList |
A list of graphs to be used instead of the output from |
A nested list, containing the following details.
The "error term" is the maximal absolute value of Theta
in a non-edge entry.
graph , N , tol
|
As in the input |
gList |
As in the input or computed by |
Gamma0 , Theta0 , err0
|
Initial |
iterations |
A nested list, containing the following infos for each performed iteration:
|
Other matrix completion related topics:
complete_Gamma()
,
complete_Gamma_decomposable()
,
complete_Gamma_general()
,
complete_Gamma_general_split()
Given a non-decomposable graph
, and (non-graphical) variogram matrix Gamma
,
modifies Gamma
in non-edge entries, such that the resulting matrix is a
variogram matrix with graphical structure described by graph
.
Does so by splitting graph
at complete separators into smaller subgraphs,
and calling complete_Gamma_general
for each subgraph/submatrix,
using multiple cores if available.
complete_Gamma_general_split( Gamma, graph, N = 10000, sub_tol = get_large_tol() * 0.001, check_tol = 100, mc_cores_overwrite = NULL, final_tol = get_large_tol() )
complete_Gamma_general_split( Gamma, graph, N = 10000, sub_tol = get_large_tol() * 0.001, check_tol = 100, mc_cores_overwrite = NULL, final_tol = get_large_tol() )
Gamma |
Numeric |
graph |
|
N |
Maximum number of iterations. |
sub_tol |
Numeric scalar. Tolerance to be used when completing submatrices.
Should be smaller than |
check_tol |
Numeric/integer scalar. How often to check the tolerance when completing submatrices. |
mc_cores_overwrite |
|
final_tol |
Numeric scalar. Check convergence of the final result with this tolerance. Skipped if this value is < 0. |
A completed variogram matrix.
Other matrix completion related topics:
complete_Gamma()
,
complete_Gamma_decomposable()
,
complete_Gamma_general()
,
complete_Gamma_general_demo()
A dataset containing river discharge data for tributaries of the Danube.
danube
danube
A named list
with four entries
data_clustered
A numeric matrix, containing pre-processed discharge data for each gauging station
data_raw
A numeric matrix, containing daily (raw) discharge data for each gauging station
info
A data frame, containing information about each gauging station
flow_edges
A two-column numeric matrix. Each row contains the indices (in info
)
of a pair of gauging stations that are directly connected by a river.
To obtain the matrix data_clustered
, the daily discharge data from the summer months of
1960 to 2010, given in data_raw
, was declustered, yielding between seven and ten observations per year.
Each row corresponds to one observation from this declustered time series,
the non-unique rownames indicate which year an observation is from.
Each column corresponds to one of the gauging stations,
with column indices in data_raw
/data_clustered
corresponding to row indices in info
.
See (Asadi et al. 2015) for details on the preprocessing and declustering.
info
is a data frame containing the following information for
each of the gauging stations or its corresponding catchment area:
RivNames
Name of the river at the gauging station
Lat
, Long
Coordinates of the gauging station
Lat_Center
, Long_Center
Coordinates of the center of the catchment corresponding to the gauging station
Alt
Mean altitude of the catchment
Area
Area of the catchment corresponding to the gauging station
Slope
Mean slope of the catchment
PlotCoordX
, PlotCoordY
X-Y-coordinates which can be used to arrange the gauging stations when plotting a flow graph.
Bavarian Environmental Agency https://www.gkd.bayern.de.
Asadi P, Davison AC, Engelke S (2015). “Extremes on river networks.” Ann. Appl. Stat., 9(4), 2023 – 2050. doi:10.1214/15-AOAS863.
Other danubeData:
getDanubeFlowGraph()
,
plotDanube()
Other datasets:
flights
dim(danube$data_clustered) colnames(danube$info)
dim(danube$data_clustered) colnames(danube$info)
Transforms the data
matrix empirically to the multivariate Pareto scale.
data2mpareto(data, p, na.rm = FALSE)
data2mpareto(data, p, na.rm = FALSE)
data |
Numeric |
p |
Numeric between 0 and 1. Probability used for the quantile to threshold the data. |
na.rm |
Logical. If rows containing NAs should be removed. |
The columns of the data
matrix are first transformed empirically to
standard Pareto distributions. Then, only the observations where at least
one component exceeds the p
-quantile of the standard Pareto distribution
are kept. Those observations are finally divided by the p
-quantile
of the standard Pareto distribution to standardize them to the multivariate Pareto scale.
If na.rm
is FALSE
, missing entries are left as such during the transformation of univariate marginals.
In the thresholding step, missing values are considered as -Inf
.
Numeric matrix, where
m
is the number
of rows in the original data
matrix that are above the threshold.
Other parameter estimation methods:
emp_chi()
,
emp_chi_multdim()
,
emp_vario()
,
emtp2()
,
fmpareto_HR_MLE()
,
fmpareto_graph_HR()
,
loglik_HR()
Other structure estimation methods:
eglatent()
,
eglearn()
,
emst()
,
fit_graph_to_Theta()
n <- 20 d <- 4 p <- .8 G <- cbind( c(0, 1.5, 1.5, 2), c(1.5, 0, 2, 1.5), c(1.5, 2, 0, 1.5), c(2, 1.5, 1.5, 0) ) set.seed(123) my_data <- rmstable(n, "HR", d = d, par = G) data2mpareto(my_data, p)
n <- 20 d <- 4 p <- .8 G <- cbind( c(0, 1.5, 1.5, 2), c(1.5, 0, 2, 1.5), c(1.5, 2, 0, 1.5), c(2, 1.5, 1.5, 0) ) set.seed(123) my_data <- rmstable(n, "HR", d = d, par = G) data2mpareto(my_data, p)
Following the methodology from Engelke and Taeb (2024), fits an extremal graph structure with latent variables.
eglatent( Gamma, lam1_list = c(0.1, 0.15, 0.19, 0.205), lam2_list = c(2), refit = TRUE, verbose = FALSE )
eglatent( Gamma, lam1_list = c(0.1, 0.15, 0.19, 0.205), lam2_list = c(2), refit = TRUE, verbose = FALSE )
Gamma |
conditionally negative semidefinite matrix. This will be typically the empirical variogram matrix. |
lam1_list |
Numeric vector of non-negative regularization parameters for eglatent.
Default is |
lam2_list |
Numeric vector of non-negative regularization parameters for eglatent.
Default is |
refit |
Logical scalar, if TRUE then the model is refit on the estimated graph to obtain an estimate of the Gamma matrix on that graph.
Default is |
verbose |
Logical scalar, indicating whether to print progress updates. |
The function fits one model for each combination
of values in lam1_list
and lam2_list
. All returned objects
have one entry per model. List consisting of:
#'
graph |
A list of |
rk |
Numeric vector containing the estimated ranks of the latent variables. |
G_est |
A list of numeric estimated |
G_refit |
A list of numeric estimated |
lambdas |
A list containing the values of |
Engelke S, Taeb A (2024). “Extremal graphical modeling with latent variables.” 2403.09604.
Other structure estimation methods:
data2mpareto()
,
eglearn()
,
emst()
,
fit_graph_to_Theta()
Following the methodology from Engelke et al. (2022), fits an extremal graph structure using the neighborhood selection approach (see Meinshausen and Bühlmann (2006)) or graphical lasso (see Friedman et al. (2008)).
eglearn( data, p = NULL, rholist = c(0.1, 0.15, 0.19, 0.205), reg_method = c("ns", "glasso"), complete_Gamma = FALSE )
eglearn( data, p = NULL, rholist = c(0.1, 0.15, 0.19, 0.205), reg_method = c("ns", "glasso"), complete_Gamma = FALSE )
data |
Numeric |
p |
Numeric between 0 and 1 or |
rholist |
Numeric vector of non-negative regularization parameters
for the lasso.
Default is |
reg_method |
One of |
complete_Gamma |
Whether you want to try fto complete Gamma matrix.
Default is |
List made of:
graph |
A list of |
Gamma |
A list of numeric estimated |
rholist |
The list of penalty coefficients. |
graph_ic |
A list of |
Gamma_ic |
A list of numeric |
Engelke S, Lalancette M, Volgushev S (2022).
“Learning extremal graphical structures in high dimensions.”
doi:10.48550/ARXIV.2111.00840, Available from https://arxiv.org/abs/2111.00840., 2111.00840, https://arxiv.org/abs/2111.00840.
Friedman J, Hastie T, Tibshirani R (2008).
“Sparse inverse covariance estimation with the graphical lasso.”
Biostatistics, 9(3), 432–441.
Meinshausen N, Bühlmann P (2006).
“High-dimensional graphs and variable selection with the Lasso.”
Ann. Statist., 34(3), 1436 – 1462.
doi:10.1214/009053606000000281.
Other structure estimation methods:
data2mpareto()
,
eglatent()
,
emst()
,
fit_graph_to_Theta()
set.seed(2) m <- generate_random_model(d=6) y <- rmpareto(n=500, par=m$Gamma) ret <- eglearn(y)
set.seed(2) m <- generate_random_model(d=6) y <- rmpareto(n=500, par=m$Gamma) ret <- eglearn(y)
Estimates empirically the matrix of bivariate extremal correlation coefficients .
emp_chi(data, p = NULL) emp_chi_pairwise(data, p = NULL, verbose = FALSE)
emp_chi(data, p = NULL) emp_chi_pairwise(data, p = NULL, verbose = FALSE)
data |
Numeric |
p |
Numeric scalar between 0 and 1 or |
verbose |
Print verbose progress information |
emp_chi_pairwise
calls emp_chi
for each pair of observations.
This is more robust if the data contains many NA
s, but can take rather long.
Numeric matrix . The matrix contains the
bivariate extremal coefficients
, for
.
Other parameter estimation methods:
data2mpareto()
,
emp_chi_multdim()
,
emp_vario()
,
emtp2()
,
fmpareto_HR_MLE()
,
fmpareto_graph_HR()
,
loglik_HR()
n <- 100 d <- 4 p <- .8 Gamma <- cbind( c(0, 1.5, 1.5, 2), c(1.5, 0, 2, 1.5), c(1.5, 2, 0, 1.5), c(2, 1.5, 1.5, 0) ) set.seed(123) my_data <- rmstable(n, "HR", d = d, par = Gamma) emp_chi(my_data, p)
n <- 100 d <- 4 p <- .8 Gamma <- cbind( c(0, 1.5, 1.5, 2), c(1.5, 0, 2, 1.5), c(1.5, 2, 0, 1.5), c(2, 1.5, 1.5, 0) ) set.seed(123) my_data <- rmstable(n, "HR", d = d, par = Gamma) emp_chi(my_data, p)
Estimates the d
-dimensional extremal correlation coefficient empirically.
emp_chi_multdim(data, p = NULL)
emp_chi_multdim(data, p = NULL)
data |
Numeric |
p |
Numeric scalar between 0 and 1 or |
Numeric scalar. The empirical d
-dimensional extremal correlation coefficient
for the
data
.
Other parameter estimation methods:
data2mpareto()
,
emp_chi()
,
emp_vario()
,
emtp2()
,
fmpareto_HR_MLE()
,
fmpareto_graph_HR()
,
loglik_HR()
n <- 100 d <- 2 p <- .8 G <- cbind( c(0, 1.5), c(1.5, 0) ) set.seed(123) my_data <- rmstable(n, "HR", d = d, par = G) emp_chi_multdim(my_data, p)
n <- 100 d <- 2 p <- .8 G <- cbind( c(0, 1.5), c(1.5, 0) ) set.seed(123) my_data <- rmstable(n, "HR", d = d, par = G) emp_chi_multdim(my_data, p)
of a Huesler-Reiss distributionEstimates the variogram of the Huesler-Reiss distribution empirically.
emp_vario(data, k = NULL, p = NULL) emp_vario_pairwise(data, k = NULL, p = NULL, verbose = FALSE)
emp_vario(data, k = NULL, p = NULL) emp_vario_pairwise(data, k = NULL, p = NULL, verbose = FALSE)
data |
Numeric |
k |
Integer between 1 and |
p |
Numeric between 0 and 1 or |
verbose |
Print verbose progress information |
emp_vario_pairwise
calls emp_vario
for each pair of observations.
This is more robust if the data contains many NA
s, but can take rather long.
Numeric matrix. The estimated variogram of the Huesler-Reiss distribution.
Other parameter estimation methods:
data2mpareto()
,
emp_chi()
,
emp_chi_multdim()
,
emtp2()
,
fmpareto_HR_MLE()
,
fmpareto_graph_HR()
,
loglik_HR()
G <- generate_random_Gamma(d=5) y <- rmpareto(n=100, par=G) Ghat <- emp_vario(y)
G <- generate_random_Gamma(d=5) y <- rmpareto(n=100, par=G) Ghat <- emp_vario(y)
Fits an extremal minimum spanning tree, where the edge weights are:
negative maximized log-likelihoods of the bivariate Huesler-Reiss distributions,
if method = "ML"
. See Engelke and Hitz (2020) for details.
empirical extremal variogram, if method = "vario"
. See Engelke and Volgushev (2022) for details.
empirical extremal correlation, if method = "chi"
. See Engelke and Volgushev (2022) for details.
emst(data, p = NULL, method = c("vario", "ML", "chi"), cens = FALSE)
emst(data, p = NULL, method = c("vario", "ML", "chi"), cens = FALSE)
data |
Numeric |
p |
Numeric between 0 and 1 or |
method |
One of |
cens |
Logical. This argument is considered only if |
List consisting of:
graph |
An |
Gamma |
Numeric |
Engelke S, Hitz AS (2020).
“Graphical models for extremes (with discussion).”
J. R. Stat. Soc. Ser. B Stat. Methodol., 82, 871–932.
Engelke S, Volgushev S (2022).
“Structure learning for extremal tree models.”
J. R. Stat. Soc. Ser. B Stat. Methodol..
doi:10.1111/rssb.12556, Forthcoming, https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/rssb.12556.
Other structure estimation methods:
data2mpareto()
,
eglatent()
,
eglearn()
,
fit_graph_to_Theta()
## Fitting a 4-dimensional HR minimum spanning tree my_graph <- igraph::graph_from_adjacency_matrix( rbind( c(0, 1, 0, 0), c(1, 0, 1, 1), c(0, 1, 0, 0), c(0, 1, 0, 0) ), mode = "undirected" ) n <- 100 Gamma_vec <- c(.5, 1.4, .8) complete_Gamma(Gamma = Gamma_vec, graph = my_graph) ## full Gamma matrix set.seed(123) my_data <- rmpareto_tree(n, "HR", tree = my_graph, par = Gamma_vec) my_fit <- emst(my_data, p = NULL, method = "ML", cens = FALSE)
## Fitting a 4-dimensional HR minimum spanning tree my_graph <- igraph::graph_from_adjacency_matrix( rbind( c(0, 1, 0, 0), c(1, 0, 1, 1), c(0, 1, 0, 0), c(0, 1, 0, 0) ), mode = "undirected" ) n <- 100 Gamma_vec <- c(.5, 1.4, .8) complete_Gamma(Gamma = Gamma_vec, graph = my_graph) ## full Gamma matrix set.seed(123) my_data <- rmpareto_tree(n, "HR", tree = my_graph, par = Gamma_vec) my_fit <- emst(my_data, p = NULL, method = "ML", cens = FALSE)
This function implements a block descent algorithm to find the maximum of the Gaussian log-likelihood under the constraint that the concentration matrix is a Laplacian matrix. See Röttger et al. (2021) for details.
emtp2(Gamma, tol = 1e-06, verbose = TRUE, initial_point = TRUE)
emtp2(Gamma, tol = 1e-06, verbose = TRUE, initial_point = TRUE)
Gamma |
conditionally negative semidefinite matrix. This will be typically the empirical variogram matrix. |
tol |
The convergence tolerance. The algorithm terminates when the sum of absolute differences between two iterations is below |
verbose |
if TRUE (default) the output will be printed. |
initial_point |
if TRUE (default), the algorithm will construct an initial point before the iteration steps. |
A list consisting of:
G_emtp2 |
The optimal value of the variogram matrix |
it |
The number of iterations |
Röttger F, Engelke S, Zwiernik P (2021). “Total positivity in multivariate extremes.” doi:10.48550/ARXIV.2112.14727, https://arxiv.org/abs/2112.14727.
Other parameter estimation methods:
data2mpareto()
,
emp_chi()
,
emp_chi_multdim()
,
emp_vario()
,
fmpareto_HR_MLE()
,
fmpareto_graph_HR()
,
loglik_HR()
Ensures the symmetry of a square matrix by averaging it with its transpose. Optionally verifies that the matrix was close to symmetric before.
Makes sure zeros are "numerically zero", by truncating all small values.
ensure_matrix_symmetry(M, checkTol = Inf, alert = NULL) truncate_zeros(M, tol = get_small_tol()) ensure_matrix_symmetry_and_truncate_zeros( M, tol = get_small_tol(), checkTol = Inf )
ensure_matrix_symmetry(M, checkTol = Inf, alert = NULL) truncate_zeros(M, tol = get_small_tol()) ensure_matrix_symmetry_and_truncate_zeros( M, tol = get_small_tol(), checkTol = Inf )
M |
Numeric square matrix. |
checkTol |
Positive scalar. If the maximum absolute difference between |
alert |
Passed to |
tol |
All entries with absolute value below this value are truncated to zero. |
The adjusted value of M
.
Other input validation functions:
checkGamma()
,
check_graph()
,
check_partial_matrix_and_graph()
Fits a graph to an empirical Gamma matrix by computing the corresponding
Theta matrix using Gamma2Theta()
and greedily chooses m
edges that
correspond to high absolute values in Theta.
fit_graph_to_Theta(data, m = NULL, Gamma_emp = NULL)
fit_graph_to_Theta(data, m = NULL, Gamma_emp = NULL)
data |
The (standardized) data from which to compute Gamma |
m |
The number of edges to add, defaults to the number of edges in a tree |
Gamma_emp |
The empirical Gamma matrix
(can be |
A list containing an [igraph::graph
] object and a fitted Gamma
matrix
Other structure estimation methods:
data2mpareto()
,
eglatent()
,
eglearn()
,
emst()
Convert a numeric matrix containing flight counts between airports to a data frame containing a list of connections.
flightCountMatrixToConnectionList(nFlightsPerConnection, directed = TRUE)
flightCountMatrixToConnectionList(nFlightsPerConnection, directed = TRUE)
nFlightsPerConnection |
A square, numeric matrix with identical column- and row-names. Each entry represents the number of flights from the airport indexing the row to the airport indexing the column in some arbitrary time period. |
directed |
Logical scalar. Whether flights A->B and B->A should be considered separately. |
A data frame with columns departureAirport
, arrivalAirport
, nFlights
.
Each row represents one connection with >=1 flights in the input matrix.
Other flight data related topics:
flights
,
getFlightDelayData()
,
getFlightGraph()
,
plotFlights()
flightCountMatrixToConnectionList(flights$flightCounts[1:100, 1:100, 1])
flightCountMatrixToConnectionList(flights$flightCounts[1:100, 1:100, 1])
A dataset containing daily total delays of major U.S. airlines. The raw data was obtained from the U.S. Bureau of Transportation Statistics, and pre-processed as described in Hentschel et al. (2022). Note: The CRAN version of this package contains only data from 2010-2013. The full dataset is available in the GitHub version of this package.
flights
flights
A named list
with three entries:
airports
A data.frame
, containing information about US airports
delays
A numeric matrix, containing daily aggregated delays at the airports in the dataset
flightCounts
A numeric array, containing yearly flight numbers between airports in the dataset
flightCounts
is a three-dimensional array, containing the number of flights in the dataset
between each pair of airports, aggregated on a yearly basis.
Each entry is the total number of flights between the departure airport (row)
and destination airport (column) in a given year (dimension 3).
This array does not contain any NA
s, even if an airport did not operate
at all in a given year, which is simply indicated by zeros.
delays
is a three-dimensional array containing daily total positive delays,
in minutes, of incoming and outgoing flights respectively.
Each column corresponds to an airport in the dataset and each row corresponds
to a day. The third dimension has length two, 'arrivals'
containing delays of
incoming flights and 'departures'
containing delays of outgoing flights.
Zeros indicate that there were flights arriving/departing at that airport
on a given day, but none of them had delays. NA
s indicate that there were
no flights arriving/departing at that airport on that day at all.
airports
is a data frame containing the following information about a number of US airports.
Some entries are missing, which is indicated by NA
s.
IATA
3-letter IATA code
Name
name of the airport
City
main city served by the airport
Country
country or territory where the airport is located (mostly "United States"
)
ICAO
4-letter ICAO code
Latitude
latitude of the airport, in decimal degrees
Longitude
longitude of the airport, in decimal degrees
Altitude
altitude of the airport, in feet
Timezone
timezone of the airport, in hours offset from UTC
DST
Daylight savings time used at the airport. 'A'=US/Canada, 'N'=None.
Timezone2
name of the timezone of the airport
Raw delays data:
Fields/Forms used in the raw data:
Airports (includes license information):
Hentschel M, Engelke S, Segers J (2022). “Statistical Inference for Hüsler-Reiss Graphical Models Through Matrix Completions.” doi:10.48550/ARXIV.2210.14292, https://arxiv.org/abs/2210.14292.
Other flight data related topics:
flightCountMatrixToConnectionList()
,
getFlightDelayData()
,
getFlightGraph()
,
plotFlights()
Other datasets:
danube
# Get total number of flights in the dataset: totalFlightCounts <- apply(flights$flightCounts, c(1,2), sum) # Get number of flights for specific years in the dataset: flightCounts_10_11 <- apply(flights$flightCounts[,,c('2010', '2011')], c(1,2), sum) # Get list of connections from 2008: connections_10 <- flightCountMatrixToConnectionList(flights$flightCounts[,,'2010'])
# Get total number of flights in the dataset: totalFlightCounts <- apply(flights$flightCounts, c(1,2), sum) # Get number of flights for specific years in the dataset: flightCounts_10_11 <- apply(flights$flightCounts[,,c('2010', '2011')], c(1,2), sum) # Get list of connections from 2008: connections_10 <- flightCountMatrixToConnectionList(flights$flightCounts[,,'2010'])
Fits the parameter matrix (variogram) of a multivariate Huesler-Reiss Pareto distribution with a given graphical structure, using maximum-likelihood estimation or the empirical variogram.
fmpareto_graph_HR( data, graph, p = NULL, method = c("vario", "ML"), handleCliques = c("average", "full", "sequential"), ... )
fmpareto_graph_HR( data, graph, p = NULL, method = c("vario", "ML"), handleCliques = c("average", "full", "sequential"), ... )
data |
Numeric |
graph |
Undirected, connected [ |
p |
Numeric between 0 and 1 or |
method |
One of |
handleCliques |
How to handle cliques and separators in the graph. See details. |
... |
Arguments passed to |
If handleCliques='average'
, the marginal parameter matrix is estimated for
each maximal clique of the graph
and then combined into a partial parameter
matrix by taking the average of entries from overlapping cliques. Lastly,
the full parameter matrix is computed using complete_Gamma()
.
If handleCliques='full'
, first the full parameter matrix is estimated using the
specified method
and then the non-edge entries are adjusted such that the
final parameter matrix has the graphical structure indicated by graph
.
If handleCliques='sequential'
, graph
must be decomposable, and
method='ML'
must be specified. The parameter matrix is first estimated on
the (recursive) separators and then on the rest of the cliques, keeping
previously estimated entries fixed.
If method='ML'
, the computational cost is mostly influenced by the total size
of the graph (if handleCliques='full'
) or the size of the cliques,
and can already take a significant amount of time for modest dimensions (e.g. d=3
).
The estimated parameter matrix.
Other parameter estimation methods:
data2mpareto()
,
emp_chi()
,
emp_chi_multdim()
,
emp_vario()
,
emtp2()
,
fmpareto_HR_MLE()
,
loglik_HR()
Fits the parameters of a multivariate Huesler-Reiss Pareto distribution using (censored) maximum likelihood estimation.
fmpareto_HR_MLE( data, p = NULL, cens = FALSE, init = NULL, fixParams = integer(0), useTheta = TRUE, maxit = 100, graph = NULL, optMethod = "BFGS", nAttemptsFixInit = 3 )
fmpareto_HR_MLE( data, p = NULL, cens = FALSE, init = NULL, fixParams = integer(0), useTheta = TRUE, maxit = 100, graph = NULL, optMethod = "BFGS", nAttemptsFixInit = 3 )
data |
Numeric |
p |
Numeric scalar between 0 and 1 or |
cens |
Logical scalar. If true, then censored likelihood contributions are used for
components below the threshold. This is computationally expensive and by default |
init |
Numeric vector or numeric matrix. Initial parameter values in the optimization.
If |
fixParams |
Numeric or logical vector. Indices of the parameter vectors that are kept
fixed (identical to |
useTheta |
Logical. Whether to perform the MLE optimization in terms of Theta or Gamma. |
maxit |
Positive integer. The maximum number of iterations in the optimization. |
graph |
Graph object from |
optMethod |
String. A valid optimization method used by the function
stats::optim. By default, |
nAttemptsFixInit |
Numeric. If |
Only the parameters corresponding to edges in graph
are optimized, the remaining
entries are implied by the graphical structure. If graph
is NULL
, the complete graph is used.
The optimization is done either in terms of the variogram (Gamma) or precision matrix (Theta),
depending on the value of useTheta
. If graph
is non-decomposable,
useTheta=TRUE
is significantly faster, otherwise they are similar in performance.
List consisting of:
convergence |
Logical. Indicates whether the optimization converged or not. |
Gamma |
Numeric |
Theta |
Numeric |
par |
Numeric vector. Optimal parameters, including fixed parameters. |
par_opt |
Numeric. Optimal parameters, excluding fixed parameters. |
nllik |
Numeric. Optimal value of the negative log-likelihood function. |
hessian |
Numeric matrix. Estimated Hessian matrix of the estimated parameters. |
Other parameter estimation methods:
data2mpareto()
,
emp_chi()
,
emp_chi_multdim()
,
emp_vario()
,
emtp2()
,
fmpareto_graph_HR()
,
loglik_HR()
Creates a graph object representing the graph structure implied by a parameter matrix.
Gamma2graph(Gamma, tol = get_large_tol(), check = TRUE) Sigma2graph(Sigma, tol = get_large_tol(), k = NULL, full = FALSE, check = TRUE) Theta2graph(Theta, tol = get_large_tol(), k = NULL, full = FALSE, check = TRUE) partialMatrixToGraph(M)
Gamma2graph(Gamma, tol = get_large_tol(), check = TRUE) Sigma2graph(Sigma, tol = get_large_tol(), k = NULL, full = FALSE, check = TRUE) Theta2graph(Theta, tol = get_large_tol(), k = NULL, full = FALSE, check = TRUE) partialMatrixToGraph(M)
Gamma |
Numeric |
tol |
Numeric scalar. Entries in the precision matrix with absolute value smaller than this are considered to be zero. |
check |
Whether to check the inputs and call |
Sigma |
Numeric |
k |
|
full |
Logical. If |
Theta |
Numeric |
M |
Partial matrix with |
An igraph::graph
object.
Other parameter matrix transformations:
Gamma2Sigma()
,
chi2Gamma()
,
par2Matrix()
Converts between different matrices that parametrize the same
Huesler-Reiss distribution:
,
,
,
,
.
The
matrices
and
can also be given/returned
as
matrices with the kth row and column filled with zeros.
Gamma2Sigma(Gamma, k = NULL, full = FALSE, check = TRUE) Gamma2Theta(Gamma, k = NULL, full = FALSE, check = TRUE) Sigma2Gamma(Sigma, k = NULL, full = FALSE, check = TRUE) Theta2Gamma(Theta, k = NULL, full = FALSE, check = TRUE) Sigma2Theta( Sigma, k1 = NULL, k2 = NULL, full1 = FALSE, full2 = FALSE, check = TRUE ) Theta2Sigma( Theta, k1 = NULL, k2 = NULL, full1 = FALSE, full2 = FALSE, check = TRUE ) Theta2Theta( Theta, k1 = NULL, k2 = NULL, full1 = FALSE, full2 = FALSE, check = TRUE ) Sigma2Sigma( Sigma, k1 = NULL, k2 = NULL, full1 = FALSE, full2 = FALSE, check = TRUE ) Gamma2Gamma(Gamma, check = TRUE) matrix2matrix( M, name1 = c("Gamma", "Sigma", "Theta")[1], name2 = c("Gamma", "Sigma", "Theta")[1], k1 = NULL, k2 = NULL, full1 = FALSE, full2 = FALSE, check = TRUE )
Gamma2Sigma(Gamma, k = NULL, full = FALSE, check = TRUE) Gamma2Theta(Gamma, k = NULL, full = FALSE, check = TRUE) Sigma2Gamma(Sigma, k = NULL, full = FALSE, check = TRUE) Theta2Gamma(Theta, k = NULL, full = FALSE, check = TRUE) Sigma2Theta( Sigma, k1 = NULL, k2 = NULL, full1 = FALSE, full2 = FALSE, check = TRUE ) Theta2Sigma( Theta, k1 = NULL, k2 = NULL, full1 = FALSE, full2 = FALSE, check = TRUE ) Theta2Theta( Theta, k1 = NULL, k2 = NULL, full1 = FALSE, full2 = FALSE, check = TRUE ) Sigma2Sigma( Sigma, k1 = NULL, k2 = NULL, full1 = FALSE, full2 = FALSE, check = TRUE ) Gamma2Gamma(Gamma, check = TRUE) matrix2matrix( M, name1 = c("Gamma", "Sigma", "Theta")[1], name2 = c("Gamma", "Sigma", "Theta")[1], k1 = NULL, k2 = NULL, full1 = FALSE, full2 = FALSE, check = TRUE )
Gamma |
Numeric |
k |
|
full |
Logical. If |
check |
Whether to check the inputs and call |
Sigma |
Numeric |
Theta |
Numeric |
k1 |
|
k2 |
|
full1 |
Logical. If |
full2 |
Logical. If |
M |
Numeric matrix, |
name1 |
Name of the input representation. |
name2 |
Name of the output representation. |
If k
, k1
, or k2
is NULL
, the corresponding full*
argument is ignored.
Gamma2Gamma
only checks and returns the input.
matrix2matrix
is a wrapper function that calls the corresponding
conversion function implied by name1
, name2
.
The desired parameter matrix corresponding to the specified inputs.
Other parameter matrix transformations:
Gamma2graph()
,
chi2Gamma()
,
par2Matrix()
Generate random graphs with different structures. These do not follow well-defined distributions and are mostly meant for quickly generating test models.
generate_random_chordal_graph( d, cMin = 2, cMax = 6, sMin = 1, sMax = 4, block_graph = FALSE, ... ) generate_random_connected_graph( d, m = NULL, p = 2/(d + 1), maxTries = 1000, ... ) generate_random_tree(d) generate_random_cactus(d, cMin = 2, cMax = 6)
generate_random_chordal_graph( d, cMin = 2, cMax = 6, sMin = 1, sMax = 4, block_graph = FALSE, ... ) generate_random_connected_graph( d, m = NULL, p = 2/(d + 1), maxTries = 1000, ... ) generate_random_tree(d) generate_random_cactus(d, cMin = 2, cMax = 6)
d |
Number of vertices in the graph |
cMin |
Minimal size of cliques/blocks (last one might be smaller if necessary) |
cMax |
Maximal size of cliques/blocks |
sMin |
Minimal size of separators |
sMax |
Maximal size of separators |
block_graph |
Force |
... |
Ignored, only allowed for compatibility |
m |
Number of edges in the graph (specify this or |
p |
Probability of each edge being in the graph (specify this or |
maxTries |
Maximum number of tries to produce a connected Erdoes-Renyi graph |
generate_random_chordal_graph
generates a random chordal graph by starting with a (small) complete graph
and then adding new cliques until the specified size is reached.
The sizes of cliques and separators can be specified.
generate_random_connected_graph
first tries to generate an Erdoes-Renyi graph, if that fails, falls back
to producing a tree and adding random edges to that tree.
generate_random_cactus
generates a random cactus graph (mostly useful for benchmarking).
An [igraph::graph
] object
Other example generation functions:
generate_random_Gamma()
,
generate_random_graphical_Gamma()
,
generate_random_integer_Gamma()
,
generate_random_model()
,
generate_random_spd_matrix()
Generates a valid Gamma matrix with a given dimension
generate_random_Gamma(d, ...)
generate_random_Gamma(d, ...)
d |
Size of the matrix |
... |
Further arguments passed to |
Other example generation functions:
generate_random_chordal_graph()
,
generate_random_graphical_Gamma()
,
generate_random_integer_Gamma()
,
generate_random_model()
,
generate_random_spd_matrix()
Generates a valid Gamma matrix with conditional independence structure specified by a graph
generate_random_graphical_Gamma(graph, ...)
generate_random_graphical_Gamma(graph, ...)
graph |
An |
... |
Further arguments passed to |
Other example generation functions:
generate_random_Gamma()
,
generate_random_chordal_graph()
,
generate_random_integer_Gamma()
,
generate_random_model()
,
generate_random_spd_matrix()
Generates a random variogram Matrix by producing a matrix
B
with random
integer entries between -b
and b
, computing S = B %*% t(B)
,
and passing this S
to Sigma2Gamma()
.
This process is repeated with an increasing b
until a valid Gamma matrix
is produced.
generate_random_integer_Gamma(d, b = 2, b_step = 1)
generate_random_integer_Gamma(d, b = 2, b_step = 1)
d |
Number of rows/columns in the output matrix |
b |
Initial |
b_step |
By how much |
A numeric variogram matrix with integer entries
Other example generation functions:
generate_random_Gamma()
,
generate_random_chordal_graph()
,
generate_random_graphical_Gamma()
,
generate_random_model()
,
generate_random_spd_matrix()
generate_random_integer_Gamma(5, 2, 0.1)
generate_random_integer_Gamma(5, 2, 0.1)
Generates a random connected graph and Gamma matrix with conditional independence structure corresponding to that graph.
generate_random_model(d, graph_type = "general", ...)
generate_random_model(d, graph_type = "general", ...)
d |
Number of vertices in the graph |
graph_type |
|
... |
Further arguments passed to functions generating the graph and Gamma matrix |
Other example generation functions:
generate_random_Gamma()
,
generate_random_chordal_graph()
,
generate_random_graphical_Gamma()
,
generate_random_integer_Gamma()
,
generate_random_spd_matrix()
set.seed(1) d <- 12 generate_random_model(d, 'tree') generate_random_model(d, 'block') generate_random_model(d, 'decomposable') generate_random_model(d, 'general') generate_random_model(d, 'complete')
set.seed(1) d <- 12 generate_random_model(d, 'tree') generate_random_model(d, 'block') generate_random_model(d, 'decomposable') generate_random_model(d, 'general') generate_random_model(d, 'complete')
Generates a random symmetric positive definite matrix.
This is done by generating a random
matrix
B
,
then computing B %*% t(B)
,
and then normalizing the matrix to approximately single digit entries.
generate_random_spd_matrix(d, bMin = -10, bMax = 10, ...)
generate_random_spd_matrix(d, bMin = -10, bMax = 10, ...)
d |
Number of rows/columns |
bMin |
Minimum value of entries in |
bMax |
Maximum value of entries in |
... |
Ignored, only allowed for compatibility |
Other example generation functions:
generate_random_Gamma()
,
generate_random_chordal_graph()
,
generate_random_graphical_Gamma()
,
generate_random_integer_Gamma()
,
generate_random_model()
Get a function that can be used to alert the user of invalid inputs.
Returns the value implied by the overwrite
argument,
or the option "graphicalExtremes.default.alert"
,
falling back to warning()
if neither is specified.
get_alert_function(overwrite = NULL)
get_alert_function(overwrite = NULL)
overwrite |
|
A function that takes an arbitrary number of strings as arguments.
Other default parameters:
get_mc_cores()
,
get_small_tol()
Helper function that returns the number of cores to be used in parallel computations.
Will always be 1 on Windows. On other systems, this value can be set using
setOption('graphicalExtremes.mc.cores', ...)
.
get_mc_cores(overwrite = NULL)
get_mc_cores(overwrite = NULL)
overwrite |
Use this value (if it is valid and not on Windows) |
An integer to be used as number of cores
Other default parameters:
get_alert_function()
,
get_small_tol()
Helper function that returns the tolerance to be used in internal computations.
get_small_tol(overwrite = NULL) get_large_tol(overwrite = NULL)
get_small_tol(overwrite = NULL) get_large_tol(overwrite = NULL)
overwrite |
|
There are two different tolerances used in the package, for details see
graphicalExtremes-package
. The default values for these tolerances can be
set using the options "graphicalExtremes.tol.small"
and
"graphicalExtremes.tol.large"
.
A non-negative numerical scalar
Other default parameters:
get_alert_function()
,
get_mc_cores()
Returns an igraph::graph
object representing the flow graph of the danube
dataset.
getDanubeFlowGraph(stationIndices = NULL, directed = FALSE)
getDanubeFlowGraph(stationIndices = NULL, directed = FALSE)
stationIndices |
Logical or numerical vector. Indicating which stations to include. |
directed |
Logical. Whether the graph should be directed (in the direction of flow). |
An igraph::graph
object.
Other danubeData:
danube
,
plotDanube()
Get filtered flight delay data, containing only a selection of dates and airports. Currently, all possible selections correspond to the case study in Hentschel et al. (2022).
getFlightDelayData( what = c("delays", "IATAs", "dates"), airportFilter = c("all", "tcCluster", "tcAll"), dateFilter = c("all", "tcTrain", "tcTest", "tcAll"), delayFilter = c("totals", "arrivals", "departures")[1] )
getFlightDelayData( what = c("delays", "IATAs", "dates"), airportFilter = c("all", "tcCluster", "tcAll"), dateFilter = c("all", "tcTrain", "tcTest", "tcAll"), delayFilter = c("totals", "arrivals", "departures")[1] )
what |
Whether to get the array of delays (numerical),
or just the vector of airport codes ( |
airportFilter |
Which airports to include. Specify exactly one. See details below. |
dateFilter |
Which dates to include. Specify exactly one. See details below. |
delayFilter |
Which kinds of delays to include. Specify one or more.
Possible values are |
The provided lists of airports and dates correspond to the ones used in
the case study of Hentschel et al. (2022).
The argument airportFilter="tcCluster"
corresponds to the airports in the analyzed "Texas Cluster",
airportFilter="tcAll"
corresponds to all airports used in the previous clustering step,
airportFilter="all"
corresponds to all airports in the dataset.
Similarly, dateFilter="tcTrain"
selects the dates from the training set,
dateFilter="tcTest"
the ones from the test/validation set.
To get the union of these sets, specify dateFilter="tcAll"
.
To get all dates in the dataset (possibly more than for "tcAll"),
specify dateFilter="all"
.
If what="IATAs"
or what="dates"
, a character vector.
If required, it can be converted to Date
objects using as.Date()
.
If what="delays"
, a three-dimensional array or two-dimensional matrix,
with dimensions corresponding to dates, airports, and delay types.
Hentschel M, Engelke S, Segers J (2022). “Statistical Inference for Hüsler-Reiss Graphical Models Through Matrix Completions.” doi:10.48550/ARXIV.2210.14292, https://arxiv.org/abs/2210.14292.
Other flight data related topics:
flightCountMatrixToConnectionList()
,
flights
,
getFlightGraph()
,
plotFlights()
Convert the info from flights$flightCounts
to an igraph::graph
object.
getFlightGraph(IATAs = NULL, years = NULL, minNFlights = 1, directed = FALSE)
getFlightGraph(IATAs = NULL, years = NULL, minNFlights = 1, directed = FALSE)
IATAs |
Character vector. IATA codes of airports to include |
years |
Character vector. Years to include (as strings). |
minNFlights |
Numerical scalar. Minimum number of flights on a connection to be included as an edge. |
directed |
Logical scalar. Whether flights A->B and B->A should be considered separately. |
An igraph::graph
object containing a vertex for each airport
and an edge whenever there are at least minNFlights
between two airports.
Other flight data related topics:
flightCountMatrixToConnectionList()
,
flights
,
getFlightDelayData()
,
plotFlights()
g <- getFlightGraph()
g <- getFlightGraph()
Computes (censored) Huesler-Reiss log-likelihood, AIC, and BIC values.
loglik_HR(data, p = NULL, graph = NULL, Gamma, cens = FALSE)
loglik_HR(data, p = NULL, graph = NULL, Gamma, cens = FALSE)
data |
Numeric |
p |
Numeric between 0 and 1 or |
graph |
An [ |
Gamma |
Numeric |
cens |
Boolean. If true, then censored log-likelihood is computed.
By default, |
Numeric vector c("loglik"=..., "aic"=..., "bic"=...)
with the evaluated
log-likelihood, AIC, and BIC values.
Other parameter estimation methods:
data2mpareto()
,
emp_chi()
,
emp_chi_multdim()
,
emp_vario()
,
emtp2()
,
fmpareto_HR_MLE()
,
fmpareto_graph_HR()
Plotting function to visualize the river flow data from the danube
dataset.
Requires ggplot2
to be installed.
plotDanube( stationIndices = NULL, graph = NULL, directed = NULL, plotStations = TRUE, plotConnections = TRUE, labelStations = FALSE, returnGGPlot = FALSE, useStationVolume = FALSE, useConnectionVolume = FALSE, mapCountries = c("Germany"), vertexColors = NULL, vertexShapes = NULL, edgeColors = NULL, xyRatio = NULL, clipMap = 1.2, useLatex = FALSE, edgeAlpha = 0.2 ) plotDanubeIGraph( stationIndices = NULL, graph = NULL, directed = NULL, labelStations = TRUE, vertexColors = NULL, vertexShapes = NULL, edgeColors = NULL, ... )
plotDanube( stationIndices = NULL, graph = NULL, directed = NULL, plotStations = TRUE, plotConnections = TRUE, labelStations = FALSE, returnGGPlot = FALSE, useStationVolume = FALSE, useConnectionVolume = FALSE, mapCountries = c("Germany"), vertexColors = NULL, vertexShapes = NULL, edgeColors = NULL, xyRatio = NULL, clipMap = 1.2, useLatex = FALSE, edgeAlpha = 0.2 ) plotDanubeIGraph( stationIndices = NULL, graph = NULL, directed = NULL, labelStations = TRUE, vertexColors = NULL, vertexShapes = NULL, edgeColors = NULL, ... )
stationIndices |
Logical or numerical vector, indicating the stations to be plotted. |
graph |
An |
directed |
Logical. Whether to consider the flow graph as directed. |
plotStations |
Logical. Whether to plot the stations. |
plotConnections |
Logical. Whether to plot the connections. |
labelStations |
Logical. Whether to label stations. |
returnGGPlot |
If |
useStationVolume |
Logical. Whether to indicate flow volume at a station by circle size. |
useConnectionVolume |
Logical. Whether to indicate flow volume on a connection by line width. |
mapCountries |
Which country borders to show using |
vertexColors |
Vector with color information for vertices. |
vertexShapes |
Vector with shape information for vertices. |
edgeColors |
Vector with color information for edges. |
xyRatio |
Approximate X-Y-ratio (w.r.t. distance on the ground) of the area shown in the plot. |
clipMap |
Logical or numeric scalar. Whether to ignore the map image when determining the axis limits of the plot. If it is a positive scalar, the plot limits are extended by that factor. |
useLatex |
Whether to format numbers etc. as latex code (useful when plotting to tikz). |
edgeAlpha |
Numeric scalar between 0 and 1. The alpha value to be used when plotting edges/connections. |
... |
Passed through to |
The values of vertexColors
, vertexShapes
, and edgeColors
are interpreted differently
by ggplot2::geom_point
/ggplot2::geom_segment
and igraph::plot.igraph()
.
plotDanube
uses a combination of ggplot2
functions to plot the graph.
plotDanubeIGraph
uses igraph::plot.igraph
to plot the graph.
Other danubeData:
danube
,
getDanubeFlowGraph()
# Basic plot graphicalExtremes::plotDanube() # Plot flow volumes graphicalExtremes::plotDanube( clipMap = 1.2, useConnectionVolume = TRUE, useStationVolume = TRUE ) # Plot other graph structures nStations <- nrow(graphicalExtremes::danube$info) g <- igraph::erdos.renyi.game(nStations, nStations, 'gnm') graphicalExtremes::plotDanube( clipMap = 1.2, graph = g )
# Basic plot graphicalExtremes::plotDanube() # Plot flow volumes graphicalExtremes::plotDanube( clipMap = 1.2, useConnectionVolume = TRUE, useStationVolume = TRUE ) # Plot other graph structures nStations <- nrow(graphicalExtremes::danube$info) g <- igraph::erdos.renyi.game(nStations, nStations, 'gnm') graphicalExtremes::plotDanube( clipMap = 1.2, graph = g )
Plotting function to visualize the flight connections from the flights
dataset.
This function requires the package ggplot2
to be installed.
plotFlights( airportIndices = NULL, airports_sel = NULL, connections_sel = NULL, graph = NULL, plotAirports = TRUE, plotConnections = TRUE, labelAirports = FALSE, returnGGPlot = FALSE, useAirportNFlights = FALSE, useConnectionNFlights = FALSE, minNFlights = 0, map = "state", vertexColors = NULL, vertexShapes = NULL, edgeColors = NULL, xyRatio = NULL, clipMap = FALSE, useLatex = FALSE, edgeAlpha = 0.2 )
plotFlights( airportIndices = NULL, airports_sel = NULL, connections_sel = NULL, graph = NULL, plotAirports = TRUE, plotConnections = TRUE, labelAirports = FALSE, returnGGPlot = FALSE, useAirportNFlights = FALSE, useConnectionNFlights = FALSE, minNFlights = 0, map = "state", vertexColors = NULL, vertexShapes = NULL, edgeColors = NULL, xyRatio = NULL, clipMap = FALSE, useLatex = FALSE, edgeAlpha = 0.2 )
airportIndices |
The indices of the airports (w.r.t. |
airports_sel |
The airports to plot. Might be further subset by arguments |
connections_sel |
A three columns data frame as output by |
graph |
An optional |
plotAirports |
Logical. Whether to plot the airports specified. |
plotConnections |
Logical. Whether to plot the connections specified. |
labelAirports |
Logical. Whether to show the IATA code next to each plotted airport. |
returnGGPlot |
If |
useAirportNFlights |
Logical. Whether to vary the size of the circles representing airports in the plot, according to the number of flights at that airport. |
useConnectionNFlights |
Logical. Whether to vary the size of the edges representing connections in the plot, according to the number of flights on that connection. |
minNFlights |
Numeric scalar. Only plot connections with at least this many flights. |
map |
String or |
vertexColors |
Optional vector, named with IATA codes, to be used as colors for the vertices/airports. |
vertexShapes |
Optional vector, named with IATA codes, to be used as shapes for the vertices/airports. Is coerced to |
edgeColors |
Optional vector or symmetric matrix (character or numeric), to be used as colors for edges/connections.
If this is a vector, its entries must match the plotted connections (in the order specified in |
xyRatio |
Approximate X-Y-ratio (w.r.t. distance on the ground) of the area shown in the plot. |
clipMap |
Logical or numeric scalar. Whether to ignore the map image when determining the axis limits of the plot. If it is a positive scalar, the plot limits are extended by that factor. |
useLatex |
Whether to format numbers etc. as latex code (useful when plotting to tikz). |
edgeAlpha |
Numeric scalar between 0 and 1. The alpha value to be used when plotting edges/connections. |
If returnGGPlot
is TRUE
, a ggplot2::ggplot
object, otherwise NULL
.
Other flight data related topics:
flightCountMatrixToConnectionList()
,
flights
,
getFlightDelayData()
,
getFlightGraph()
# Plot all airports in the dataset plotFlights(plotConnections = FALSE, map = 'world') # Plot a selection of airports plotFlights(c('JFK', 'SFO', 'LAX'), useConnectionNFlights = TRUE, useAirportNFlights = TRUE) # Plot airports with a custom connections graph IATAs <- c('ACV', 'BFL', 'EUG', 'SFO', 'MRY') graph <- igraph::make_full_graph(length(IATAs)) plotFlights(IATAs, graph=graph, clipMap = 1.5)
# Plot all airports in the dataset plotFlights(plotConnections = FALSE, map = 'world') # Plot a selection of airports plotFlights(c('JFK', 'SFO', 'LAX'), useConnectionNFlights = TRUE, useAirportNFlights = TRUE) # Plot airports with a custom connections graph IATAs <- c('ACV', 'BFL', 'EUG', 'SFO', 'MRY') graph <- igraph::make_full_graph(length(IATAs)) plotFlights(IATAs, graph=graph, clipMap = 1.5)
Simulates exact samples of a multivariate Pareto distribution.
rmpareto( n, model = c("HR", "logistic", "neglogistic", "dirichlet"), d = NULL, par )
rmpareto( n, model = c("HR", "logistic", "neglogistic", "dirichlet"), d = NULL, par )
n |
Number of simulations. |
model |
The parametric model type; one of:
|
d |
Dimension of the multivariate Pareto distribution.
In some cases this can be |
par |
Respective parameter for the given
|
The simulation follows the algorithm in Engelke and Hitz (2020). For details on the parameters of the Huesler-Reiss, logistic and negative logistic distributions see Dombry et al. (2016), and for the Dirichlet distribution see Coles and Tawn (1991).
Numeric matrix of simulations of the
multivariate Pareto distribution.
Coles S, Tawn JA (1991).
“Modelling extreme multivariate events.”
J. R. Stat. Soc. Ser. B Stat. Methodol., 53, 377–392.
Dombry C, Engelke S, Oesting M (2016).
“Exact simulation of max-stable processes.”
Biometrika, 103, 303–317.
Engelke S, Hitz AS (2020).
“Graphical models for extremes (with discussion).”
J. R. Stat. Soc. Ser. B Stat. Methodol., 82, 871–932.
Other sampling functions:
rmpareto_tree()
,
rmstable()
,
rmstable_tree()
## A 4-dimensional HR distribution n <- 10 d <- 4 G <- cbind( c(0, 1.5, 1.5, 2), c(1.5, 0, 2, 1.5), c(1.5, 2, 0, 1.5), c(2, 1.5, 1.5, 0) ) rmpareto(n, "HR", d = d, par = G) ## A 3-dimensional logistic distribution n <- 10 d <- 3 theta <- .6 rmpareto(n, "logistic", d, par = theta) ## A 5-dimensional negative logistic distribution n <- 10 d <- 5 theta <- 1.5 rmpareto(n, "neglogistic", d, par = theta) ## A 4-dimensional Dirichlet distribution n <- 10 d <- 4 alpha <- c(.8, 1, .5, 2) rmpareto(n, "dirichlet", d, par = alpha)
## A 4-dimensional HR distribution n <- 10 d <- 4 G <- cbind( c(0, 1.5, 1.5, 2), c(1.5, 0, 2, 1.5), c(1.5, 2, 0, 1.5), c(2, 1.5, 1.5, 0) ) rmpareto(n, "HR", d = d, par = G) ## A 3-dimensional logistic distribution n <- 10 d <- 3 theta <- .6 rmpareto(n, "logistic", d, par = theta) ## A 5-dimensional negative logistic distribution n <- 10 d <- 5 theta <- 1.5 rmpareto(n, "neglogistic", d, par = theta) ## A 4-dimensional Dirichlet distribution n <- 10 d <- 4 alpha <- c(.8, 1, .5, 2) rmpareto(n, "dirichlet", d, par = alpha)
Simulates exact samples of a multivariate Pareto distribution that is an extremal graphical model on a tree as defined in Engelke and Hitz (2020).
rmpareto_tree(n, model = c("HR", "logistic", "dirichlet")[1], tree, par)
rmpareto_tree(n, model = c("HR", "logistic", "dirichlet")[1], tree, par)
n |
Number of simulations. |
model |
The parametric model type; one of:
|
tree |
Graph object from |
par |
Respective parameter for the given
|
The simulation follows the algorithm in Engelke and Hitz (2020). For details on the parameters of the Huesler-Reiss, logistic and negative logistic distributions see Dombry et al. (2016), and for the Dirichlet distribution see Coles and Tawn (1991).
Numeric matrix of simulations of the
multivariate Pareto distribution.
Coles S, Tawn JA (1991).
“Modelling extreme multivariate events.”
J. R. Stat. Soc. Ser. B Stat. Methodol., 53, 377–392.
Dombry C, Engelke S, Oesting M (2016).
“Exact simulation of max-stable processes.”
Biometrika, 103, 303–317.
Engelke S, Hitz AS (2020).
“Graphical models for extremes (with discussion).”
J. R. Stat. Soc. Ser. B Stat. Methodol., 82, 871–932.
Other sampling functions:
rmpareto()
,
rmstable()
,
rmstable_tree()
## A 4-dimensional HR tree model my_tree <- igraph::graph_from_adjacency_matrix(rbind( c(0, 1, 0, 0), c(1, 0, 1, 1), c(0, 1, 0, 0), c(0, 1, 0, 0) ), mode = "undirected" ) n <- 10 Gamma_vec <- c(.5, 1.4, .8) set.seed(123) rmpareto_tree(n, "HR", tree = my_tree, par = Gamma_vec) ## A 4-dimensional Dirichlet model with asymmetric edge distributions alpha <- cbind(c(.2, 1, .5), c(1.5, .6, .8)) rmpareto_tree(n, model = "dirichlet", tree = my_tree, par = alpha)
## A 4-dimensional HR tree model my_tree <- igraph::graph_from_adjacency_matrix(rbind( c(0, 1, 0, 0), c(1, 0, 1, 1), c(0, 1, 0, 0), c(0, 1, 0, 0) ), mode = "undirected" ) n <- 10 Gamma_vec <- c(.5, 1.4, .8) set.seed(123) rmpareto_tree(n, "HR", tree = my_tree, par = Gamma_vec) ## A 4-dimensional Dirichlet model with asymmetric edge distributions alpha <- cbind(c(.2, 1, .5), c(1.5, .6, .8)) rmpareto_tree(n, model = "dirichlet", tree = my_tree, par = alpha)
Simulates exact samples of a multivariate max-stable distribution.
rmstable(n, model = c("HR", "logistic", "neglogistic", "dirichlet")[1], d, par)
rmstable(n, model = c("HR", "logistic", "neglogistic", "dirichlet")[1], d, par)
n |
Number of simulations. |
model |
The parametric model type; one of:
|
d |
Dimension of the multivariate Pareto distribution. |
par |
Respective parameter for the given
|
The simulation follows the extremal function algorithm in Dombry et al. (2016). For details on the parameters of the Huesler-Reiss, logistic and negative logistic distributions see Dombry et al. (2016), and for the Dirichlet distribution see Coles and Tawn (1991).
Numeric matrix of simulations of the
multivariate max-stable distribution.
Coles S, Tawn JA (1991).
“Modelling extreme multivariate events.”
J. R. Stat. Soc. Ser. B Stat. Methodol., 53, 377–392.
Dombry C, Engelke S, Oesting M (2016).
“Exact simulation of max-stable processes.”
Biometrika, 103, 303–317.
Other sampling functions:
rmpareto()
,
rmpareto_tree()
,
rmstable_tree()
## A 4-dimensional HR distribution n <- 10 d <- 4 G <- cbind( c(0, 1.5, 1.5, 2), c(1.5, 0, 2, 1.5), c(1.5, 2, 0, 1.5), c(2, 1.5, 1.5, 0) ) rmstable(n, "HR", d = d, par = G) ## A 3-dimensional logistic distribution n <- 10 d <- 3 theta <- .6 rmstable(n, "logistic", d, par = theta) ## A 5-dimensional negative logistic distribution n <- 10 d <- 5 theta <- 1.5 rmstable(n, "neglogistic", d, par = theta) ## A 4-dimensional Dirichlet distribution n <- 10 d <- 4 alpha <- c(.8, 1, .5, 2) rmstable(n, "dirichlet", d, par = alpha)
## A 4-dimensional HR distribution n <- 10 d <- 4 G <- cbind( c(0, 1.5, 1.5, 2), c(1.5, 0, 2, 1.5), c(1.5, 2, 0, 1.5), c(2, 1.5, 1.5, 0) ) rmstable(n, "HR", d = d, par = G) ## A 3-dimensional logistic distribution n <- 10 d <- 3 theta <- .6 rmstable(n, "logistic", d, par = theta) ## A 5-dimensional negative logistic distribution n <- 10 d <- 5 theta <- 1.5 rmstable(n, "neglogistic", d, par = theta) ## A 4-dimensional Dirichlet distribution n <- 10 d <- 4 alpha <- c(.8, 1, .5, 2) rmstable(n, "dirichlet", d, par = alpha)
Simulates exact samples of a multivariate max-stable distribution that is an extremal graphical model on a tree as defined in Engelke and Hitz (2020).
rmstable_tree(n, model = c("HR", "logistic", "dirichlet")[1], tree, par)
rmstable_tree(n, model = c("HR", "logistic", "dirichlet")[1], tree, par)
n |
Number of simulations. |
model |
The parametric model type; one of:
|
tree |
Graph object from |
par |
Respective parameter for the given
|
The simulation follows a combination of the extremal function algorithm in Dombry et al. (2016) and the theory in Engelke and Hitz (2020) to sample from a single extremal function. For details on the parameters of the Huesler-Reiss, logistic and negative logistic distributions see Dombry et al. (2016), and for the Dirichlet distribution see Coles and Tawn (1991).
Numeric matrix of simulations of the
multivariate max-stable distribution.
Coles S, Tawn JA (1991).
“Modelling extreme multivariate events.”
J. R. Stat. Soc. Ser. B Stat. Methodol., 53, 377–392.
Dombry C, Engelke S, Oesting M (2016).
“Exact simulation of max-stable processes.”
Biometrika, 103, 303–317.
Engelke S, Hitz AS (2020).
“Graphical models for extremes (with discussion).”
J. R. Stat. Soc. Ser. B Stat. Methodol., 82, 871–932.
Other sampling functions:
rmpareto()
,
rmpareto_tree()
,
rmstable()
## A 4-dimensional HR tree model my_tree <- igraph::graph_from_adjacency_matrix(rbind( c(0, 1, 0, 0), c(1, 0, 1, 1), c(0, 1, 0, 0), c(0, 1, 0, 0) ), mode = "undirected" ) n <- 10 Gamma_vec <- c(.5, 1.4, .8) rmstable_tree(n, "HR", tree = my_tree, par = Gamma_vec) ## A 4-dimensional Dirichlet model with asymmetric edge distributions alpha <- cbind(c(.2, 1, .5), c(1.5, .6, .8)) rmstable_tree(n, model = "dirichlet", tree = my_tree, par = alpha)
## A 4-dimensional HR tree model my_tree <- igraph::graph_from_adjacency_matrix(rbind( c(0, 1, 0, 0), c(1, 0, 1, 1), c(0, 1, 0, 0), c(0, 1, 0, 0) ), mode = "undirected" ) n <- 10 Gamma_vec <- c(.5, 1.4, .8) rmstable_tree(n, "HR", tree = my_tree, par = Gamma_vec) ## A 4-dimensional Dirichlet model with asymmetric edge distributions alpha <- cbind(c(.2, 1, .5), c(1.5, .6, .8)) rmstable_tree(n, model = "dirichlet", tree = my_tree, par = alpha)