Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Computing loss curves and confidence intervals #2845

Open
raoanirudh opened this issue Jun 5, 2017 · 1 comment
Open

Computing loss curves and confidence intervals #2845

raoanirudh opened this issue Jun 5, 2017 · 1 comment
Assignees

Comments

@raoanirudh
Copy link
Member

raoanirudh commented Jun 5, 2017

Introduction

Currently, different algorithms are used for producing the asset loss curves and the aggregate loss curves for an event based risk calculation. The reason for using different algorithms is that it was previously difficult to store and transfer the large volumes of data contained in typical event loss tables. However, after the recent performance improvements, both can be computed in a post-processing stage from the event loss tables generated during the main calculation.

Generation of the asset loss curves currently requires the parameter loss_ratios to be set in the job configuration file. For each asset, for each loss category, the engine counts the number of exceedances for each loss ratio specified in the loss_ratios, and the division by the total effective investigation time gives the corresponding rates of exceedance, which are finally converted to probabilities of exceedance.

The aggregate loss curves algorithm currently uses the parameter loss_curve_resolution, which is assigned a default value of 50 if not specified in the job configuration file, to divide the range [0, max_loss] into loss_curve_resolution - 1 steps. The number of exceedances of each of the intermediate loss values is then counted, divided by the effective investigation time, and converted to probabilities of exceedance.

Apart from the differing algorithms, another drawback of the current method of computing the loss curves is that users who are more interested in obtaining the loss values at specific probabilities of exceedance or at specific 'return periods' need to do additional post-processing on the output loss curve files to get the desired losses through interpolation.

The methodology proposed here will harmonize the loss curve calculations at the asset and portfolio levels, and permit users to specify a set of return periods in the job configuration file, at which the loss curves will be computed. Optionally, the user can also request confidence intervals for the average (annual) losses and loss curves.

Required changes in the configuration parameters

  • Remove the parameter loss_curve_resolution
  • Remove the parameter loss_ratios
  • Change the default value for risk_investigation_time from None to 1, so that OpenQuake will compute average annual losses by default
  • Introduce the parameter return_periods. For default values for return_periods, use the array [5., 10., 25., 50., 100., 250., 500., 1000., ...], with its upper bound limited by (ses_per_logic_tree_path × investigation_time)
    These default return periods could be generated by a method similar to the one shown in the snippet below:
import numpy
U32 = numpy.uint32

def return_periods(eff_time, num_losses):
    max_period = U32(eff_time)
    min_period = U32(eff_time/num_losses)
    period = 1
    periods = []
    loop = True
    while loop:
        for val in [1, 2, 5]:
            if period * val >= min_period:
                if period * val > max_period:
                    loop = False
                    break
                periods.append(period * val)
        period *= 10
    return numpy.array(periods, U32)

...where eff_time is the effective investigation time for the realization,
i.e., eff_time = (ses_per_logic_tree_path × investigation_time)

Calculation of loss curves

The following procedure applies to a single branch of the logic-tree, so the procedure will need to be repeated for each branch of the logic-tree realized in the calculation.

The loss curves are computed from the event loss tables using the function:

def losses_by_period(losses, return_periods, eff_time):
    periods = eff_time / numpy.arange(len(losses), 0., -1)
    return numpy.interp(numpy.log(return_periods), numpy.log(periods), numpy.sort(losses))

For the portfolio ('aggregate') loss curves, this process needs to be repeated for each loss category. For the asset loss curves, this process needs to be repeated for each asset in the exposure model, for each loss category.

Calculation of confidence intervals

For any risk metric estimated via sampling, we are interested in answering some questions concerning the accuracy of the estimate, such as the following:

  1. What is the 95% confidence interval for the estimate of the risk metric?
  2. What is the minimum number of stochastic event sets (or effective investigation time) that should be simulated in order to keep the width (or half-width) of the 95% confidence interval below a certain chosen value?

Confidence intervals for average losses

The confidence interval for average losses can be obtained with little additional computational overhead, and thus, can be calculated for all jobs where avg_losses has been set to true. The confidence interval for average losses is obtained as follows:

import numpy
import scipy.stats as st

def average_loss_with_confidence(losses, confidence=0.95):
    """
    Reads an array of losses and returns the average loss,
    and the confidence bounds

    :param losses: the array of losses
    :param confidence: the confidence level (default = 95%)
    """
    n = len(losses)
    ci = st.t.interval(confidence, n-1,
                       loc=numpy.mean(losses), scale=st.sem(losses))
    return average_loss(losses), ci

The user may specify the confidence level at which the confidence intervals will be computed by setting the parameter confidence_level in the job configuration file. The default confidence level is 0.95. Valid values are floats in the range (0., 1.).

Confidence intervals for loss curves

Unlike the previous case involving the estimator for the average annual loss, there is no simple formula for the standard error for the estimator in this case, since this estimator cannot be expressed as the mean of a random variable. However, we can obtain a bootstrap estimate of the standard error, and bootstrap estimates for the confidence interval in this case, as demonstrated in the following function:

import numpy

def loss_curve_with_confidence(losses, return_periods, confidence=0.95, B=1000):
    """
    Reads an array of losses and returns the annual loss exceedance curve
    and the confidence bounds

    :param losses: the array of simulated losses
    :param return_periods: list of return periods at which the loss curve is computed
    :param confidence: the confidence level (default = 95%)
    :param B: the number of Bootstrap samples (default = 1000)
    """
    n = len(losses)
    bootstrap_return_period_losses = numpy.zeros((len(return_periods), B))
    for b in range(B):
        bootstrap_losses = numpy.random.choice(losses, n, replace=True)
        bootstrap_sorted_losses = numpy.sort(bootstrap_losses)[::-1]
        bootstrap_return_period_losses[:, b] = bootstrap_sorted_losses[
            (n/return_periods-1).astype(int).tolist()]
    ci_left = numpy.percentile(
        bootstrap_return_period_losses, (1 - confidence)/2*100, axis=1)
    ci_right = numpy.percentile(
        bootstrap_return_period_losses, (1 + confidence)/2*100, axis=1)
    ci = numpy.vstack((return_periods, ci_left, ci_right))
    return loss_curve(losses, return_periods), ci

The computation of confidence intervals for loss curves can add significant overhead due to the bootstrap sampling involved, and so should be avoided by default. The user may request the computation of confidence intervals for loss curves by setting loss_curve_confidence_intervals = true in the job configuration file. The number of Bootstrap samples is set to a default value of B=1000, which is sufficient for all expected use cases. Exposing this parameter to the user is not necessary.

For more details on the background and methodology for the computation of loss exceedance curves and confidence intervals, please see the attached document:
Convergence of Loss Metrics.pdf

@raoanirudh
Copy link
Member Author

A typical loss exceedance curve, plotted four ways:

ca-res-afe-vs-loss-lin-lin

ca-res-afe-vs-loss-log-lin

ca-res-loss-vs-rp-lin-lin

ca-res-loss-vs-rp-lin-log

In estimating the loss value corresponding to a specified return period, interpolation between the loss values for the two closest available return periods should be done in log-space, rather than in linear-space.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants