Secondary perils performance improvements: Filter sites before and/or after hazard calculations #6480

cossatot · 2021-01-21T17:52:40Z

It seems that the risk calculations for the secondary perils analysis are very computationally expensive, because many samples of displacement have to be generated at each site to get a full representation of the probable losses. One way to reduce this burden is to reduce the number of sites that are being considered. Depending on the study area of course, many sites will have little to no risk of landsliding or liquefaction, and therefore they can be filtered and excluded from further analysis. Though the same discussion applies to landsliding, I'll just refer to liquefaction for now.

The probability of liquefaction (PL) depends on both site characteristics and earthquake characteristics (PGA, magnitude).

There are two sensible places to filter; these are not mutually exclusive (but should have a lot of overlap). The first is before the hazard analysis, in what could be a pre-processing step. High values for PGA and magnitude can be assumed (say, PGA=1g, M=7) and PL can be calculated very quickly for all sites. Sites with values below a threshold are removed from further analyis including the hazard calculations. The second step is after the hazard calculations. Sites with low probabilities (or with low displacements) could be removed on a per-event basis or based on the aggregated (mean, etc.) probabilities. I think a post-hazard filter would be preferable as it would be more accurate, and in all likelihood filter out more sites (including those where the expected ground motions are low but the site susceptibility is moderate).

Either way there should be some threshold for probability (or displacement, or both?) that is set to a default and can be modified in the job.ini as surely different users will have different preferences.

I don't know much about how the engine would handle the filtering of sites (I assume this is something that is already done some times).

Tagging @micheles @VSilva

micheles · 2021-01-21T18:06:30Z

Please don't do anything on this, @cossatot

First we need a real example, and then we can figure out were the bottlenecks are. My feeling is that working on the hazard side will be of little benefit, and instead we will have to change the algorithm on the risk side (i.e. no extra simulations, just multiply the loss for the probability of liquefaction, that will scale to millions of events).

cossatot · 2021-01-21T18:12:04Z

Please don't do anything on this, @cossatot

I am happy to comply with this.

I started the issue on Vitor's request. It can be a place for discussion or it can be closed; that is up to you all.

micheles · 2021-01-22T04:43:57Z

Don't get me wrong: we need to think of ways to improve the hazard part too, but right now the issue on the risk side is one hundred times worse. In particular, given how the calculators work now, we need to store the entire event loss table in memory: if there are 50 or 100 additional simulations we will need 50-100 times more memory. It will never work. What could work is to use the same approach used in the ebrisk calculator: that requires only storing the event loss table on disk, not in memory. But even in that case the problem would reappear in the post-processing phase, when the event loss table has to be read. We need to think of a sensible algorithm to use. For instance we could have two calculators, one for computing the average and one focused only on the extreme events and discarding everything else. But it requires time and work. And it is useless to discuss this in abstract, before we have a concrete example to look at. It could even by that I am too pessimistic.

Right now, the problem is the number of events, not the number of sites or assets. If we could say: forget about event based, we want to support scenarios only, with the limits num_seismic_events < 1000 and num_liq_simulations < 100 than it could work as currently specified, at least on the risk side. The hazard side would still require some work to address the case of very fine grids, on the line of the ideas that you are suggesting here. But if we are talking about event based there will be millions of events and the risk part will be extremely challenging, much more than the hazard part.

cossatot · 2021-01-22T14:47:40Z

Yeah that sounds challenging. I wonder if a stratified sampling approach would work, where the events (and/or liquefaction simulations) are sampled proportionally to their capacity for damage and then the results (loss table I guess) are adjusted ex post to reflect the sample bias.

As far as the results are concerned, maybe the loss table itself is not particularly useful if it has several million cells? How does one interpret that? There has to be some sort of aggregation in order to make it a useful product, so it is computationally tractable by a human brain, regardless of how much silicon effort is needed to produce it. Perhaps this can be done in a map-reduce manner or by in-place modification of the aggregated variables that are what the final users need (i.e. final val += simulation result). But I am guessing here since I have never dealt with these results.

micheles · 2021-02-22T15:18:28Z

Today @VSilva noted that we need to store only mean and std after the liquefaction sampling, so that should solve most of my concerns.

cossatot added performance enhancement labels Jan 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Secondary perils performance improvements: Filter sites before and/or after hazard calculations #6480

Secondary perils performance improvements: Filter sites before and/or after hazard calculations #6480

cossatot commented Jan 21, 2021

micheles commented Jan 21, 2021

cossatot commented Jan 21, 2021 •

edited

micheles commented Jan 22, 2021 •

edited

cossatot commented Jan 22, 2021 •

edited

micheles commented Feb 22, 2021

Secondary perils performance improvements: Filter sites before and/or after hazard calculations #6480

Secondary perils performance improvements: Filter sites before and/or after hazard calculations #6480

Comments

cossatot commented Jan 21, 2021

micheles commented Jan 21, 2021

cossatot commented Jan 21, 2021 • edited

micheles commented Jan 22, 2021 • edited

cossatot commented Jan 22, 2021 • edited

micheles commented Feb 22, 2021

cossatot commented Jan 21, 2021 •

edited

micheles commented Jan 22, 2021 •

edited

cossatot commented Jan 22, 2021 •

edited