Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggest the user how to reduce the logic tree for a site-specific analysis #6867

Open
micheles opened this issue Jun 15, 2021 · 1 comment
Open
Assignees

Comments

@micheles
Copy link
Contributor

micheles commented Jun 15, 2021

Our Canadian friends wants to run risk calculations on Vancouver and want to know how many sample should they take from the 21,000+ realizations of the full model. This is currently hard to guess and involves running a lot of very slow calculations to manually check the stability of the results.
We could instead run a classical calculation on the interesting site with full enumeration (if possible, otherwise with a lot of samples) and then call a view

oq show clusterize_hcurves:<k>

that would collect together similar hazard curves in clusters(using scipy.cluster.vq.kmeans2) and would print a representative for each cluster.
A possible syntax could be the following, for a case with 2187 realizations (1 source model, 7 TRTs of 3 GMPEs each, 3^7=2187) reduced to 9 clusters, assuming 5 TRTs are not relevant:

0~0[345][678]9[CDE][FGH][IJK]
0~2[345][678]A[CDE][FGH][IJK]
0~1[345][678]B[CDE][FGH][IJK]
0~1[345][678]9[CDE][FGH][IJK]
0~0[345][678]B[CDE][FGH][IJK]
0~2[345][678]B[CDE][FGH][IJK]
0~2[345][678]9[CDE][FGH][IJK]
0~1[345][678]A[CDE][FGH][IJK]
0~0[345][678]A[CDE][FGH][IJK]

We already have a view to connect one-letter abbreviations to the branch IDs:

$ oq show branch_ids
| logic_tree      | abbrev | branch_id |
|-----------------+--------+-----------|
| source_model_lt | 0      | b1        |
| gsim_lt         | 0      | b31       |
| gsim_lt         | 1      | b32       |
| gsim_lt         | 2      | b33       |
| gsim_lt         | 3      | b11       |
| gsim_lt         | 4      | b12       |
| gsim_lt         | 5      | b13       |
| gsim_lt         | 6      | b61       |
| gsim_lt         | 7      | b62       |
| gsim_lt         | 8      | b63       |
| gsim_lt         | 9      | b71       |
| gsim_lt         | A      | b72       |
| gsim_lt         | B      | b73       |
| gsim_lt         | C      | b21       |
| gsim_lt         | D      | b22       |
| gsim_lt         | E      | b23       |
| gsim_lt         | F      | b41       |
| gsim_lt         | G      | b42       |
| gsim_lt         | H      | b43       |
| gsim_lt         | I      | b51       |
| gsim_lt         | J      | b52       |
| gsim_lt         | K      | b53       |

Then it is possible to manually tweak the files source_model_logic_tree.xml and gsim_logic_tree.xml and reduce the logic tree to 9 realizations instead of 2187. Then the event_based_risk calculation can be run on the reduced logic tree.

@mmpagani
Copy link
Member

This is a good idea. We need to carefully think about the metric used to calculate distances (typically a key problem in cluster analysis). Also I would suggest to give the user the possibility to define a range of probabilities that can be used to extract a part of a hazard curve for the cluster analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants