Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aachen Turbine Test Case #2276

Open
jblueh opened this issue May 8, 2024 · 3 comments · May be fixed by #2293
Open

Aachen Turbine Test Case #2276

jblueh opened this issue May 8, 2024 · 3 comments · May be fixed by #2293
Labels

Comments

@jblueh
Copy link
Contributor

jblueh commented May 8, 2024

The Aachen turbine test case added in #2158 needs another look, for two reasons.

  1. The test case is subject to a yet unfixed memory access violation, as detected in the address sanitizer tests in Address sanitizer tests in the CI pipeline #2246. We explored a fix in c97610b, that remedies the address sanitzer detections but also changes the test results notably. It is unclear yet whether this is the correct way to fix it.
  2. @bigfooted observed non-deterministic behaviour in Fix update of dual-time solver for species transport. #2260. This might or might not be related to the memory access violation.

As soon as the memory access violation is fixed, address sanitizer testing can be re-enabled for the Aachen turbine test case (it was disabled in 0f3fc3e).

@alecappiello We can continue the discussion from #2246 here.

@jblueh jblueh added the bug label May 8, 2024
@joshkellyjak
Copy link
Contributor

As mentioned previously I would be surprised if the fix implemented in c97610b is the source of the issue as I would expect the other turbomachinery cases to fail as well. Looking at the testcase there would be two things I would try.

  1. Converge the solution very tightly, however I expect that this may be difficult, save the converged solution. Restart the solver and run for 1 iteration and compare the results of the two. Inspect the solution and see if there is anything obvious, this may give a hint as to where the problem lay. Maybe also include linear solver information in the output to see if the desired convergence is being achieved in the initial iterations.

  2. Furthermore inspect the residuals of the initial converged solution as it approaches the restart point, a jump in residuals could indicate that the solver is not converging smoothly before the termination of the initial solution. I suggest maybe increasing the value of ENTROPY_FIX_COEFF, say 0.1 and see what happens.

If you can't converge the solution tightly in point 1, apply the increased entropy fix in point 2 and retry. If you would like @alecappiello once you have tried this we can organise a call to discuss early next week.

@jblueh
Copy link
Contributor Author

jblueh commented May 23, 2024

@joshkellyjak The memory access violation described here does not occur unconditionally, it depends on the test case. The iterations of the inner loop for jSpan == 0 and jSpan == nSpanDonor - 1 (that we removed in the fix) only have an effect if the if-conditions in the loop body evaluate to true. I checked this for the serial test cases in the test file's "turbomachinery" section.

  • aachen_turbine_restart: iterations both with jSpan == 0 and jSpan == nSpanDonor - 1 have an effect, the latter trigger the memory access violations
  • jones_turbocharger_restart: some iterations with jSpan == 0 have an effect, but iterations for larger jSpan usually overwrite the results; also, the case jSpan == 0 does not trigger the memory access violation
  • axial_stage2D: the problematic code is not executed by this test case
  • transonic_stator_restart: the problematic code is not executed by this test case

I think these observations point to why only the Aachen test case had address sanitizer findings, and why only the results of the Aachen test case were affected by the attempted fix. The other test cases basically "do not care" whether we do the extra iterations.

@alecappiello Regarding your earlier question about the restart file. Restart files are to some extent specific to the version of SU2 they were obtained with, in the sense that newer versions of SU2 can behave differently if they use an old restart file. I was wondering if this is the case here, too. If the memory access violation had manifested itself in the restart file, a version of SU2 with the memory access violation fixed could give different results. The observations above point to why only the Aachen test case is affected.

@joshkellyjak
Copy link
Contributor

I'll try and find some time to have a look at this today. I think the issue is in the jSpan == nSpanDonor - 1. The final value in both the donor and target arrays are 1D values, the values at nSpanDonor - 2 are the values at the shroud. I think the original proposed fix is still needed here as further down you have an array accessing a postion of kSpan + 1 which iirc resulted in the memory access violation. I think what has happened here is the Aachen case triggers this condition, and results in an error in the calculation of the coefficient for the linear interpolation. When we fixed it, we changed the simulation enough that it throws the residuals off as the computational problem is inherently different. A good test for this would be to extract the values at the interface from the turbomachinery special output with and without this change to see if this is the case. When we were first designing this case @alecappiello had an issue getting the results in one of the zones to agree with experimental results, this could be why?

@jblueh jblueh linked a pull request May 29, 2024 that will close this issue
6 tasks
@jblueh jblueh linked a pull request May 29, 2024 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants