Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIDE_HIERARCHIC doesn't work with array variables #27610

Open
maxnezdyur opened this issue May 13, 2024 · 21 comments · May be fixed by #27639
Open

SIDE_HIERARCHIC doesn't work with array variables #27610

maxnezdyur opened this issue May 13, 2024 · 21 comments · May be fixed by #27639
Labels
C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: defect An anomaly, which is anything that deviates from expectations.

Comments

@maxnezdyur
Copy link
Contributor

maxnezdyur commented May 13, 2024

Bug Description

SIDE_HIERARCHIC doesn't work with array variables while HIERARCHIC does. This seems like an error because of the similarity of the two variables.

Steps to Reproduce

[Problem]
  solve = false
[]

[Mesh]
  type = GeneratedMesh
  elem_type = QUAD8
  dim = 2
  nx = 1
  ny = 1
[]

[Variables]
  [HIERARCHIC_var]
    order = ELEVENTH
    family = HIERARCHIC
    components = 2
  []
# no error when SIDE_HIERARCHIC_var block is commented out or when there is only 1 component
  [SIDE_HIERARCHIC_var]
    order = ELEVENTH
    family = SIDE_HIERARCHIC
    components = 2
  []
[]

[Executioner]
  type = Steady
[]

Error:

libMesh terminating:
No index 384 in ghosted vector.
Vector contains [0,384)
And empty ghost array.

Stack frames: 20
0: libMesh::print_trace(std::ostream&)
1: libMesh::MacroFunctions::report_error(char const*, int, char const*, char const*, std::ostream&)
2: /data/nezdmn/projects/moose/framework/libmoose-dbg.so.0(+0x1e99dd9) [0x7fc6a0fc5dd9]
3: /data/nezdmn/projects/moose/framework/libmoose-dbg.so.0(+0x1e965eb) [0x7fc6a0fc25eb]
4: MooseVariableDataBase<Eigen::Matrix<double, -1, 1, 0, -1, 1> >::getArrayDoFValues(libMesh::NumericVector<double> const&, unsigned int, MooseArray<Eigen::Matrix<double, -1, 1, 0, -1, 1> >&) const
5: MooseVariableDataBase<Eigen::Matrix<double, -1, 1, 0, -1, 1> >::fetchDoFValues()
6: MooseVariableData<Eigen::Matrix<double, -1, 1, 0, -1, 1> >::computeValues()
7: MooseVariableFE<Eigen::Matrix<double, -1, 1, 0, -1, 1> >::computeElemValues()
8: SystemBase::reinitElem(libMesh::Elem const*, unsigned int)
9: FEProblemBase::reinitElem(libMesh::Elem const*, unsigned int)
10: ComputeInitialConditionThread::operator()(libMesh::StoredRange<libMesh::MeshBase::const_element_iterator, libMesh::Elem const*> const&)
11: /data/nezdmn/projects/moose/framework/libmoose-dbg.so.0(+0x22d2dd1) [0x7fc6a13fedd1]
12: FEProblemBase::projectSolution()
13: FEProblemBase::initialSetup()
14: Steady::init()
15: MooseApp::executeExecutioner()
16: MooseApp::run()
17: main

Impact

Prevents the use of SIDE_HIERARCHIC with array variables.

[Optional] Diagnostics

No response

@maxnezdyur maxnezdyur added P: normal A defect affecting operation with a low possibility of significantly affects. T: defect An anomaly, which is anything that deviates from expectations. labels May 13, 2024
@lindsayad
Copy link
Member

@roystgnr FYI

@roystgnr
Copy link
Contributor

Is this being run in parallel? You don't say anything about that in "Steps to Reproduce", but I'd be very surprised to see a ghosting error triggered by a serial run.
...
Whoa. I can trigger the error with a serial run. It looks as if we've somehow mistakenly decided to look for a 385th DoF (index 384) in a system with only 384 DoFs.

I'm somewhat baffled by this even on top of that. I wouldn't have been surprised if someone wrote an array variable implementation that breaks for some FE types - in the bad old days there was a lot of MOOSE that worked for MONOMIAL or LAGRANGE but broke outside of that - but once you've got something compatible with a p>2 C0 FEType (and hence with nodes that have more than one DoF each) you ought to practically get code that's compatible with everything else by accident.

Looking at the failure ... we're in MooseVariableDataBase.C, getArrayDoFValues, at a point where dof=384 has been calculated by starting from a _dof_indices for the element and then for some reason incrementing each entry by n=48 to try to get the corresponding index for the next component?

We're on a mesh with 1 QUAD8, so for the HIERARCHIC ELEVENTH variable we expect to have 1 DoF per component per vertex, 10 per component per side node, and 100 per component on the Elem. For the SIDE_HIERARCHIC ELEVENTH we expect 12 DoFs per component per side node. So that's a total of 2 DoFs times 4 vertices plus 44 times 4 sides plus 200, making up our 384 total.

The _dof_indices values are 288-299, 312-323, 336-347, 360-371, so that's got to be the indices for a SIDE_HIERARCHIC variable, 4 ranges of 12. Checking out dof_number ... yeah, those are the indices for the first component.

So ... who thought adding 48 to that would give us the indices for the second component? Its indices are 300-311, 324-335, 348-359, 372-383, offset by 12.

Are we even sure HIERARCHIC is working correctly here? It's starting to look to me like whoever wrote this code was basically second-guessing dof_number badly, didn't write any assertions, wrote some tests that passed for isoparametric Lagrange and for monomials and then called that a win? I see L2_LAGRANGE input files in tests, but still isoparametric (and single-variable) ... and yeah, 52 tests with not a single one that has multiple DoFs per node.

Which makes sense, looking at getArrayDoFValues again. It's got an isNodal() path to handle LAGRANGE, an else path to handle discontinuous variables, and nothing capable of handling anything else, except that thanks to the lack of assertions we only get an error message if things are bad enough to cause OOB access, not from merely scrambled indices. If we hit this method with a mere HIERARCHIC variable, then it goes down the isNodal() path, tries to add "1" to the first component's index to get the second component's index, thereby gets garbage instead, and merrily proceeds along its way.

@lindsayad
Copy link
Member

@YaqiWang

@YaqiWang
Copy link
Contributor

I guess only side variables reveal this bug?

@YaqiWang
Copy link
Contributor

This issue libMesh/libmesh#2114 and this #13417 could be related.

@roystgnr
Copy link
Contributor

If you never actually use your variable, then it takes a non-discontinuous non-isNodal() variable to hit an error with this stack trace. (I can never remember which variables satisfy isNodal(), but since Hierarchic FE are in general not nodal I don't feel too ashamed of not remembering nonsense...)

Scrambled indices will give garbage results eventually, though, even if they don't hit an assertion or a segfault first.

@YaqiWang
Copy link
Contributor

YaqiWang commented May 13, 2024

This is the logic for isNodal: _is_nodal = (_continuity == C_ZERO || _continuity == C_ONE);, _continuity is from FEBase.

@roystgnr
Copy link
Contributor

#13417 is related in the sense that it has the same "is it elemental or is it nodal" false dichotomy. Most finite elements are not elemental (because their support spans multiple elements) and also not nodal (because they have zero or multiple components at some nodes, or because they have a component which is not the value at non-vertex nodes). There might be valid use cases for isNodal() but I'll bet they're still outnumbered by the use cases which are just bugs.

@lindsayad
Copy link
Member

I always giggle when I get to witness @roystgnr rants about MOOSE

@maxnezdyur
Copy link
Contributor Author

Are we even sure HIERARCHIC is working correctly here? It's starting to look to me like whoever wrote this code was basically second-guessing dof_number badly, didn't write any assertions, wrote some tests that passed for isoparametric Lagrange and for monomials and then called that a win? I see L2_LAGRANGE input files in tests, but still isoparametric (and single-variable) ... and yeah, 52 tests with not a single one that has multiple DoFs per node.

I don't think HIERARCHIC works properly past second order.
For the third order, 2 components, and a single element. The dofs_indices for the first component are
DOF Indices: 0 2 4 6 9 8 13 12 17 16 21 20 24 25 26 27. Using the var.componentDofIndices(dofs, 1) to get the dofs for the second component we can see that we will have problems. Since what happens if we take every dof and add 1 to it, we would have dofs indices in both components.

@roystgnr
Copy link
Contributor

I need to edit my rants better, or maybe only post the conclusions rather than the full stream-of-consciousness. I did get to that same "thereby gets garbage" conclusion eventually.

@maxnezdyur
Copy link
Contributor Author

Code below fixes HIERARCHIC, this is not efficient,....., but it works to correctly grab the components dofs and passes all current tests. The only assumption is that the ordering of dofs for component 0 and all other components are the same. To fix SIDE_HIERARCHIC would need to just change getArrayDoFValues which hardcodes the original way to find dof indices.
This is not to say the way below should be used but I think I am grasping how the arrayVariables are set up.

std::vector<dof_id_type>
MooseVariableBase::componentDofIndices(const std::vector<dof_id_type> & dof_indices,
                                       unsigned int component) const
{
  // Initialize the new_dof_indices vector with the input dof_indices
  std::vector<dof_id_type> new_dof_indices(dof_indices);

  // Only perform mapping if the component is not 0
  if (component != 0)
  {
    // Grab all DoFs for component 0; this will be used as a reference
    std::vector<dof_id_type> comp_0_dofs;
    _dof_map.local_variable_indices(comp_0_dofs, _mesh, _var_num);

    // Determine the variable number for the specified component
    auto comp_num = _var_num + component;

    // Grab all DoFs for the specified component
    std::vector<dof_id_type> comp_n_dofs;
    _dof_map.local_variable_indices(comp_n_dofs, _mesh, comp_num);

    // Create a map to store the index of each DoF in comp_0_dofs
    std::unordered_map<dof_id_type, size_t> index_map;
    for (size_t i = 0; i < comp_0_dofs.size(); ++i)
    {
      index_map[comp_0_dofs[i]] = i;
    }
    // We expect that the ordering of the dofs from one compoent to another
    // is the same, relatively.
    // Map the input dof_indices to the corresponding DoFs in comp_n_dofs
    for (size_t i = 0; i < dof_indices.size(); ++i)
    {
      // Find the index of the current DoF in comp_0_dofs using the map
      auto it = index_map.find(dof_indices[i]);
      if (it != index_map.end())
      {
        // Use the found index to get the corresponding DoF in comp_n_dofs
        new_dof_indices[i] = comp_n_dofs[it->second];
      }
      else
      {
        mooseError("DoF not found in comp_0_dofs");
      }
    }
  }

  return new_dof_indices;
}

maxnezdyur added a commit to maxnezdyur/moose that referenced this issue May 15, 2024
maxnezdyur added a commit to maxnezdyur/moose that referenced this issue May 15, 2024
@maxnezdyur maxnezdyur linked a pull request May 15, 2024 that will close this issue
@YaqiWang
Copy link
Contributor

Does L2_ HIERARCHIC have this issue?

@YaqiWang
Copy link
Contributor

Maybe we can have a function to return how many dofs associated with an object and a function to return which object a dof is associated with? In this way, we can use the number of dofs of the object to stride the dofs for higher components and consolidate nodal and elemental variables as well. Using map may cause performance penalty (not sure how much the impact could be).

@maxnezdyur
Copy link
Contributor Author

Doesn't seem like there are issues with L2_HIERARCHIC.
For certain variables there isn't a single stride that would work.
Below is the dof indices for 3rd order HIERARCHIC.
DOF Comp 0 Indices: 0 2 4 6 9 8 13 12 17 16 21 20 24 25 26 27
The stride in this case changes. The first four entries the stride is 1 then the next 8 the stride is 2 then the last 4 the stride is 4, IIRC.

@roystgnr
Copy link
Contributor

Maybe we can have a function to return how many dofs associated with an object

This is dof_object->n_comp(sys_num, var_num).

and a function to return which object a dof is associated with?

This would require an inverse lookup table that in the best case would double the memory we use for indexing. The typical case would be a proportion similar to the number of variables you have in one variable group. (IIRC this was in the thousands for some of your use cases?)

@YaqiWang
Copy link
Contributor

this was in the thousands for some of your use cases?

The number of components for an array variable could be thousands, but the number of local dofs on an element for a component is very limited. We often at most use third order polynomials.

@roystgnr
Copy link
Contributor

number of variables you have in one variable group

is not the number of dofs you have in one variable. Thousands of components in an array variable would mean thousands of variables in one variable group. In the forward case we use the component number to avoid storing thousands of indices instead of one, but the reverse lookup would not have the a priori information needed to make the same optimization.

@YaqiWang
Copy link
Contributor

Not sure what you mean the reverse lookup. When we call DofMap::dof_indices, we possibly can also retrieve the necessary information for how all the dofs are assembled together, which can be used for helping obtain dofs of variables in the same variable group (i.e components in any array variable) later without calling this function.

@roystgnr
Copy link
Contributor

Not sure what you mean the reverse lookup

The normal lookup here is the function which returns which dofs are associated with an object; hence

a function to return which object a dof is associated with

is the reverse of that.

But,

When we call DofMap::dof_indices

It sounds like you're talking about a local reverse lookup, not global? Why not just loop over nodes once, then, and operate on indices node-wise, when you know what node they're associated with and you know they're contiguous there? Even with third-order polynomials (wait, how are you using array variables with these, if you never added HIERARCHIC support?) a lookup table for indices of a thousand variables could be an unnecessary 64kb in 3D.

@YaqiWang
Copy link
Contributor

YaqiWang commented May 15, 2024

When I added the code, I thought the stride is either 1 for nodal variables or number of dofs of one variable for elemental variables. Right, I never added HIERARCHIC support. In the case @maxnezdyur gave above with 0 2 4 6 9 8 13 12 17 16 21 20 24 25 26 27, if we can obtain another stride vector like 1 1 1 1 2 2 2 2 2 2 2 2 4 4 4 4, then the code in MooseVariableDataBase<OutputType>::getArrayDoFValues can be updated.

I think I know what you mean now. MooseVariableBase::componentDofIndices is a bit difficult to fix. It currently assumes the dof indices passed in for an elemental variable are only for all local dofs of an element without an assertion. Maybe we should address libMesh/libmesh#2114.

maxnezdyur added a commit to maxnezdyur/moose that referenced this issue May 20, 2024
maxnezdyur added a commit to maxnezdyur/moose that referenced this issue May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C: Framework P: normal A defect affecting operation with a low possibility of significantly affects. T: defect An anomaly, which is anything that deviates from expectations.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants