Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CIR][CIRGen] Improve switch support for unrecheable code #528

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

wenpen
Copy link
Contributor

@wenpen wenpen commented Apr 1, 2024

Support non-block case and statementw that don't belong to any case region, fix #520 #521

@Lancern
Copy link
Collaborator

Lancern commented Apr 2, 2024

Actually the body of a switch statement can be neither of case, default, or compound:

int f(int x) {
  switch (x)
    return 1;
  return 2;
}

This is accepted by clang (f returns 2 unconditionally) and since you're working on this I believe we can further resolve this in this PR.

@wenpen wenpen force-pushed the switch_support_single_case branch from 8cba98d to b5eaa96 Compare April 2, 2024 12:39
@wenpen
Copy link
Contributor Author

wenpen commented Apr 2, 2024

@Lancern Added CaseOpKind_NE and buildCaseNoneStmt() to handle the unreachable statements.
Thanks for pointing out it!

@wenpen wenpen force-pushed the switch_support_single_case branch 2 times, most recently from a7b0446 to 5915d80 Compare April 2, 2024 13:05
Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for you first patch and for working on this.

clang/test/CIR/CodeGen/switch.cpp Outdated Show resolved Hide resolved
@bcardosolopes bcardosolopes changed the title switchStmt support non-block substatement [CIR][CIRGen] Support switch with non-block substatements Apr 3, 2024
@bcardosolopes
Copy link
Member

Since you are already working on this and they are all related, this PR should also fix the other issues and include testscases from #521 and #522

@Lancern
Copy link
Collaborator

Lancern commented Apr 4, 2024

Well, things are a little tricky in the case when the body of a switch statement is not one of case, default, or compound. These statements are NOT simply unreachable, consider:

void foo(int x) {
  switch (x)
    while (condition()) {
  case 42:
      do_something();
    }
}

The while statement is indeed reachable when x is 42. In such a case switch behaves just like a goto.

@wenpen
Copy link
Contributor Author

wenpen commented Apr 8, 2024

It becomes a little complex when we consider case across scope, the current definition of SwitchOp is not enough to express the semantics.

The definition assume the size of case attributes is same with regions, unfortunately the region may be nested.
For example, there should be 2 case attribute with only 1 SwitchOp region (or said 2 nested region) in following code.

switch (x)
  case 9: {
    x++;
    case 7:
      x++;
  }

I'm not sure what a reasonable SwitchOp definition should be. A preliminary idea is using label and branch somehow, then the cir is like

cir.scope {
  cir.switch (...) [
    9: ^bb0
    7: ^bb1
  ] {
    ^bb0:
      cir.scope {
        ...
        ^bb1:
          ...
      }
  }
}

Otherwise I noticed goto didn't support branch across scope as well, maybe the problem is related?
Do you have any suggestions?

@Lancern
Copy link
Collaborator

Lancern commented Apr 8, 2024

Since switch statements are more or less like a syntax sugar for goto, we may learn some ideas from the way goto is implemented in CIR: #508 .

@bcardosolopes
Copy link
Member

It becomes a little complex when we consider case across scope

Oh right, you don't need to solve this problem for the moment, let's focus on the testcases that don't involve scope crossing. @gitoleg is working on #508, once that lands we can do incremental work to fix it.

@wenpen wenpen force-pushed the switch_support_single_case branch 2 times, most recently from 1e75d7b to e19aa5a Compare April 10, 2024 10:42
@wenpen wenpen marked this pull request as draft April 10, 2024 10:55
@bcardosolopes
Copy link
Member

Let me know when this comes out of Draft state and I'll take a look again

@wenpen wenpen force-pushed the switch_support_single_case branch 2 times, most recently from 072a50b to c9f7e9a Compare April 12, 2024 06:01
@wenpen wenpen changed the title [CIR][CIRGen] Support switch with non-block substatements [CIR][CIRGen] Enhance switch Apr 12, 2024
@wenpen wenpen marked this pull request as ready for review April 12, 2024 06:25
@wenpen
Copy link
Contributor Author

wenpen commented Apr 12, 2024

Appreciate for the suggestions!
This pr is ready for review now. @Lancern @bcardosolopes

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, almost there. Few more inline comments.

clang/test/CIR/CodeGen/switch.cpp Outdated Show resolved Hide resolved
clang/lib/CIR/CodeGen/CIRGenStmt.cpp Outdated Show resolved Hide resolved
clang/lib/CIR/CodeGen/CIRGenFunction.h Outdated Show resolved Hide resolved
clang/lib/CIR/CodeGen/CIRGenStmt.cpp Outdated Show resolved Hide resolved
clang/lib/CIR/CodeGen/CIRGenFunction.h Outdated Show resolved Hide resolved
@@ -275,14 +276,124 @@ void sw12(int a) {
// CHECK-NEXT: cir.break
// CHECK-NEXT: }

void fallthrough(int x) {
void sw13(int a) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These testcases are great. Can you add a couple more variations where extra nested switch shows up? Just wanna make sure the lexical scope here is doing the right thing.

@wenpen
Copy link
Contributor Author

wenpen commented Apr 17, 2024

Comments addressed.
Found some new corner-case, so rewrite some function and add a few UT.

@wenpen wenpen force-pushed the switch_support_single_case branch 2 times, most recently from 9ddab69 to ef970b7 Compare April 26, 2024 07:23
@wenpen
Copy link
Contributor Author

wenpen commented Apr 26, 2024

Hi, this pr is ready for review, thanks~

Copy link
Member

@bcardosolopes bcardosolopes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update, much easier for me to understand the approach now that the refactoring part is gone. More comments inline.

@@ -976,7 +1000,8 @@ mlir::LogicalResult CIRGenFunction::buildSwitchBody(
builder.setInsertionPointToEnd(lastCaseBlock);
res = buildStmt(c, /*useCurrentScope=*/!isa<CompoundStmt>(c));
} else {
llvm_unreachable("statement doesn't belong to any case region, NYI");
checkCaseNoneStmt(*c);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just call buildStmt() instead or rewrite more logic in a way that CIRGen just falls out naturally. Like I mentioned in previous reviews, this is dispatching another visitor logic just for the sake of grabbing information that could just be handled in our regular CIRGen visiting path.

Looking at checkCaseNoneStmt impl, specifically Stmt::CaseStmtClass/DefaultStmtClass: you should add a buildCaseStmt and buildDefaultStmt and call them from buildSimpleStmt. That code should already be part of CIRGen emission, and not something that does a side checking. If you need this info, you can walk LexScopes up to find if we are in a switch or not.

Copy link
Contributor Author

@wenpen wenpen Apr 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we call buildStmt(), for ReturnStmt we need to get currLexScope->RetBlocks, we expect the block is belong to the case region, but CaseNoneStmt has no region.
I'll keep working to find a solution. Pushed a wip commit in case you want to have a look about the draft.

@wenpen wenpen force-pushed the switch_support_single_case branch from ef970b7 to 5223c3c Compare April 29, 2024 09:01
@wenpen wenpen marked this pull request as draft April 29, 2024 09:14
lanza pushed a commit that referenced this pull request Apr 29, 2024
Make logic cleaner and more extensible.

Separate collecting `SwitchStmt` information and building op logic into
different functions.
Add more UT to cover nested switch, which also worked before this pr.

This pr is split from #528.
lanza pushed a commit that referenced this pull request Apr 29, 2024
Make logic cleaner and more extensible.

Separate collecting `SwitchStmt` information and building op logic into
different functions.
Add more UT to cover nested switch, which also worked before this pr.

This pr is split from #528.
lanza pushed a commit that referenced this pull request Apr 29, 2024
Make logic cleaner and more extensible.

Separate collecting `SwitchStmt` information and building op logic into
different functions.
Add more UT to cover nested switch, which also worked before this pr.

This pr is split from #528.
@bcardosolopes
Copy link
Member

@wenpen still working on this? I don't usually look at draft PRs, just trying to make sure if there's something I should be looking here.

@wenpen
Copy link
Contributor Author

wenpen commented May 9, 2024

@bcardosolopes Yes, just be a little busy recently, I will update the pr and request review form you later days, thanks~

@wenpen wenpen force-pushed the switch_support_single_case branch from 5223c3c to 9ae5d1f Compare May 10, 2024 11:06
Comment on lines 470 to 473
// TODO: Rewrite the logic to handle ReturnStmt inside SwitchStmt, then
// clean up the code below.
if (currLexScope->IsInsideCaseNoneStmt)
return mlir::success();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found many sample code that failed due to incorrect terminator in block, e.g.

  switch(a) {
  case 0:
    break; 
    int x = 1;
  }
  switch(a) {
  case 0:
    return 0;
    return 1;
    int x = 1;
  }
for (;;) {
  break;
  int x = 1;
}

Looks like it's another large work, so I just skip ReturnStmt here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, can you file a new issue and list these?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm opposed to return mlir::success(); because it will just silently skips something we don't know how to handle, I rather these things fails or crash, so that it's clear that they aren't implemented? What happens when you remove this return?

@wenpen wenpen requested a review from bcardosolopes May 11, 2024 06:14
@wenpen wenpen marked this pull request as ready for review May 11, 2024 06:59
@@ -328,6 +328,14 @@ mlir::LogicalResult CIRGenFunction::buildLabelStmt(const clang::LabelStmt &S) {
// IsEHa: not implemented.
assert(!(getContext().getLangOpts().EHAsynch && S.isSideEntry()));

// TODO: After support case stmt crossing scopes, we should build LabelStmt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any TODO in CIRGen should be TODO(cir)

@@ -2027,6 +2031,8 @@ class CIRGenFunction : public CIRGenTypeCache {
// Scope entry block tracking
mlir::Block *getEntryBlock() { return EntryBlock; }

bool IsInsideCaseNoneStmt = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this, reasons below.

// and clean LexicalScope::IsInsideCaseNoneStmt.
for (auto *lexScope = currLexScope; lexScope;
lexScope = lexScope->getParentScope()) {
assert(!lexScope->IsInsideCaseNoneStmt &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if you remove this code? Also, why doesn't it work to just walk the scope up until you find a switch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, we won't need this assert anymore if we could keep the case none stmt somehow as you suggested.

What happens if you remove this code?

Remove this code won't cause incorrect behavior currently (as we didn't support goto in that case yet), but I think it may produce strange error message in the future.

switch (int x) {
foo:
  x = 1;
  break;
case 2:
  goto foo;
}

We need to avoid erasing the CaseNoneStmt containing label foo.

why doesn't it work to just walk the scope up until you find a switch?

Refer to the below code, we need to guarantee the removed Stmt won't contain any LabelStmt, whether the LabelStmt is inside another nested switch or not.

switch(x) {
  switch(x) {
  case 1:
foo:
    break;
  }
  break;
case 1:
  goto foo;
}

Comment on lines 470 to 473
// TODO: Rewrite the logic to handle ReturnStmt inside SwitchStmt, then
// clean up the code below.
if (currLexScope->IsInsideCaseNoneStmt)
return mlir::success();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, can you file a new issue and list these?

Comment on lines 470 to 473
// TODO: Rewrite the logic to handle ReturnStmt inside SwitchStmt, then
// clean up the code below.
if (currLexScope->IsInsideCaseNoneStmt)
return mlir::success();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm opposed to return mlir::success(); because it will just silently skips something we don't know how to handle, I rather these things fails or crash, so that it's clear that they aren't implemented? What happens when you remove this return?

@@ -704,6 +717,22 @@ CIRGenFunction::buildSwitchCase(const SwitchCase &S, mlir::Type condType,
llvm_unreachable("expect case or default stmt");
}

mlir::LogicalResult CIRGenFunction::buildCaseNoneStmt(const Stmt *S) {
// Create orphan region to skip over the case none stmts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because you are creating an orphan region, this mean that anything emitted inside a buildCaseNoneStmt will never execute, right? The problem if a orphan region is that it won't get attached to anything, so it really adds no value (not even for unrecheable code analysis). If so, better just to split the current basic block A into two: B and C. A should jump to C and you emit the code in B.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a good place to hold the block of CaseNoneStmt.

For example

void f(int x) {
  switch(x) {
    break;
  }
}

There is no region inside SwitchOp, so we have to put the break block outside SwitchOp, which cause verification failed: 'cir.break' op must be within a loop or switch.

Did I misunderstand something? Looking forward to your suggestions~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point, but if you go for the current approach you might as well skip this codegen entirely, because what you are emitting won't ever be attached to anything. I think it's safer to mimic the original codegen here, what is Clang currently doing for OG codegen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should create a SwitchOp with at least one default region and delete that at the end if it ends up unused?

@wenpen wenpen force-pushed the switch_support_single_case branch from f726860 to 7a61b3c Compare May 17, 2024 05:48
Copy link
Contributor Author

@wenpen wenpen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also feel the solution about #521 is not very natural, so I'll be happy to modify it if you have some ideas. Or I could revert the change and only solve #520 in this pr, if you think the definition of SwitchOp should be changed firstly. Thanks!

@@ -704,6 +717,22 @@ CIRGenFunction::buildSwitchCase(const SwitchCase &S, mlir::Type condType,
llvm_unreachable("expect case or default stmt");
}

mlir::LogicalResult CIRGenFunction::buildCaseNoneStmt(const Stmt *S) {
// Create orphan region to skip over the case none stmts.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find a good place to hold the block of CaseNoneStmt.

For example

void f(int x) {
  switch(x) {
    break;
  }
}

There is no region inside SwitchOp, so we have to put the break block outside SwitchOp, which cause verification failed: 'cir.break' op must be within a loop or switch.

Did I misunderstand something? Looking forward to your suggestions~

// and clean LexicalScope::IsInsideCaseNoneStmt.
for (auto *lexScope = currLexScope; lexScope;
lexScope = lexScope->getParentScope()) {
assert(!lexScope->IsInsideCaseNoneStmt &&
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, we won't need this assert anymore if we could keep the case none stmt somehow as you suggested.

What happens if you remove this code?

Remove this code won't cause incorrect behavior currently (as we didn't support goto in that case yet), but I think it may produce strange error message in the future.

switch (int x) {
foo:
  x = 1;
  break;
case 2:
  goto foo;
}

We need to avoid erasing the CaseNoneStmt containing label foo.

why doesn't it work to just walk the scope up until you find a switch?

Refer to the below code, we need to guarantee the removed Stmt won't contain any LabelStmt, whether the LabelStmt is inside another nested switch or not.

switch(x) {
  switch(x) {
  case 1:
foo:
    break;
  }
  break;
case 1:
  goto foo;
}

Comment on lines +470 to +474
// TODO(cir): Rewrite the logic to handle ReturnStmt inside SwitchStmt, then
// clean up the code below.
if (currLexScope->IsInsideCaseNoneStmt)
return mlir::success();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm opposed to return mlir::success(); because it will just silently skips something we don't know how to handle, I rather these things fails or crash, so that it's clear that they aren't implemented? What happens when you remove this return?

buildReturnStmt() assume there is exactly one return block in a region, and there is one region in a lexical scope, the only exceptions are switch scope, which has multiple regions. The related code is

    mlir::Block *getOrCreateRetBlock(CIRGenFunction &CGF, mlir::Location loc) {
      unsigned int regionIdx = 0;
      if (isSwitch())
        regionIdx = SwitchRegions.size() - 1;
      if (regionIdx >= RetBlocks.size())
        return createRetBlock(CGF, loc);
      return &*RetBlocks.back();
    }

So if we remove the return here, the following code will cause crash. regionIdx will be -1, and we'll call RetBlocks .back() with empty RetBlocks

int f(int x) {
  switch(x) {
    return 0;
  }
  return 1;
}

By the way, I believe the current implementation of getOrCreateRetBlock() about switch is incorrect and also should be solved after changing definition of SwitchOp.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I believe the current implementation of getOrCreateRetBlock() about switch is incorrect and also should be solved after changing definition of SwitchOp.

Right, we should fix the logic, not take shortcuts like returning mlir::success(). Can you elaborate on what do you mean by changing the definition of SwitchOp?

@bcardosolopes
Copy link
Member

I'm going to resume reviewing this, sorry for the delay!

@@ -704,6 +717,22 @@ CIRGenFunction::buildSwitchCase(const SwitchCase &S, mlir::Type condType,
llvm_unreachable("expect case or default stmt");
}

mlir::LogicalResult CIRGenFunction::buildCaseNoneStmt(const Stmt *S) {
// Create orphan region to skip over the case none stmts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your point, but if you go for the current approach you might as well skip this codegen entirely, because what you are emitting won't ever be attached to anything. I think it's safer to mimic the original codegen here, what is Clang currently doing for OG codegen?

Comment on lines +470 to +474
// TODO(cir): Rewrite the logic to handle ReturnStmt inside SwitchStmt, then
// clean up the code below.
if (currLexScope->IsInsideCaseNoneStmt)
return mlir::success();

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I believe the current implementation of getOrCreateRetBlock() about switch is incorrect and also should be solved after changing definition of SwitchOp.

Right, we should fix the logic, not take shortcuts like returning mlir::success(). Can you elaborate on what do you mean by changing the definition of SwitchOp?

@@ -704,6 +717,22 @@ CIRGenFunction::buildSwitchCase(const SwitchCase &S, mlir::Type condType,
llvm_unreachable("expect case or default stmt");
}

mlir::LogicalResult CIRGenFunction::buildCaseNoneStmt(const Stmt *S) {
// Create orphan region to skip over the case none stmts.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should create a SwitchOp with at least one default region and delete that at the end if it ends up unused?

@bcardosolopes bcardosolopes changed the title [CIR][CIRGen] Enhance switch [CIR][CIRGen] Improve switch support for unrecheable code Jun 6, 2024
@bcardosolopes
Copy link
Member

I landed #611 which has some comments related to this PR (cc: @piggynl)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Assertion failure on switch statement with non-block substatement
3 participants