-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48273][SQL] Fix late rewrite of PlanWithUnresolvedIdentifier #46580
Conversation
@@ -254,7 +254,7 @@ class Analyzer(override val catalogManager: CatalogManager) extends RuleExecutor | |||
TypeCoercion.typeCoercionRules | |||
} | |||
|
|||
override def batches: Seq[Batch] = Seq( | |||
private val earlyBatches: Seq[Batch] = Seq( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's keep it as def
. It may not be a big deal but I'd like to keep the previous behavior as much as we can: we create a new instance of the rule for each analysis execution.
import org.apache.spark.sql.catalyst.trees.TreePattern.UNRESOLVED_IDENTIFIER | ||
import org.apache.spark.sql.types.StringType | ||
|
||
/** | ||
* Resolves the identifier expressions and builds the original plans/expressions. | ||
*/ | ||
object ResolveIdentifierClause extends Rule[LogicalPlan] with AliasHelper with EvalHelper { | ||
class ResolveIdentifierClause(executor: RuleExecutor[LogicalPlan]) extends Rule[LogicalPlan] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto, let's pass in the rules and create RuleExecutor
within this rule, as the passed rules may be new instances for each analysis execution
I think this fixes https://issues.apache.org/jira/browse/SPARK-46625 as well. Can we add a test to verify? |
Checked locally, seems like these changes don't resolve the issue:
|
|
||
override def apply(plan: LogicalPlan): LogicalPlan = plan.resolveOperatorsUpWithPruning( | ||
_.containsAnyPattern(UNRESOLVED_IDENTIFIER)) { | ||
case p: PlanWithUnresolvedIdentifier if p.identifierExpr.resolved => | ||
p.planBuilder.apply(evalIdentifierExpr(p.identifierExpr)) | ||
val executor = new RuleExecutor[LogicalPlan] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only need to create one RuleExecutor
instance once in this rule.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added, please check again.
thanks, merging to master/3.5! |
### What changes were proposed in this pull request? `PlanWithUnresolvedIdentifier` is rewritten later in analysis which causes rules like `SubstituteUnresolvedOrdinals` to miss the new plan. This causes following queries to fail: ``` create temporary view identifier('v1') as (select my_col from (values (1), (2), (1) as (my_col)) group by 1); -- cache table identifier('t1') as (select my_col from (values (1), (2), (1) as (my_col)) group by 1); -- create table identifier('t2') as (select my_col from (values (1), (2), (1) as (my_col)) group by 1); insert into identifier('t2') select my_col from (values (3) as (my_col)) group by 1; ``` Fix this by explicitly applying rules after plan rewrite. ### Why are the changes needed? To fix the described bug. ### Does this PR introduce _any_ user-facing change? Yes, it fixes the mentioned problematic queries. ### How was this patch tested? Updated existing `identifier-clause.sql` golden file. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #46580 from nikolamand-db/SPARK-48273. Authored-by: Nikola Mandic <nikola.mandic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit 731a2cf) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
there are new failing cases in
|
fixing at #46794 |
…tifier-clause.sql ### What changes were proposed in this pull request? A followup of #46580 . It's better to create non-Hive tables in the tests, so that it's backport safe, as old branches creates hive table by default. ### Why are the changes needed? fix branch-3.5 CI ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? no Closes #46794 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>
…tifier-clause.sql ### What changes were proposed in this pull request? A followup of #46580 . It's better to create non-Hive tables in the tests, so that it's backport safe, as old branches creates hive table by default. ### Why are the changes needed? fix branch-3.5 CI ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? no Closes #46794 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit cf47293) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
### What changes were proposed in this pull request? This is a followup of #46580 to update golden files ### Why are the changes needed? fix CI ### Does this PR introduce _any_ user-facing change? no ### How was this patch tested? N/A ### Was this patch authored or co-authored using generative AI tooling? no Closes #46800 from cloud-fan/test. Authored-by: Wenchen Fan <wenchen@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
PlanWithUnresolvedIdentifier
is rewritten later in analysis which causes rules likeSubstituteUnresolvedOrdinals
to miss the new plan. This causes following queries to fail:Fix this by explicitly applying rules after plan rewrite.
Why are the changes needed?
To fix the described bug.
Does this PR introduce any user-facing change?
Yes, it fixes the mentioned problematic queries.
How was this patch tested?
Updated existing
identifier-clause.sql
golden file.Was this patch authored or co-authored using generative AI tooling?
No.