{"payload":{"feedbackUrl":"https://github.com/orgs/community/discussions/53140","repo":{"id":17165658,"defaultBranch":"master","name":"spark","ownerLogin":"apache","currentUserCanPush":false,"isFork":false,"isEmpty":false,"createdAt":"2014-02-25T08:00:08.000Z","ownerAvatar":"https://avatars.githubusercontent.com/u/47359?v=4","public":true,"private":false,"isOrgOwned":true},"refInfo":{"name":"","listCacheKey":"v0:1717372277.0","currentOid":""},"activityList":{"items":[{"before":"c4f720dfb41919dade7002b49462b3ff6b91eb22","after":"88b8dc29e100a51501701ffdffbcd0eff1f97c98","ref":"refs/heads/master","pushedAt":"2024-06-05T09:41:06.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-46937][SQL][FOLLOWUP] Properly check registered function replacement\n\n### What changes were proposed in this pull request?\n\nA followup of https://github.com/apache/spark/pull/44976 . `ConcurrentHashMap#put` has a different semantic than the scala map, and it returns null if the key is new. We should update the checking code accordingly.\n\n### Why are the changes needed?\n\navoid wrong warning messages\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\nmanual\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46876 from cloud-fan/log.\n\nAuthored-by: Wenchen Fan \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-46937][SQL][FOLLOWUP] Properly check registered function repla…"}},{"before":"7f99f2cbd7d2d637f15b8444aebae3f9630ed3ab","after":"d3a324d63f82ffc4a4818bb1bfe7485d12f1dada","ref":"refs/heads/branch-3.5","pushedAt":"2024-06-05T08:35:25.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled\n\n### What changes were proposed in this pull request?\nUpdate config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled\n\n### Why are the changes needed?\nClarifying the implications of turning off this config after a certain Spark version\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nN/A - config doc only change\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46875 from anishshri-db/task/SPARK-48535.\n\nAuthored-by: Anish Shrigondekar \nSigned-off-by: Kent Yao \n(cherry picked from commit c4f720dfb41919dade7002b49462b3ff6b91eb22)\nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48535][SS] Update config docs to indicate possibility of data …"}},{"before":"db527ac346f2f6f6dbddefe292a24848d1120172","after":"c4f720dfb41919dade7002b49462b3ff6b91eb22","ref":"refs/heads/master","pushedAt":"2024-06-05T08:34:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48535][SS] Update config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled\n\n### What changes were proposed in this pull request?\nUpdate config docs to indicate possibility of data loss/corruption issue if skip nulls for stream-stream joins config is enabled\n\n### Why are the changes needed?\nClarifying the implications of turning off this config after a certain Spark version\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nN/A - config doc only change\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46875 from anishshri-db/task/SPARK-48535.\n\nAuthored-by: Anish Shrigondekar \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48535][SS] Update config docs to indicate possibility of data …"}},{"before":"adbfd17318bf50b34d03f62ccd04219b18a41103","after":"db527ac346f2f6f6dbddefe292a24848d1120172","ref":"refs/heads/master","pushedAt":"2024-06-05T05:20:44.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"Revert \"[SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC`\"\n\nThis reverts commit abbe301d7645217f22641cf3a5c41502680e65be.","shortMessageHtmlLink":"Revert \"[SPARK-48505][CORE] Simplify the implementation of `Utils#isG…"}},{"before":"4075ce6771206ac8957029566c8d4196bcb8a87b","after":"adbfd17318bf50b34d03f62ccd04219b18a41103","ref":"refs/heads/master","pushedAt":"2024-06-05T05:11:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48533][CONNECT][PYTHON][TESTS] Add test for cached schema\n\n### What changes were proposed in this pull request?\nAdd test for cached schema, to make Spark Classic's mapInXXX also works within `SparkConnectSQLTestCase`, also add a new `contextmanager` for `os.environ`\n\n### Why are the changes needed?\ntest coverage\n\n### Does this PR introduce _any_ user-facing change?\nno, test only\n\n### How was this patch tested?\nCI\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46871 from zhengruifeng/test_cached_schema.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48533][CONNECT][PYTHON][TESTS] Add test for cached schema"}},{"before":"a17ab572cfdaefdb4a988908aa923c33f3ed58e1","after":"4075ce6771206ac8957029566c8d4196bcb8a87b","ref":"refs/heads/master","pushedAt":"2024-06-05T03:12:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48374][PYTHON][TESTS][FOLLOW-UP] Explicitly enable ANSI mode for non-ANSI build\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to explicitly set ANSI mode in `test_toArrow_error` test.\n\n### Why are the changes needed?\n\nTo make non-ANSI build passing https://github.com/apache/spark/actions/runs/9342888897/job/25711689943:\n\n```\n\n======================================================================\nFAIL [0.180s]: test_toArrow_error (pyspark.sql.tests.connect.test_parity_arrow.ArrowParityTests.test_toArrow_error)\n----------------------------------------------------------------------\nTraceback (most recent call last):\n File \"/__w/spark/spark/python/pyspark/sql/tests/test_arrow.py\", line 1207, in test_toArrow_error\n with self.assertRaises(ArithmeticException):\nAssertionError: ArithmeticException not raised\n\n----------------------------------------------------------------------\nRan 88 tests in 17.797s\n```\n\n### Does this PR introduce _any_ user-facing change?\n\nNo, test-only.\n\n### How was this patch tested?\n\nManually.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46872 from HyukjinKwon/SPARK-48374-followup.\n\nAuthored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48374][PYTHON][TESTS][FOLLOW-UP] Explicitly enable ANSI mode f…"}},{"before":"33aa467f75824ed8460d514ca1e37f559d3cc405","after":"a17ab572cfdaefdb4a988908aa923c33f3ed58e1","ref":"refs/heads/master","pushedAt":"2024-06-05T00:07:26.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[MINOR][DOCS] Fix a typo in core-migration-guide.md\n\n### What changes were proposed in this pull request?\n\n Fix a typo in core-migration-guide.md:\n\n- agressively -> aggressively\n\n### Why are the changes needed?\n\nFix mistakes.\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPassed GA.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46864 from wayneguow/typo.\n\nAuthored-by: Wei Guo \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[MINOR][DOCS] Fix a typo in core-migration-guide.md"}},{"before":"e47ce476b9ac962d24fabfbe1b344d074403d45b","after":"33aa467f75824ed8460d514ca1e37f559d3cc405","ref":"refs/heads/master","pushedAt":"2024-06-04T23:55:07.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48523][DOCS] Add `grpc_max_message_size ` description to `client-connection-string.md`\n\n### What changes were proposed in this pull request?\nThe pr aims to\n- add `grpc_max_message_size` description to `client-connection-string.md`\n- rename `hostname` to `host`.\n- fix some typo.\n\n### Why are the changes needed?\n- In PR https://github.com/apache/spark/pull/45842, we extract a `constant` as a `parameter` for the connect client, and we need to update the related doc.\n- Make the parameter names in our doc consistent with those in the code,\n In the doc, it is called `hostname`, but in the code, it is called `host`\nhttps://github.com/apache/spark/blob/d273fdf37bc291aadf8677305bda2a91b593219f/connector/connect/common/src/main/scala/org/apache/spark/sql/connect/client/SparkConnectClientParser.scala#L36\n\n### Does this PR introduce _any_ user-facing change?\nYes, only for doc `client-connection-string.md`.\n\n### How was this patch tested?\nManually test.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46862 from panbingkun/SPARK-48523.\n\nAuthored-by: panbingkun \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48523][DOCS] Add grpc_max_message_size description to `clie…"}},{"before":"8a0927c07a1483bcd9125bdc2062a63759b0a337","after":"e47ce476b9ac962d24fabfbe1b344d074403d45b","ref":"refs/heads/master","pushedAt":"2024-06-04T23:53:19.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48485][CONNECT][SS] Support interruptTag and interruptAll in streaming queries\n\n### What changes were proposed in this pull request?\n\nThis PR proposes to support interruptTag and interruptAll in streaming queries\n\n### Why are the changes needed?\n\nIn order to provide a way to interrupt streaming queries in batch.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, `spark.interruptTag` and `spark.interruptAll` cancel streaming queries.\n\n### How was this patch tested?\n\nTBD\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46819 from HyukjinKwon/interrupt-all.\n\nAuthored-by: Hyukjin Kwon \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48485][CONNECT][SS] Support interruptTag and interruptAll in s…"}},{"before":"c7caac9b10ca73316e4127aef6f3fd73eac5ecda","after":"8a0927c07a1483bcd9125bdc2062a63759b0a337","ref":"refs/heads/master","pushedAt":"2024-06-04T22:04:27.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48307][SQL] InlineCTE should keep not-inlined relations in the original WithCTE node\n\n### What changes were proposed in this pull request?\n\nI noticed an outdated comment in the rule `InlineCTE`\n```\n // CTEs in SQL Commands have been inlined by `CTESubstitution` already, so it is safe to add\n // WithCTE as top node here.\n```\n\nThis is not true anymore after https://github.com/apache/spark/pull/42036 . It's not a big deal as we replace not-inlined CTE relations with `Repartition` during optimization, so it doesn't matter where we put the `WithCTE` node with not-inlined CTE relations, as it will disappear eventually. But it's still better to keep it at its original place, as third-party rules may be sensitive about the plan shape.\n\n### Why are the changes needed?\n\nto keep the plan shape as much as can after inlining CTE relations.\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\nnew test\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nno\n\nCloses #46617 from cloud-fan/cte.\n\nLead-authored-by: Wenchen Fan \nCo-authored-by: Wenchen Fan \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48307][SQL] InlineCTE should keep not-inlined relations in the…"}},{"before":"651f68782ab705f277b2548382900cdf986e017e","after":"c7caac9b10ca73316e4127aef6f3fd73eac5ecda","ref":"refs/heads/master","pushedAt":"2024-06-04T18:06:02.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-47972][SQL][FOLLOWUP] Restrict CAST expression for collations\n\n### What changes were proposed in this pull request?\nRemoval of immutable Seq import.\n\n### Why are the changes needed?\nThis import was added with https://github.com/apache/spark/pull/46474, but in reality is changing behaviour of other AstBuilder.scala rules and because of this needs to be removed.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nExisting tests cover this.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46860 from mihailom-db/SPARK-47972-FOLLOWUP.\n\nAuthored-by: Mihailo Milosevic \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-47972][SQL][FOLLOWUP] Restrict CAST expression for collations"}},{"before":"8b88f5ae10cc676a9778c186b12c691fa913088d","after":"651f68782ab705f277b2548382900cdf986e017e","ref":"refs/heads/master","pushedAt":"2024-06-04T17:28:55.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"dongjoon-hyun","name":"Dongjoon Hyun","path":"/dongjoon-hyun","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/9700541?s=80&v=4"},"commit":{"message":"[SPARK-48531][INFRA] Fix `Black` target version to Python 3.9\n\n### What changes were proposed in this pull request?\n\nThis PR aims to fix `Black` target version to `Python 3.9`.\n\n### Why are the changes needed?\n\nSince SPARK-47993 dropped Python 3.8 support officially at Apache Spark 4.0.0, we had better update target version to `Python 3.9`.\n\n- #46228\n\n`py39` is the version for `Python 3.9`.\n```\n$ black --help | grep target\n -t, --target-version [py33|py34|py35|py36|py37|py38|py39|py310|py311|py312]\n```\n\n### Does this PR introduce _any_ user-facing change?\n\nNo.\n\n### How was this patch tested?\n\nPass the CIs with Python linter.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46867 from dongjoon-hyun/SPARK-48531.\n\nAuthored-by: Dongjoon Hyun \nSigned-off-by: Dongjoon Hyun ","shortMessageHtmlLink":"[SPARK-48531][INFRA] Fix Black target version to Python 3.9"}},{"before":"f4afa2215a1a390d9f099a26155fbefc5beefbe9","after":"8b88f5ae10cc676a9778c186b12c691fa913088d","ref":"refs/heads/master","pushedAt":"2024-06-04T13:33:06.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"LuciferYang","name":"YangJie","path":"/LuciferYang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1475305?s=80&v=4"},"commit":{"message":"[SPARK-48522][BUILD] Update Stream Library to 2.9.8 and attach its NOTICE\n\n### What changes were proposed in this pull request?\n\nUpdate Stream Library to 2.9.8 and attach its NOTICE\n\n### Why are the changes needed?\n\nupdate dep and notice file\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\npassing ci\n\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46861 from yaooqinn/SPARK-48522.\n\nAuthored-by: Kent Yao \nSigned-off-by: yangjie01 ","shortMessageHtmlLink":"[SPARK-48522][BUILD] Update Stream Library to 2.9.8 and attach its NO…"}},{"before":"d273fdf37bc291aadf8677305bda2a91b593219f","after":"f4afa2215a1a390d9f099a26155fbefc5beefbe9","ref":"refs/heads/master","pushedAt":"2024-06-04T12:33:57.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"LuciferYang","name":"YangJie","path":"/LuciferYang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1475305?s=80&v=4"},"commit":{"message":"[SPARK-48506][CORE] Compression codec short names are case insensitive except for event logging\n\n### What changes were proposed in this pull request?\n\nCompression codec short names, e.g. map statuses, broadcasts, shuffle, parquet/orc/avro outputs, are case insensitive except for event logging. Calling `org.apache.spark.io.CompressionCodec.getShortName` causes this issue.\n\nIn this PR, we make `CompressionCodec.getShortName` handle case sensitivity correctly.\n\n### Why are the changes needed?\n\nFeature parity\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, spark.eventLog.compression.codec now accepts not only the lowercased form of lz4, lzf, snappy, and zstd, but also forms with any of the characters to be upcased。\n\n### How was this patch tested?\n\nnew tests\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46847 from yaooqinn/SPARK-48506.\n\nAuthored-by: Kent Yao \nSigned-off-by: yangjie01 ","shortMessageHtmlLink":"[SPARK-48506][CORE] Compression codec short names are case insensitiv…"}},{"before":"90ee299925220fa564c90e1f688a0d13ba0ac79d","after":"d273fdf37bc291aadf8677305bda2a91b593219f","ref":"refs/heads/master","pushedAt":"2024-06-04T11:08:45.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"LuciferYang","name":"YangJie","path":"/LuciferYang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1475305?s=80&v=4"},"commit":{"message":"[SPARK-48519][BUILD] Upgrade jetty to 11.0.21\n\n### What changes were proposed in this pull request?\nThis pr aims to upgrade jetty from 11.0.20 to 11.0.21.\n\n### Why are the changes needed?\nThe new version bring some bug fix like [Reduce ByteBuffer churning in HttpOutput](https://github.com/jetty/jetty.project/commit/fe94c9f8a40df49021b28280f708448870c5b420). The full release notes as follows:\n- https://github.com/jetty/jetty.project/releases/tag/jetty-11.0.21\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nPass GitHub Actions\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46843 from LuciferYang/jetty-11.0.21.\n\nAuthored-by: yangjie01 \nSigned-off-by: yangjie01 ","shortMessageHtmlLink":"[SPARK-48519][BUILD] Upgrade jetty to 11.0.21"}},{"before":"02c645607f4353df573cdba568e092c3ff4c359a","after":"90ee299925220fa564c90e1f688a0d13ba0ac79d","ref":"refs/heads/master","pushedAt":"2024-06-04T10:58:39.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48518][CORE] Make LZF compression be able to run in parallel\n\n### What changes were proposed in this pull request?\n\nThis PR introduced a config that turns on LZF compression to parallel mode via using PLZFOutputStream.\n\nFYI, https://github.com/ning/compress?tab=readme-ov-file#parallel-processing\n\n### Why are the changes needed?\n\nImprove performance\n\n```\n[info] OpenJDK 64-Bit Server VM 17.0.10+0 on Mac OS X 14.5\n[info] Apple M2 Max\n[info] Compress large objects: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative\n[info] -----------------------------------------------------------------------------------------------------------------------------\n[info] Compression 1024 array values in 7 threads 12 13 1 0.1 11788.2 1.0X\n[info] Compression 1024 array values single-threaded 23 23 0 0.0 22512.7 0.5X\n```\n\n### Does this PR introduce _any_ user-facing change?\n\nno\n\n### How was this patch tested?\n\nbenchmark\n### Was this patch authored or co-authored using generative AI tooling?\nno\n\nCloses #46858 from yaooqinn/SPARK-48518.\n\nAuthored-by: Kent Yao \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48518][CORE] Make LZF compression be able to run in parallel"}},{"before":"8cebb9b56cc3716fb5afaafa317751924f0f8062","after":"02c645607f4353df573cdba568e092c3ff4c359a","ref":"refs/heads/master","pushedAt":"2024-06-04T09:50:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"zhengruifeng","name":"Ruifeng Zheng","path":"/zhengruifeng","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/7322292?s=80&v=4"},"commit":{"message":"[SPARK-48512][PYTHON][TESTS] Refactor Python tests\n\n### What changes were proposed in this pull request?\n\nUse withSQLConf in tests when it is appropriate.\n\n### Why are the changes needed?\n\nEnforce good practice for setting config in test cases.\n\n### Does this PR introduce _any_ user-facing change?\n\nNO\n\n### How was this patch tested?\n\nexisting UT\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNO\n\nCloses #46852 from amaliujia/refactor_pyspark.\n\nAuthored-by: Rui Wang \nSigned-off-by: Ruifeng Zheng ","shortMessageHtmlLink":"[SPARK-48512][PYTHON][TESTS] Refactor Python tests"}},{"before":"abbe301d7645217f22641cf3a5c41502680e65be","after":"8cebb9b56cc3716fb5afaafa317751924f0f8062","ref":"refs/heads/master","pushedAt":"2024-06-04T08:24:36.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"beliefer","name":"Jiaan Geng","path":"/beliefer","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8486025?s=80&v=4"},"commit":{"message":"[SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry\n\n### What changes were proposed in this pull request?\nThis PR propose to improve concurrency performance for `FunctionRegistry`.\n\n### Why are the changes needed?\nCurrently, `SimpleFunctionRegistryBase` adopted the `mutable.Map` caching function infos. The `SimpleFunctionRegistryBase` guarded by this so as ensure security under multithreading.\nBecause all the mutable state are related to `functionBuilders`, we can delegate security to `ConcurrentHashMap`.\n`ConcurrentHashMap ` has higher concurrency activity and responsiveness.\nAfter this change, `FunctionRegistry` have better perf than before.\n\n### Does this PR introduce _any_ user-facing change?\n'No'.\n\n### How was this patch tested?\nGA.\nThe benchmark test.\n```\nobject FunctionRegistryBenchmark extends BenchmarkBase {\n\n override def runBenchmarkSuite(mainArgs: Array[String]): Unit = {\n runBenchmark(\"FunctionRegistry\") {\n val iters = 1000000\n val threadNum = 4\n val functionRegistry = FunctionRegistry.builtin\n val names = FunctionRegistry.expressions.keys.toSeq\n val barrier = new CyclicBarrier(threadNum + 1)\n val threadPool = ThreadUtils.newDaemonFixedThreadPool(threadNum, \"test-function-registry\")\n val benchmark = new Benchmark(\"SimpleFunctionRegistry\", iters, output = output)\n\n benchmark.addCase(\"only read\") { _ =>\n for (_ <- 1 to threadNum) {\n threadPool.execute(new Runnable {\n val random = new Random()\n override def run(): Unit = {\n barrier.await()\n for (_ <- 1 to iters) {\n val name = names(random.nextInt(names.size))\n val fun = functionRegistry.lookupFunction(new FunctionIdentifier(name))\n assert(fun.map(_.getName).get == name)\n functionRegistry.listFunction()\n }\n barrier.await()\n }\n })\n }\n barrier.await()\n barrier.await()\n }\n\n benchmark.run()\n }\n }\n}\n```\nThe benchmark output before this PR.\n```\nJava HotSpot(TM) 64-Bit Server VM 17.0.9+11-LTS-201 on Mac OS X 10.14.6\nIntel(R) Core(TM) i5-5350U CPU 1.80GHz\nSimpleFunctionRegistry: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative\n------------------------------------------------------------------------------------------------------------------------\nonly read 54858 55043 261 0.0 54858.1 1.0X\n```\nThe benchmark output after this PR.\n```\nJava HotSpot(TM) 64-Bit Server VM 17.0.9+11-LTS-201 on Mac OS X 10.14.6\nIntel(R) Core(TM) i5-5350U CPU 1.80GHz\nSimpleFunctionRegistry: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative\n------------------------------------------------------------------------------------------------------------------------\nonly read 20202 20264 88 0.0 20202.1 1.0X\n```\n\n### Was this patch authored or co-authored using generative AI tooling?\n'No'.\n\nCloses #44976 from beliefer/SPARK-46937.\n\nAuthored-by: beliefer \nSigned-off-by: beliefer ","shortMessageHtmlLink":"[SPARK-46937][SQL] Improve concurrency performance for FunctionRegistry"}},{"before":"c852c4f72acb658ff0193f16b526c8f653188a4e","after":"abbe301d7645217f22641cf3a5c41502680e65be","ref":"refs/heads/master","pushedAt":"2024-06-04T07:41:47.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48505][CORE] Simplify the implementation of `Utils#isG1GC`\n\n### What changes were proposed in this pull request?\nThis PR changes to use the result of `ManagementFactory.getGarbageCollectorMXBeans` to determine whether G1GC is used. When G1GC is used, `ManagementFactory.getGarbageCollectorMXBeans` will return two instances of `GarbageCollectorExtImpl`, their names are `G1 Young Generation` and `G1 Old Generation` respectively.\n\n### Why are the changes needed?\nSimplify the implementation.\n\n### Does this PR introduce _any_ user-facing change?\nNo\n\n### How was this patch tested?\nPass GitHub Actions\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46783 from LuciferYang/refactor-isG1GC.\n\nLead-authored-by: yangjie01 \nCo-authored-by: YangJie \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48505][CORE] Simplify the implementation of Utils#isG1GC"}},{"before":"6475ddfed7f4fc13ac362181c2a9d28f8f2454f7","after":"c852c4f72acb658ff0193f16b526c8f653188a4e","ref":"refs/heads/master","pushedAt":"2024-06-04T07:10:54.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48318][SQL] Enable hash join support for all collations (complex types)\n\n### What changes were proposed in this pull request?\nEnable collation support for hash join on complex types.\n\n- Logical plan is rewritten in analysis to (recursively) replace all non-binary strings with CollationKey\n- CollationKey is a unary expression that transforms StringType to BinaryType\n- Collation keys allow correct & efficient string comparison under specific collation rules\n\n### Why are the changes needed?\nImprove JOIN performance for complex types containing collated strings.\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\n\n- Unit tests for `CollationKey` in `CollationExpressionSuite`\n- E2e SQL tests for `RewriteCollationJoin` in `CollationSuite`\n- Various queries with JOIN in existing TPCDS collation test suite\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46722 from uros-db/hash-join-cmx.\n\nAuthored-by: Uros Bojanic <157381213+uros-db@users.noreply.github.com>\nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48318][SQL] Enable hash join support for all collations (compl…"}},{"before":"560c08332b35941260169124b4f522bdc82b84d8","after":"6475ddfed7f4fc13ac362181c2a9d28f8f2454f7","ref":"refs/heads/master","pushedAt":"2024-06-04T06:51:22.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"yaooqinn","name":"Kent Yao","path":"/yaooqinn","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/8326978?s=80&v=4"},"commit":{"message":"[SPARK-48514][BUILD][K8S] Upgrade `kubernetes-client` to 6.13.0\n\n### What changes were proposed in this pull request?\nUpgrade kubernetes-client from 6.12.1 to 6.13.0\n\n### Why are the changes needed?\nUpgrade Fabric8 Kubernetes Model to Kubernetes v1.30.0\n[Release log 6.13.0](https://github.com/fabric8io/kubernetes-client/releases/tag/v6.13.0)\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nPass GA\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46854 from bjornjorgensen/kubclient6.13.0.\n\nAuthored-by: Bjørn Jørgensen \nSigned-off-by: Kent Yao ","shortMessageHtmlLink":"[SPARK-48514][BUILD][K8S] Upgrade kubernetes-client to 6.13.0"}},{"before":"6272c0511d23740e4ac1152ab5d967fba50d6690","after":"560c08332b35941260169124b4f522bdc82b84d8","ref":"refs/heads/master","pushedAt":"2024-06-04T00:24:20.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48482][PYTHON] dropDuplicates and dropDuplicatesWIthinWatermark should accept variable length args\n\n### What changes were proposed in this pull request?\n\nIn scala, `dropDuplicates` and `dropDuplicatesWIthinWatermark` accepts varargs, i.e. `df.dropDuplicates(\"id\", \"value\")`. However this is not supported in Python, users have to wrap them with list. This PR fixes it.\n\n### Why are the changes needed?\n\nBetter API, integrated scala and python experience\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, now users can use var args as parameters to `dropDuplicates` and `dropDuplicatesWithinWatermark`\n\n### How was this patch tested?\n\nAdded & modified existing unit tests\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46817 from WweiL/dropDuplicates-accept-vararg.\n\nAuthored-by: Wei Liu \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48482][PYTHON] dropDuplicates and dropDuplicatesWIthinWatermar…"}},{"before":"cfb79d9d44c404f6a55f19323b76c126601d3137","after":"6272c0511d23740e4ac1152ab5d967fba50d6690","ref":"refs/heads/master","pushedAt":"2024-06-03T23:44:48.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48508][CONNECT][PYTHON] Cache user specified schema in `DataFrame.{to, mapInPandas, mapInArrow}`\n\n### What changes were proposed in this pull request?\nCache user specified schema in `DataFrame.{to, mapInPandas, mapInArrow}`\n\n### Why are the changes needed?\nto avoid extra RPC to get the schema\n\n### Does this PR introduce _any_ user-facing change?\nno, it should only be an optimization\n\n### How was this patch tested?\nCI\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46848 from zhengruifeng/py_user_define_schema.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48508][CONNECT][PYTHON] Cache user specified schema in `DataFr…"}},{"before":"e4e8bb5936d305d27961c3a9c04d06ee1901977f","after":"cfb79d9d44c404f6a55f19323b76c126601d3137","ref":"refs/heads/master","pushedAt":"2024-06-03T23:44:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-47933][PYTHON][FOLLOWUP] Correct the error message\n\n### What changes were proposed in this pull request?\nCorrect the error message\n\nCloses #46850 from zhengruifeng/nit_dispatch_col_method.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-47933][PYTHON][FOLLOWUP] Correct the error message"}},{"before":"f9542d008402f8cef96d5ec347583c7c1d30d840","after":"e4e8bb5936d305d27961c3a9c04d06ee1901977f","ref":"refs/heads/master","pushedAt":"2024-06-03T23:16:51.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-47972][SQL] Restrict CAST expression for collations\n\n### What changes were proposed in this pull request?\nBlock of syntax CAST(value AS STRING COLLATE collation_name).\n\n### Why are the changes needed?\nCurrent state of code allows for calls like CAST(1 AS STRING COLLATE UNICODE). We want to restrict CAST expression to only be able to cast to default collation string, and to only allow COLLATE expression to produce explicitly collated strings.\n\n### Does this PR introduce _any_ user-facing change?\nYes.\n\n### How was this patch tested?\nTest in CollationSuite.\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46474 from mihailom-db/SPARK-47972.\n\nAuthored-by: Mihailo Milosevic \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-47972][SQL] Restrict CAST expression for collations"}},{"before":"5baaa615ef852c2032342c5986b4aa15ccf74b25","after":"f9542d008402f8cef96d5ec347583c7c1d30d840","ref":"refs/heads/master","pushedAt":"2024-06-03T20:00:37.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48413][SQL] ALTER COLUMN with collation\n\n### What changes were proposed in this pull request?\n\nAdd support for changing collation of a column with `ALTER COLUMN` command. Use existing support for `ALTER COLUMN` with type to enable changing collations of column. Syntax example:\n```\nALTER TABLE t1 ALTER COLUMN col TYPE STRING COLLATE UTF8_BINARY_LCASE\n```\n\n### Why are the changes needed?\n\nEnable changing collation on column.\n\n### Does this PR introduce _any_ user-facing change?\n\nYes, it adds support for changing collation of column.\n\n### How was this patch tested?\n\nAdded tests to `DDLSuite` and `DataTypeSuite`.\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo.\n\nCloses #46734 from nikolamand-db/SPARK-48413.\n\nAuthored-by: Nikola Mandic \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48413][SQL] ALTER COLUMN with collation"}},{"before":"5d71ef0716f7a2d470d05bf3c04441382cd80138","after":"5baaa615ef852c2032342c5986b4aa15ccf74b25","ref":"refs/heads/master","pushedAt":"2024-06-03T18:49:17.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"gengliangwang","name":"Gengliang Wang","path":"/gengliangwang","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/1097932?s=80&v=4"},"commit":{"message":"[SPARK-47977] DateTimeUtils.timestampDiff and DateTimeUtils.timestampAdd should not throw INTERNAL_ERROR exception\n\n### What changes were proposed in this pull request?\n\nConvert `INTERNAL_ERROR` for `timestampAdd` and `timestampDiff` to error with class. Reusing `INVALID_PARAMETER_VALUE.DATETIME_UNIT` used when parsing expressions.\n\nThe change is needed since `timestampDiff` and `timestampAdd` expressions could be constructed without going through parser - e.g. PySpark creates `timestampDiff` through PythonSQLUtils.\n\n### Why are the changes needed?\n\nCorrect classification of errors\n\n### Does this PR introduce _any_ user-facing change?\n\nNo\n\n### How was this patch tested?\n\nExisting unit tests updated with new error class\n\n### Was this patch authored or co-authored using generative AI tooling?\n\nNo\n\nCloses #46210 from vitaliili-db/SPARK-47977.\n\nAuthored-by: Vitalii Li \nSigned-off-by: Gengliang Wang ","shortMessageHtmlLink":"[SPARK-47977] DateTimeUtils.timestampDiff and DateTimeUtils.timestamp…"}},{"before":"7e8b60b5ae7d6453bc1ce51b5112c975f9aa8757","after":"5d71ef0716f7a2d470d05bf3c04441382cd80138","ref":"refs/heads/master","pushedAt":"2024-06-03T17:51:15.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"cloud-fan","name":"Wenchen Fan","path":"/cloud-fan","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/3182036?s=80&v=4"},"commit":{"message":"[SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non-equivalent columns that were incorrectly allowed\n\n### What changes were proposed in this pull request?\n\nFixes CheckAnalysis to reject invalid scalar subquery group-bys that were previously allowed and returned wrong results.\n\nFor example, this query is not legal and should give an error, but instead we incorrectly allowed it and it returns wrong results prior to this PR (full repro with table data in the jira):\n\n```\nselect *, (select count(*) from y where y1 > x1 group by y1) from x;\n```\n\nIt returns two rows, even though there's only one row of x. The correct result is an error, because there is more than one row returned by the scalar subquery.\n\nAnother problem case is if the correlation condition is an equality but it's under another operator like an OUTER JOIN or UNION. Various other expressions that are not equi-joins between the inner and outer fields hit this too, e.g. `where y1 + y2 = x1 group by y1`. See the comments in the code and the tests for more examples.\n\nThis PR fixes the logic which checks for valid vs invalid group-bys. However, note that this new logic may block some queries that are actually valid, for example `a + 1 = outer(b)` is valid but would be rejected. Therefore, we add a conf flag that can be used to restore the legacy behavior, as well as logging for when the legacy behavior is used and differs from the new behavior. (In general, to accurately run valid queries and reject invalid queries, the check must be moved from compile-time to run-time - see https://issues.apache.org/jira/browse/SPARK-48501.)\n\nThis is a longstanding bug. The bug is in CheckAnalysis in checkAggregateInScalarSubquery. It allows grouping columns that are present in correlation predicates, but doesn’t check whether those predicates are equalities -  because when that code was written, non-equality correlation wasn’t allowed. Therefore, this bug has existed since non-equality correlation was added (~2 years ago).\n\n### Why are the changes needed?\nFix invalid queries returning wrong results\n\n### Does this PR introduce _any_ user-facing change?\nYes, block subqueries with invalid group-bys.\n\n### How was this patch tested?\nAdd tests\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo\n\nCloses #46839 from jchen5/scalar-subq-gby.\n\nAuthored-by: Jack Chen \nSigned-off-by: Wenchen Fan ","shortMessageHtmlLink":"[SPARK-48503][SQL] Fix invalid scalar subqueries with group-by on non…"}},{"before":"67ad55c46347750ae602868326964e102525b484","after":"7e8b60b5ae7d6453bc1ce51b5112c975f9aa8757","ref":"refs/heads/master","pushedAt":"2024-06-03T08:53:29.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48507][INFRA] Use Hadoop 3.3.6 winutils in `build_sparkr_window`\n\n### What changes were proposed in this pull request?\nThe pr aims to use `Hadoop 3.3.6` winutils in `build_sparkr_window`.\n\n### Why are the changes needed?\nLet's use the latest version.\nhttps://github.com/cdarlint/winutils/tree/master\n\n### Does this PR introduce _any_ user-facing change?\nNo.\n\n### How was this patch tested?\nN/A\n\n### Was this patch authored or co-authored using generative AI tooling?\nNo.\n\nCloses #46846 from panbingkun/SPARK-48507.\n\nAuthored-by: panbingkun \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48507][INFRA] Use Hadoop 3.3.6 winutils in build_sparkr_window"}},{"before":"8d9d9c243ed141660e853b189982979d19f0ba9d","after":"67ad55c46347750ae602868326964e102525b484","ref":"refs/heads/master","pushedAt":"2024-06-03T08:06:01.000Z","pushType":"push","commitsCount":1,"pusher":{"login":"HyukjinKwon","name":"Hyukjin Kwon","path":"/HyukjinKwon","primaryAvatarUrl":"https://avatars.githubusercontent.com/u/6477701?s=80&v=4"},"commit":{"message":"[SPARK-48504][PYTHON][CONNECT] Parent Window class for Spark Connect and Spark Classic\n\n### What changes were proposed in this pull request?\n Parent Window class for Spark Connect and Spark Classic\n\n### Why are the changes needed?\nSame as https://github.com/apache/spark/pull/46129\n\n### Does this PR introduce _any_ user-facing change?\nSame as https://github.com/apache/spark/pull/46129\n\n### How was this patch tested?\nCI\n\n### Was this patch authored or co-authored using generative AI tooling?\nNO\n\nCloses #46841 from zhengruifeng/py_parent_window.\n\nAuthored-by: Ruifeng Zheng \nSigned-off-by: Hyukjin Kwon ","shortMessageHtmlLink":"[SPARK-48504][PYTHON][CONNECT] Parent Window class for Spark Connect …"}}],"hasNextPage":true,"hasPreviousPage":false,"activityType":"all","actor":null,"timePeriod":"all","sort":"DESC","perPage":30,"cursor":"djE6ks8AAAAEXO6b_QA","startCursor":null,"endCursor":null}},"title":"Activity · apache/spark"}