Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broadcasted in-place assignment of matrix to subarray 3x slower in 1.11-alpha1 over 1.10 and nightly #53615

Open
BioTurboNick opened this issue Mar 5, 2024 · 10 comments
Labels
domain:arrays [a, r, r, a, y, s] kind:regression Regression in behavior compared to a previous version performance Must go faster status:bisect wanted
Milestone

Comments

@BioTurboNick
Copy link
Contributor

Discovered a slowdown in 1.11-alpha when copying an array to a subarray.

Since it seems nightly doesn't have it it's possible this is fixed and already marked for backport but I haven't found it - apologies for the noise if so.

using BenchmarkTools

A = rand(1000,1000);
B = zeros(1001,1001);

@btime $B[1:end - 1, 1:end - 1] .= $A;
# Julia 1.10.2: 526.900 μs
# Julia 1.11-alpha1: 1.560 ms
# Julia 1.12-DEV.136: 532.800 μs
@giordano
Copy link
Contributor

giordano commented Mar 5, 2024

Duplicate of #53158?

@giordano giordano added kind:regression Regression in behavior compared to a previous version domain:linear algebra Linear algebra status:bisect wanted performance Must go faster domain:arrays [a, r, r, a, y, s] and removed domain:linear algebra Linear algebra labels Mar 5, 2024
@giordano giordano added this to the 1.11 milestone Mar 5, 2024
@BioTurboNick
Copy link
Contributor Author

BioTurboNick commented Mar 5, 2024

Ah, I think it is. Thanks. EDIT: I'll take my hands off for the moment, didn't want to close prematurely.

@jishnub
Copy link
Contributor

jishnub commented Mar 6, 2024

There's also #53430

@abraemer
Copy link

abraemer commented Mar 12, 2024

Bisection shows that the regression was introduced by 9aa7980

I'll bisect again to find the fix as well and report back in a couple of hours.

Bisect script
#!/bin/bash
#=
make -s clean >> ~/buildlogs_julia.txt 2>&1
make -s -C deps uninstall >> ~/buildlogs_julia.txt 2>&1
if make -s -j 8 >> ~/buildlogs_julia.txt 2>&1 ; then
	echo "build success"
else
	echo "build failure"
	exit 125
fi
exec ~/julia/bisect/julia/julia --startup=no $0
#exec julia --startup=no $0 # test case v1.10.0
=#
#import Pkg; Pkg.add("BenchmarkTools")
using BenchmarkTools

A = rand(1000,1000);
B = zeros(1001,1001);

b = @benchmark $B[1:end - 1, 1:end - 1] .= $A;
time = minimum(b.times)
@info "" time
if time > 1.5e6 # local runtime is 1e6 before regression
	@info "BAD COMMIT (performance difference)"
	exit(1) # bad commit
else
	@info "GOOD COMMIT (no performance difference)"
	exit(0) # good commit
end

@giordano
Copy link
Contributor

Bisection shows that the regression was introduced by 9aa7980

...which is the same as #53158 (comment), which suggests again this is a duplicate of that ticket.

@abraemer
Copy link

Performance for this specific example was fixed by 1a90409 by @jishnub

So this is likely a duplicate of either #40962 or #53430

Bisect script
#!/bin/bash
#=
make -s clean >> ~/buildlogs_julia.txt 2>&1
make -s -C deps uninstall >> ~/buildlogs_julia.txt 2>&1
if make -s -j 8 >> ~/buildlogs_julia.txt 2>&1 ; then
	echo "build success"
else
	echo "build failure"
	exit 125
fi
exec ~/julia/bisect/julia/julia --startup=no $0
=#
#import Pkg; Pkg.add("BenchmarkTools")
using BenchmarkTools

A = rand(1000,1000);
B = zeros(1001,1001);

b = @benchmark $B[1:end - 1, 1:end - 1] .= $A;
time = minimum(b.times)
@info "" time
# we search for the commit that fixes the performance
# so in git bisect lingo:
# performance difference -> normal -> good
# no performance difference -> "bug" -> bad
if time > 1.5e6
	@info "GOOD COMMIT (performance difference)"
	exit(0) # good commit
else
	@info "BAD COMMIT (no performance difference)"
	exit(1) # bad commit
end

@jishnub
Copy link
Contributor

jishnub commented Mar 12, 2024

This is probably different from #53430. copyto_unalised! has four branches, one for each combination of source and destination index styles. In this, the source is IndexLinear whereas the destination is IndexCartesian, whereas in #53430, they're both IndexCartesian.

Btw, I don't think #53383 fully solves #53158, although it improves the performance by fixing the broken branches. A minor difference in performance still exists for me:

julia> a = zeros(40000,4000); b = rand(size(a)...);

julia> @btime $a[1:end, 1:end] .= $b;
  293.946 ms (0 allocations: 0 bytes) # v"1.10.2"
  314.381 ms (0 allocations: 0 bytes) # v"1.12.0-DEV.162"

This should probably be backported, but it uses the convenience function introduced in #53369, so that would be necessary as well.

@KristofferC
Copy link
Sponsor Member

KristofferC commented May 14, 2024

Interestingly, I get:

julia> @btime $B[1:end - 1, 1:end - 1] .= $A;
  601.689 μs (0 allocations: 0 bytes)

on the latest released 1.11 beta but

julia> @btime $B[1:end - 1, 1:end - 1] .= $A;
  927.958 μs (0 allocations: 0 bytes)

on the backport branch. So I'll have to check out what is going on there...

@KristofferC
Copy link
Sponsor Member

#51760 + #53383 (that was reverted on the backport branch) seemed to have fixed this...

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 14, 2024

#53383 should be relanded (with different tests), as that as a bugfix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays [a, r, r, a, y, s] kind:regression Regression in behavior compared to a previous version performance Must go faster status:bisect wanted
Projects
None yet
Development

No branches or pull requests

6 participants