Identical calls to getindex not optimised out #54281

juliaUser42 · 2024-04-27T00:07:17Z

Hi,
thanks for the great job you do on julia. I think I spotted a potential performance issue. Maybe this is know or expected or I am missing something. In any case, I could not find a relevant post that helped me to make sense of this.

The output of versioninfo()

Julia Version 1.10.2
Commit bd47eca (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 64 × AMD EPYC 7313 16-Core Processor
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 64 virtual cores)
Environment:
JULIA_DIR = /snap/julia/current

How you installed Julia

snap on Ubuntu

A minimal working example (MWE), also known as a minimum reproducible example

x::Vector{Float64} = rand(10000)
y::Vector{Float64} = rand(10000)
u::Vector{Int32} = rand(1:30000, 30000)
l::Vector{Int32} = rand(1:30000, 30000)
L::Vector{Float64} = rand(30000)
U::Vector{Float64} = rand(30000)

myFunction(x, y, u, l, L, U)
    @inbounds for f in 1:length(L)
        y[l[f]] += U[f] * x[u[f]]
        y[u[f]] += L[f] * x[l[f]]
    end
end

produces

.LBB0_24:                               # %L69.epil
	movsxd	rdx, dword ptr [r11 + 4*rdi]
	movsxd	rbx, dword ptr [rsi + 4*rdi]
	vmovsd	xmm0, qword ptr [rcx + 8*rdx - 8] # xmm0 = mem[0],zero
	vmulsd	xmm0, xmm0, qword ptr [r10 + 8*rdi]
	vaddsd	xmm0, xmm0, qword ptr [rax + 8*rbx - 8]
	vmovsd	qword ptr [rax + 8*rbx - 8], xmm0
	movsxd	rbx, dword ptr [rsi + 4*rdi]
	movsxd	rdx, dword ptr [r11 + 4*rdi]
	vmovsd	xmm0, qword ptr [rcx + 8*rbx - 8] # xmm0 = mem[0],zero
	vmulsd	xmm0, xmm0, qword ptr [r9 + 8*rdi]
	inc	rdi
	vaddsd	xmm0, xmm0, qword ptr [rax + 8*rdx - 8]
	vmovsd	qword ptr [rax + 8*rdx - 8], xmm0
	cmp	r8, rdi
	jne	.LBB0_24

The issue is that the identical calls to Base.getindex(Main.l, f) and Base.getindex(Main.u, f) are not optimised out and consequently executed twice. This is also evident in the output of @code_lowered and @code_typed.

Of course, this can be fixed easily by hand.

On the other hand, I would not expect having to do this because clang, icc and gcc perform this optimisation automatically all producing machine code similar to

myFunction(double*, double*, int*, int*, double*, double*, double*, int):
        mov     r9, qword ptr [rsp + 8]
        xor     r11d, r11d
.LBB0_1:                                # =>This Inner Loop Header: Depth=1
        movsd   xmm0, qword ptr [r9 + 8*r11]    # xmm0 = mem[0],zero
        movsxd  r10, dword ptr [rdx + 4*r11]
        mulsd   xmm0, qword ptr [rdi + 8*r10]
        movsxd  rax, dword ptr [rcx + 4*r11]
        addsd   xmm0, qword ptr [rsi + 8*rax]
        movsd   qword ptr [rsi + 8*rax], xmm0
        movsd   xmm0, qword ptr [r8 + 8*r11]    # xmm0 = mem[0],zero
        mulsd   xmm0, qword ptr [rdi + 8*rax]
        addsd   xmm0, qword ptr [rsi + 8*r10]
        movsd   qword ptr [rsi + 8*r10], xmm0
        add     r11, 1
        jmp     .LBB0_1

see https://gcc.godbolt.org/z/z8e99c8oj

The text was updated successfully, but these errors were encountered:

Keno · 2024-04-27T20:06:33Z

The C version relies on TBAA between double pointers and int pointers to prove noalias. Our current TBAA hierachy isn't fine enough to require this, although the whole ReinterpretArray change years ago was in preparation for making such a change. Still, making the change would probably require some careful evaluation, I'm sure some people are using TBAA violating workarounds.

juliaUser42 · 2024-04-28T14:30:51Z

Hi Keno,
Thank you for your quick answer. I am an engineer literate in assembly - not a compiler guy. For my purposes, it's enough to know the limits of what the compiler can do or cannot do ;-)

oscardssmith · 2024-04-29T05:07:50Z

to remove a little bit of the jargon from Keno's answer, the optimization that "obviously" can be done is only valid if writing to y doesn't write to either u or l. In this case, this should be possible to prove via tbaa (type based alias analysis) since y is a vector of floats, while the other arrays are vectors of Ints.

sgaure · 2024-04-30T08:15:51Z

Perhaps there should be a possibility for the user to tell the compiler there's no aliasing? A @noalias macro applied to a method, or a loop? I can predict disastrous consequences since it's really a call site thing, not a method or instance thing, so it should probably not be used in generic code. It would be nice for the more speed aware users, though. If I recall correctly, I think I've used a fortran compiler many years ago (HP, IBM, PGI?) with a compiler flag -noaliasing with this effect.

nsajko · 2024-04-30T08:36:16Z

there should be a possibility for the user to tell the compiler there's no aliasing

That already exists. There's both:

An experimental API in Base: Base.Experimental.Const and Base.Experimental.@aliasscope.
The user package UnsafeAssume.jl, which allows telling the compiler other things, too.

Also see #52851.

KristofferC · 2024-04-30T09:12:35Z

Still, making the change would probably require some careful evaluation, I'm sure some people are using TBAA violating workarounds.

After ReshapedArrays what (safe) ways are there to get aliasing between two Array of different data type? I know there is still vec so aliasing between matrix and vector of the same type is possible. Would it be hard to make that change and run a PkgEval?

oscardssmith added performance Must go faster domain:arrays [a, r, r, a, y, s] labels Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identical calls to getindex not optimised out #54281

Identical calls to getindex not optimised out #54281

juliaUser42 commented Apr 27, 2024 •

edited

Keno commented Apr 27, 2024

juliaUser42 commented Apr 28, 2024

oscardssmith commented Apr 29, 2024

sgaure commented Apr 30, 2024

nsajko commented Apr 30, 2024 •

edited

KristofferC commented Apr 30, 2024

Identical calls to getindex not optimised out #54281

Identical calls to getindex not optimised out #54281

Comments

juliaUser42 commented Apr 27, 2024 • edited

Keno commented Apr 27, 2024

juliaUser42 commented Apr 28, 2024

oscardssmith commented Apr 29, 2024

sgaure commented Apr 30, 2024

nsajko commented Apr 30, 2024 • edited

KristofferC commented Apr 30, 2024

juliaUser42 commented Apr 27, 2024 •

edited

nsajko commented Apr 30, 2024 •

edited