Optimizer regression on 1.11 #506

maleadt · 2023-09-06T06:24:03Z

Error in testset "JuliaLang/julia#48097: kwcall inference in the presence of overlay method" on worker 4022595:
Test Failed at /home/maleadt/.julia/packages/GPUCompiler/3f5LS/test/native_tests.jl:534
  Expression: !(occursin("inttoptr", ir))
   Evaluated: !(occursin("inttoptr", "define void @julia_parent_21052() local_unnamed_addr #3 {\ntop:\n  %jlcallframe = alloca {}*, i32 4, align 8\n  %0 = call fastcc nonnull {}* @julia_typejoin_21060({}* readonly inttoptr (i64 140677987481232 to {}*), {}* readonly inttoptr (i64 140677987481296 to {}*))\n  %1 = load {}*, {}** bitcast (i8* getelementptr (i8, i8* @small_typeof, i64 64) to {}**), align 8\n  %2 = getelementptr inbounds {}*, {}** %jlcallframe, i32 0\n  store {}* %1, {}** %2, align 8\n  %3 = getelementptr inbounds {}*, {}** %jlcallframe, i32 1\n  store {}* inttoptr (i64 140677987481232 to {}*), {}** %3, align 8\n  %4 = getelementptr inbounds {}*, {}** %jlcallframe, i32 2\n  store {}* inttoptr (i64 140677987481296 to {}*), {}** %4, align 8\n  %5 = getelementptr inbounds {}*, {}** %jlcallframe, i32 3\n  store {}* %0, {}** %5, align 8\n  %6 = call nonnull {}* @jl_f_apply_type({}* null, {}** %jlcallframe, i32 4)\n  ret void\n}\n"))

MWE:

using GPUCompiler, LLVM

module TestRuntime
    # dummy methods
    signal_exception() = return
    malloc(sz) = C_NULL
    report_oom(sz) = return
    report_exception(ex) = return
    report_exception_name(ex) = return
    report_exception_frame(idx, func, file, line) = return
end

struct TestCompilerParams <: AbstractCompilerParams end
GPUCompiler.runtime_module(::CompilerJob{<:Any,TestCompilerParams}) = TestRuntime

child(; kwargs...) = return
function parent()
    child(; a=1f0, b=1.0)
    return
end

Base.Experimental.@MethodTable method_table
Base.Experimental.@overlay method_table @noinline Core.throw_inexacterror(f::Symbol, ::Type{T}, val) where {T} = return

source = methodinstance(typeof(parent), Tuple{})
target = NativeCompilerTarget()
params = TestCompilerParams()
config = CompilerConfig(target, params; kernel=false)
job = CompilerJob(source, config)

JuliaContext() do ctx
    ir, meta = GPUCompiler.compile(:llvm, job; validate=false)
    for f in functions(ir)
        if startswith(LLVM.name(f), "julia_parent")
            println(string(f))
        end
    end
end

Before JuliaLang/julia#51092:

define void @julia_parent_3932() local_unnamed_addr #0 !dbg !4 {
top:
  ret void, !dbg !8
}

After:

define void @julia_parent_12207() local_unnamed_addr #3 !dbg !36 {
top:
  %jlcallframe = alloca {}*, i32 4, align 8
  %0 = call fastcc nonnull {}* @julia_typejoin_12215({}* readonly inttoptr (i64 139659080241488 to {}*), {}* readonly inttoptr (i64 139659080241552 to {}*)), !dbg !40
  %1 = load {}*, {}** bitcast (i8* getelementptr (i8, i8* @small_typeof, i64 64) to {}**), align 8, !dbg !63, !tbaa !64, !alias.scope !68, !noalias !71, !nonnull !39, !dereferenceable !76, !align !77
  %2 = getelementptr inbounds {}*, {}** %jlcallframe, i32 0, !dbg !63
  store {}* %1, {}** %2, align 8, !dbg !63
  %3 = getelementptr inbounds {}*, {}** %jlcallframe, i32 1, !dbg !63
  store {}* inttoptr (i64 139659080241488 to {}*), {}** %3, align 8, !dbg !63
  %4 = getelementptr inbounds {}*, {}** %jlcallframe, i32 2, !dbg !63
  store {}* inttoptr (i64 139659080241552 to {}*), {}** %4, align 8, !dbg !63
  %5 = getelementptr inbounds {}*, {}** %jlcallframe, i32 3, !dbg !63
  store {}* %0, {}** %5, align 8, !dbg !63
  %6 = call nonnull {}* @jl_f_apply_type({}* null, {}** %jlcallframe, i32 4), !dbg !63
  ret void, !dbg !78
}

So basically, JuliaLang/julia#48097 got revived by JuliaLang/julia#51092. @aviatesk any thoughts?

The text was updated successfully, but these errors were encountered:

maleadt · 2024-01-16T19:40:57Z

It looks like this doesn't even need overlay methods:

using GPUCompiler

cudacall(f, types::Type, args...; kwargs...) = nothing

function outer(f)
    @inline cudacall(f, Tuple{}; stream=Ref(42), shmem=1)
    return
end

struct TestCompilerParams <: AbstractCompilerParams end

function main()
    source = methodinstance(typeof(outer), Tuple{Nothing})
    target = NativeCompilerTarget()
    params = TestCompilerParams()
    config = CompilerConfig(target, params)
    job = CompilerJob(source, config)

    interp = GPUCompiler.get_interpreter(job)

    println("Native interpreter:")
    display(Base.code_ircode(outer, Tuple{Nothing}))
    println()
    println("GPUCompiler interpreter:")
    display(Base.code_ircode(outer, Tuple{Nothing}; interp))
    return
end

Native interpreter:
1-element Vector{Any}:
7 1 ─     return nothing                                                    │
   => Nothing

GPUCompiler interpreter:
1-element Vector{Any}:
6 1 ─ %1 = invoke Base.typejoin(Int64::Any, Base.RefValue{Int64}::Any)::Anyll
  │        Core.apply_type(Base.Union, Int64, Base.RefValue{Int64}, %1)::Type
7 └──      return nothing                                  │
   => Nothing

@aviatesk Can you help me debug this? How would you approach this, and/or do you know what could be up here?

maleadt · 2024-01-17T08:57:23Z

MWE without GPUCompiler:

const CC = Core.Compiler
using Core: MethodInstance, CodeInstance, CodeInfo, MethodTable


## code instance cache

struct CodeCache
    dict::IdDict{MethodInstance,Vector{CodeInstance}}

    CodeCache() = new(IdDict{MethodInstance,Vector{CodeInstance}}())
end

function CC.setindex!(cache::CodeCache, ci::CodeInstance, mi::MethodInstance)
    cis = get!(cache.dict, mi, CodeInstance[])
    push!(cis, ci)
end


## world view of the cache

function CC.haskey(wvc::CC.WorldView{CodeCache}, mi::MethodInstance)
    CC.get(wvc, mi, nothing) !== nothing
end

function CC.get(wvc::CC.WorldView{CodeCache}, mi::MethodInstance, default)
    # check the cache
    for ci in get!(wvc.cache.dict, mi, CodeInstance[])
        if ci.min_world <= wvc.worlds.min_world && wvc.worlds.max_world <= ci.max_world
            # TODO: if (code && (code == jl_nothing || jl_ir_flag_inferred((jl_array_t*)code)))
            src = if ci.inferred isa Vector{UInt8}
                ccall(:jl_uncompress_ir, Any, (Any, Ptr{Cvoid}, Any),
                       mi.def, C_NULL, ci.inferred)
            else
                ci.inferred
            end
            return ci
        end
    end

    return default
end

function CC.getindex(wvc::CC.WorldView{CodeCache}, mi::MethodInstance)
    r = CC.get(wvc, mi, nothing)
    r === nothing && throw(KeyError(mi))
    return r::CodeInstance
end

function CC.setindex!(wvc::CC.WorldView{CodeCache}, ci::CodeInstance, mi::MethodInstance)
    src = if ci.inferred isa Vector{UInt8}
        ccall(:jl_uncompress_ir, Any, (Any, Ptr{Cvoid}, Any),
              mi.def, C_NULL, ci.inferred)
    else
        ci.inferred
    end
    CC.setindex!(wvc.cache, ci, mi)
end


## interpreter

if isdefined(CC, :CachedMethodTable)
    const ExternalMethodTableView = CC.CachedMethodTable{CC.OverlayMethodTable}
    get_method_table_view(world::UInt, mt::MethodTable) =
        CC.CachedMethodTable(CC.OverlayMethodTable(world, mt))
else
    const ExternalMethodTableView = CC.OverlayMethodTable
    get_method_table_view(world::UInt, mt::MethodTable) = CC.OverlayMethodTable(world, mt)
end

struct ExternalInterpreter <: CC.AbstractInterpreter
    world::UInt
    method_table::ExternalMethodTableView

    code_cache
    inf_cache::Vector{CC.InferenceResult}
end

function ExternalInterpreter(world::UInt=Base.get_world_counter(); method_table, code_cache)
    @assert world <= Base.get_world_counter()
    method_table = get_method_table_view(world, method_table)
    inf_cache = Vector{CC.InferenceResult}()

    return ExternalInterpreter(world, method_table, code_cache, inf_cache)
end

CC.InferenceParams(interp::ExternalInterpreter) = CC.InferenceParams()
CC.OptimizationParams(interp::ExternalInterpreter) = CC.OptimizationParams()
CC.get_world_counter(interp::ExternalInterpreter) = interp.world
CC.get_inference_cache(interp::ExternalInterpreter) = interp.inf_cache
CC.code_cache(interp::ExternalInterpreter) = CC.WorldView(interp.code_cache, interp.world)

# No need to do any locking since we're not putting our results into the runtime cache
CC.lock_mi_inference(interp::ExternalInterpreter, mi::MethodInstance) = nothing
CC.unlock_mi_inference(interp::ExternalInterpreter, mi::MethodInstance) = nothing

function CC.add_remark!(interp::ExternalInterpreter, sv::CC.InferenceState, msg)
    @debug "Inference remark during External compilation of $(sv.linfo): $msg"
end

CC.may_optimize(interp::ExternalInterpreter) = true
CC.may_compress(interp::ExternalInterpreter) = true
CC.may_discard_trees(interp::ExternalInterpreter) = true
CC.verbose_stmt_info(interp::ExternalInterpreter) = false
CC.method_table(interp::ExternalInterpreter) = interp.method_table




# main

Base.Experimental.@MethodTable(GLOBAL_METHOD_TABLE)

inner(f, types::Type, args...; kwargs...) = nothing
outer(f) = @inline inner(f, Tuple{}; foo=Ref(42), bar=1)

function main()
    println("Native:")
    display(Base.code_ircode(outer, Tuple{Nothing}))

    println()

    println("External:")
    interp = ExternalInterpreter(; method_table=GLOBAL_METHOD_TABLE, code_cache=CodeCache())
    display(Base.code_ircode(outer, Tuple{Nothing}; interp))

    return
end

isinteractive() || main()

Native:
1-element Vector{Any}:
115 1 ─     return nothing                                                  │
     => Nothing

External:
1-element Vector{Any}:
115 1 ─ %1 = invoke Base.typejoin(Int64::Any, Base.RefValue{Int64}::Any)::Any
    │        Core.apply_type(Base.Union, Int64, Base.RefValue{Int64}, %1)::Type
    └──      return nothing                                  │
     => Nothing

maleadt · 2024-01-17T09:12:23Z

Closing this in favor of JuliaLang/julia#52938

maleadt mentioned this issue Sep 6, 2023

Switch to ReTestItems.jl #505

Merged

Seelengrab mentioned this issue Sep 7, 2023

Add a function to reset the compile cache #508

Open

maleadt mentioned this issue Nov 10, 2023

inference: fix bad effects for recursion JuliaLang/julia#51092

Merged

maleadt mentioned this issue Jan 16, 2024

Fixes for nightly JuliaGPU/CUDA.jl#2240

Merged

maleadt changed the title ~~LLVM codegen regression on 1.11~~ Optimizer regression on 1.11 Jan 16, 2024

maleadt mentioned this issue Jan 17, 2024

Inference regression with external interpreters + recursive functions JuliaLang/julia#52938

Closed

maleadt closed this as completed Jan 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizer regression on 1.11 #506

Optimizer regression on 1.11 #506

maleadt commented Sep 6, 2023

maleadt commented Jan 16, 2024

maleadt commented Jan 17, 2024

maleadt commented Jan 17, 2024

Optimizer regression on 1.11 #506

Optimizer regression on 1.11 #506

Comments

maleadt commented Sep 6, 2023

maleadt commented Jan 16, 2024

maleadt commented Jan 17, 2024

maleadt commented Jan 17, 2024