Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer regression on 1.11 #506

Closed
maleadt opened this issue Sep 6, 2023 · 3 comments
Closed

Optimizer regression on 1.11 #506

maleadt opened this issue Sep 6, 2023 · 3 comments

Comments

@maleadt
Copy link
Member

maleadt commented Sep 6, 2023

Error in testset "JuliaLang/julia#48097: kwcall inference in the presence of overlay method" on worker 4022595:
Test Failed at /home/maleadt/.julia/packages/GPUCompiler/3f5LS/test/native_tests.jl:534
  Expression: !(occursin("inttoptr", ir))
   Evaluated: !(occursin("inttoptr", "define void @julia_parent_21052() local_unnamed_addr #3 {\ntop:\n  %jlcallframe = alloca {}*, i32 4, align 8\n  %0 = call fastcc nonnull {}* @julia_typejoin_21060({}* readonly inttoptr (i64 140677987481232 to {}*), {}* readonly inttoptr (i64 140677987481296 to {}*))\n  %1 = load {}*, {}** bitcast (i8* getelementptr (i8, i8* @small_typeof, i64 64) to {}**), align 8\n  %2 = getelementptr inbounds {}*, {}** %jlcallframe, i32 0\n  store {}* %1, {}** %2, align 8\n  %3 = getelementptr inbounds {}*, {}** %jlcallframe, i32 1\n  store {}* inttoptr (i64 140677987481232 to {}*), {}** %3, align 8\n  %4 = getelementptr inbounds {}*, {}** %jlcallframe, i32 2\n  store {}* inttoptr (i64 140677987481296 to {}*), {}** %4, align 8\n  %5 = getelementptr inbounds {}*, {}** %jlcallframe, i32 3\n  store {}* %0, {}** %5, align 8\n  %6 = call nonnull {}* @jl_f_apply_type({}* null, {}** %jlcallframe, i32 4)\n  ret void\n}\n"))

MWE:

using GPUCompiler, LLVM

module TestRuntime
    # dummy methods
    signal_exception() = return
    malloc(sz) = C_NULL
    report_oom(sz) = return
    report_exception(ex) = return
    report_exception_name(ex) = return
    report_exception_frame(idx, func, file, line) = return
end

struct TestCompilerParams <: AbstractCompilerParams end
GPUCompiler.runtime_module(::CompilerJob{<:Any,TestCompilerParams}) = TestRuntime

child(; kwargs...) = return
function parent()
    child(; a=1f0, b=1.0)
    return
end

Base.Experimental.@MethodTable method_table
Base.Experimental.@overlay method_table @noinline Core.throw_inexacterror(f::Symbol, ::Type{T}, val) where {T} = return

source = methodinstance(typeof(parent), Tuple{})
target = NativeCompilerTarget()
params = TestCompilerParams()
config = CompilerConfig(target, params; kernel=false)
job = CompilerJob(source, config)

JuliaContext() do ctx
    ir, meta = GPUCompiler.compile(:llvm, job; validate=false)
    for f in functions(ir)
        if startswith(LLVM.name(f), "julia_parent")
            println(string(f))
        end
    end
end

Before JuliaLang/julia#51092:

define void @julia_parent_3932() local_unnamed_addr #0 !dbg !4 {
top:
  ret void, !dbg !8
}

After:

define void @julia_parent_12207() local_unnamed_addr #3 !dbg !36 {
top:
  %jlcallframe = alloca {}*, i32 4, align 8
  %0 = call fastcc nonnull {}* @julia_typejoin_12215({}* readonly inttoptr (i64 139659080241488 to {}*), {}* readonly inttoptr (i64 139659080241552 to {}*)), !dbg !40
  %1 = load {}*, {}** bitcast (i8* getelementptr (i8, i8* @small_typeof, i64 64) to {}**), align 8, !dbg !63, !tbaa !64, !alias.scope !68, !noalias !71, !nonnull !39, !dereferenceable !76, !align !77
  %2 = getelementptr inbounds {}*, {}** %jlcallframe, i32 0, !dbg !63
  store {}* %1, {}** %2, align 8, !dbg !63
  %3 = getelementptr inbounds {}*, {}** %jlcallframe, i32 1, !dbg !63
  store {}* inttoptr (i64 139659080241488 to {}*), {}** %3, align 8, !dbg !63
  %4 = getelementptr inbounds {}*, {}** %jlcallframe, i32 2, !dbg !63
  store {}* inttoptr (i64 139659080241552 to {}*), {}** %4, align 8, !dbg !63
  %5 = getelementptr inbounds {}*, {}** %jlcallframe, i32 3, !dbg !63
  store {}* %0, {}** %5, align 8, !dbg !63
  %6 = call nonnull {}* @jl_f_apply_type({}* null, {}** %jlcallframe, i32 4), !dbg !63
  ret void, !dbg !78
}

So basically, JuliaLang/julia#48097 got revived by JuliaLang/julia#51092. @aviatesk any thoughts?

@maleadt
Copy link
Member Author

maleadt commented Jan 16, 2024

It looks like this doesn't even need overlay methods:

using GPUCompiler

cudacall(f, types::Type, args...; kwargs...) = nothing

function outer(f)
    @inline cudacall(f, Tuple{}; stream=Ref(42), shmem=1)
    return
end

struct TestCompilerParams <: AbstractCompilerParams end

function main()
    source = methodinstance(typeof(outer), Tuple{Nothing})
    target = NativeCompilerTarget()
    params = TestCompilerParams()
    config = CompilerConfig(target, params)
    job = CompilerJob(source, config)

    interp = GPUCompiler.get_interpreter(job)

    println("Native interpreter:")
    display(Base.code_ircode(outer, Tuple{Nothing}))
    println()
    println("GPUCompiler interpreter:")
    display(Base.code_ircode(outer, Tuple{Nothing}; interp))
    return
end
Native interpreter:
1-element Vector{Any}:
7 1 ─     return nothing                                                    │
   => Nothing

GPUCompiler interpreter:
1-element Vector{Any}:
6 1 ─ %1 = invoke Base.typejoin(Int64::Any, Base.RefValue{Int64}::Any)::Anyll
  │        Core.apply_type(Base.Union, Int64, Base.RefValue{Int64}, %1)::Type
7 └──      return nothing                                  │
   => Nothing

@aviatesk Can you help me debug this? How would you approach this, and/or do you know what could be up here?

@maleadt maleadt changed the title LLVM codegen regression on 1.11 Optimizer regression on 1.11 Jan 16, 2024
@maleadt
Copy link
Member Author

maleadt commented Jan 17, 2024

MWE without GPUCompiler:

const CC = Core.Compiler
using Core: MethodInstance, CodeInstance, CodeInfo, MethodTable


## code instance cache

struct CodeCache
    dict::IdDict{MethodInstance,Vector{CodeInstance}}

    CodeCache() = new(IdDict{MethodInstance,Vector{CodeInstance}}())
end

function CC.setindex!(cache::CodeCache, ci::CodeInstance, mi::MethodInstance)
    cis = get!(cache.dict, mi, CodeInstance[])
    push!(cis, ci)
end


## world view of the cache

function CC.haskey(wvc::CC.WorldView{CodeCache}, mi::MethodInstance)
    CC.get(wvc, mi, nothing) !== nothing
end

function CC.get(wvc::CC.WorldView{CodeCache}, mi::MethodInstance, default)
    # check the cache
    for ci in get!(wvc.cache.dict, mi, CodeInstance[])
        if ci.min_world <= wvc.worlds.min_world && wvc.worlds.max_world <= ci.max_world
            # TODO: if (code && (code == jl_nothing || jl_ir_flag_inferred((jl_array_t*)code)))
            src = if ci.inferred isa Vector{UInt8}
                ccall(:jl_uncompress_ir, Any, (Any, Ptr{Cvoid}, Any),
                       mi.def, C_NULL, ci.inferred)
            else
                ci.inferred
            end
            return ci
        end
    end

    return default
end

function CC.getindex(wvc::CC.WorldView{CodeCache}, mi::MethodInstance)
    r = CC.get(wvc, mi, nothing)
    r === nothing && throw(KeyError(mi))
    return r::CodeInstance
end

function CC.setindex!(wvc::CC.WorldView{CodeCache}, ci::CodeInstance, mi::MethodInstance)
    src = if ci.inferred isa Vector{UInt8}
        ccall(:jl_uncompress_ir, Any, (Any, Ptr{Cvoid}, Any),
              mi.def, C_NULL, ci.inferred)
    else
        ci.inferred
    end
    CC.setindex!(wvc.cache, ci, mi)
end


## interpreter

if isdefined(CC, :CachedMethodTable)
    const ExternalMethodTableView = CC.CachedMethodTable{CC.OverlayMethodTable}
    get_method_table_view(world::UInt, mt::MethodTable) =
        CC.CachedMethodTable(CC.OverlayMethodTable(world, mt))
else
    const ExternalMethodTableView = CC.OverlayMethodTable
    get_method_table_view(world::UInt, mt::MethodTable) = CC.OverlayMethodTable(world, mt)
end

struct ExternalInterpreter <: CC.AbstractInterpreter
    world::UInt
    method_table::ExternalMethodTableView

    code_cache
    inf_cache::Vector{CC.InferenceResult}
end

function ExternalInterpreter(world::UInt=Base.get_world_counter(); method_table, code_cache)
    @assert world <= Base.get_world_counter()
    method_table = get_method_table_view(world, method_table)
    inf_cache = Vector{CC.InferenceResult}()

    return ExternalInterpreter(world, method_table, code_cache, inf_cache)
end

CC.InferenceParams(interp::ExternalInterpreter) = CC.InferenceParams()
CC.OptimizationParams(interp::ExternalInterpreter) = CC.OptimizationParams()
CC.get_world_counter(interp::ExternalInterpreter) = interp.world
CC.get_inference_cache(interp::ExternalInterpreter) = interp.inf_cache
CC.code_cache(interp::ExternalInterpreter) = CC.WorldView(interp.code_cache, interp.world)

# No need to do any locking since we're not putting our results into the runtime cache
CC.lock_mi_inference(interp::ExternalInterpreter, mi::MethodInstance) = nothing
CC.unlock_mi_inference(interp::ExternalInterpreter, mi::MethodInstance) = nothing

function CC.add_remark!(interp::ExternalInterpreter, sv::CC.InferenceState, msg)
    @debug "Inference remark during External compilation of $(sv.linfo): $msg"
end

CC.may_optimize(interp::ExternalInterpreter) = true
CC.may_compress(interp::ExternalInterpreter) = true
CC.may_discard_trees(interp::ExternalInterpreter) = true
CC.verbose_stmt_info(interp::ExternalInterpreter) = false
CC.method_table(interp::ExternalInterpreter) = interp.method_table




# main

Base.Experimental.@MethodTable(GLOBAL_METHOD_TABLE)

inner(f, types::Type, args...; kwargs...) = nothing
outer(f) = @inline inner(f, Tuple{}; foo=Ref(42), bar=1)

function main()
    println("Native:")
    display(Base.code_ircode(outer, Tuple{Nothing}))

    println()

    println("External:")
    interp = ExternalInterpreter(; method_table=GLOBAL_METHOD_TABLE, code_cache=CodeCache())
    display(Base.code_ircode(outer, Tuple{Nothing}; interp))

    return
end

isinteractive() || main()
Native:
1-element Vector{Any}:
115 1 ─     return nothing                                                  │
     => Nothing

External:
1-element Vector{Any}:
115 1 ─ %1 = invoke Base.typejoin(Int64::Any, Base.RefValue{Int64}::Any)::Any
    │        Core.apply_type(Base.Union, Int64, Base.RefValue{Int64}, %1)::Type
    └──      return nothing                                  │
     => Nothing

@maleadt
Copy link
Member Author

maleadt commented Jan 17, 2024

Closing this in favor of JuliaLang/julia#52938

@maleadt maleadt closed this as completed Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant