Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid allocations in RaiseEvent when there's no listeners #15580

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

MrJul
Copy link
Member

@MrJul MrJul commented May 1, 2024

What does the pull request do?

This PR adds new Interactive.RaiseEvent() overloads which avoid creating an event route and the corresponding event arguments if there are no listeners for a given RoutedEvent.

Note that while there's already an existing BuildEventRoute() method to avoid allocating the arguments when there isn't any listener, it doesn't prevent creating the route itself.

In addition, EventRoute.HasHandlers now returns true if there are no handlers inside the route but RoutedFinished handlers exist.

What is the current behavior?

Each time a RoutedEvent is raised, an EventRoute and a RoutedEventArgs (or derived) objects are always created.

What is the updated/expected behavior with this PR?

RoutedEventArgs are created only if necessary (when there's at least one listener).
EventRoute are never created anymore.

How was the solution implemented (if it's not obvious)?

A new LightweightEventRoute is introduced, which has the same behavior as an EventRoute, but is a struct, to avoid heap allocations.
This new type has RaiseEvent() methods taking an arguments factory alongside a context/state. The factory is only called if there are listeners for the event.
The expected usage is to have a static factory to avoid allocating an extra closure on each call (which would defeat the purpose of the method).

Since mutable structures are prone to errors (they must always be passed by reference to work correctly, which is hard to enforce in public API), LightweightEventRoute is internal and isn't exposed directly.
Instead, new public Interactive.RaiseEvent() overloads use the LightweightEventRoute.

The public EventRoute implementation now delegates its logic to LightweightEventRoute.

Notes

The first commit is the most interesting one as it implements the new RaiseEvent() overloads and the LightweightEventRoute type.
The other commits are using the new RaiseEvent() overloads where it's possible.

We have several places where it isn't possible to remove the arguments allocation currently without breaking changes.

For example, there's Control.OnLoaded/OnUnloaded/OnSizeChanged which takes the arguments directly, added in #11828. While there are good arguments exposed in this PR for this change, I think we should also optimize for the 99% case. There are usually hundreds of controls in a given visual tree, and only a few (if any) with a Loaded handler.

That said, we have tons of inconsistency here in the codebase. Most routed events don't have a matching virtual OnXxx() method. Some have. Some of these methods have a RoutedEventArgs parameter. Some don't. We should really unify the behavior here in v12, document it, and stick to it (while still avoiding unnecessary allocations when possible). I plan to open a proposal to discuss that.

@MrJul MrJul added area-perf api API addition labels May 1, 2024
@avaloniaui-bot
Copy link

You can test this PR using the following package version. 11.2.999-cibuild0047982-alpha. (feed url: https://nuget-feed-all.avaloniaui.net/v3/index.json) [PRBUILDID]

@cla-avalonia
Copy link
Collaborator

cla-avalonia commented May 1, 2024

  • All contributors have signed the CLA.

@MrJul
Copy link
Member Author

MrJul commented May 1, 2024

@cla-avalonia agree

return typedInputElement.RaiseEvent(routedEvent);
}

var eventArgs = new RoutedEventArgs(routedEvent);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it make sense to pass a factory here too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would require adding all the new RaiseEvent overloads to the IInputElement interface, which I wasn't comfortable with, as I don't think it should be that specific.

Since IInputElement isn't client implementable, we should never get to that fallback anyways.

/// The created <see cref="RoutedEventArgs"/> used to raise the event,
/// or null if the event wasn't raised because it didn't have any listeners.
/// </returns>
public RoutedEventArgs? RaiseEvent<TContext>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if passing the context by value combined with a delegate call is less expensive than dealing with a single allocation

Copy link
Member Author

@MrJul MrJul May 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's optimizing for the case where there are no listeners, where it's a clear win.

For the case where there are listeners, passing the state and calling the delegate is a little slower than a direct call, as you'd expect.

Note that the delegate+state pattern appears in .NET itself, for example in ILogger.Log<TState>() or String.Create<TState>().

Benchmarks

SmallEvent

Method Mean Error StdDev Gen0 Allocated
Standard_No_Listeners 22.091 ns 0.1672 ns 0.1564 ns 0.0024 40 B
Factory_No_Listeners 9.617 ns 0.0545 ns 0.0510 ns - -
Standard_With_Listeners 100.925 ns 0.8585 ns 0.8031 ns 0.0057 96 B
Factory_With_Listeners 111.128 ns 0.6346 ns 0.5936 ns 0.0057 96 B

LargeEvent:

Method Mean Error StdDev Gen0 Allocated
Standard_No_Listeners 39.63 ns 0.768 ns 0.854 ns 0.0076 128 B
Factory_No_Listeners 21.99 ns 0.115 ns 0.108 ns - -
Standard_With_Listeners 118.76 ns 1.438 ns 1.345 ns 0.0110 184 B
Factory_With_Listeners 126.16 ns 0.411 ns 0.343 ns 0.0110 184 B
Benchmark code
#nullable enable

using Avalonia.Controls;
using Avalonia.Input;
using Avalonia.Interactivity;
using BenchmarkDotNet.Attributes;

namespace Avalonia.Benchmarks;

[MemoryDiagnoser]
public class RaiseEvent_SmallEvent_Benchmarks
{
    private static readonly RoutedEvent<RoutedEventArgs> _smallEvent =
        RoutedEvent.Register<Button, RoutedEventArgs>("SomeEvent", RoutingStrategies.Bubble);

    private readonly Button _button = CreateButton(10);

    internal static Button CreateButton(int depth)
    {
        var button = new Button();

        for (int i = 0; i < depth; i++)
        {
            var child = new Button();
            button.Content = child;
            button = child;
        }

        return button;
    }

    [GlobalSetup(Targets = [nameof(Standard_With_Listeners), nameof(Factory_With_Listeners)])]
    public void SetupListeners()
        => _button.AddHandler(_smallEvent, (_, _) => { });

    [Benchmark]
    public void Standard_No_Listeners()
        => Standard();

    [Benchmark]
    public void Factory_No_Listeners()
        => Factory();

    [Benchmark]
    public void Standard_With_Listeners()
        => Standard();

    [Benchmark]
    public void Factory_With_Listeners()
        => Factory();

    private void Standard()
        => _button.RaiseEvent(new RoutedEventArgs(_smallEvent));

    private void Factory()
        => _button.RaiseEvent(_smallEvent);
}

[MemoryDiagnoser]
public class RaiseEvent_LargeEvent_Benchmarks
{
    private static readonly RoutedEvent<PointerEventArgs> _largeEvent =
        RoutedEvent.Register<Button, PointerEventArgs>("LargeEvent", RoutingStrategies.Bubble);

    private readonly Button _button = RaiseEvent_SmallEvent_Benchmarks.CreateButton(10);
    private readonly IPointer _pointer = new Pointer(1, PointerType.Mouse, true);

    [GlobalSetup(Targets = [nameof(Standard_With_Listeners), nameof(Factory_With_Listeners)])]
    public void SetupListeners()
        => _button.AddHandler(_largeEvent, (_, _) => { });

    [Benchmark]
    public void Standard_No_Listeners()
        => Standard();

    [Benchmark]
    public void Factory_No_Listeners()
        => Factory();

    [Benchmark]
    public void Standard_With_Listeners()
        => Standard();

    [Benchmark]
    public void Factory_With_Listeners()
        => Factory();

    private void Standard()
        => _button.RaiseEvent(new PointerEventArgs(
            _largeEvent,
            _button,
            _pointer,
            _button,
            new Point(123.0, 456.0),
            9876543210UL,
            new PointerPointProperties(RawInputModifiers.Control, PointerUpdateKind.LeftButtonPressed),
            KeyModifiers.Control));

    private void Factory()
        => _button.RaiseEvent(
            _largeEvent,
            static (e, ctx) => new PointerEventArgs(e, ctx.source, ctx.pointer, ctx.rootVisual, ctx.rootVisualPosition, ctx.timestamp, ctx.properties, ctx.modifiers),
            (source: _button,
                pointer: _pointer,
                rootVisual: _button,
                rootVisualPosition: new Point(123.0, 456.0),
                timestamp: 9876543210UL,
                properties: new PointerPointProperties(RawInputModifiers.Control, PointerUpdateKind.LeftButtonPressed),
                modifiers: KeyModifiers.Control));
}

So it's about 6-10% slower when there's listeners, and twice as fast when there's none.
Note that the handler does nothing in these benchmarks - real code will probably have heavier handlers.

If the overhead for listeners isn't acceptable, we can avoid the delegate call and context and use the LightweightEventRoute directly in the solution - it's simply less convenient to write than then RaiseEvent() handlers.

Copy link
Contributor

@TomEdwardsEnscape TomEdwardsEnscape left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the small performance drop identified above, the static delegate syntax is hard to use correctly.

Allocations could also be avoided with a null check, using syntax similar to this:

myControl.RaiseEvent(InputElement.TextInputEvent)?.With(new TextInputEventArgs() { Text = "foo" });

The return value of this RaiseEvent overload should be a nullable struct. If it's null, the event args object won't be constructed.

Perhaps a new name is needed for this design, since "RaiseEvent" won't actually raise the event.

@maxkatz6
Copy link
Member

maxkatz6 commented May 2, 2024

myControl.RaiseEvent(InputElement.TextInputEvent)?.With(new TextInputEventArgs() { Text = "foo" });

It reminds me event?.Invoke(sender, args) C# syntax. So, similar naming could be used here.

But I don't have specific preferences here, since old (current) syntax is still available and likely to be the most common to use.

@maxkatz6
Copy link
Member

maxkatz6 commented May 2, 2024

Aside from that, simple dev analyzer can be added to help with finding usages of slower RaiseEvent version.

@MrJul
Copy link
Member Author

MrJul commented May 2, 2024

Allocations could also be avoided with a null check, using syntax similar to this:

myControl.RaiseEvent(InputElement.TextInputEvent)?.With(new TextInputEventArgs() { Text = "foo" });

The return value of this RaiseEvent overload should be a nullable struct. If it's null, the event args object won't be constructed.

While I very much like the simpler syntax you're proposing, it comes with new caveats.
The event route has a PooledList which should be disposed to return the underlying array to the pool.

The struct could implement IDisposable. The syntax isn't as nice, but still acceptable imo:

using var route = myControl.TryBuildEventRoute(Event); // final method name to be defined
route?.RaiseEvent(new RoutedEventArgs(...));

Or we can dispose the list on raise:

myControl.TryBuildEventRoute(Event)?.Raise(new RoutedEventArgs(...));

Forgetting to call Raise wouldn't dispose the list, which is minor as it doesn't impact correctness, only performance.
However, calling Raise twice on the same struct won't do anything the second time, which isn't expected.

Personally, I prefer the first one, with the using, as it's still straightforward even though it isn't a one-liner.

@TomEdwardsEnscape
Copy link
Contributor

Both are OK, but I would prefer option B because I think it will be unusual for anyone to try to store the returned object and reuse it, especially if we choose a good name for the methods. "EventRouteBuilder", then "Complete"?

If someone does try to re-use the same struct, an exception can be thrown.

@MrJul
Copy link
Member Author

MrJul commented May 2, 2024

For the sake of discussion, I want to add that option A is a similar to what we have today with the existing BuildEventRoute:

using var route = BuildEventRoute();
route.RaiseEvent(args);

@MrJul
Copy link
Member Author

MrJul commented May 2, 2024

If someone does try to re-use the same struct, an exception can be thrown.

That works, but is a bit surprising.

That being said, I agree it should be unusual to store the returned object, and proper documentation comments should hopefully prevent incorrect usage. Outside of the Avalonia solution, only people caring about these extra allocations will use the new method anyway.

@robloo
Copy link
Contributor

robloo commented May 2, 2024

So unless this is a major bottleneck we might be going too far on the optimizations now. It's always a balance between optimization <-> maintainability and this seems error prone and not intuitive compared to what we've been doing.

@MrJul
Copy link
Member Author

MrJul commented May 2, 2024

So unless this is a major bottleneck we might be going too far on the optimizations now. It's always a balance between optimization <-> maintainability and this seems error prone and not intuitive compared to what we've been doing.

Yes, it's always a balance, and a difficult one. That's why .NET, especially since Core, has multiple levels of API. For example you can go as high as File.ReadAllBytes() or as low as RandomAccess.Read, while passing by a good middleground with FileStream. Each lower level API is more difficult to use, but more performant. It's the same thing here: I don't propose removing the simple RaiseEvent(args) that's been everywhere! We just need an alternative. (This is my personal opinion and other team members may think differently. Let's discuss!)

While it's not a major bottleneck, allocating is a death by a thousand paper cuts, as I like to remind each time. I don't mind allocations when they are useful, and kept to a minimum if possible (while still having maintainable code). In the case of this PR, launch any Avalonia application, and observe that 95% of raised events on startup have no listeners. Yes, that's a real number.

For example, should a StackPanel raise a SnapPointsChanged event each time its measured, when that event is only for stackpanel-as-items-container? Personally, I think it's a waste that is easily removed, even if it isn't a bottleneck. That event was what prompted me to start this work actually, as I really can't stand code that caters for the 1% case while still running for everyone. No, alone it's not a big deal. Do that for every component, and in a few years you've got a terrible unoptimized mess and people complaining that the framework is slow.

In my opinion, everything should be "pay to use" as much as possible. You're not using that feature? It's free, no extra allocations and minimum CPU time wasted. If possible, it should even be completely trimmed away when using AOT compilation (not applicable to this PR).

As someone looking at memory snapshots very frequently, it's quite hard to analyze and improve things when the majority of the allocations are noise coming from libraries out of your control. I ideally want a "clean state" here as much as possible. Avalonia should be efficient, and never be a bottleneck. I'm very committed to improve things regarding memory allocations and CPU usage in the coming months.

My rant aside, if the proposed API isn't ideal - which I agree it isn't - we can go back to simply exposing LightweightEventRoute, which was my very first iteration of this problem. The API will be similar to the existing EventRoute, minus the mutation methods to avoid problems with mutable structs. If that's still too error prone, we can make it internal (but I believe we should expose performant methods as much as convenient ones and let the user choose depending on its needs).

@jp2masa
Copy link
Contributor

jp2masa commented May 2, 2024

Option B could allow reusing the route with Raise vs RaiseAndComplete or Raise(..., bool complete = true), for example. As this is a lower-level API, the increased complexity is acceptable, I think.

@robloo
Copy link
Contributor

robloo commented May 3, 2024

@MrJul I agree with basically everything you said. My only concern is the API is a bit ugly and non-standard (unwieldy?) for this type of thing 😄 I would prioritize a cleaner API in this area to avoid errors rather than micro optimizations.

But like I said, I agree with you that it makes sense to have different levels of API. If we don't expose this API to app developers (keep it internal) I am even more OK with this. It allows us to improve it in the future if a better idea comes along without committing to it as a public API.

I can live with the PR as is but if I was a gatekeeper I would still want an improved API. I won't hold up the discussion any more on this unless I think of a better API myself.

@MrJul MrJul marked this pull request as draft May 3, 2024 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api API addition area-perf
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants