rfc: introduce max partition policy to oneDNN Graph API #1694

ElaineBao · 2023-08-02T00:58:06Z

Description

This RFC is to introduce max partition policy to oneDNN Graph API.

Rendered version: link

rfcs/20230801-max-partition-policy/README.md

chuanqi129 · 2023-08-02T07:52:52Z

rfcs/20230801-max-partition-policy/README.md

+- If not specified, the default partition policy remains to be fusion policy.
+- Max policy returns partitions max in partition size. The criteria of partitioning
+  is based on pre-defined ops list and pre-defined rules. Connected ops
+  that meet the conditions of pre-defined ops list and pre-defined rules


I think this sentence may be ambiguous, meet the condistions of pre-defined ops list and don't hit the pre-defined excluding rules will be selected into one partition ?

meet the conditions of pre-defined ops list and pre-defined rules will be selected into one partition
means both pre-defined ops list and pre-defined rules need to be met to be selected into one partition, violate either will not be selected into one partition.

Yes, but i remember the rules are negative list. If op met one of rules, the op will be remove from the partition

Oh I got your point. But since the pre-defined rules are backend specific, we cannot say it's always a negative list. So here I just keep general words.

Will the pre-defined ops list and pre-defined rules be expose through API or API documentation? If not, how does the user know when to use the max policy? Or are you suggesting users to always specify max policy for their models?

Will the pre-defined ops list and pre-defined rules be expose through API or API documentation?

Fusion policy doesn't expose fusion patterns through API, so I think max policy follows similar strategy. For document, maybe we can provide a page like pre-defined fusion patterns, if it helps.

If not, how does the user know when to use the max policy? Or are you suggesting users to always specify max policy for their models?

I feel this discussion can be merged with https://github.com/oneapi-src/oneDNN/pull/1694/files#r1283768894

rfcs/20230801-max-partition-policy/README.md

igorsafo · 2023-08-03T22:12:16Z

rfcs/20230801-max-partition-policy/README.md

+The expected behaviors when using max policy:
+
+- If not specified, the default partition policy remains to be fusion policy.
+- Max policy returns partitions max in partition size. The criteria of partitioning


What is motivation for differentiation between max and fusion? How user can decide which one should be used (not for a single model, but for overall integration)? Maybe "max" should be renamed to something like "experimental" ("experimental" is really bad word, we already have semantics for experimental features) to indicate that user shouldn't use it until they know it brings performance for their particular case? Another option to call it "all" if the intend is to take all graph and merge it into a single partition.

What is motivation for differentiation between max and fusion?

From my understanding, fusion defines fixed number of patterns, while max's optimizations are not fixed, it will generate different optimization for different models based on model architecture, which might be good for OOB models.

How user can decide which one should be used (not for a single model, but for overall integration)?

It depends on how much control user wants to give to the library. Compared with fusion, max gives the library more control to optimize the graph.
But I can understand user wants to get better performance when they give the library more control, which is not guaranteed for max policy since we are not able to test all the models in the world (Please correct me if I'm wrong @ZhennanQin @niuxiaog).

Maybe "max" should be renamed to something like "experimental" ("experimental" is really bad word, we already have semantics for experimental features) to indicate that user shouldn't use it until they know it brings performance for their particular case? Another option to call it "all" if the intend is to take all graph and merge it into a single partition.

all may not be true because take all graph and merge it into a single partition is not guaranteed. experimental sounds a doable option, what do you think? @ZhennanQin @TaoLv

max policy is designed to support direct programming with oneDNN graph API to implement custom ops or custom fusion patterns. For example, end-user can build RMS_norm with small ops and pass it to onednn graph with max policy, and compiler backend can generate a fused kernel for this. This is frequent request for newly introduced activation or normalization. I think experimental is not a good name since it can mean anything and doesn't reflect actual behavior. But we can use other term than max to make it more clear.

TaoLv · 2023-08-07T05:15:35Z

rfcs/20230801-max-partition-policy/README.md

+- If not specified, the default partition policy remains to be fusion policy.
+- Max policy returns partitions max in partition size. The criteria of partitioning
+  is based on pre-defined ops list and pre-defined rules. Connected ops
+  that meet the conditions of pre-defined ops list and pre-defined rules


Will the pre-defined ops list and pre-defined rules be expose through API or API documentation? If not, how does the user know when to use the max policy? Or are you suggesting users to always specify max policy for their models?

rfcs/20230801-max-partition-policy/README.md

yimeisun123 · 2023-08-08T23:53:55Z

In the current oneDNN graph implementation, my understanding is that the graph compiler backend already provides additional fusions comparing to DNNL backend. Will specifying "max" policy bring in even more fusions in the graph compiler backend? What happens if we are using graph compiler backend without specifying max policy?
It would be good to know how you decide the op list and policy.
what would be the case that specifying max policy may have negative impact?

niuxiaog · 2023-08-09T00:58:37Z

In the current oneDNN graph implementation, my understanding is that the graph compiler backend already provides additional fusions comparing to DNNL backend. Will specifying "max" policy bring in even more fusions in the graph compiler backend? What happens if we are using graph compiler backend without specifying max policy?

It would be good to know how you decide the op list and policy.

what would be the case that specifying max policy may have negative impact?

Yes, specifying "max" policy will bring in even more fusions in the graph compiler backend. If we are using graph compiler backend without specifying max policy, the default fusion policy will be used.
The op list is roughly the supported ops list of graph compiler.
Models in the model zoo will get performance improvement by max policy; For untested models, there may be some regression.

TaoLv · 2023-08-10T02:47:22Z

@niuxiaog

The op list is roughly the supported ops list of graph compiler.

That's internal. Users don't know which ops are supported by graph compiler from API.

Models in the model zoo will get performance improvement by max policy; For untested models, there may be some regression.

I believe framework integration is for general usage. How can frameworks or framework users know which model is from model zoo and which is not? @chunyuan-w @sanchitintel @yucai-intel @retonym @wenzhe-nrv Please share how you plan to integrate and expose this feature. Thanks.

vpirogov · 2023-08-11T20:17:07Z

rfcs/20230801-max-partition-policy/README.md

+
+The expected behaviors when using max policy:
+
+- Max policy relies on predefined ops list to do partitioning. Connected ops


Choice between max and fusion policy falls on the shoulders of the framework user. How this is expected to be exposed in Pytorch and Tensorflow?

How users are supposed to decide whether use fusion or max? The explanation here talks about implementation details mainly, but what are the consequences visible to framework user?

My understanding is,

Max policy is not expected to be exposed by framework, instead, it's used to support framework developer to implement their custom fusion kernels.

Fusion policy is the recommended way to use onednn graph automatically, as we will do graph partition according to pre-defined fusion pattern and deliver good performance. While max policy is to support direct programming with onednn graph API, and framework developers need to program the graph they want to fuse and pass it to onednn graph with max policy, just like how they do C program and compile it with GCC.

max policy can be used in below scenarios:

The default fusion policy doesn't meet framework's requirement and developer wants to change the fusion behavior, for example, for a 3-layer MLP, fusion policy will take them all, but if the developer only wants oneDNN graph to take the first two layers, and leave the last layer to be un-partitioned, then they can program with max policy and define a graph only contains first two layers.

Define new fusion patterns. For example, if framework developer want to implement RMSNorm op, they can build a graph with small ops and pass it to oneDNN graph with max policy, then a fused kernel will be returned.

In general, max policy is not designed for framework users, but for framework developers.

My observation is that framework side also like to develop patterns and sometime they use custom op to support these patterns like flash_attention, sparse computation, int4, and etc. oneDNN Graph 's max policy would help framework develop to use oneDNN Graph instead of writing their own custom op. This gives them a chance to use oneDNN Graph and compare with their custom op, if oneDNN Graph does better, than they just call oneDNN Graph with max policy.

@Jianhui-Li, adding to your point (custom ops vs. patterns created by max partitioning), I feel like there's another aspect as well - a pattern added manually by backend teams may sometimes perform better than the pattern automatically chosen by the max partitioning approach, and sometimes, the opposite may be true.

So at runtime, should a framework (transparently to framework users) benchmark both approaches, and eventually use the approach that delivers better performance? In case a framework can't do this sort of benchmarking, then the framework may have to expose an API to the users to manually check if max partitioning approach (if it's non-default) may deliver better performance.

vpirogov · 2023-08-11T20:19:37Z

rfcs/20230801-max-partition-policy/README.md

+   For example, DNNL backend will not support max partition policy in oneDNN v3.3
+   release, so when running with a combination of max partition policy and DNNL
+   backend, max partition policy will fallback to fusion partition policy.
+1. The validation for max partition policy will not be landed in oneDNN v3.3


We are not landing features without validation in oneDNN.

Sorry, the wording needs to be more precise, I'll refactor.

We have limited validation for max partition policy in v3.3 release:

Basic functionality test and correctness check will be covered by gtests.

Due to validation resource limitation, benchdnn graph plans to provide performance test in v3.3 as a stretch goal if the feature and test cases ready. (@chuanqi129 @thuang6 for more details)

@vpirogov Do you think this validation scope looks good to you?

sanchitintel · 2023-08-14T01:42:40Z

Hi @ElaineBao, can framework developers tell if custom patterns that don't already exist with fusion policy (but can be added by a backend as predefined patterns) would perform better or worse than max policy?

In other words, would max policy enable better OOB support by not having to request backend teams to add specific pre-defined patterns for new OOB models that would be encountered in the future? Are there any heuristics that can be used to determine if custom pre-defined patterns would perform better or worse than the patterns generated by max policy? Thanks!

ElaineBao · 2023-08-14T02:50:15Z

Hi, @sanchitintel, I guess you are asking how a FW could determine when to use max and when to use fusion policy.

The general rule is that the default recommended policy is still fusion policy. But for a specific workload that have customed patterns that have not been covered in fusion policy, FW can try to use max policy to see if it achieves better performance.

rfc: introduce max partition policy to oneDNN Graph API

6081577

niuxiaog reviewed Aug 2, 2023

View reviewed changes

rfcs/20230801-max-partition-policy/README.md Outdated Show resolved Hide resolved

niuxiaog approved these changes Aug 2, 2023

View reviewed changes

chuanqi129 approved these changes Aug 2, 2023

View reviewed changes

igorsafo reviewed Aug 3, 2023

View reviewed changes

TaoLv reviewed Aug 7, 2023

View reviewed changes

ElaineBao added 3 commits August 8, 2023 09:46

clarify motivation of max policy

7476d12

add a section about validation

2110d3a

clarify fallback of max policy

1dd516f

yucai-intel approved these changes Aug 10, 2023

View reviewed changes

clarify proposal of max policy

521ee7b

TaoLv added the RFC A design document label Aug 10, 2023

vpirogov reviewed Aug 11, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc: introduce max partition policy to oneDNN Graph API #1694

rfc: introduce max partition policy to oneDNN Graph API #1694

ElaineBao commented Aug 2, 2023

chuanqi129 Aug 2, 2023

ElaineBao Aug 2, 2023

chuanqi129 Aug 3, 2023

ElaineBao Aug 4, 2023

TaoLv Aug 7, 2023

ElaineBao Aug 7, 2023

igorsafo Aug 3, 2023 •

edited

ElaineBao Aug 4, 2023

ZhennanQin Aug 4, 2023

TaoLv Aug 7, 2023

yimeisun123 commented Aug 8, 2023

niuxiaog commented Aug 9, 2023

TaoLv commented Aug 10, 2023

vpirogov Aug 11, 2023

ZhennanQin Aug 12, 2023

Jianhui-Li Aug 17, 2023

sanchitintel Aug 17, 2023

vpirogov Aug 11, 2023

ElaineBao Aug 14, 2023

sanchitintel commented Aug 14, 2023 •

edited

ElaineBao commented Aug 14, 2023


		The expected behaviors when using max policy:

		- Max policy relies on predefined ops list to do partitioning. Connected ops

rfc: introduce max partition policy to oneDNN Graph API #1694

Are you sure you want to change the base?

rfc: introduce max partition policy to oneDNN Graph API #1694

Conversation

ElaineBao commented Aug 2, 2023

Description

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

igorsafo Aug 3, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yimeisun123 commented Aug 8, 2023

niuxiaog commented Aug 9, 2023

TaoLv commented Aug 10, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanchitintel commented Aug 14, 2023 • edited

ElaineBao commented Aug 14, 2023

igorsafo Aug 3, 2023 •

edited

sanchitintel commented Aug 14, 2023 •

edited