Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rfc: introduce max partition policy to oneDNN Graph API #1694

Open
wants to merge 5 commits into
base: rfcs
Choose a base branch
from

Conversation

ElaineBao
Copy link
Contributor

Description

This RFC is to introduce max partition policy to oneDNN Graph API.

Rendered version: link

- If not specified, the default partition policy remains to be fusion policy.
- Max policy returns partitions max in partition size. The criteria of partitioning
is based on pre-defined ops list and pre-defined rules. Connected ops
that meet the conditions of pre-defined ops list and pre-defined rules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this sentence may be ambiguous, meet the condistions of pre-defined ops list and don't hit the pre-defined excluding rules will be selected into one partition ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meet the conditions of pre-defined ops list and pre-defined rules will be selected into one partition
means both pre-defined ops list and pre-defined rules need to be met to be selected into one partition, violate either will not be selected into one partition.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but i remember the rules are negative list. If op met one of rules, the op will be remove from the partition

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I got your point. But since the pre-defined rules are backend specific, we cannot say it's always a negative list. So here I just keep general words.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the pre-defined ops list and pre-defined rules be expose through API or API documentation? If not, how does the user know when to use the max policy? Or are you suggesting users to always specify max policy for their models?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the pre-defined ops list and pre-defined rules be expose through API or API documentation?

Fusion policy doesn't expose fusion patterns through API, so I think max policy follows similar strategy. For document, maybe we can provide a page like pre-defined fusion patterns, if it helps.

If not, how does the user know when to use the max policy? Or are you suggesting users to always specify max policy for their models?

I feel this discussion can be merged with https://github.com/oneapi-src/oneDNN/pull/1694/files#r1283768894

rfcs/20230801-max-partition-policy/README.md Show resolved Hide resolved
The expected behaviors when using max policy:

- If not specified, the default partition policy remains to be fusion policy.
- Max policy returns partitions max in partition size. The criteria of partitioning
Copy link
Contributor

@igorsafo igorsafo Aug 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is motivation for differentiation between max and fusion? How user can decide which one should be used (not for a single model, but for overall integration)? Maybe "max" should be renamed to something like "experimental" ("experimental" is really bad word, we already have semantics for experimental features) to indicate that user shouldn't use it until they know it brings performance for their particular case? Another option to call it "all" if the intend is to take all graph and merge it into a single partition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is motivation for differentiation between max and fusion?

From my understanding, fusion defines fixed number of patterns, while max's optimizations are not fixed, it will generate different optimization for different models based on model architecture, which might be good for OOB models.

How user can decide which one should be used (not for a single model, but for overall integration)?

It depends on how much control user wants to give to the library. Compared with fusion, max gives the library more control to optimize the graph.
But I can understand user wants to get better performance when they give the library more control, which is not guaranteed for max policy since we are not able to test all the models in the world (Please correct me if I'm wrong @ZhennanQin @niuxiaog).

Maybe "max" should be renamed to something like "experimental" ("experimental" is really bad word, we already have semantics for experimental features) to indicate that user shouldn't use it until they know it brings performance for their particular case? Another option to call it "all" if the intend is to take all graph and merge it into a single partition.

all may not be true because take all graph and merge it into a single partition is not guaranteed. experimental sounds a doable option, what do you think? @ZhennanQin @TaoLv

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

max policy is designed to support direct programming with oneDNN graph API to implement custom ops or custom fusion patterns. For example, end-user can build RMS_norm with small ops and pass it to onednn graph with max policy, and compiler backend can generate a fused kernel for this. This is frequent request for newly introduced activation or normalization. I think experimental is not a good name since it can mean anything and doesn't reflect actual behavior. But we can use other term than max to make it more clear.

- If not specified, the default partition policy remains to be fusion policy.
- Max policy returns partitions max in partition size. The criteria of partitioning
is based on pre-defined ops list and pre-defined rules. Connected ops
that meet the conditions of pre-defined ops list and pre-defined rules
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the pre-defined ops list and pre-defined rules be expose through API or API documentation? If not, how does the user know when to use the max policy? Or are you suggesting users to always specify max policy for their models?

rfcs/20230801-max-partition-policy/README.md Outdated Show resolved Hide resolved
rfcs/20230801-max-partition-policy/README.md Show resolved Hide resolved
@yimeisun123
Copy link

  1. In the current oneDNN graph implementation, my understanding is that the graph compiler backend already provides additional fusions comparing to DNNL backend. Will specifying "max" policy bring in even more fusions in the graph compiler backend? What happens if we are using graph compiler backend without specifying max policy?
  2. It would be good to know how you decide the op list and policy.
  3. what would be the case that specifying max policy may have negative impact?

@niuxiaog
Copy link
Contributor

niuxiaog commented Aug 9, 2023

  1. In the current oneDNN graph implementation, my understanding is that the graph compiler backend already provides additional fusions comparing to DNNL backend. Will specifying "max" policy bring in even more fusions in the graph compiler backend? What happens if we are using graph compiler backend without specifying max policy?
  2. It would be good to know how you decide the op list and policy.
  3. what would be the case that specifying max policy may have negative impact?
  1. Yes, specifying "max" policy will bring in even more fusions in the graph compiler backend. If we are using graph compiler backend without specifying max policy, the default fusion policy will be used.
  2. The op list is roughly the supported ops list of graph compiler.
  3. Models in the model zoo will get performance improvement by max policy; For untested models, there may be some regression.

@TaoLv
Copy link
Contributor

TaoLv commented Aug 10, 2023

@niuxiaog

The op list is roughly the supported ops list of graph compiler.

That's internal. Users don't know which ops are supported by graph compiler from API.

Models in the model zoo will get performance improvement by max policy; For untested models, there may be some regression.

I believe framework integration is for general usage. How can frameworks or framework users know which model is from model zoo and which is not? @chunyuan-w @sanchitintel @yucai-intel @retonym @wenzhe-nrv Please share how you plan to integrate and expose this feature. Thanks.

@TaoLv TaoLv added the RFC A design document label Aug 10, 2023

The expected behaviors when using max policy:

- Max policy relies on predefined ops list to do partitioning. Connected ops
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choice between max and fusion policy falls on the shoulders of the framework user. How this is expected to be exposed in Pytorch and Tensorflow?

How users are supposed to decide whether use fusion or max? The explanation here talks about implementation details mainly, but what are the consequences visible to framework user?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is,

Max policy is not expected to be exposed by framework, instead, it's used to support framework developer to implement their custom fusion kernels.

Fusion policy is the recommended way to use onednn graph automatically, as we will do graph partition according to pre-defined fusion pattern and deliver good performance. While max policy is to support direct programming with onednn graph API, and framework developers need to program the graph they want to fuse and pass it to onednn graph with max policy, just like how they do C program and compile it with GCC.

max policy can be used in below scenarios:

  1. The default fusion policy doesn't meet framework's requirement and developer wants to change the fusion behavior, for example, for a 3-layer MLP, fusion policy will take them all, but if the developer only wants oneDNN graph to take the first two layers, and leave the last layer to be un-partitioned, then they can program with max policy and define a graph only contains first two layers.
  2. Define new fusion patterns. For example, if framework developer want to implement RMSNorm op, they can build a graph with small ops and pass it to oneDNN graph with max policy, then a fused kernel will be returned.

In general, max policy is not designed for framework users, but for framework developers.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My observation is that framework side also like to develop patterns and sometime they use custom op to support these patterns like flash_attention, sparse computation, int4, and etc. oneDNN Graph 's max policy would help framework develop to use oneDNN Graph instead of writing their own custom op. This gives them a chance to use oneDNN Graph and compare with their custom op, if oneDNN Graph does better, than they just call oneDNN Graph with max policy.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jianhui-Li, adding to your point (custom ops vs. patterns created by max partitioning), I feel like there's another aspect as well - a pattern added manually by backend teams may sometimes perform better than the pattern automatically chosen by the max partitioning approach, and sometimes, the opposite may be true.

So at runtime, should a framework (transparently to framework users) benchmark both approaches, and eventually use the approach that delivers better performance? In case a framework can't do this sort of benchmarking, then the framework may have to expose an API to the users to manually check if max partitioning approach (if it's non-default) may deliver better performance.

For example, DNNL backend will not support max partition policy in oneDNN v3.3
release, so when running with a combination of max partition policy and DNNL
backend, max partition policy will fallback to fusion partition policy.
1. The validation for max partition policy will not be landed in oneDNN v3.3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are not landing features without validation in oneDNN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, the wording needs to be more precise, I'll refactor.

We have limited validation for max partition policy in v3.3 release:

  • Basic functionality test and correctness check will be covered by gtests.
  • Due to validation resource limitation, benchdnn graph plans to provide performance test in v3.3 as a stretch goal if the feature and test cases ready. (@chuanqi129 @thuang6 for more details)

@vpirogov Do you think this validation scope looks good to you?

@sanchitintel
Copy link

sanchitintel commented Aug 14, 2023

Hi @ElaineBao, can framework developers tell if custom patterns that don't already exist with fusion policy (but can be added by a backend as predefined patterns) would perform better or worse than max policy?

In other words, would max policy enable better OOB support by not having to request backend teams to add specific pre-defined patterns for new OOB models that would be encountered in the future? Are there any heuristics that can be used to determine if custom pre-defined patterns would perform better or worse than the patterns generated by max policy? Thanks!

@ElaineBao
Copy link
Contributor Author

Hi, @sanchitintel, I guess you are asking how a FW could determine when to use max and when to use fusion policy.

The general rule is that the default recommended policy is still fusion policy. But for a specific workload that have customed patterns that have not been covered in fusion policy, FW can try to use max policy to see if it achieves better performance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC A design document
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet