Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify CNI logging config #51074

Merged
merged 5 commits into from
May 21, 2024

Conversation

bleggett
Copy link
Contributor

@bleggett bleggett commented May 15, 2024

Please provide a description of this PR:

Fixes #50958

Currently, we use different scopes for all the subcomponents:

  • plugin install
  • sidecar repair
  • ambient agent
  • out-of-process CNI plugin

which means that default:XXXX does not work as you'd expect at all (it's in fact mostly useless), which leads to a lot of confusion when configuring istio-cni logging levels - you either have to use all:XXX or know your subcomponents. In practice using separate scopes hurts more than it helps.

Also, there's no real reason why we should ask the end-user to separately configure the out-of-process plugin log level, it should just inherit.

This moves all istio-cni logging to use the default scope, using labels for the different components, and removes the user-facing separate plugin logging config in favor of inheriting the level from the agent.

@bleggett bleggett requested review from a team as code owners May 15, 2024 18:23
@istio-policy-bot istio-policy-bot added the area/ambient Issues related to ambient mesh label May 15, 2024
@istio-testing istio-testing added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 15, 2024
@bleggett bleggett added area/networking/cni Istio CNI-related issues do-not-merge/hold Block automatic merging of a PR. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 15, 2024
@istio-testing istio-testing added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label May 15, 2024
@howardjohn
Copy link
Member

howardjohn commented May 15, 2024

I think there may be some misunderstand on how the scopes work? Or in my understanding of this change

If CNI should use default scope everywhere, why just CNI? That is inconsistent, and we should do the same everywhere. Scopes, though, are useful -- so I don't think we want to d othat

default:debug is a bad rule to set. that means "set everything without a scope to be debug". Logs without scope ~= Istio devs were lazy and forgot to scope their logs. Fixing those cases would be good.

Seems like what you are looking for is all:debug?

@bleggett
Copy link
Contributor Author

bleggett commented May 15, 2024

I think there may be some misunderstand on how the scopes work?

The main thing is that unless they can be nested and auto inherit the level from the parent scope (which AFAICT they cannot), in this component they're less useful than just labels. We get no real benefit from the segregation, there are no real cases where we'd only want plugin logs, but not ambient informer logs, for instance - that's not a practically useful level of granularity.

If CNI should use default scope everywhere, why just CNI? That is inconsistent, and we should do the same everywhere. Scopes, though, are useful -- so I don't think we want to do that

They can be, but

  • Their existing use in istio-cni is definitely not correct (this PR was prompted by a user debugging session being harder to set up than it really should have been, because of goofy scoping)
  • I don't know that in istio-cni specifically separate scoping is useful - it's one node agent with different features enabled via flag. 99.9% of the time, you want all enabled features to use the same logging level, but you still wanna know which section of code logged. So IMO labels are a better fit for that.

default:debug is a bad rule to set. that means "set everything without a scope to be debug". Logs without scope ~= Istio devs were lazy and forgot to scope their logs. Fixing those cases would be good.

I am fine creating a top-level cni scope, if default is not allowed - it wasn't clear to me that default is "stuff without a scope" - that implies we should never "set the level for the default scope" and if we do it's a bug/TODO.

Seems like what you are looking for is all:debug?

Yes, but in the values.yaml we set

    # change cni scope level to control logging out of istio-cni-node DaemonSet
    logging:
      level: default:info,cni:info

which (if you don't know our logging levels) is confusing - why would anyone know the difference between default and all? Why are we setting two different scopes for one component? etc etc.

It sounds like you're saying we should just basically not use default - that's fine, I'll use a scope of cni for this component.

@howardjohn
Copy link
Member

I still don't get it. If you don't care about scopes then set all? I don't see why we don't think anyone could possibly care about fine grained scopes. If the helm chart is confusing (it is) we can fix that without removing all the scopes

@bleggett
Copy link
Contributor Author

I still don't get it. If you don't care about scopes then set all? I don't see why we don't think anyone could possibly care about fine grained scopes. If the helm chart is confusing (it is) we can fix that without removing all the scopes

With the current scoping in istio-cni, no one except istio-cni developers has the context to know how to use them individually, realistically.

If no one but people so familiar with the code that they understand the implicit dependencies between the inner components can intelligently make use of the granular scopes, and those people don't use selective granular scopes (I don't, I doubt any other maintainers do for this component) - why do we use scopes vs labels? I can't think of a reason.

@howardjohn
Copy link
Member

The plugin stuff is weird though, since there are basically 2 plugin levels (what level the plugin sends over UDS, and what level the daemonset logs from the plugin); we could probably collapse that (either send all over UDS and filter in DS, or opposite)

@bleggett
Copy link
Contributor Author

The plugin stuff is weird though, since there are basically 2 plugin levels (what level the plugin sends over UDS, and what level the daemonset logs from the plugin); we could probably collapse that (either send all over UDS and filter in DS, or opposite)

Yes, it is weird. I considered just sending everything and filtering serverside (that is simpler) but that has ~some perf implications? Probably not much. If you wanna have a toggle just on one end, I'll do that.

@howardjohn
Copy link
Member

If a user doesn't understand the scopes and just wants "all the logs" for debugging, they should not use scopes and should use all.

Why would we remove scopes in CNI and not in all of istio?

@howardjohn
Copy link
Member

And do we remove them from ztunnel and envoy as well?

@bleggett
Copy link
Contributor Author

bleggett commented May 15, 2024

Why would we remove scopes in CNI and not in all of istio?
And do we remove them from ztunnel and envoy as well?

I don't have an opinion on that really, that's IMO strictly out of scope (no pun intended) for this PR - and it's not an argument for why they should be used here, either way.

Scopes make much more sense for things in the datapath IMO where you might have or want different log streams with hugely varying volumes that go to different places - access logs + other stuff, basically.

CNI isn't in the datapath. It doesn't have logs like that. You either want to trace the flow of pod add/pod remove events thru all the components, or you don't.

Labels give context, context is what we need here. We don't have a requirement for sub-component toggling in istio-cni, and probably never will (but if a compelling case is made, we can do it).

@howardjohn
Copy link
Member

I don't have an opinion on that really, that's IMO strictly out of scope (no pun intended) for this PR - and it's not an argument for why they should be used here, either way.

Keeping components of Istio consistent is always in scope 🙂 .

@bleggett
Copy link
Contributor Author

bleggett commented May 15, 2024

I don't have an opinion on that really, that's IMO strictly out of scope (no pun intended) for this PR - and it's not an argument for why they should be used here, either way.

Keeping components of Istio consistent is always in scope 🙂 .

Fair - for me it comes back to the datapath thing tho - istio-cni is not part of that, and consequently it doesn't have log streams that would be directed to different locations/destinations/stores, and so doesn't benefit from scoped toggles. In this scenario scopes are mostly just useful as labeling, so might as well use labels.

If you definitely want scopes - what scopes would you propose here? The existing scoping is excessively granular, to the point where it is unused.

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
@istio-testing istio-testing added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 20, 2024
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
@bleggett bleggett changed the title Fix CNI log levels Simplify CNI logging config May 20, 2024
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
@istio-testing istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 20, 2024
@bleggett
Copy link
Contributor Author

Couple of things we can do here (I'm open to any of these):

  • Use single cni scope for all of the istio-cni component instead of default scope (arguably no better than using default, requires a few more lines of explicit scope management), and use labels for subcomponent context.
  • Use single default scope for all of the istio-cni component, and use labels for subcomponent logging context (current PR approach).
  • Either of the above, and use no labels for subcomponent context (simplest)
  • None of the above - keep some degree of subcomponent-level independent log toggling (probably only useful to Istio devs given the nature of this component, current granularity is excessive and unused, if we really want this, we need to identify what scopes we want within the istio-cni component and where)

@howardjohn lmk your thoughts on this, or if there's another option I didn't identify.

@howardjohn
Copy link
Member

I would do (4), and we at the very least want scopes for the plugin vs the server

Signed-off-by: Benjamin Leggett <benjamin.leggett@solo.io>
@bleggett
Copy link
Contributor Author

bleggett commented May 20, 2024

I would do (4), and we at the very least want scopes for the plugin vs the server

I added those 2 scopes - note tho that the original impetus for this PR was to make the plugin auto-inherit the agent log level, and that doesn't change with this.

@bleggett bleggett removed the do-not-merge/hold Block automatic merging of a PR. label May 21, 2024
@bleggett
Copy link
Contributor Author

/test integ-security-multicluster

@istio-testing istio-testing merged commit 901a523 into istio:master May 21, 2024
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ambient Issues related to ambient mesh area/networking/cni Istio CNI-related issues size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consolidate logging in istio-cni
4 participants