Add EventBridge Scheduler mode for policy lambdas #9273

scolebrook · 2024-02-05T22:12:11Z

Add optional EB Scheduler for periodic mode if scheduler_role mode property is present. EB Scheduler uses an execution role to invoke a Lambda function instead of a permission (mentioned in lambda.rst with link to AWS docs). All three schedule types are supported, with timezone (defaults to Etc/UTC), start_time, end_time and group_name (defaults to default) properties. If specified, the schedule group must already exist.

Use of EB Scheduler requires additional actions for the deployment identity (covered in deployment.rst).

Converting an existing periodic policy from EB Rule style to EB Scheduler style causes the old rule to be deleted if it exists. Similarly, converting in the other direction causes the schedule to be deleted if it exists. This requires checking for the schedule in all groups (since the group the schedule was in is lost at that point).

These conversion checks have resulted in extra API calls for the test_mu.py suite so the placebo_data.zip has been updated.

EB Scheduler allows 1,000,000 schedules per region with more upon request. EventBridge allows only 300 rules per region (as low as 100 in some regions) with more upon request. By moving periodic policies to EB Scheduler, more rule capacity is freed for other types of rules, like CloudTrail.

…an up

scolebrook · 2024-02-06T16:04:53Z

@kapilt Looks like the ubuntu-latest/3.10 test is getting a 503 error when trying to upload to codecov. Most likely a "them" problem. They've just updated their status page to indicate they're having a problem. But when it clears, I can't retry the job.

scolebrook · 2024-02-08T01:40:22Z

Addresses periodic mode in #7989

kapilt · 2024-02-09T15:55:11Z

c7n/mu.py

+            # check if schedule exists for func.event_name and remove it to clean up
+            # when switching between scheduled event types
+            # must check all groups
+            EventBridgeScheduleSource.remove_from_groups(self.session.client('scheduler'), func)


since this is new behavior, we can avoid the api call if we store state on the lambda via tag to avoid the extra api calls. that will also help clean up some of the test data set being modified.

So to clarify, you're suggesting having a tag on the policy lambda fn to store the fact that a schedule is used for periodic mode instead of a rule? Rule based periodic mode and all other kinds of modes would not have the tag (unchanged from current), so we'd just be check for tag presence and adding/removing a tag when moving from one kind of periodic mode to the other. Am I understanding your suggestion correctly?

Perhaps another approach would be to leave periodic mode untouched and have a new periodic-schedule mode.

a new mode would be useful to avoid extraneous api calls on the migration, I think you'll still want to use tag state to avoid the scheduling group iteration. I think for migration purposes we could document using a custodian policy to do transition cleanup, albeit we'll also need support for scheduling groups and schedules as resources for that.

So a new mode (periodic-schedule) to avoid all the extra work modifying and testing periodic to support 2 methods of implementation depending on what properties are provided in the policy, and either an additional custodian tag on the policy lambda to record the schedule group name (even if default), or perhaps another item in the custodian-info tag. I'm more concerned about the risk of breaking other things that depend on custodian-info containing only 2 : separated properties. I don't have enough visibility to know if appending to custodian-info is a bad idea. @kapilt What do you think?

As for migration, if you are converting an existing policy from periodic to periodic-schedule, mu.py would remove the cwe rule if I'm reading the code correctly. Shouldn't be a problem. Schedule groups as resources would be useful for tagging enforcement though. Another project for another day :-)!

for migration, i think we would do a followup pr to support additional resources on schedule/schedule groups, and then update docs with guidance on some sample policies that can do the clean up. I'm not clear the periodic to periodic-schedule would get the cwe removal.

Added a little to mugc.py so it can remove a schedule when one exists. This required adding an auto tag (custodian-schedule) to hold the group name since mugc works with the function, not the policy. Added a note to the lambda.rst doc pointing out commenting out a policy, running mugc to remove all the bits, then removing the comments and running custodian run again to redeploy is a simple process when switching.

c7n/policy.py

… used in `periodic` mode

kapilt · 2024-03-21T09:15:31Z

thanks for the changes, overall looks good, It does feel like their is some extraneous recording data though wrt to the volume of json in the pr, I haven't dived deep enough to understand if thats just due to extraneous functions in the recording or what the underlying cause is.

scolebrook · 2024-03-25T19:13:50Z

@kapilt Found some of the test data files you saw were from before the pivot from modifying periodic to creating the new schedule mode. Removed them and cleaned up some other things from that pivot. Also added a new policy test to cover the custodian-schedule tag on schedule mode policy lambdas.

Still lots of test data files in the two new mu tests. test_eb_schedule creates a policy, then checks the schedule. Then it modifies the policy and checks the schedule twice for two different kinds of changes. Similarly, test_pause_resume_sched_policy creates the policy, then disables the schedule before re-enabling it. This was all necessary to get good coverage. If you see anything unneeded in the two test cases so we can simplify, let me know.

kapilt · 2024-03-27T10:33:13Z

c7n/mu.py

+        # check for schedules with same func.event_name in any other group in case
+        # group_name has been changed
+        # there should only be one schedule across all groups for each policy lambda
+        self.remove_from_groups(self.client, func, params['GroupName'])


it feels like the group change should be a delta of the tag in actual function vs the configured function which alleviate having to do api calls to trawl through all schedule groups to check for the function schedule.

@kapilt Thanks for spotting that. That was from before I added the tag with the group name in it (needed for mugc). The problem is that the function is changed first (actually deployed to aws), then the events get processed. So by the time we're here in add, the tag with the group name has been changed to the new group name.

About the only way I can see to make this work, would be to update _create_or_update in the LambdaManager class in mu.py (line 457). It gets the config of an existing function as part of its process if the function actually exists. If it does, we could read the group name out of the tag before it's replaced and return that along side the existing values that method returns. Something like old_group_name which would default to None for a new function or for function with any mode other than schedule.

Then the question is how to get that value into the add method for the EventBridgeScheduleSource class.

_create_or_update is called by LambdaManager.publish() which then gets all the events at line 398 with a call to func.get_events(). This is where an instance of the EventBridgeScheduleSource is created. Rather than add the old_group_name as an argument to the add method (which would mess with all the other subclasses of AWSEventBase), it's probably best to pass the old group name in when creating an EventBridgeScheduleSource (line 956). That would mean none of the other classes need modification, and the EventBridgeScheduleSource would have the old group name, or None if there isn't one. It could then see if the old group name is different to the new one and then remove the schedule in the old group.

Lots of little tweaks to get the value of that tag all the way to where remove_from_groups is happening, but is probably the cleanest way to do it. It would remove all those extra API calls and just leave one call to remove a schedule only when we already know it's there AND it's no longer wanted.

There's a lot to unpack there. But let me know if that is an approach you think would be acceptable.

haven't had a chance to read fully, traveling today, but I was thinking we could encapsulate via a provision event/facade abstraction with the old lambda state, the current lambda config/object, that we could propagate to event provisioners from lambda manager on publish which would provide the needed context to avoid the api calls which the current interface lacks.

That's basically what my long verbage was attempting to describe. :-D

I've been working on an implementation of this. This is how it currently works.

In LambdaManager, publish calls _create_or_update which gets the current config if it exists (not a new function) before updating the function. Return that config to publish alongside the existing values going back. Extract the old group name from the tag if it exists with a regex. publish now has the previous function config available for any other purpose in the future.

publish then calls func.get_events where func is a PolicyLambda or LambdaFunction instance and iterates through the returned list of event objects. The PolicyLambda variety is where the event source objects get created. So we pass the old group name (or None) through to get_events and if we're creating an EventBridgeScheduleSource we pass it on in the init and store as a property. We can now delete an old sched when needed during the call to the add method.

I've tried this out and it works without needing to modify any other tests. It results in a decrease of about 15% in the number test json files for the two new tests (mostly in test_eb_scheduler).

I'll push a commit shortly with this implementation for you to have a look at when you get time. Thanks for your help refining this feature. Safe travels.

…e when changing group name

scolebrook · 2024-05-01T13:50:56Z

@kapilt Any additional changes needed on this?

kapilt · 2024-05-20T15:45:55Z

c7n/mu.py

-        for e in func.get_events(self.session_factory):
+        # process any previous lambda configuration
+        old_group_name = None
+        if existing:


this logic should really be in the event implementation, rather than hardcoding a particular event's pattern into the manager. the notion is to pass both current, and previous of the function so the event can do whatever specific logic it needs. as is this just violates encapsulation.

I'll work on it.

kapilt

lgtm

scolebrook · 2024-05-20T18:26:39Z

@kapilt Thanks for your help pushing this across the finish line.

kapilt · 2024-05-20T20:39:19Z

@scolebrook no problem, thanks for working on it and making it happen.

Add EventBridge Scheduler as an option for periodic mode policies

26d64a2

scolebrook requested a review from kapilt as a code owner February 5, 2024 22:12

scolebrook and others added 5 commits February 5, 2024 17:20

Correct bug in AWSEventBase to get unit tests to pass

d296bd7

Longer responses

43a13e6

Improve codecov

4d5490f

Merge branch 'main' into periodic_eb_scheduler

3ae434c

improve codecov

eadf1d9

scolebrook marked this pull request as draft February 6, 2024 03:14

Add test to switch between rule and schedule and back and confirm cle…

ae7b67f

…an up

scolebrook marked this pull request as ready for review February 6, 2024 15:48

kapilt and others added 2 commits February 6, 2024 17:18

Merge branch 'main' into periodic_eb_scheduler

797285a

Merge branch 'main' into periodic_eb_scheduler

a9a3d38

Merge branch 'main' into periodic_eb_scheduler

5747777

kapilt reviewed Feb 9, 2024

View reviewed changes

c7n/policy.py Outdated Show resolved Hide resolved

Pivot to create a schedule mode to separate EB Scheduler from Rules…

f044c77

… used in `periodic` mode

scolebrook marked this pull request as draft March 6, 2024 05:01

scolebrook and others added 2 commits March 6, 2024 09:44

Change properties with underscores to use dashes

1b635a5

Merge branch 'main' into periodic_eb_scheduler

47f3e2d

scolebrook marked this pull request as ready for review March 6, 2024 15:08

scolebrook changed the title ~~Add EventBridge Scheduler as an option for periodic mode policies~~ Add EventBridge Scheduler mode for policy lambdas Mar 8, 2024

scolebrook and others added 5 commits March 25, 2024 13:33

test cleanup

7e85c80

Merge branch 'main' into periodic_eb_scheduler

8e6408e

more test clean up

1e00cd4

add test for new custodian-schedule tag on schedule mode policy lambdas

f72005d

typo fix

596bd18

kapilt reviewed Mar 27, 2024

View reviewed changes

scolebrook and others added 3 commits March 28, 2024 10:11

Capture existing config on policy lambda update to remove old schedul…

bcf44cf

…e when changing group name

Merge branch 'main' into periodic_eb_scheduler

9063c00

Merge branch 'main' into periodic_eb_scheduler

180df2b

kapilt assigned kapilt and unassigned kapilt May 18, 2024

kapilt reviewed May 20, 2024

View reviewed changes

kapilt and others added 3 commits May 20, 2024 13:16

move schedule specific logic into appropriate event class

d069068

update s3 module

445006a

Merge branch 'main' into periodic_eb_scheduler

f9ccb08

kapilt approved these changes May 20, 2024

View reviewed changes

kapilt merged commit d459a38 into cloud-custodian:main May 20, 2024
22 checks passed

scolebrook deleted the periodic_eb_scheduler branch June 12, 2024 00:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EventBridge Scheduler mode for policy lambdas #9273

Add EventBridge Scheduler mode for policy lambdas #9273

scolebrook commented Feb 5, 2024

scolebrook commented Feb 6, 2024 •

edited

scolebrook commented Feb 8, 2024

kapilt Feb 9, 2024

scolebrook Feb 9, 2024

scolebrook Feb 9, 2024

kapilt Feb 12, 2024

scolebrook Feb 12, 2024

kapilt Feb 12, 2024

scolebrook Mar 6, 2024

kapilt commented Mar 21, 2024

scolebrook commented Mar 25, 2024

kapilt Mar 27, 2024

scolebrook Mar 27, 2024 •

edited

kapilt Mar 28, 2024

scolebrook Mar 28, 2024

scolebrook commented May 1, 2024

kapilt May 20, 2024

kapilt May 20, 2024

kapilt left a comment

scolebrook commented May 20, 2024

kapilt commented May 20, 2024

Add EventBridge Scheduler mode for policy lambdas #9273

Add EventBridge Scheduler mode for policy lambdas #9273

Conversation

scolebrook commented Feb 5, 2024

scolebrook commented Feb 6, 2024 • edited

scolebrook commented Feb 8, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kapilt commented Mar 21, 2024

scolebrook commented Mar 25, 2024

Choose a reason for hiding this comment

scolebrook Mar 27, 2024 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scolebrook commented May 1, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kapilt left a comment

Choose a reason for hiding this comment

scolebrook commented May 20, 2024

kapilt commented May 20, 2024

scolebrook commented Feb 6, 2024 •

edited

scolebrook Mar 27, 2024 •

edited