-
-
Notifications
You must be signed in to change notification settings - Fork 284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DefaultMailbox size constantly increasing #2109
Comments
What happens if the application is left running? will it ever release the memory? Usually when a mailbox grows, it´s a sign that the actor is busy doing work, and new messages are arriving faster than the actor can handle. Without more context, it´s hard to say what is going on here specifically. |
I've been trying to debug such behavior for days. It's been hard, I should say. One thing that I haven't mentioned is that I have this same application using Akka.Net and this hasn't been observed after several months of usage. I basically built this version of the application based on the Akka.Net version by replacing the actor system (and the actor especific code, of course). This is what I observe: If the application is left running, the memory usage keeps increasing until I run out of if (64 GB). I've already tried to force run GC.Collect every certain interval regardless of any performance impact, but it didn't work as well. Let me also give you some more background on the application, so you may get a better sense that it is not really that demanding performancewise: the average size of each CSV file being sunk is ~30 KB and the sinking routine is being run every 30-60 seconds (that is, 30 seconds plus a random additional interval up to 30 seconds, just to make sure they are not all sinking at the same time) using ~500 actors, so I'd doubt that the messages would be arriving faster than the actor can handle. There are peaks, which can only occur in the first few seconds or so as the application captures a bit of historical information, but CPU usage rarely goes above 20-30% in such moments on a 8-core PC. Before the ramp up, RAM usage rarely goes above 0.5 GB, even in the first minutes. The only moment I see the CPU peaking to ~50% of so it exactly before the memory consumption ramp up starts. It is as if something is triggering it. From then on, memory consumption start to ramp up fast and nothing seems to stop it. Also, regarding your comment 'some form of backpressure can mitigate this. e.g. reverse the flow and let the actor signal to the producer that it is ready to receive new work rather than just being force-fed work from the outside', please note that I'm killing the 'back' actors whenever the sinking operation finishes and, while the data is being sunk, there is already a 'front' actor receiving the new messages. This is handled by ~100 parent actors. What's really tricky to see is that:
I think this one is important:
Would you know what can be creating such nested instances of the ConcurrentQueueSegment in the UnboundedMailboxQueue? Does the fact that all of its instances have exactly 500 instances of the PingMessage indicate something? |
I'm doing an obvious test that I should have done before: I just let the application run without having to process any data (that is, no CSV sinking, so no actor creation and actor killing). I see the same behavior: the number of ConcurrentQueueSegments with 500 ping messages increasing fast after ~20 min, as per the screenshots below. I'll try to create a minimal example of it and send it over in the sequence. |
I was finally able to figure out what was happening and the issue was due to some dumb mistake on my end, that was rather difficult to catch. Sorry about that and thanks for your support. |
I have this application that basically collects data from different sources and store it into CSV files every 30 seconds. For this, it uses around 5k actors. 50% of theses are what I call 'front actors', that are in charge of receiving the data and storing the information in a list. The other 50% are 'back actors', that are in charge of sinking the data into the csv files. Whenever a 30 second cycle ends, 1) the one classified as a 'back actor' starts to receive new data, 2) the 'front actor' sinks the in memory data to the csv files, 3) sends itself a poison pill once concluded and 4) a new 'back actor' is created. Important to say that the data flow is quite steady - there are some peaks, but it does not vary that much in terms of MB/s. Also, there were no deadletters at all observed.
It happens though that at some point (typically after ~20 min from the data collection start), the LOH and POH memory start to ramp up, as per the screenshot below:
When I analyze it with dotMemory, I see that 99% of that is due to the size of a 'ConcurrentQueueSegment + Slot:
Screenshot taken when memory consumption was at ~1.3 GB:
Screenshot taken when memory consumption was at ~4.4 GB:
Some more snapshots taken:
When I analyze those thousands of 'ConcurrentQueueSegment+Slot', I see that they are all composed by this 'PingMessage' I created internally. These are recurring messages that are sent and used by parent actors in the actor hierarchy, at a certain TimeSpan interval (using 'ActorContext.Scheduler().SendRepeatedly(delay, interval, ActorPid, recurringMessage)'). So, it has nothing to do with the Front and Back actors - they don't even receive it. The PingMessages are sent to 'self'. Put another way, the PingMessages don't flow in the application... So even though there are hundreds of actors being killed every minute or so, the PingMessages are always there and are never touched by this sinking process and the only place they exist is within the actor itself.
I also find strange that this always happens after ~20 min from the application start, even though at this point in time thousands of sinking routines (and therefore thousands of PoisonPills have already been used). So, this doesn't seem to be only related to the recurring messages... It it were for that, the size of the 'ConcurrentQueueSegment+Slot' should have been increasing since the beginning of the applicaiton run.
So, this issue looks somehow related to the fact that I kill and create hundreds of actors every minute or so, but at the same time 99,99% of the messages are of the type 'PingMessage', of which the actor class being killed/created are not even aware of.
Could anyone assist, please?
The text was updated successfully, but these errors were encountered: