`DirectFileStore` creates too many files for long running applications #143

maximkosov · 2019-07-10T08:37:21Z

We use passenger to run our rails application. Worker processes recreated after every 100 requests. Recently we tried client_ruby with DirectFileStore. Each worker process has its own pid so depending on server load there will be tens of thousands of files in prometheus work dir after couple of hours/days when app wasn't restarted.

With 50k files in prometheus work dir /metrics route starts to be very slow with processing time about 10 seconds which could lead to prometheus scraper timeouts.

Is there any workaround for long running processes with DirectFileStore? One possible workaround I can think of is just restart app once in a few hours. Instead of restarting the whole application we can just wipe prometheus work directory once in a few hours, but this is looks a little bit hacky for me.

The text was updated successfully, but these errors were encountered:

dmagliola · 2019-07-10T10:48:53Z

This is related to issue #109, namely, how do you know whether a file is being written to by a process that's still alive. It's not exactly the same problem, since in this case, we can't get rid of files from old processes (or else counters would go down), but I have a feeling both problems will be solved in a related way.

lawrencejones · 2019-07-10T13:47:35Z

I would think an appropriate fix for this would be to have each process open a shared lock on the file it manages, for the duration of it needing the file.

Then, every time a file store tries creating a new file, you examine all those that exist in the directory that you can exclusively lock and compact them into a new file, while destroying all those that existed before.

Provided you think about how to do concurrency control, this would be a decent mechanism for solving it. But at this point I reckon you should probably be using an mmap system.

brian-brazil · 2019-07-10T14:06:49Z

Is there a locking system that will work across all OSes?

lawrencejones · 2019-07-10T14:10:55Z

Is there a locking system that will work across all OSes?

That would certainly be a consideration, and you'd probably end up with the compaction as an optional configuration on the file store. But honestly, I'd rather see a more performant mmap approach before doing this. Just wanted to sketch out a possible implementation.

brian-brazil · 2019-07-10T14:50:38Z

We haven't solved this, and we're using mmap over in Python. The reads were actually changed to not use mmap recently for performance.

dmagliola · 2019-07-11T06:53:48Z

Another potential option would be to have one file per process, rather than one file per process/metric.

This doesn't fundamentally solve the issue, but it makes it orders of magnitude less problematic. The downside to this is that now each metric increment has essentially a mutex around it, so it is not the best performance in multi-threaded scenarios.

But if, like most Ruby apps, each process is running single-threaded, or if it's not incrementing counters that often, the performance penalty will probably be negligible. Each store save should be in the order of single-digit microseconds after all.

This would need a separate store (i'm not proposing modifying the existing file store to do this), but that's the point of having the swappable store backends, and it would be significantly easier to write than the compaction we're talking about here.

Sinjo · 2019-08-19T19:17:10Z

I'm taking this out of the v0.10.0 milestone. I think we should come up with an answer to this problem, but I also think there's a lot of value in getting 0.10.0 out of alpha and into more people's hands in its current state.

I'm going to try and get through the documentation issues so we can get to a release.

After that, we can go one of a couple of ways:

If we find we need another run of breaking changes before 1.0, we can have an 0.11.0 release with alphas as needed.
If not, we'll basically promote what we've got here to 1.0.

This was a quick experiment on having all metrics for each process on the same file, because it seemed like it would be easy. (just do these things in this commit) However, even though tests pass, this doesn't actually work. See next commit on why. If the change were just this, i'd say we should do this. However, as you'll see in the next commit, it's more involved than that, and i'm not sure it's worth doing, at least not with this approach... I'm just pushing this up so it doesn't get lost.

dmagliola mentioned this issue Jul 11, 2019

Validate behaviour of DirectFileStore in the presence of existing files #109

Open

Sinjo assigned Sinjo and unassigned Sinjo Jul 11, 2019

Sinjo added this to the multi-process milestone Jul 11, 2019

Sinjo removed this from the v0.10.0 milestone Aug 19, 2019

dmagliola mentioned this issue Oct 14, 2019

One file per process instead of per metric/process for DirectFileStore [DOESNT WORK] #161

Open

dmagliola mentioned this issue Nov 1, 2019

Improve documentation for DirectFileStore #167

Merged

dmagliola mentioned this issue Jun 17, 2020

slow response on metrics http endpoint #194

Open

dmagliola mentioned this issue Jul 20, 2021

CPU usage growing over time with DirectFileStore #232

Open

stanhu mentioned this issue Mar 8, 2023

Improving DirectFileStore for performance and scalability #281

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`DirectFileStore` creates too many files for long running applications #143

`DirectFileStore` creates too many files for long running applications #143

maximkosov commented Jul 10, 2019

dmagliola commented Jul 10, 2019 •

edited

lawrencejones commented Jul 10, 2019

brian-brazil commented Jul 10, 2019

lawrencejones commented Jul 10, 2019

brian-brazil commented Jul 10, 2019

dmagliola commented Jul 11, 2019

Sinjo commented Aug 19, 2019

DirectFileStore creates too many files for long running applications #143

DirectFileStore creates too many files for long running applications #143

Comments

maximkosov commented Jul 10, 2019

dmagliola commented Jul 10, 2019 • edited

lawrencejones commented Jul 10, 2019

brian-brazil commented Jul 10, 2019

lawrencejones commented Jul 10, 2019

brian-brazil commented Jul 10, 2019

dmagliola commented Jul 11, 2019

Sinjo commented Aug 19, 2019

`DirectFileStore` creates too many files for long running applications #143

`DirectFileStore` creates too many files for long running applications #143

dmagliola commented Jul 10, 2019 •

edited