Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg/blobserver/google/cloudstorage: add rate limiting #1285

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

bobg
Copy link
Contributor

@bobg bobg commented Dec 8, 2019

This solves rate-limit-exceeded errors when dumping large numbers of blobs into a Google Cloud Storage backend. The default limit is 1 request per 10ms and can be overridden with the new optional setting rate_limit.

@googlebot googlebot added the cla: yes Author has submitted the Google CLA. label Dec 8, 2019
@@ -733,14 +734,17 @@ func (b *lowBuilder) addGoogleDriveConfig(v string) error {
return nil
}

var errGCSUsage = errors.New(`genconfig: expected "googlecloudstorage" field to be of form "client_id:client_secret:refresh_token:bucket[/dir/]" or ":bucketname[/dir/]"`)
var errGCSUsage = errors.New(`genconfig: expected "googlecloudstorage" field to be of form "client_id:client_secret:refresh_token:bucket[/dir/][:ratelimit]" or ":bucketname[/dir/]"`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious what the format/unit of ratelimit is.

I think we should probably hold on making this configurable until the new configuration format is done. (I keep forgetting to finish it, but this is some extra motivation.)

@@ -755,7 +759,8 @@ func (b *lowBuilder) addGoogleCloudStorageConfig(v string) error {
if isReplica {
gsPrefix := "/sto-googlecloudstorage/"
b.addPrefix(gsPrefix, "storage-googlecloudstorage", args{
"bucket": bucket,
"bucket": bucket,
"rate_limit": rate,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think everything else in Perkeep (+ JSON in general?) uses fooBar case, not underscores?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solved this (and the obviousness of units) by renaming the config item to qps.

@@ -87,6 +89,7 @@ func newFromConfig(_ blobserver.Loader, config jsonconfig.Obj) (blobserver.Stora
auth = config.RequiredObject("auth")
bucket = config.RequiredString("bucket")
cacheSize = config.OptionalInt64("cacheSize", 32<<20)
ratestr = config.OptionalString("rate_limit", "10ms") // no faster than 1 req per this duration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, cacheSize above is without underscore and client_id and client_secret below use an underscore. Great. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does 10ms come from? Is that a Google-published number?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment referencing the relevant GCS docs. Short answer: up to 1000qps for writing and 5000 qps for reading is fine, above that and you're supposed to ramp up to it slowly. The default here is very conservative.

@@ -108,16 +111,22 @@ func newFromConfig(_ blobserver.Loader, config jsonconfig.Obj) (blobserver.Stora
if dirPrefix != "" && !strings.HasSuffix(dirPrefix, "/") {
dirPrefix += "/"
}

dur, err := time.ParseDuration(ratestr)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably want to return a more descriptive error here so the user knows that this was from their rate limit.

@bobg
Copy link
Contributor Author

bobg commented Dec 9, 2019

Update: this change does not in fact suffice to avoid rate-limit-exceeded errors. Digging into it a little farther I find that many (all?) write operations modify the same object: my PGP public-key blob! And it's the modifying of that one object that's happening too often for GCS's taste, even if the overall rate is well within limits.

Offhand it looks like the GCS blobserver code will happily upload any blob anew even if it's already present. I'll see if I can whip up a fix for that (in a separate PR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes Author has submitted the Google CLA.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants