Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analyse direct executor usage in the BridgeImpl #2006

Open
ivansenic opened this issue Jul 26, 2022 · 4 comments
Open

Analyse direct executor usage in the BridgeImpl #2006

ivansenic opened this issue Jul 26, 2022 · 4 comments
Assignees

Comments

@ivansenic
Copy link
Contributor

We were doing some initial performance testing with the Docs API V2 and once of the most disturbing things for us so far was the metrics and the tracing reports that the DescribeKeyspace method on the Bridge was taking too much time from the client perspective.

In the current state, we observed this metrics that showcase that the method has:

  • server side average of 2.87ms
  • client side average of 7.10ms
grpc_server_processing_duration_seconds_count{method="DescribeKeyspace",methodType="UNARY",service="stargate.StargateBridge",statusCode="OK",} 400036.0
grpc_server_processing_duration_seconds_sum{method="DescribeKeyspace",methodType="UNARY",service="stargate.StargateBridge",statusCode="OK",} 1151.208204639
grpc_client_processing_duration_seconds_count{method="DescribeKeyspace",methodType="UNARY",module="sgv2-docsapi",service="stargate.StargateBridge",statusCode="OK",} 400036.0
grpc_client_processing_duration_seconds_sum{method="DescribeKeyspace",methodType="UNARY",module="sgv2-docsapi",service="stargate.StargateBridge",statusCode="OK",} 2842.293467427

It was quite disturbing to see a method that should be a sub-millisecond method to be recorded as the 7ms on the client side, with over 4ms overhead from the server-side on the same machine. We started exploring and we, most likely by accident, tackled the .directExecutor() usage on the BridgeImpl..

We changed this to a dedicated executor with 16 threads: .executor(Executors.newScheduledThreadPool(16)), and got quite the opposite times:

  • server side average of 0.73ms
  • client side average of 1.28ms
grpc_server_processing_duration_seconds_count{method="DescribeKeyspace",methodType="UNARY",service="stargate.StargateBridge",statusCode="OK",} 700102.0
grpc_server_processing_duration_seconds_sum{method="DescribeKeyspace",methodType="UNARY",service="stargate.StargateBridge",statusCode="OK",} 512.994728568
grpc_client_processing_duration_seconds_count{method="DescribeKeyspace",methodType="UNARY",module="sgv2-docsapi",service="stargate.StargateBridge",statusCode="OK",} 500099.0
grpc_client_processing_duration_seconds_sum{method="DescribeKeyspace",methodType="UNARY",module="sgv2-docsapi",service="stargate.StargateBridge",statusCode="OK",} 643.713077082

It's really unclear to me why is this and if usage of the direct executor is justified, but we also observed improved response times for the read and write scenarios in V2:

Direct Executor Executor Diff
Read Mean=25.16ms (99.000ptile=42ms) Mean=21.55ms (99.000ptile=31ms) 16% faster
Write Mean=39.69ms (99.000ptile=68ms) Mean=36.25ms (99.000ptile=55ms) 9% faster

We should analyze if the direct executor should be used here or not.

Note that all data is from write and read docs scenario using 60 nb threads on a single machine.

@ivansenic
Copy link
Contributor Author

@mpenick Could you have a look on this please?

@ivansenic
Copy link
Contributor Author

Showcasing differences in tracing examples:

Executor:
image

Direct executor:
image

@ivansenic
Copy link
Contributor Author

Reconfirmation needed after #2008 is merged.

@ivansenic
Copy link
Contributor Author

The theory why is this happening that the DescribeKeyspace calls are not offloaded to another thread, as for the CQL querying. This is most likely the reason for that strange timing. @mpenick and I think that it would be the best:

  • keep the directExecutor()
  • use a new or existing executor for the processing of the gRPC calls that are not going to execute CQL queries

This should be done for all the gRPC call that are not doing the CQL query.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants