Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CC service stopped consuming while reading from Kafka and giving kafkajs crashed error #185

Open
Rahulkala opened this issue Sep 4, 2020 · 1 comment · Fixed by #186
Labels
bug Something isn't working M-kafka This issue is related to the kafka module P2 Priority 2

Comments

@Rahulkala
Copy link

We have a service consuming from kafka which stopped consuming and was not failing.
All we saw in logs that
Kafkajs crashed because co-rodinator is loading. It crashed multiple times and restarted the consumer every-time
Even after restarts it shows that it was unable to connect to the leader with below error.

["This server is not the leader for that topic-partition"],"stack":["KafkaJSProtocolError: This server is not the leader for that topic-partition\\n at createErrorFromCode (/usr/src/node_modules/kafkajs/src/protocol/error.js:537:10)\\n at Object.parse (/usr/src/node_modules/kafkajs/src/protocol/requests/listOffsets/v2/response.js:43:11)\\n at Connection.send (/usr/src/node_modules/kafkajs/src/network/connection.js:311:35)\\n at runMicrotasks (\u003canonymous\u003e)\\n at processTicksAndRejections (internal/process/task_queues.js:93:5)\\n at async Broker.listOffsets (/usr/src/node_modules/kafkajs/src/broker/index.js:413:20)\\n at async /usr/src/node_modules/kafkajs/src/cluster/index.js:419:43\\n at async Promise.all (index 0)\\n at async Cluster.fetchTopicsOffset (/usr/src/node_modules/kafkajs/src/cluster/index.js:431:23)\\n at async /usr/src/node_modules/kafkajs/src/admin/index.js:193:22"]}}

"msg":["Kafkajs Crashed"],"err":[{"error":{"name":"KafkaJSProtocolError","retriable":true,"type":"GROUP_LOAD_IN_PROGRESS","code":14

And when Kafka consumer comes back we are seeing the below error
"error":"This is not the correct coordinator for this group"

On checking kafka logs for the same time we see that the consumer group was getting disconnected from kafka co-ordinator.

If the consumer is disconnected the service retried connecting and keep on failing with kafkajs crashed retry-able error and once we bounce the service, everything started working correctly.

@sklose
Copy link
Collaborator

sklose commented Sep 4, 2020

Initial investigation indicated that kafkajs' metadata (contains which broker owns which topics/partitions) might be stale and therefore it errors every single time we try to retrieve topic offsets. The fact that it worked after a restart points in that direction as well.

As a workaround (assuming the issue can't be found / get fixed in kafkajs) we could terminate the kafka input source if the code that fetches the topic offsets for metrics is constantly getting the "server is not the leader for that topic-partition" error.

@sklose sklose added bug Something isn't working M-kafka This issue is related to the kafka module labels Sep 4, 2020
@sklose sklose linked a pull request Sep 8, 2020 that will close this issue
@plameniv plameniv added the P2 Priority 2 label Jul 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working M-kafka This issue is related to the kafka module P2 Priority 2
Development

Successfully merging a pull request may close this issue.

3 participants