Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

100+ Consecutive fails on some nodes #29

Open
sadgb opened this issue Mar 14, 2019 · 6 comments
Open

100+ Consecutive fails on some nodes #29

sadgb opened this issue Mar 14, 2019 · 6 comments

Comments

@sadgb
Copy link

sadgb commented Mar 14, 2019

After investigating those nodes i was unable to find any clues.
CPU usage low
Memory usage less than 50%
Same configuration as others, make restart didn't help
Server restart didn't help

The only thing i was able to find in logs is

chainpoint-node | 2019-01-20T12:39:21.196654386Z WARN : Calendar : Could not retrieve block range 25735 (blocks 2573500 to 2573599) ...... chainpoint-node | 2019-03-07T12:04:20.243424875Z WARN : Calendar : Could not retrieve block range 28231 (blocks 2823100 to 2823199) chainpoint-node | 2019-03-13T06:44:41.190138103Z WARN : Calendar : Could not retrieve block range 28511 (blocks 2851100 to 2851199)

The problem starts in batches. For example i have some nodes with 320-350 consecutive faileds and a batch with 738-800 faileds

Please tell me how to find more info or how to fix this

@jacohend
Copy link
Contributor

Hi @sadgb, thanks for reaching out. Could you send us your node IP and node version? This will help us debug the issue. You can send to jacob@tierion.com if you don't want your information public on github.

@sadgb
Copy link
Author

sadgb commented Mar 15, 2019

all of them are 1.5.4
I've sent your an email with details

@michael-iglesias
Copy link
Contributor

Hello @sadgb,

Can you please provide us with more complete account of what is being logged within the Nodes suffering from consecutive failure. We've noted in the log output pasted above that there is a considerable block range gap between the first and last reported failures: block range 28231 & block range 28511, respectively.

A more verbose snapshot of log output and steps that you've taken to try to remedy the issue will give us a bit more insight into what is going on.

Feel free to post requested info here in this thread or email either jacob@tierion.com or miglesias@tierion.com.

Thanks,
Michael I.

@sadgb
Copy link
Author

sadgb commented Mar 19, 2019 via email

@sadgb
Copy link
Author

sadgb commented Mar 19, 2019

also i would like to mention that nu,ber of problem nodes increased a bit

@michael-iglesias
Copy link
Contributor

@sadgb I think you may have forgotten to attach the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants