Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add healthcheck to MariaDB container to ensure it is ready to accept external connections before reporting as ready #758

Merged
merged 2 commits into from
Dec 22, 2023

Conversation

wotnak
Copy link
Collaborator

@wotnak wotnak commented Dec 15, 2023

Currently, tests running in GitHub Actions sometimes randomly fail when connection to MariaDB server fails. It usually fails only in first tests, with later ones running without problems. A recent example can be seen in #751 (comment).

The problem is with MariaDB container being reported as ready before the actual MariaDB server is ready to accept external connections.

It can be fixed by adding a healthcheck to the MariaDB container that checks if MariaDB server accepts external connections before reporting the container as ready. To do that, MariaDB container image provides a helper script that can be used in a healthcheck.

This pr adds a healthcheck as described at https://mariadb.org/mariadb-server-docker-official-images-healthcheck-without-mysqladmin/ to the MariaDB container used in tests, which should prevent random failures caused by connectivity issues between MariaDB server and farmOS when MariaDB server is starting.

@mstenta
Copy link
Member

mstenta commented Dec 15, 2023

Wow @wotnak this has been an issue for so long! Thank you for contributing this fix!!

The tricky part has always been testing, because the failures are not consistent. I'll requeue the tests on this PR a few times to test it out...

@mstenta
Copy link
Member

mstenta commented Dec 15, 2023

Hmm MariaDB tests failed on attempt #3, but I don't think it was related to the issue this PR is fixing: https://github.com/farmOS/farmOS/actions/runs/7221675735/job/19681595880?pr=758

There was 1 error:

1) Drupal\Tests\farm_location\Functional\LocationTest::testLocationFieldVisibility
WebDriver\Exception\UnknownError: unknown error: session deleted because of page crash
from unknown error: cannot determine loading status
from tab crashed
  (Session info: headless chrome=98.0.4758.102)
  (Driver info: chromedriver=98.0.4758.102 (273bf7ac8c909cde36982d27f66f3c70846a3718-refs/branch-heads/4758@{#1151}),platform=Linux 6.2.0-1018-azure x86_64)

/opt/drupal/vendor/instaclick/php-webdriver/lib/WebDriver/Exception.php:198
/opt/drupal/vendor/instaclick/php-webdriver/lib/WebDriver/AbstractWebDriver.php:168
/opt/drupal/vendor/instaclick/php-webdriver/lib/WebDriver/Container.php:121
/opt/drupal/vendor/behat/mink-selenium2-driver/src/Selenium2Driver.php:508
/opt/drupal/vendor/behat/mink/src/Driver/CoreDriver.php:296
/opt/drupal/vendor/behat/mink/src/Element/ElementFinder.php:62
/opt/drupal/vendor/behat/mink/src/Element/Element.php:142
/opt/drupal/web/core/tests/Drupal/Tests/UiHelperTrait.php:469
/opt/drupal/web/core/tests/Drupal/Tests/UiHelperTrait.php:445
/opt/drupal/web/core/tests/Drupal/Tests/UiHelperTrait.php:244
/opt/drupal/web/profiles/farm/modules/core/location/tests/src/Functional/LocationTest.php:84
/opt/drupal/vendor/phpunit/phpunit/src/Framework/TestResult.php:728

ERRORS!
Tests: 111, Assertions: 3043, Errors: 1.

I will continue running tests a few more times.

@mstenta
Copy link
Member

mstenta commented Dec 15, 2023

Hmm - just failed again (on attempt #8) which seems to be the issue this PR is trying to prevent:

There was 1 error:

1) Drupal\Tests\farm_equipment\Functional\EquipmentFieldTest::testEquipmentField
Drupal\Core\Installer\Exception\InstallerException: Resolve all issues below to continue the installation. For help configuring your database server, see the <a href="https://www.drupal.org/docs/installing-drupal">installation handbook</a>, or contact your hosting provider.<ul><li>Failed to connect to your database server. The server reports the following message: <em class="placeholder">SQLSTATE[HY000] [2002] Connection refused [Tip: This message normally means that there is no MySQL server running on the system or that you are using an incorrect host name or port number when trying to connect to the server. You should also check that the TCP/IP port you are using has not been blocked by a firewall or port blocking service.] </em>.<ul><li>Is the database server running?</li><li>Does the database exist or does the database user have sufficient privileges to create the database?</li><li>Have you entered the correct database name?</li><li>Have you entered the correct username and password?</li><li>Have you entered the correct database hostname and port number?</li></ul></li></ul>

https://github.com/farmOS/farmOS/actions/runs/7221675735/job/19684915488

@wotnak
Copy link
Collaborator Author

wotnak commented Dec 15, 2023

Yes, looks like adding healthcheck doesn't fix the issue. Will later try to further investigate the cause of it.

@mstenta
Copy link
Member

mstenta commented Dec 15, 2023

Thanks @wotnak! It was a good thing to try!

@wotnak
Copy link
Collaborator Author

wotnak commented Dec 21, 2023

Done some more testing, and it looks like the health check works as it should, but by default dependency between docker compose services waits only on the dependent service start not for when it is declared as healthy.

Pushed an additional change that configures www service to wait for db service to be declared as healthy (for the healthcheck to pass, https://docs.docker.com/compose/startup-order/#control-startup) before starting www service and executing tests that require connection to the database.

From my local testing, this consistently fixes the issue with tests failing because of MariaDB database connection failure.

@mstenta
Copy link
Member

mstenta commented Dec 21, 2023

Ah ha! That makes perfect sense. Thanks @wotnak! I'll re-run the PR tests a bunch of times and see if this does the trick. 👍

@mstenta
Copy link
Member

mstenta commented Dec 21, 2023

I ran tests 10 times and all of them passed!

Requesting @paul121 and @symbioquine's reviews, but this looks good to me! Thanks @wotnak!!!

Copy link
Member

@paul121 paul121 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing!

Copy link
Collaborator

@symbioquine symbioquine left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Worth noting that this has two caveats;

  1. It means the farmOS container can't start up in parallel with the DB starting up so it would be expected to increase the test run time marginally (As opposed to a strategy where just the tests themselves wait to start until both the DB and farmOS containers are ready. I think the strategy in this PR is more maintainable though.)
  2. The depends_on.condition functionality is part of docker-compose V2 so it might not work for some folks on older versions of Docker/Docker-compose

I don't think either of those is a show-stopper, just worth calling out.

@mstenta
Copy link
Member

mstenta commented Dec 22, 2023

Good points @symbioquine - thanks for the review!

Improving the performance and overall run time of tests would be a great thing to focus on next, especially given the fact that we ended up dropping paratest in 3.x recently as well. We've had to take some steps back in order to move forward, but I would love to revisit these things when we can.

In the meantime, this fix is welcome! Thanks again @wotnak!

@mstenta mstenta merged commit a2bd713 into farmOS:3.x Dec 22, 2023
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants