Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small issues with prerun.py #24

Open
themowski opened this issue Sep 5, 2023 · 2 comments
Open

Small issues with prerun.py #24

themowski opened this issue Sep 5, 2023 · 2 comments

Comments

@themowski
Copy link

Overview

While testing out creation of a custom CKAN 2.10 image, I found some issues with ckan-2.10/base/setup/prerun.py. At a quick glance, these issues also affect the CKAN 2.9 version of the file.

Issue: TypeError can be raised in except subprocess.CalledProcessError blocks

In both init_db() and init_datastore_db(), the following logic occurs:

except subprocess.CalledProcessError as e:
    if "OperationalError" in e.output:
        # (...)

If this except block is encountered, then the script will fail with a TypeError because the type of e.output is bytes, not str.

Demonstration

Using the ckan/ckan-docker setup, I modified prerun.py in the following way, to make the db_command inside init_db() invalid:

def init_db():

    db_command = ["ckan", "-c", ckan_ini, "db", "init3335"]

When I started the Docker containers, the script failed with a TypeError:

[prerun] Initializing or upgrading db - start
Traceback (most recent call last):
  File "/srv/app/prerun.py", line 100, in init_db
    subprocess.check_output(db_command, stderr=subprocess.STDOUT)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ckan', '-c', '/srv/app/ckan.ini', 'db', 'init3335']' returned non-zero exit status 2.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/app/prerun.py", line 206, in <module>
    init_db()
  File "/srv/app/prerun.py", line 103, in init_db
    if "OperationalError" in e.output:
TypeError: a bytes-like object is required, not 'str'

Suggested Fix

There are multiple ways that this could be addressed. One option is to modify the invocations of subprocess.Popen and subprocess.check_output to explicitly open the output stream in text mode. Based on how e.output is used, I think that this is probably what is intended. Per the official documentation for subprocess:

If encoding or errors are specified, or text is true, file objects for stdin, stdout and stderr are opened in text mode using the specified encoding and errors or the io.TextIOWrapper default. The universal_newlines argument is equivalent to text and is provided for backwards compatibility. By default, file objects are opened in binary mode.

To demonstrate, I implemented this change by adding text=True inside the subprocess.check_output() call:

def init_db():

    db_command = ["ckan", "-c", ckan_ini, "db", "init3335"]
    print("[prerun] Initializing or upgrading db - start")
    try:
        subprocess.check_output(db_command, stderr=subprocess.STDOUT, text=True)

Here are the results -- the TypeError does not happen anymore.

[prerun] Initializing or upgrading db - start
2023-09-05 18:51:14,168 INFO  [ckan.cli] Using configuration file /srv/app/ckan.ini
2023-09-05 18:51:14,168 INFO  [ckan.config.environment] Loading static files from public
2023-09-05 18:51:14,389 INFO  [ckan.config.environment] Loading templates from /srv/app/src/ckan/ckan/templates
2023-09-05 18:51:14,740 WARNI [ckanext.reclineview.plugin] The Recline-based views are deprecated andwill be removed in future versions
2023-09-05 18:51:14,766 INFO  [ckan.config.environment] Loading templates from /srv/app/src/ckan/ckan/templates
2023-09-05 18:51:15,828 WARNI [ckan.config.middleware.flask_app] Extensions are excluded from CSRF protection! We allow extensions to run without CSRF protection but it will be forced future releases. Read the documentation for more information on how to add CSRF protection to your extension.
Usage: ckan db [OPTIONS] COMMAND [ARGS]...
Try 'ckan db --help' for help.

Error: No such command 'init3335'.

Traceback (most recent call last):
  File "/srv/app/prerun.py", line 206, in <module>
    init_db()
  File "/srv/app/prerun.py", line 110, in init_db
    raise e
  File "/srv/app/prerun.py", line 100, in init_db
    subprocess.check_output(db_command, stderr=subprocess.STDOUT, text=True)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ckan', '-c', '/srv/app/ckan.ini', 'db', 'init3335']' returned non-zero exit status 2.

It looks like text was added in Python 3.7, which is the minimum supported version for CKAN, so adding that flag should be a reasonable thing to do.

Final Note

As a final note, the documentation on subprocess Exceptions and the subprocess.CalledProcessError docs both seem to suggest that subprocess.Popen() will not raise a CalledProcessError, so it is possible that the except subprocess.CalledProcessError block in init_database_db() is actually unnecessary (or needs to catch a different Exception).

Issue: Unnecessary import of re inside check_solr_connection()

There is an unnecessary import of re in an except block inside check_solr_connection().

There is also a lot of unnecessary trailing whitespace on the lines in that same except block.

@themowski themowski changed the title Small issues with prerun.py Small issues with prerun.py Sep 5, 2023
@themowski
Copy link
Author

themowski commented Sep 14, 2023

I found an additional issue just now.

Issue: Missing return in multiple functions

The check_main_db_connection() and check_datastore_db_connection() functions contain if blocks that are missing a return, based on the message that is printed out.

The fix is easy, just add return after the print() statement.

@themowski
Copy link
Author

themowski commented Nov 29, 2023

I found an additional issue just now.

Issue: Missing return in multiple functions

The check_main_db_connection() and check_datastore_db_connection() functions contain if blocks that are missing a return, based on the message that is printed out.

The fix is easy, just add return after the print() statement.

Looks like this specific issue was resolved in PR #34. The original issues still seem to be relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant