Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve B/R procedures #10792

Closed
etj opened this issue Mar 17, 2023 · 10 comments
Closed

Improve B/R procedures #10792

etj opened this issue Mar 17, 2023 · 10 comments

Comments

@etj
Copy link
Contributor

etj commented Mar 17, 2023

In the B/R code there are many points that may be improved, since they could lead to random issues.

  • Tables could be retrieved in a single step, closing the connection as soon as the needed data is retrieved:

    def dump_db(config, db_name, db_user, db_port, db_host, db_passwd, target_folder):
    """Dump Full DB into target folder"""
    db_host = db_host if db_host is not None else 'localhost'
    db_port = db_port if db_port is not None else 5432
    conn = get_db_conn(db_name, db_user, db_port, db_host, db_passwd)
    curs = conn.cursor()
    try:
    sql_dump = f"""SELECT tablename from pg_tables where tableowner = '{db_user}'"""
    curs.execute(sql_dump)
    pg_all_tables = [table[0] for table in curs.fetchall()]
    pg_tables = []
    if config.gs_data_datasetname_filter:
    for pat in config.gs_data_datasetname_filter:
    pg_tables += glob_filter(pg_all_tables, pat)
    elif config.gs_data_datasetname_exclude_filter:
    pg_tables = pg_all_tables
    for pat in config.gs_data_datasetname_exclude_filter:
    names = ','.join(glob_filter(pg_all_tables, pat))
    for exclude_table in names.split(','):
    pg_tables.remove(exclude_table)
    else:
    pg_tables = pg_all_tables
    print(f"Dumping existing GeoServer Vectorial Data: {pg_tables}")
    empty_folder(target_folder)
    for table in pg_tables:
    print(f"Dump Table: {db_name}:{table}")
    os.system(f"PGPASSWORD=\"{db_passwd}\" {config.pg_dump_cmd} -h {db_host} -p {str(db_port)} -U {db_user} "
    f"-F c -b -t '\"{str(table)}\"' -f {os.path.join(target_folder, f'{table}.dump {db_name}')}")
    conn.commit()
    except Exception:
    try:
    conn.rollback()
    except Exception:
    pass
    traceback.print_exc()
    finally:
    curs.close()
    conn.close()

  • The tables are dumped in binary format, so it is dependent on the PG version, and furthermore files can not be easily inspected:

    os.system(f"PGPASSWORD=\"{db_passwd}\" {config.pg_dump_cmd} -h {db_host} -p {str(db_port)} -U {db_user} "
    f"-F c -b -t '\"{str(table)}\"' -f {os.path.join(target_folder, f'{table}.dump {db_name}')}")

  • Transaction controls are badly implemented: if an operation fails, all the subsequand ones are automatically skipped by the dbms, so the commit/rollaback should be placed inside the loop:

    conn = get_db_conn(db_name, db_user, db_port, db_host, db_passwd)
    # curs = conn.cursor()
    try:
    included_extenstions = ['dump', 'sql']
    file_names = [fn for fn in os.listdir(source_folder)
    if any(fn.endswith(ext) for ext in included_extenstions)]
    for table in file_names:
    print(f"Restoring GeoServer Vectorial Data : {db_name}:{os.path.splitext(table)[0]} ")
    pg_rstcmd = (f"PGPASSWORD=\"{db_passwd}\" {config.pg_restore_cmd} -h {db_host} -p {str(db_port)} -U {db_user} "
    f"--role={db_user} -F c -t \"{os.path.splitext(table)[0]}\" {os.path.join(source_folder, table)} -d {db_name}"
    " -c" if not preserve_tables else "")
    os.system(pg_rstcmd)
    conn.commit()
    except Exception:
    try:
    conn.rollback()
    except Exception:
    pass
    traceback.print_exc()
    finally:
    conn.close()
    def remove_existing_tables(db_name, db_user, db_port, db_host, db_passwd):
    conn = get_db_conn(db_name, db_user, db_port, db_host, db_passwd)
    curs = conn.cursor()
    table_list = f"""SELECT tablename from pg_tables where tableowner = '{db_user}'"""
    try:
    curs.execute(table_list)
    pg_all_tables = [table[0] for table in curs.fetchall()]
    print(f"Dropping existing GeoServer Vectorial Data: {table_list}")
    for pg_table in pg_all_tables:
    print(f"Drop Table: {db_name}:{pg_table} ")
    try:
    curs.execute(f"DROP TABLE \"{pg_table}\" CASCADE")
    except Exception as e:
    print(f"Error Droping Table: {e}")
    conn.commit()
    except Exception as e:
    print(f"Error Removing GeoServer Vectorial Data Tables: {e}")
    try:
    conn.rollback()

  • When dealing with tables, if tables are in different schemas the procedure may break

  • Password are visibile in the process list since they are part of the command line:

    os.system(f"PGPASSWORD=\"{db_passwd}\" {config.pg_dump_cmd} -h {db_host} -p {str(db_port)} -U {db_user} "

  • Logging

    • is quite confused
    • uses print instead of logging
    • doesn't always say what is going to do, so for long tasks it seems it's freezed
  • GWC layers are not recreated

Specifications

  • GeoNode version: linked code is from 4.0.3
@gannebamm
Copy link
Contributor

@Cornacap we just talked about the wonkiness of B/R, maybe this will solve your issue?

@etj

This comment was marked as off-topic.

@Cornacap

This comment was marked as off-topic.

@etj
Copy link
Contributor Author

etj commented Mar 21, 2023

@Cornacap the restore procedure performs these steps:

  • create a backup of the target system
  • removes all the data from the target system
  • restore the content of the backup zip file into the target sytem

The backup of the target system is a safety step: if the restore procedure fails, it tries to restore the last known data in the system; if this step should fail, you still have the old data stored somewhere.

Anyway, the restore procedure should give you a detailed log of what's happening, in order to make it possible a post-mortem investigation.

@Cornacap

This comment was marked as off-topic.

@Cornacap

This comment was marked as off-topic.

@etj

This comment was marked as off-topic.

@Cornacap

This comment was marked as off-topic.

@etj

This comment was marked as off-topic.

@Cornacap

This comment was marked as off-topic.

etj added a commit to etj/geonode that referenced this issue Mar 23, 2023
etj added a commit to etj/geonode that referenced this issue Mar 23, 2023
etj added a commit to etj/geonode that referenced this issue Mar 24, 2023
afabiani added a commit that referenced this issue May 18, 2023
Co-authored-by: etj <etj@geo-solutions.it>
Co-authored-by: Alessio Fabiani <alessio.fabiani@geosolutionsgroup.com>
ridoo pushed a commit to Thuenen-GeoNode-Development/geonode that referenced this issue Jun 2, 2023
* Improve B/R procedures

* fix formatting of geoserver error message

* Fix black formatting

* Fix PEP wanrings

---------

Co-authored-by: Alessio Fabiani <alessio.fabiani@geosolutionsgroup.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants