Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new catalog indexer-worker #4147

Open
Tracked by #3925
stacimc opened this issue Apr 17, 2024 · 0 comments · May be fixed by #4330
Open
Tracked by #3925

Implement new catalog indexer-worker #4147

stacimc opened this issue Apr 17, 2024 · 0 comments · May be fixed by #4330
Assignees
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs

Comments

@stacimc
Copy link
Contributor

stacimc commented Apr 17, 2024

Problem

This issue tracks the creation of a new catalog-indexer-worker Docker image. It does not include adding any orchestration steps to the DAG, or any infrastructure work to actually create ASGs.

Description

First we will create a new indexer-worker directory under catalog/dags/data_refresh, which will contain the contents of the new indexer worker. This implementation already exists in the ingestion server. The relevant pieces can be pulled out and refactored slightly to fit the new, much smaller image. Broadly, this is the mapping of existing files to new files needed:

  • api.py will defined the API for the worker, and is refactored from the existing indexer_worker.py. It must be refactored to add task state and a task_status endpoint, which takes a task_id and returns the status and progress of the given task.
  • indexer.py will contain the logic for the actual indexing task. It will be refactored from the existing indexer.py; specifically all we need is the replicate function.
  • elasticsearch_models.py, pulled from the file of the same name in the ingestion server. Defines a mapping from a database record to an Elasticsearch document.
  • Utility files for helper functions for connecting to Elasticsearch and Postgres (e.g. es_helpers.py)

The Dockerfile can be copied from the existing ingestion server. It should be updated to reference the new file structure, and to expose only a single port, which should be distinguished from the ports currently in use by the ingestion server (8001 and 8002). Other necessary files, including env.docker, .dockerignore, Pipfile, and gunicorn.conf.py can all be copied in from the existing ingestion server as well.

Finally we will update the monorepo’s root docker-compose.yml to add a new catalog-indexer-worker service. Its build context should point to the nested data_refresh/indexer_worker directory, and it should map the exposed port to enable the API to be reached by the catalog.

When this work is complete, it should be possible to run just catalog/shell and curl the new indexer worker. The existing ingestion-server and indexer-worker services are unaffected (it is still possible to run legacy data refreshes locally and in production).

Additional context

See this section of the IP.

@stacimc stacimc added 🟨 priority: medium Not blocking but should be addressed soon ✨ goal: improvement Improvement to an existing user-facing feature 💻 aspect: code Concerns the software code in the repository 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Apr 17, 2024
@stacimc stacimc self-assigned this May 7, 2024
@stacimc stacimc linked a pull request May 14, 2024 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository ✨ goal: improvement Improvement to an existing user-facing feature 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Status: 🏗 In Progress
Development

Successfully merging a pull request may close this issue.

1 participant