Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vertica urns lack database name #10387

Open
heyromnivan opened this issue Apr 26, 2024 · 2 comments
Open

Vertica urns lack database name #10387

heyromnivan opened this issue Apr 26, 2024 · 2 comments

Comments

@heyromnivan
Copy link

heyromnivan commented Apr 26, 2024

with Vertica URNs don't contain database name

In my case I'm trying to build a joint lineage between Vertica and dbt, and they don't connect. If I understand correctly, it's because tables described by dbt have urn of urn:li:dataPlatform:vertica,dbaname.schema.table, but tables ingested from Vertica have urns of urn:li:dataPlatform:vertica,schema.table.

Originally posted by @heyromnivan in #5483 (comment)

Copy link

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

@github-actions github-actions bot added the stale label May 27, 2024
@heyromnivan
Copy link
Author

0.13.1

It does look like an issue to me as this makes Vertica basically incompatible with any other metadata source. Even though Vertica itself doesn't allow multiple databases, it still has a database concept and external tools (dbt, BI tools) are all designed to take db name into account when constructing urns.

The only way I found is to make a custom source extending VerticaSource and overriding get_identifier method.

from datahub.ingestion.source.sql.vertica import VerticaSource, VerticaConfig
from vertica_sqlalchemy_dialect.base import VerticaInspector

@platform_name("Vertica")
@config_class(VerticaConfig)
# copy here all the decorators from the latest version of VerticaSource
class MyVerticaSource(VerticaSource):
    def get_identifier(self, *, schema: str, entity: str, inspector: VerticaInspector, **kwargs) -> str:
        db_name = self.get_db_name(inspector)
        return f'{db_name}.{schema}.{entity}'

This can only be used with CLI ingestion which cannot be scheduled or run through DataHub UI, so it has to be automated with some external tool.

@github-actions github-actions bot removed the stale label May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant