Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Amazon Comprehend Service #39592

Conversation

gopidesupavan
Copy link
Contributor

@gopidesupavan gopidesupavan commented May 13, 2024

Added Amazon Comprehend Start Pii Entities Detection Job Operator Doc, Hook,
Operator, Sensor, Trigger, Waiter, Unit Test, System Test.

At present it supports only Pii Entities Detection Job. Remaining Comprehend services coming next.

https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/comprehend/client/start_pii_entities_detection_job.html

Sample Dag:

from datetime import datetime

from airflow import DAG
from airflow.providers.amazon.aws.operators.comprehend import ComprehendStartPiiEntitiesDetectionJobOperator

with DAG(
    dag_id="comprehend_testing",
    schedule_interval=None,
    start_date=datetime(2021, 1, 1),
    tags=["comprehend pii entities detection"],
    catchup=False,
) as dag:
    pii_entities_detection_job = ComprehendStartPiiEntitiesDetectionJobOperator(
        task_id="pii_entities_detection_job",
        input_data_config={"S3Uri": f"s3://aws-comprehend-testing-hpl7cy/sample_data.txt",
                           "InputFormat": "ONE_DOC_PER_LINE",
                           },
        output_data_config={"S3Uri": f"s3://aws-comprehend-testing-hpl7cy/redacted_output/"},
        mode="ONLY_REDACTION",
        language_code="en",
        data_access_role_arn="arn:aws:iam::{ACCOUNT_ID}:role/ComprehendRole",
        start_pii_entities_kwargs={"RedactionConfig": {"PiiEntityTypes": ["NAME", "ADDRESS"],
                                                       "MaskMode": "REPLACE_WITH_PII_ENTITY_TYPE"}}
    )
image image

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good job! Thanks for following the good practises and everything. I just added one question but overall it is really good!

tests/system/providers/amazon/aws/example_comprehend.py Outdated Show resolved Hide resolved
Copy link
Contributor

@o-nikolas o-nikolas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awesome PR! Super thorough and ticks all the boxes. We'll use this as an example for future folks, great work! 😃

@gopidesupavan
Copy link
Contributor Author

This is an awesome PR! Super thorough and ticks all the boxes. We'll use this as an example for future folks, great work! 😃

Thank you so much for reviewing this 😄 , Applied all your feedback.
The quick start guides are really helpful and well documented.

Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome PR!

@gopidesupavan
Copy link
Contributor Author

Awesome PR!

Thank you @vincbeck 😃

@vincbeck vincbeck merged commit 9dd7752 into apache:main May 15, 2024
41 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants