Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/80 new upload widget #83

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open

Conversation

aivuk
Copy link
Contributor

@aivuk aivuk commented Dec 9, 2022

Adds new endpoint resource_create_with_schema to be used with new uploader widget at https://github.com/aivuk/ckan-uploader.

@aivuk aivuk added the feature label Dec 9, 2022
@aivuk aivuk requested a review from amercader December 9, 2022 22:52
@aivuk aivuk self-assigned this Dec 9, 2022
_run_sync_validation(
resource_id, local_upload=is_local_upload, new_resource=True)
for plugin in plugins.PluginImplementations(IDataValidation):
if not plugin.can_validate(context, data_dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why skip validation if one plugin can do it and another can't? Shouldn't we just let the one plugin handle it?

@amercader
Copy link
Member

amercader commented Dec 16, 2022

@aivuk Great work so far, but I think it's worth do a couple of refactors now to make our life easier down the line:

  • Let's call blueprint endpoints rather than the API directly. This will make easier to handle authentication, CSRF and removes the need to know the package_id at the component level. If we make them return the same output as now you'll only have to change the URL in the component. Below is an overview of the changes to get you started
  • Let's not override resource_form.html, otherwise we are overriding ckanext-scheming entirely. All necessary markup should be in ckan_uploader.html as this is the snippet that scheming will use. Let's not use hidden fields to pass params, you can pass them directly to the module with data- arguments, eg data-module-resource_id="{{ data.id }}"

I'll keep playing with it and provide more feedback

Snippet / template:

diff --git a/ckanext/validation/templates/scheming/form_snippets/ckan_uploader.html b/ckanext/validation/templates/scheming/form_snippets/ckan_uploader.html
index b601c59..7085341 100644
--- a/ckanext/validation/templates/scheming/form_snippets/ckan_uploader.html
+++ b/ckanext/validation/templates/scheming/form_snippets/ckan_uploader.html
@@ -1,6 +1,12 @@
-  <input name="id" value="{{ data.id }}" type="hidden"/>
-  <div class="form-group control-medium">
-    <label class="control-label" for="ckan_uploader">{{ _('File') }}</label>
-    <div id="ckan_uploader" data-module="ckan-uploader" data-module-upload_url="{{ config.get('ckan.site_url', '') }}"></div>
-  </div>
 
+<div class="form-group control-medium">
+  <label class="control-label" for="ckan_uploader">{{ _('File') }}</label>
+ <!-- TODO: arguments like resource_id -->
+  <div id="ckan_uploader" data-module="ckan-uploader" data-module-upload_url="{{ config.get('ckan.site_url', '') }}"></div>
+</div>
+
+{% asset 'ckanext-validation/ckan-uploader-js' %}
+{% asset 'ckanext-validation/ckan-uploader-css' %}

Blueprint changes:

diff --git a/ckanext/validation/blueprints.py b/ckanext/validation/blueprints.py
index 3ec0dc3..4d7ecde 100644
--- a/ckanext/validation/blueprints.py
+++ b/ckanext/validation/blueprints.py
@@ -2,7 +2,20 @@
 
 from flask import Blueprint
 
-from ckantoolkit import c, NotAuthorized, ObjectNotFound, abort, _, render, get_action
+from ckan.lib.navl.dictization_functions import unflatten
+from ckan.logic import tuplize_dict, clean_dict, parse_params
+
+from ckantoolkit import (
+    c, g,
+    NotAuthorized,
+    ObjectNotFound,
+    abort,
+    _,
+    render,
+    get_action,
+    request,
+)
+
 
 validation = Blueprint("validation", __name__)
 
@@ -44,3 +57,50 @@ def read(id, resource_id):
 validation.add_url_rule(
     "/dataset/<id>/resource/<resource_id>/validation", view_func=read
 )
+
+
+def _get_data():
+    data = clean_dict(
+	unflatten(tuplize_dict(parse_params(request.form)))
+    )
+    data.update(clean_dict(
+	unflatten(tuplize_dict(parse_params(request.files)))
+    ))
+
+
+
+def resource_file_create(id):
+
+    # Get data from the request
+    data_dict = _get_data()
+
+    # Call resource_create
+    # TODO: error handling
+    context = {
+        'user': g.user,
+    }
+    data_dict["package_id"] = id
+    resource = get_action("resource_create")(context, data_dict)
+
+    # If it's tabular (local OR remote), infer and store schema
+    if is_tabular(resource):
+        update_resource = t.get_action('resource_table_schema_infer')(
+            context, {'resource_id': resource_id, 'store_schema': True}
+        )
+
+    # Return resource
+    # TODO: set response format as JSON
+    return resource
+
+
+def resource_file_update(id, resource_id):
+    # TODO: same as create, you can reuse as much code as needed
+    pass
+
+
+validation.add_url_rule(
+    "/dataset/<id>/resource/file", view_func=resource_file_create, methods=["POST"]
+)
+validation.add_url_rule(
+    "/dataset/<id>/resource/<resource_id>/file", view_func=resource_file_update, methods=["POST"]
+)

@@ -145,9 +145,10 @@ def _process_schema_fields(self, data_dict):
schema_upload = data_dict.pop(u'schema_upload', None)
schema_url = data_dict.pop(u'schema_url', None)
schema_json = data_dict.pop(u'schema_json', None)
if isinstance(schema_upload, ALLOWED_UPLOAD_TYPES) and schema_upload.content_length > 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason to check content length is because a choice not to upload is sometimes represented as a zero-length file. Taking out the check might mean that a request is incorrectly treated as an upload when it isn't one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was fixed in d0f780

@aivuk
Copy link
Contributor Author

aivuk commented Jan 9, 2023

Great @amercader, I started to work on that but I still couldn't find a way to get the package_id from the upload component. If I change the uploader to be loaded from its own scheming template, I need to access this id to know which endpoint to use (create or update a resource)

@amercader
Copy link
Member

@aivuk You are right, the changes I suggested (which we should still implement) don't solve the uploader snippet not having access to the dataset id. We need to resort to extracting it from the URL. I'll discuss with Ian (scheming maintainer) to see if we can change the snippets upstream so the resource field snippets get the dataset id properly.
In the meantime let's do something like this in the snippet:

diff --git a/ckanext/validation/templates/scheming/form_snippets/ckan_uploader.html b/ckanext/validation/templates/scheming/form_snippets/ckan_uploader.html
index b601c59..d2224e9 100644
--- a/ckanext/validation/templates/scheming/form_snippets/ckan_uploader.html
+++ b/ckanext/validation/templates/scheming/form_snippets/ckan_uploader.html
@@ -1,3 +1,6 @@
+  {% set package_id = data.package_id or h.get_package_id_from_resource_url() %}
+
   <input name="id" value="{{ data.id }}" type="hidden"/>
   <div class="form-group control-medium">
     <label class="control-label" for="ckan_uploader">{{ _('File') }}</label>

To register the helper:

diff --git a/ckanext/validation/helpers.py b/ckanext/validation/helpers.py
index b6c856d..d43a489 100644
--- a/ckanext/validation/helpers.py
+++ b/ckanext/validation/helpers.py
@@ -1,8 +1,9 @@
 # encoding: utf-8
 import json
+import re
 
 from ckan.lib.helpers import url_for_static
-from ckantoolkit import url_for, _, config, asbool, literal, h
+from ckantoolkit import url_for, _, config, asbool, literal, h, request
 
 
 def get_validation_badge(resource, in_listing=False):
@@ -99,3 +100,9 @@ def bootstrap_version():
 
 def use_webassets():
     return int(h.ckan_version().split('.')[1]) >= 9
+
+
+def get_package_id_from_resource_url():
+    match = re.match("/dataset/(.*)/resource/", request.path)
+    if match:
+        return model.Package.get(match.group(1)).id
diff --git a/ckanext/validation/plugin/__init__.py b/ckanext/validation/plugin/__init__.py
index 5ea63d8..9c3211c 100644
--- a/ckanext/validation/plugin/__init__.py
+++ b/ckanext/validation/plugin/__init__.py
@@ -27,6 +27,7 @@ from ckanext.validation.helpers import (
     bootstrap_version,
     validation_dict,
     use_webassets,
+    get_package_id_from_resource_url,
 )
 from ckanext.validation.validators import (
     resource_schema_validator,
@@ -117,6 +118,7 @@ to create the database tables:
             u'bootstrap_version': bootstrap_version,
             u'validation_dict': validation_dict,
             u'use_webassets': use_webassets,
+            'get_package_id_from_resource_url': get_package_id_from_resource_url,
         }
 
     # IResourceController

@aivuk aivuk marked this pull request as ready for review February 1, 2023 14:09
@amercader
Copy link
Member

@aivuk great work! There are still some things to iron out though:

  • Remind me why do we need to modify the resource create and resource update actions included in ckanext-validation? IIRC
    we moved the infer schema logic to its own action to avoid having to touch these

  • There are problems in sync mode (ckanext.validation.run_on_create_sync/ckanext.validation.run_onupdatesync are True), because we are calling resource_create/resource_update to create a temp resource, to store the schema, etc and we don't want to run the validation at that point, just when the user actually saves the resource. I'll try to work on this as it's related to other issues I've solved in the past

  • I really don't like the UI of the File / URL fields:

    • Let's try to mimic as much as possible the standard file/ url picker field both in terms of design and functionality. I know it's more work but it does a good job of clearly indicating two different options (file or UtRL), allows to clear the value and switch to the other type ,etc

Screenshot 2023-02-02 at 15-11-34 Add resource Test dataset 23222-CKAN Catàleg Dados

  • It's good that the button label changes but I think it could be made clearer to the user what's happening:

    • Can we display a progress bar or at least a spinner while the file is uploaded and the schema inferred?
    • Perhaps when we have received the schema and the schema field is populated we could highlight the background slightly for a few seconds
  • Regardless of the UI, if I try to submit a URL to an external file I get an exception

This is as far as I got. I'll keep playing with it and give more feedback

Co-authored-by: Adrià Mercader <amercadero@gmail.com>
@aivuk
Copy link
Contributor Author

aivuk commented Feb 6, 2023

  • Remind me why do we need to modify the resource create and resource update actions included in ckanext-validation? IIRC
    we moved the infer schema logic to its own action to avoid having to touch these

In the past I was using the resource_create and resource_update to create/update a resource using the ckan-uploader widget. I did need to have the returned schema to use it on the schema field of the resource edit form. But currently, with the blueprint endpoint I'm not doing the infer in the resource_create / resource_update action anymore, but on the blueprint. I just commented the lines doing the inference in the actions, it's better to remove them

@aivuk
Copy link
Contributor Author

aivuk commented Feb 6, 2023

  • Can we display a progress bar or at least a spinner while the file is uploaded and the schema inferred?

  • Perhaps when we have received the schema and the schema field is populated we could highlight the background slightly for a few seconds

Sorry, forgot to change the color of the progress bar in my last update. Here is how it looks:

Screencast.from.02-06-2023.11.48.15.AM.webm

It's simple to change the color, we just need to change it on https://github.com/frictionlessdata/ckan-uploader/blob/main/src/lib/CkanUploader.svelte#L167

@amercader
Copy link
Member

Thanks @aivuk, btw this is the screenshot I meant to include in my previous message:
Screenshot 2023-02-06 at 12-08-39 Add resource - Test Data 1 - CKAN Demo

Another think I missed is adding tests. We should add some tests covering at least the endpoints (expected responses, resources created / updated etc)

@aivuk
Copy link
Contributor Author

aivuk commented Feb 8, 2023

@amercader My last updates is mimicking the UI from the old widget and I solved the bug when using external URLs, please look at it when possible. thanks!

Copy link
Contributor Author

@aivuk aivuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nicely done, I just don't know what did you change in the ckan-uploader.js (or if you changed anything at all or just built a new one).

It's failing on the CI tests. I will pull it and see if it's failing in my machine also


data = {
"url": url,
# CKAN does not refresh the format, see https://github.com/ckan/ckan/issues/7415
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the new widget, I'm retrieving the format when the user replace the uploaded resource and changing it on the resource edit form. if the user save the changes the new format is updated but I agree with you, it would be better to do it in the backend

@amercader
Copy link
Member

@aivuk sorry I didn't mean to push a new build of uploader, was just playing around. I'll revert it.

I got these failures locally, I had to upgrade to the latest framework version to get rid of them. I'll push that too.
I'm cleaning up things a bit, I'll let you know when it's ready to review

These should be as in current master, ie identical to the upstream core
ones except for the bit where the sync validation is triggered
Rather than try to deal with complex logic on the plugin hooks with
dodgy context keys, it offers a way to shut down completely the
validation process in a specific block of code. Works by explicitly
setting all the validation config options to False and restoring the
previous values afterwards.
It makes more sense to do it when the user actually saves the final
resource, as the file or the schema might have changed
Refactor the schema file processing function to work both when an empty
file upload is sent (ie when pasting a text schema) and the tests (where
upload.content_length is always 0)
schema = schema_url
if schema_json:
schema = schema_json
import ipdb; ipdb.set_trace()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this line be here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants