Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSW harvester OutputSchema config support #258 #259

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

ccancellieri
Copy link
Contributor

@ccancellieri ccancellieri commented Oct 22, 2021

This will close #258 adding support to an additional param into the csw json config:

"output_schema": "mdb"

mdb is the namespace of the schema to use (in this case it's an iso19115-3.2018)

{'mdb':'http://standards.iso.org/iso/19115/-3/mdb/2.0'}

Full Example below:

{
"user":"ckan_admin",
"cql": "dc:identifier = '0-----292--------------------------'",
"output_schema": "mdb",
"default_tags": [ ],
"default_extras": {},
"group_mapping": {},
"read_only": false
}

Doing this the CSW harvester will receive the metadata in the configured outputschema (must be supported by the target csw server).

@ccancellieri
Copy link
Contributor Author

Can also help
#209
#210
#219

Copy link
Member

@amercader amercader left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and useful @ccancellieri. I just added some minor comments



# load config
self._set_source_config(harvest_object.source.config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you document the new output_schema option and its default value in here so others are aware of it?

https://github.com/ckan/ckanext-spatial/blob/master/doc/harvesters.rst

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added fallback to default in case the server is not supporting iso19139 -> 19115 transformation
the fallback will log and switch back to default asking for iso19139 -> iso19139.

@@ -70,14 +70,36 @@ class CswService(OwsService):
def __init__(self, endpoint=None):
super(CswService, self).__init__(endpoint)
self.sortby = SortBy([SortProperty('dc:identifier')])
# check capabilities
_cap = self.getcapabilities(endpoint)['response']
self.capabilities=etree.ElementTree(etree.fromstring(_cap))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to follow PEP8 guidelines, specially spacing between = and , :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I can't validate the whole project and my code editor is not helping me, good catch, I'll try to fix my bad.

constraints = []
csw = self._ows(**kw)

# fetch target csw server capabilities for requested output schema
output_schemas=self._get_output_schemas('GetRecords')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this call to the __init__() method to avoid duplication and multiple calls to GetCapabilities?
Something like:

def __init__(self, endpoint=None):
    _cap = self.getcapabilities(endpoint)['response']
    self.capabilities = etree.ElementTree(etree.fromstring(_cap))
    self.output_schemas = {
        'GetRecords': self._get_output_schemas('GetRecords'),
        'GetRecordById': self._get_output_schemas('GetRecordById'),
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

# fetch target csw server capabilities for requested output schema
output_schemas=output_schemas = self.output_schemas['GetRecordById']
if not output_schemas.get(outputschema):
raise CswError('Output schema \'{}\' not supported by target server: '.format(output_schemas))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably here I should be more tolerant Logging ERROR and returning.

@frafra
Copy link
Contributor

frafra commented Feb 18, 2022

This is great :) Do you need any help with this PR?

@frafra
Copy link
Contributor

frafra commented Mar 1, 2022

I get this generic error after applying this PR (rebased on master) : Error contacting the CSW server: can only parse strings. I think there is a problem with the changes made to the __init__ function of CswService.

@ccancellieri
Copy link
Contributor Author

Ciao @frafra thanks to look into this.
I think something bad could happen here:

record = self._xmd(etree.fromstring(csw.response))

Would you be able to check the response provided by the server?

I'm apologize but I'm not using this plugin anymore, I changed approach, so my help can be very limited on this.

@frafra
Copy link
Contributor

frafra commented Mar 4, 2022

@ccancellieri I think you are right, I will look into that.
Which approach have you taken, If I may ask? I am interested into harvesting data from GeoNetwork too.

@ccancellieri
Copy link
Contributor Author

ccancellieri commented Mar 7, 2022 via email

markstuart added a commit to data-govt-nz/ckanext-spatial that referenced this pull request Jun 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Using CSW harvester OutputSchema is ignored while gmd is imposed
3 participants