Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional omitted Complex Input with default format is generated errorneously (?) #633

Open
3 of 8 tasks
fmigneault opened this issue Oct 6, 2021 · 1 comment
Open
3 of 8 tasks

Comments

@fmigneault
Copy link
Collaborator

Description

@cehbrecht @tomkralidis @jachym
I would like to better understand the procedure of handling inputs (how they get generated) for the following specific use case.

Given a process that has the following inputs definition :

[...]
<DataInputs>
  <Input minOccurs="0" maxOccurs="100">
	<ows:Identifier>dataset</ows:Identifier>
	<ows:Title>Dataset</ows:Title>
	<ows:Abstract>Enter a URL pointing to a NetCDF file (optional)</ows:Abstract>
	<ComplexData>
	  <Default>
		<Format>
		  <MimeType>application/x-netcdf</MimeType>
		</Format>
	  </Default>
	  <Supported>
		<Format>
		  <MimeType>application/x-netcdf</MimeType>
		</Format>
	  </Supported>
	</ComplexData>
  </Input>
  <Input minOccurs="0" maxOccurs="100">
	<ows:Identifier>dataset_opendap</ows:Identifier>
	<ows:Title>Remote OpenDAP Data URL</ows:Title>
	<ows:Abstract>Or provide a remote OpenDAP data URL, for example: http://my.opendap/thredds/dodsC/path/to/file.nc</ows:Abstract>
	<ows:Metadata xlink:href="https://www.iana.org/assignments/media-types/media-types.xhtml" xlink:title="application/x-ogc-dods" xlink:type="simple"/>
	<LiteralData>
	  <ows:DataType ows:reference="urn:ogc:def:dataType:OGC:1.1:string">string</ows:DataType>
	  <ows:AnyValue/>
	</LiteralData>
  </Input>
</DataInputs>
[...]

When I submit an execution with only input dataset_opendap provided with some URL string, the _handler(self, request, response) method of the process that ends up being called contains the following request.inputs:

request.inputs = {
  'dataset': [<pywps.inout.inputs.ComplexInput object at 0x7f140d941a10>], 
  'dataset_opendap': deque([<pywps.inout.inputs.LiteralInput object at 0x7f140d956c90>], maxlen=100)
}

My execution XML does not contain dataset, so it gets generated somehow by default following parsing.

<wps100:Execute xmlns:wps100="http://www.opengis.net/wps/1.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" service="WPS" version="1.0.0" xsi:schemaLocation="http://www.opengis.net/wps/1.0.0 http://schemas.opengis.net/wps/1.0.0/wpsExecute_request.xsd">
	<ows110:Identifier xmlns:ows110="http://www.opengis.net/ows/1.1">ncdump</ows110:Identifier>
	<wps100:DataInputs>
		<wps100:Input>
			<ows110:Identifier xmlns:ows110="http://www.opengis.net/ows/1.1">dataset_opendap</ows110:Identifier>
			<wps100:Data>
				<wps100:LiteralData>http://localhost8001/ows/proxy/thredds/dodsC/birdhouse/testdata/ta_Amon_MRI-CGCM3_decadal1980_r1i1p1_199101-200012.nc</wps100:LiteralData>
			</wps100:Data>
		</wps100:Input>
	</wps100:DataInputs>
	<wps100:ResponseForm>
		<wps100:ResponseDocument storeExecuteResponse="true" status="true" lineage="true">
			<wps100:Output asReference="true">
				<ows110:Identifier xmlns:ows110="http://www.opengis.net/ows/1.1">output</ows110:Identifier>
			</wps100:Output>
		</wps100:ResponseDocument>
	</wps100:ResponseForm>
</wps100:Execute>

I'm trying to understand why the dataset input even gets generated in request.inputs following parsing since it is omitted completely from the request.
This input is causing me problems, because I need to do some post-processing to convert PyWPS inputs into my package definitions.

Is there some way that I need to employ to detect omitted inputs to discard them explicitly vs real inputs with submitted data?
Is there some flag that I would guarantee me that this input is only the default definition and does not contain any actual data?

I cannot rely on data field to detect omitted inputs because it gets filled by the "default format" application/x-netcdf, which could be submitted real data contents:

{"mimeType": "application/x-netcdf", "encoding": null, "schema": null, "maximumMegabytes": null, "default": true}

The only (very hackish/unreliable) field I could use to detect inputs to drop is file which contains a reference to {workdir}/input instead of {workdir}/input_{uuid}.
Any better guidance would be greatly appreciated.

Expand this to see full details contents of requests.inputs["dataset"]

{ComplexInput}  
    _data_format = {Format} 
        _encoding = {NoneType} None
        _extension = {NoneType} None
        _mime_type = {str} 'application/x-netcdf'
        _schema = {NoneType} None
        encoding = {str} ''
        extension = {str} ''
        json = {dict} {'mime_type': 'application/x-netcdf', 'encoding': '', 'schema': '', 'extension': ''}
        mime_type = {str} 'application/x-netcdf'
        schema = {str} ''
    _default = {dict} {'mimeType': 'application/x-netcdf', 'encoding': None, 'schema': None, 'maximumMegabytes': None, 'default': True}
    _default_type = {int} 3
    _iohandler = {DataHandler} 
        _data = {dict}  
        _file = {str} '/tmp/weaver-hybrid/pywps_process_pw40isee/input'
        _ref = {weakref} 
        _stream = {NoneType} None
        base64 = {str} 'Traceback (most recent call last):\n  File "/opt/pycharm-pro/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_resolver.py", line 178, in _getPyDictionary\n    attr = getattr(var, n)\n  File "/home/francis/dev/miniconda/envs/weaver-py3/lib/python3.7/site-pa
        data = {dict} {'mimeType': 'application/x-netcdf', 'encoding': None, 'schema': None, 'maximumMegabytes': None, 'default': True}
        file = {str} '/tmp/weaver-hybrid/pywps_process_pw40isee/input'
        mem = {NoneType} None
        post_data = {str} 'Traceback (most recent call last):\n  File "/opt/pycharm-pro/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_resolver.py", line 178, in _getPyDictionary\n    attr = getattr(var, n)\n  File "/home/francis/dev/miniconda/envs/weaver-py3/lib/python3.7/site-pa
        prop = {str} 'data'
        size = {int} 0
        stream = {StringIO} <_io.StringIO object at 0x7f140d8aa690>
        url = {str} 'file:///tmp/weaver-hybrid/pywps_process_pw40isee/input'
    _supported_formats = {tuple} 
    _workdir = {str} '/tmp/weaver-hybrid/pywps_process_pw40isee'
    abstract = {str} 'Enter a URL pointing to a NetCDF file (optional)'
    as_reference = {bool} False
    base64 = {str} 'Traceback (most recent call last):\n  File "/opt/pycharm-pro/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_resolver.py", line 178, in _getPyDictionary\n    attr = getattr(var, n)\n  File "/home/francis/dev/miniconda/envs/weaver-py3/lib/python3.7/site-pa
    data = {dict} {'mimeType': 'application/x-netcdf', 'encoding': None, 'schema': None, 'maximumMegabytes': None, 'default': True}
    data_format = {Format} 
    data_set = {bool} True
    extension = {str} ''
    file = {str} '/tmp/weaver-hybrid/pywps_process_pw40isee/input'
    identifier = {str} 'dataset'
    inpt = {dict} {}
    json = {dict} {'identifier': 'dataset', 'title': 'Dataset', 'abstract': 'Enter a URL pointing to a NetCDF file (optional)', 'keywords': [], 'metadata': [], 'type': 'complex', 'data_format': {'mime_type': 'application/x-netcdf', 'encoding': '', 'schema': '', 'extension': ''}, 'asreference': False, 'supported_formats': [{'mime_type': 'application/x-netcdf', 'encoding': '', 'schema': '', 'extension': ''}], 'workdir': '/tmp/weaver-hybrid/pywps_process_pw40isee', 'mode': 0, 'min_occurs': 0, 'max_occurs': 100, 'translations': None, 'data': "", 'mimetype': 'application/x-netcdf'}
    keywords = {list} []
    max_occurs = {int} 100
    metadata = {list} []
    method = {str} ''
    min_occurs = {int} 0
    post_data = {str} 'Traceback (most recent call last):\n  File "/opt/pycharm-pro/plugins/python/helpers/pydev/_pydevd_bundle/pydevd_resolver.py", line 178, in _getPyDictionary\n    attr = getattr(var, n)\n  File "/home/francis/dev/miniconda/envs/weaver-py3/lib/python3.7/site-pa
    prop = {str} 'data'
    size = {int} 0
    source_type = {int} 3
    stream = {StringIO}  
    supported_formats = {tuple} 
    title = {str} 'Dataset'
    translations = {NoneType} None
    url = {str} 'file:///tmp/weaver-hybrid/pywps_process_pw40isee/input'
    uuid = {NoneType} None
    valid_mode = {int} 0
    workdir = {str} '/tmp/weaver-hybrid/pywps_process_pw40isee'
   

Environment

  • operating system:
  • Python version: 3.7
  • PyWPS version: tried with both with 4.4.5 and 4.5.0, same result
  • source/distribution
  • git clone
  • Debian
  • PyPI
  • zip/tar.gz
  • other (please specify):
  • web server
  • Apache/mod_wsgi
  • CGI
  • other (please specify): gunicorn with Pyramid WebApp that nests PyWPS's service

Steps to Reproduce

Using this process:
https://github.com/bird-house/hummingbird/blob/master/hummingbird/processes/wps_ncdump.py

It is executed indirectly by Weaver using this definition:
https://github.com/crim-ca/weaver/blob/4.1.0/weaver/processes/wps_package.py#L758

@fmigneault
Copy link
Collaborator Author

More detail...
Input dataset gets generated here:

pywps/pywps/app/Service.py

Lines 115 to 120 in 793ab34

if not request_inputs:
if inpt._default is not None:
if not inpt.data_set and isinstance(inpt, ComplexInput):
inpt._set_default_value()
data_inputs[inpt.identifier] = [inpt.clone()]

At that point, following values are defined:

request_inputs = None
inpt._default = {'mimeType': 'application/x-netcdf', 'encoding': None, 'schema': None, 'maximumMegabytes': None, 'default': True}
inpt._default_type = SOURCE_TYPE.DATA
inpt.data_set = False

The method parameter wps_request.inputs contains the following:

[
  {
    'identifier': 'dataset_opendap', 
    'data': 'http://localhost8001/ows/proxy/thredds/dodsC/birdhouse/testdata/ta_Amon_MRI-CGCM3_decadal1980_r1i1p1_199101-200012.nc', 
    'uom': '', 
    'datatype': ''
  }
]

It looks like the inpt._set_default_value() should not get called in this case, because it is not a default value that eventually gets set, but a default format definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant