Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NoSuchElementException when parsing xoai #66

Open
cmacdonald opened this issue Jul 31, 2017 · 8 comments
Open

NoSuchElementException when parsing xoai #66

cmacdonald opened this issue Jul 31, 2017 · 8 comments

Comments

@cmacdonald
Copy link

cmacdonald commented Jul 31, 2017

Stacktrace as follows:

Exception in thread "main" java.util.NoSuchElementException
	at org.codehaus.stax2.ri.Stax2EventReaderImpl.throwEndOfInput(Stax2EventReaderImpl.java:453)
	at org.codehaus.stax2.ri.Stax2EventReaderImpl.nextEvent(Stax2EventReaderImpl.java:242)
	at com.lyncode.xml.XmlReader.next(XmlReader.java:134)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:43)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parseElement(MetadataParser.java:44)
	at org.dspace.xoai.serviceprovider.parsers.MetadataParser.parse(MetadataParser.java:34)
	at org.dspace.xoai.serviceprovider.parsers.RecordParser.parse(RecordParser.java:56)
	at org.dspace.xoai.serviceprovider.parsers.ListRecordsParser.next(ListRecordsParser.java:60)
	at org.dspace.xoai.serviceprovider.handler.ListRecordHandler.nextIteration(ListRecordHandler.java:71)
	at org.dspace.xoai.serviceprovider.lazy.ItemIterator.hasNext(ItemIterator.java:32)
	at org.dspace.xoai.serviceprovider.lazy.ItemIterator.<init>(ItemIterator.java:22)
	at org.dspace.xoai.serviceprovider.ServiceProvider.listRecords(ServiceProvider.java:57)

Minimum reproducible:

OAIClient oaiClient = new HttpOAIClient("http://repository.abertay.ac.uk/oai/request");
context.withOAIClient(oaiClient);
ServiceProvider ssoarOaiPmhEndpoint = new ServiceProvider(context);
ListRecordsParameters parameters = new ListRecordsParameters();
parameters.withMetadataPrefix("xoai");
ssoarOaiPmhEndpoint.listRecords(parameters);

Example record at: view-source:http://repository.abertay.ac.uk/oai/request?verb=GetRecord&metadataPrefix=xoai&identifier=oai:repository.abertay.ac.uk:10373/1861

parseElement is failing at: parsing of license. Example is

<element name="license"><field name="bin">Tk9URTogVGhpcyBpcyB0aGUgZGVmYXVsdCBsaWNlbmNlIHRoYXQgdGhlIFVuaXZlcnNpdHkgb2YgQWJlcnRheSAKRHVuZGVlIHJlcXVpcmVzIGFsbCBzdWJtaXR0ZXJzIHRvIGdyYW50LgoKTk9OLUVYQ0xVU0lWRSBESVNUUklCVVRJT04gTElDRU5DRQoKQnkgYWdyZWVpbmcgYW5kIHN1Ym1pdHRpbmcgdGhpcyBsaWNlbmNlLCB5b3UgKHRoZSBhdXRob3IocyksIApjb3B5cmlnaHQgb3duZXIgb3Igbm9taW5hdGVkIGFnZW50KSBncmFudHMgdG8gVW5pdmVyc2l0eSBvZiBBYmVydGF5IApEdW5kZWUgKFVBRCkgdGhlIG5vbi1leGNsdXNpdmUgcmlnaHQgdG8gcmVwcm9kdWNlLCB0cmFuc2xhdGUgCihhcyBkZWZpbmVkIGJlbG93KSwgYW5kL29yIGRpc3RyaWJ1dGUgeW91ciBzdWJtaXNzaW9uIChpbmNsdWRpbmcgdGhlIAphYnN0cmFjdCkgd29ybGR3aWRlIGluIHByaW50IGFuZCBlbGVjdHJvbmljIGZvcm1hdCBhbmQgaW4gYW55IG1lZGl1bSwgCmluY2x1ZGluZyBidXQgbm90IGxpbWl0ZWQgdG8gYXVkaW8gb3IgdmlkZW8uCgpZb3UgYWdyZWUgdGhhdCBVQUQgbWF5LCB3aXRob3V0IGNoYW5naW5nIHRoZSBjb250ZW50LCB0cmFuc2xhdGUgdGhlCnN1Ym1pc3Npb24gdG8gYW55IG1lZGl1bSBvciBmb3JtYXQgZm9yIHRoZSBwdXJwb3NlIG9mIHByZXNlcnZhdGlvbi4gCllvdSBhbHNvIGFncmVlIHRoYXQgVUFEIG1heSBrZWVwIG1vcmUgdGhhbiBvbmUgY29weSBvZiB0aGlzIApzdWJtaXNzaW9uIGZvciBwdXJwb3NlcyBvZiBzZWN1cml0eSwgYmFjay11cCBhbmQgcHJlc2VydmF0aW9uLgoKWW91IHJlcHJlc2VudCB0aGF0IHRoZSBzdWJtaXNzaW9uIGlzIG9yaWdpbmFsIHdvcmssIGFuZCB0aGF0IHlvdQpoYXZlIHRoZSByaWdodCB0byBncmFudCB0aGUgcmlnaHRzIGNvbnRhaW5lZCBpbiB0aGlzIGxpY2VuY2UuIFlvdSAKYWxzbyByZXByZXNlbnQgdGhhdCB5b3VyIHN1Ym1pc3Npb24gZG9lcyBub3QsIHRvIHRoZSBiZXN0IG9mIHlvdXIgCmtub3dsZWRnZSwgaW5mcmluZ2UgdXBvbiBhbnlvbmUncyBjb3B5cmlnaHQuCgpJZiB0aGUgc3VibWlzc2lvbiBjb250YWlucyBtYXRlcmlhbCBmb3Igd2hpY2ggeW91IG9yIHlvdXIgcHVibGlzaGVyCmRvIG5vdCBob2xkIGNvcHlyaWdodCwgeW91IHJlcHJlc2VudCB0aGF0IHlvdSBoYXZlIG9idGFpbmVkIHRoZQp1bnJlc3RyaWN0ZWQgcGVybWlzc2lvbiBvZiB0aGUgY29weXJpZ2h0IG93bmVyIHRvIGdyYW50IFVBRCB0aGUKcmlnaHRzIHJlcXVpcmVkIGJ5IHRoaXMgbGljZW5jZSwgYW5kIHRoYXQgc3VjaCB0aGlyZC1wYXJ0eSBvd25lZAptYXRlcmlhbCBpcyBjbGVhcmx5IGlkZW50aWZpZWQgYW5kIGFja25vd2xlZGdlZCB3aXRoaW4gdGhlIHRleHQgb3IKY29udGVudCBvZiB0aGUgc3VibWlzc2lvbi4KCklGIFRIRSBTVUJNSVNTSU9OIElTIEJBU0VEIFVQT04gV09SSyBUSEFUIEhBUyBCRUVOIFNQT05TT1JFRCBPUiAKU1VQUE9SVEVEIEJZIEFOIEFHRU5DWSBPUiBPUkdBTklaQVRJT04gT1RIRVIgVEhBTiBVQUQsIFlPVSBSRVBSRVNFTlQgClRIQVQgWU9VIEhBVkUgRlVMRklMTEVEIEFOWSBSSUdIVCBPRiBSRVZJRVcgT1IgT1RIRVIgT0JMSUdBVElPTlMgClJFUVVJUkVEIEJZIFNVQ0ggQ09OVFJBQ1QgT1IgQUdSRUVNRU5ULgoKVUFEIHdpbGwgY2xlYXJseSBpZGVudGlmeSB5b3VyIG5hbWUocykgYXMgdGhlIGF1dGhvcihzKSBvciBvd25lcihzKSAKb2YgdGhlIHN1Ym1pc3Npb24sIGFuZCB3aWxsIG5vdCBtYWtlIGFueSBhbHRlcmF0aW9uLCBvdGhlciB0aGFuIGFzIAphbGxvd2VkIGJ5IHRoaXMgbGljZW5jZSwgdG8geW91ciBzdWJtaXNzaW9uLgo=</field>
</element>

which contains the mime encoded contents of the license.

v4.2.1-SNAPSHOT cloned from git repo today.

Any ideas?

@cmacdonald
Copy link
Author

cmacdonald commented Jul 31, 2017

Removing that particular <element> has no impact from a test parsing. This is a xoai created by a Dspace repository. I'm puzzled.

@mmalmeida
Copy link

Could you create a minimum test case with the failing example (and the XML not retrieved from the site, so the example is stable)?

Could you also confirm if the XML is OAI valid?

@cmacdonald
Copy link
Author

Minimal test case - inserting inline as GH wont take an XML attachment.

As to its validity, I can confirm its created by a Dspace instance. Do you have an XOAI validator?

Will also update issue title.

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?>
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
	<responseDate>2017-07-31T20:10:46Z</responseDate>
	<request verb="GetRecord" identifier="oai:repository.somewhere.ac.uk:10373/1861"
		metadataPrefix="xoai">http://repository.somewhere.ac.uk/oai/request</request>
	<GetRecord>
		<record>
			<header>
				<identifier>oai:repository.somewhere.ac.uk:9999/999</identifier>
				<datestamp>2015-02-03T17:41:40Z</datestamp>
				<setSpec>com_10373_3</setSpec>
				<setSpec>col_10373_12</setSpec>
			</header>
			<metadata>
				<metadata xmlns="http://www.lyncode.com/xoai">
					<element name="dc">
						<!--  either this block -->
						<element name="contributor">
							<element name="author">
								<element name="none">
									<field name="value">Author1, First A.</field>
									<field name="value">Author2, Second</field>
									<field name="value">Author3, Third</field>
								</element>
							</element>
						</element>
						<!--  or this following commented block -->
						<!--  
						<element name="relation">
							<element name="ispartof">
								<element name="en">
									<field name="value">Another article 6(4)</field>
								</element>
							</element>
						</element>
						 -->
					</element>
				</metadata>
			</metadata>
		</record>
	</GetRecord>
</OAI-PMH>

@cmacdonald cmacdonald changed the title NoSuchElementException when parsing xoai on <element name="license"><field name="bin"> NoSuchElementException when parsing xoai Aug 1, 2017
@mwoodiupui
Copy link
Member

Asking Google for "OAI validator" turns up quite a few hits. The only one I'm at all familiar with is OVAL: http://oval.base-search.net/

@cmacdonald
Copy link
Author

Thank you for that observation - I should have checked also.

I have now checked with the "offending" endpoint with http://oval.base-search.net/ and http://validator.oaipmh.com/. In particular, the latter produced no error for ListRecords, and the former produced an error about "No incremental harvesting (day granularity) of ListRecords", which I think would be irrelevant.

Output from a third validator can be found at http://oanet.cms.hu-berlin.de/validator/pages/validation_dini_results.xhtml?vid=ZUZaM2FscFM2NEpUY2lncHdZYno2QT09 - I don't feel qualified to ascertain the relevance of any of these to the Exception at hand.

@cmacdonald
Copy link
Author

cmacdonald commented Aug 2, 2017

I believe this is concerned with more than two levels of nesting <element> tags in the Dspace generated xoai.

@cmacdonald
Copy link
Author

The problem is related to the underlying XmlReader, which consumes events without checking that they are not what was being requested. After some hacking, the simplest fix I could identify was just to check in the MetadataParser that the EOD had not been reached . If someone else is in agreement, I can add a test case, and make a pull request.

MetadataParser.diff.txt

@cmacdonald
Copy link
Author

cmacdonald commented Aug 6, 2017

I had a long plane journey, so rewrote the traversal code underlying MetadataParser, which has a number of problems when parsing xoai:

  • elements within elements
  • elements within elements that already have fields.

My revised MetadataParser can be found at cmacdonald@05f67f2

I have my own application code that I have with tested examples of OAI from Pure, Dspace and Eprints. I can make unit tests for xoai-serviceprovider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants