Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RDF Patch Binary Reader silently accepts some invalid patch files #2402

Open
rvesse opened this issue Apr 8, 2024 · 4 comments · May be fixed by #2408
Open

RDF Patch Binary Reader silently accepts some invalid patch files #2402

rvesse opened this issue Apr 8, 2024 · 4 comments · May be fixed by #2408
Labels

Comments

@rvesse
Copy link
Member

rvesse commented Apr 8, 2024

Version

5.0.0

What happened?

Consider the following test case:

    protected RDFPatch read(InputStream in) {
        return RDFPatchOps.readBinary(in);
    }

    @Test
    public void junk_01() {
        byte[] junkData = "junk".getBytes(StandardCharsets.UTF_8);
        RDFPatch patch = read(new ByteArrayInputStream(junkData));
        Assert.fail("Malformed binary stream should throw an error");
    }

Since the input is not a patch in RDF Thrift format it would be expected that an error would be thrown. Instead an empty patch is returned and the error is silently ignored.

This bit of the code seems relevant:

  try { row.read(protocol) ; }
  catch (TTransportException e) {
      if ( e.getType() == TTransportException.END_OF_FILE )
          break;
      throw new PatchException("Thrift exception", e);
  }

It seems like it always treats an EOF as an acceptable error, irregardless of whether the input was genuinely at EOF or EOF was simply encountered due to malformed/incomplete data.

Relevant output and stacktrace

No response

Are you interested in making a pull request?

Maybe

@rvesse rvesse added the bug label Apr 8, 2024
@afs
Copy link
Member

afs commented Apr 8, 2024

We can't see the "read". Is the callRDFPatchOps.readBinary?

@afs
Copy link
Member

afs commented Apr 8, 2024

Try "aaaa".

@rvesse
Copy link
Member Author

rvesse commented Apr 9, 2024

We can't see the "read". Is the callRDFPatchOps.readBinary?

Yes, sorry. Was adding test cases to existing Jena test classes and for this bug there happened to be a read() method defined in the test cases already. Added it to the original description for completeness

rvesse added a commit to rvesse/jena that referenced this issue Apr 9, 2024
Adds several test cases, some of which currently fail, that demonstrate
that the RDF Patch Binary reader can silently accept invalid streams
@rvesse rvesse linked a pull request Apr 9, 2024 that will close this issue
4 tasks
@rvesse
Copy link
Member Author

rvesse commented Apr 9, 2024

Try "aaaa".

Interestingly that one does fail - see draft PR #2408 for a few example test cases. Most notably, if I generate a valid binary patch and then arbitrarily truncate it's bytes, it does not error

rvesse added a commit to rvesse/jena that referenced this issue Apr 25, 2024
Adds several test cases, some of which currently fail, that demonstrate
that the RDF Patch Binary reader can silently accept invalid streams
rvesse added a commit to rvesse/jena that referenced this issue Apr 25, 2024
Previously the RDFPatchReaderBinary simply exited its read loop when it
detected an EOF error as Thrift doesn't have a clean way of apriori
detecting whether we've reached the EOF.  The downside of this approach
was that it meant the reader would silently ignore some genuinely
malformed inputs if they happened to be the right bytes for Thrift to
think it had encountered a field it knew about.

To address this a new method is introduced that inspects the
StackTraceElement's associated with the Thrift EOF exception to detect a
couple of cases where the input is clearly malformed and throw an
appropriate error.

Test cases around malformed patch inputs are also expanded to further
test this capability.
rvesse added a commit to rvesse/jena that referenced this issue Apr 25, 2024
Previously the RDFPatchReaderBinary simply exited its read loop when it
detected an EOF error as Thrift doesn't have a clean way of apriori
detecting whether we've reached the EOF.  The downside of this approach
was that it meant the reader would silently ignore some genuinely
malformed inputs if they happened to be the right bytes for Thrift to
think it had encountered a field it knew about.

To address this a new method is introduced that inspects the
StackTraceElement's associated with the Thrift EOF exception to detect a
couple of cases where the input is clearly malformed and throw an
appropriate error.

Test cases around malformed patch inputs are also expanded to further
test this capability.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants