Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FAILED test/test_misc/test_parse_file_guess_format.py::TestFileParserGuessFormat::test_warning #2748

Open
ncopa opened this issue Mar 20, 2024 · 2 comments

Comments

@ncopa
Copy link

ncopa commented Mar 20, 2024

The test/test_misc/test_parse_file_guess_format.py::TestFileParserGuessFormat::test_warning fails on Alpine Linux edge x86_64:

Python 3.11.8

=================================== FAILURES ===================================
____________________ TestFileParserGuessFormat.test_warning ____________________

self = <rdflib.plugins.parsers.notation3.SinkParser object at 0x7fbecb88c910>
argstr = '<?xml version="1.0"?>\n\n<!--\n  Copyright World Wide Web Consortium, (Massachusetts Institute of\n  Technology, Inst...rdf:datatype="http://www.w3.org/2001/XMLSchema#integer" xml:lang="fr">10</eg:baz>\n </rdf:Description>\n\n</rdf:RDF>\n'
i = 330, res = []

    def uri_ref2(self, argstr: str, i: int, res: MutableSequence[Any]) -> int:
        """Generate uri from n3 representation.
    
        Note that the RDF convention of directly concatenating
        NS and local name is now used though I prefer inserting a '#'
        to make the namesapces look more like what XML folks expect.
        """
        qn: typing.List[Any] = []
        j = self.qname(argstr, i, qn)
        if j >= 0:
            pfx, ln = qn[0]
            if pfx is None:
                assert 0, "not used?"
                ns = self._baseURI + ADDED_HASH  # type: ignore[unreachable]
            else:
                try:
>                   ns = self._bindings[pfx]
E                   KeyError: 'Description'

rdflib/plugins/parsers/notation3.py:1232: KeyError

During handling of the above exception, another exception occurred:

self = <Graph identifier=Nb4b72901e98b4f9f86eddbb8ac3005d9 (<class 'rdflib.graph.Graph'>)>
source = <_io.BufferedReader name='/tmp/tmpmcthgvqs/no_file_ext'>
publicID = None, format = 'turtle', location = None, file = None, data = None
args = {}, could_not_guess_format = True
parser = <rdflib.plugins.parsers.notation3.TurtleParser object at 0x7fbec6ac2510>

    def parse(
        self,
        source: Optional[
            Union[IO[bytes], TextIO, InputSource, str, bytes, pathlib.PurePath]
        ] = None,
        publicID: Optional[str] = None,  # noqa: N803
        format: Optional[str] = None,
        location: Optional[str] = None,
        file: Optional[Union[BinaryIO, TextIO]] = None,
        data: Optional[Union[str, bytes]] = None,
        **args: Any,
    ) -> "Graph":
        """
        Parse an RDF source adding the resulting triples to the Graph.
    
        The source is specified using one of source, location, file or data.
    
        .. caution::
    
           This method can access directly or indirectly requested network or
           file resources, for example, when parsing JSON-LD documents with
           ``@context`` directives that point to a network location.
    
           When processing untrusted or potentially malicious documents,
           measures should be taken to restrict network and file access.
    
           For information on available security measures, see the RDFLib
           :doc:`Security Considerations </security_considerations>`
           documentation.
    
        :param source: An `InputSource`, file-like object, `Path` like object,
            or string. In the case of a string the string is the location of the
            source.
        :param location: A string indicating the relative or absolute URL of the
            source. `Graph`'s absolutize method is used if a relative location
            is specified.
        :param file: A file-like object.
        :param data: A string containing the data to be parsed.
        :param format: Used if format can not be determined from source, e.g.
            file extension or Media Type. Defaults to text/turtle. Format
            support can be extended with plugins, but "xml", "n3" (use for
            turtle), "nt" & "trix" are built in.
        :param publicID: the logical URI to use as the document base. If None
            specified the document location is used (at least in the case where
            there is a document location). This is used as the base URI when
            resolving relative URIs in the source document, as defined in `IETF
            RFC 3986
            <https://datatracker.ietf.org/doc/html/rfc3986#section-5.1.4>`_,
            given the source document does not define a base URI.
        :return: ``self``, i.e. the :class:`~rdflib.graph.Graph` instance.
    
        Examples:
    
        >>> my_data = '''
        ... <rdf:RDF
        ...   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        ...   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
        ... >
        ...   <rdf:Description>
        ...     <rdfs:label>Example</rdfs:label>
        ...     <rdfs:comment>This is really just an example.</rdfs:comment>
        ...   </rdf:Description>
        ... </rdf:RDF>
        ... '''
        >>> import os, tempfile
        >>> fd, file_name = tempfile.mkstemp()
        >>> f = os.fdopen(fd, "w")
        >>> dummy = f.write(my_data)  # Returns num bytes written
        >>> f.close()
    
        >>> g = Graph()
        >>> result = g.parse(data=my_data, format="application/rdf+xml")
        >>> len(g)
        2
    
        >>> g = Graph()
        >>> result = g.parse(location=file_name, format="application/rdf+xml")
        >>> len(g)
        2
    
        >>> g = Graph()
        >>> with open(file_name, "r") as f:
        ...     result = g.parse(f, format="application/rdf+xml")
        >>> len(g)
        2
    
        >>> os.remove(file_name)
    
        >>> # default turtle parsing
        >>> result = g.parse(data="<http://example.com/a> <http://example.com/a> <http://example.com/a> .")
        >>> len(g)
        3
    
        """
    
        source = create_input_source(
            source=source,
            publicID=publicID,
            location=location,
            file=file,
            data=data,
            format=format,
        )
        if format is None:
            format = source.content_type
        could_not_guess_format = False
        if format is None:
            if (
                hasattr(source, "file")
                and getattr(source.file, "name", None)
                and isinstance(source.file.name, str)
            ):
                format = rdflib.util.guess_format(source.file.name)
            if format is None:
                format = "turtle"
                could_not_guess_format = True
        parser = plugin.get(format, Parser)()
        try:
            # TODO FIXME: Parser.parse should have **kwargs argument.
>           parser.parse(source, self, **args)

rdflib/graph.py:1492: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
rdflib/plugins/parsers/notation3.py:2021: in parse
    p.loadStream(stream)
rdflib/plugins/parsers/notation3.py:479: in loadStream
    return self.loadBuf(stream.read())  # Not ideal
rdflib/plugins/parsers/notation3.py:485: in loadBuf
    self.feed(buf)
rdflib/plugins/parsers/notation3.py:511: in feed
    i = self.directiveOrStatement(s, j)
rdflib/plugins/parsers/notation3.py:530: in directiveOrStatement
    j = self.statement(argstr, i)
rdflib/plugins/parsers/notation3.py:778: in statement
    j = self.property_list(argstr, i, r[0])
rdflib/plugins/parsers/notation3.py:1140: in property_list
    i = self.objectList(argstr, j, objs)
rdflib/plugins/parsers/notation3.py:1190: in objectList
    i = self.object(argstr, i, res)
rdflib/plugins/parsers/notation3.py:1487: in object
    j = self.subject(argstr, i, res)
rdflib/plugins/parsers/notation3.py:785: in subject
    return self.item(argstr, i, res)
rdflib/plugins/parsers/notation3.py:877: in item
    return self.path(argstr, i, res)
rdflib/plugins/parsers/notation3.py:884: in path
    j = self.nodeOrLiteral(argstr, i, res)
rdflib/plugins/parsers/notation3.py:1515: in nodeOrLiteral
    j = self.node(argstr, i, res)
rdflib/plugins/parsers/notation3.py:1102: in node
    j = self.uri_ref2(argstr, i, res)
rdflib/plugins/parsers/notation3.py:1240: in uri_ref2
    self.BadSyntax(argstr, i, 'Prefix "%s:" not bound' % (pfx))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <rdflib.plugins.parsers.notation3.SinkParser object at 0x7fbecb88c910>
argstr = '<?xml version="1.0"?>\n\n<!--\n  Copyright World Wide Web Consortium, (Massachusetts Institute of\n  Technology, Inst...rdf:datatype="http://www.w3.org/2001/XMLSchema#integer" xml:lang="fr">10</eg:baz>\n </rdf:Description>\n\n</rdf:RDF>\n'
i = 330, msg = 'Prefix "Description:" not bound'

    def BadSyntax(self, argstr: str, i: int, msg: str) -> NoReturn:
>       raise BadSyntax(self._thisDoc, self.lines, argstr, i, msg)
E       rdflib.plugins.parsers.notation3.BadSyntax: <no detail available>

rdflib/plugins/parsers/notation3.py:1730: BadSyntax

During handling of the above exception, another exception occurred:

self = <test.test_misc.test_parse_file_guess_format.TestFileParserGuessFormat object at 0x7fbecac09790>

    def test_warning(self) -> None:
        g = Graph()
        graph_logger = logging.getLogger("rdflib")
    
        with TemporaryDirectory() as tmpdirname:
            newpath = Path(tmpdirname).joinpath("no_file_ext")
            copyfile(
                os.path.join(
                    TEST_DATA_DIR,
                    "suites",
                    "w3c",
                    "rdf-xml",
                    "datatypes",
                    "test001.rdf",
                ),
                str(newpath),
            )
            with pytest.raises(ParserError, match=r"Could not guess RDF format"):
                with pytest.warns(
                    UserWarning,
                    match="does not look like a valid URI, trying to serialize this will break.",
                ) as logwarning:
>                   g.parse(str(newpath))

test/test_misc/test_parse_file_guess_format.py:86: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Graph identifier=Nb4b72901e98b4f9f86eddbb8ac3005d9 (<class 'rdflib.graph.Graph'>)>
source = <_io.BufferedReader name='/tmp/tmpmcthgvqs/no_file_ext'>
publicID = None, format = 'turtle', location = None, file = None, data = None
args = {}, could_not_guess_format = True
parser = <rdflib.plugins.parsers.notation3.TurtleParser object at 0x7fbec6ac2510>

    def parse(
        self,
        source: Optional[
            Union[IO[bytes], TextIO, InputSource, str, bytes, pathlib.PurePath]
        ] = None,
        publicID: Optional[str] = None,  # noqa: N803
        format: Optional[str] = None,
        location: Optional[str] = None,
        file: Optional[Union[BinaryIO, TextIO]] = None,
        data: Optional[Union[str, bytes]] = None,
        **args: Any,
    ) -> "Graph":
        """
        Parse an RDF source adding the resulting triples to the Graph.
    
        The source is specified using one of source, location, file or data.
    
        .. caution::
    
           This method can access directly or indirectly requested network or
           file resources, for example, when parsing JSON-LD documents with
           ``@context`` directives that point to a network location.
    
           When processing untrusted or potentially malicious documents,
           measures should be taken to restrict network and file access.
    
           For information on available security measures, see the RDFLib
           :doc:`Security Considerations </security_considerations>`
           documentation.
    
        :param source: An `InputSource`, file-like object, `Path` like object,
            or string. In the case of a string the string is the location of the
            source.
        :param location: A string indicating the relative or absolute URL of the
            source. `Graph`'s absolutize method is used if a relative location
            is specified.
        :param file: A file-like object.
        :param data: A string containing the data to be parsed.
        :param format: Used if format can not be determined from source, e.g.
            file extension or Media Type. Defaults to text/turtle. Format
            support can be extended with plugins, but "xml", "n3" (use for
            turtle), "nt" & "trix" are built in.
        :param publicID: the logical URI to use as the document base. If None
            specified the document location is used (at least in the case where
            there is a document location). This is used as the base URI when
            resolving relative URIs in the source document, as defined in `IETF
            RFC 3986
            <https://datatracker.ietf.org/doc/html/rfc3986#section-5.1.4>`_,
            given the source document does not define a base URI.
        :return: ``self``, i.e. the :class:`~rdflib.graph.Graph` instance.
    
        Examples:
    
        >>> my_data = '''
        ... <rdf:RDF
        ...   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        ...   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
        ... >
        ...   <rdf:Description>
        ...     <rdfs:label>Example</rdfs:label>
        ...     <rdfs:comment>This is really just an example.</rdfs:comment>
        ...   </rdf:Description>
        ... </rdf:RDF>
        ... '''
        >>> import os, tempfile
        >>> fd, file_name = tempfile.mkstemp()
        >>> f = os.fdopen(fd, "w")
        >>> dummy = f.write(my_data)  # Returns num bytes written
        >>> f.close()
    
        >>> g = Graph()
        >>> result = g.parse(data=my_data, format="application/rdf+xml")
        >>> len(g)
        2
    
        >>> g = Graph()
        >>> result = g.parse(location=file_name, format="application/rdf+xml")
        >>> len(g)
        2
    
        >>> g = Graph()
        >>> with open(file_name, "r") as f:
        ...     result = g.parse(f, format="application/rdf+xml")
        >>> len(g)
        2
    
        >>> os.remove(file_name)
    
        >>> # default turtle parsing
        >>> result = g.parse(data="<http://example.com/a> <http://example.com/a> <http://example.com/a> .")
        >>> len(g)
        3
    
        """
    
        source = create_input_source(
            source=source,
            publicID=publicID,
            location=location,
            file=file,
            data=data,
            format=format,
        )
        if format is None:
            format = source.content_type
        could_not_guess_format = False
        if format is None:
            if (
                hasattr(source, "file")
                and getattr(source.file, "name", None)
                and isinstance(source.file.name, str)
            ):
                format = rdflib.util.guess_format(source.file.name)
            if format is None:
                format = "turtle"
                could_not_guess_format = True
        parser = plugin.get(format, Parser)()
        try:
            # TODO FIXME: Parser.parse should have **kwargs argument.
            parser.parse(source, self, **args)
        except SyntaxError as se:
            if could_not_guess_format:
>               raise ParserError(
                    "Could not guess RDF format for %r from file extension so tried Turtle but failed."
                    "You can explicitly specify format using the format argument."
                    % source
                )
E               rdflib.exceptions.ParserError: Could not guess RDF format for <_io.BufferedReader name='/tmp/tmpmcthgvqs/no_file_ext'> from file extension so tried Turtle but failed.You can explicitly specify format using the format argument.

rdflib/graph.py:1495: ParserError

During handling of the above exception, another exception occurred:

self = <test.test_misc.test_parse_file_guess_format.TestFileParserGuessFormat object at 0x7fbecac09790>

    def test_warning(self) -> None:
        g = Graph()
        graph_logger = logging.getLogger("rdflib")
    
        with TemporaryDirectory() as tmpdirname:
            newpath = Path(tmpdirname).joinpath("no_file_ext")
            copyfile(
                os.path.join(
                    TEST_DATA_DIR,
                    "suites",
                    "w3c",
                    "rdf-xml",
                    "datatypes",
                    "test001.rdf",
                ),
                str(newpath),
            )
            with pytest.raises(ParserError, match=r"Could not guess RDF format"):
>               with pytest.warns(
                    UserWarning,
                    match="does not look like a valid URI, trying to serialize this will break.",
                ) as logwarning:
E               Failed: DID NOT WARN. No warnings of type (<class 'UserWarning'>,) were emitted.
E                Emitted warnings: [].

test/test_misc/test_parse_file_guess_format.py:82: Failed
------------------------------ Captured log call -------------------------------
2024-03-20T11:05:23.295 WARNING  rdflib.term  term.py:287:__new__ file:///tmp/tmpmcthgvqs/?xml version="1.0"? does not look like a valid URI, trying to serialize this will break.
2024-03-20T11:05:23.295 WARNING  rdflib.term  term.py:287:__new__ !--
  Copyright World Wide Web Consortium, (Massachusetts Institute of
  Technology, Institut National de Recherche en Informatique et en
  Automatique, Keio University).
 
  All Rights Reserved.
 
  Please see the full Copyright clause at
  <http://www.w3.org/Consortium/Legal/copyright-software.html does not look like a valid URI, trying to serialize this will break.
=============================== warnings summary ===============================
test/test_literal/test_literal.py::test_ill_typed_literals[yes-http://www.w3.org/2001/XMLSchema#boolean-True]
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/term.py:1719: UserWarning: Parsing weird boolean, 'yes' does not map to True or False
    warnings.warn(

test/test_namespace/test_definednamespace.py::test_inspect[DFNSDefaults]
  /usr/lib/python3.11/inspect.py:2486: UserWarning: Code: _partialmethod is not defined in namespace DFNSDefaults
    partialmethod = obj._partialmethod

test/test_namespace/test_definednamespace.py::test_inspect[DFNSWarnNoFail]
  /usr/lib/python3.11/inspect.py:2486: UserWarning: Code: _partialmethod is not defined in namespace DFNSWarnNoFail
    partialmethod = obj._partialmethod

test/test_namespace/test_definednamespace.py::test_inspect[DFNSDefaultsEmpty]
  /usr/lib/python3.11/inspect.py:2486: UserWarning: Code: _partialmethod is not defined in namespace DFNSDefaultsEmpty
    partialmethod = obj._partialmethod

test/test_namespace/test_namespace.py::TestNamespacePrefix::test_closed_namespace
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/test/test_namespace/test_namespace.py:228: UserWarning: DefinedNamespace does not address deprecated properties
    warn("DefinedNamespace does not address deprecated properties")

test/test_parsers/test_n3parse_of_rdf_lists.py::TestOWLCollectionTest::test_collection_rdfxml
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/plugins/serializers/rdfxml.py:280: UserWarning: Assertions on rdflib.term.BNode('N9c925dd1ada149b2a379055264dc7ed7') other than RDF.first and RDF.rest are ignored ... including RDF.List
    self.predicate(predicate, object, depth + 1)

test/test_roundtrip.py: 12 warnings
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/term.py:1585: UserWarning: Serializing weird numerical rdflib.term.Literal('xy.z', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#double'))
    warnings.warn("Serializing weird numerical %r" % self)

test/test_roundtrip.py: 12 warnings
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/term.py:1585: UserWarning: Serializing weird numerical rdflib.term.Literal('+1.0z', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#double'))
    warnings.warn("Serializing weird numerical %r" % self)

test/test_roundtrip.py: 12 warnings
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/term.py:1585: UserWarning: Serializing weird numerical rdflib.term.Literal('ab.c', datatype=rdflib.term.URIRef('http://www.w3.org/2001/XMLSchema#double'))
    warnings.warn("Serializing weird numerical %r" % self)

test/test_serializers/test_serializer.py: 10 warnings
test/test_tools/test_chunk_serializer.py: 4 warnings
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/plugins/serializers/nt.py:40: UserWarning: NTSerializer always uses UTF-8 encoding. Given encoding was: None
    warnings.warn(

test/test_util.py::TestUtilTermConvert::test_util_from_n3_expectliteralandlangdtype
  /usr/lib/python3.11/site-packages/_pytest/python.py:194: UserWarning: Code: fr is not defined in namespace XSD
    result = testfunction(**testargs)

test/test_util.py::TestUtilTermConvert::test_util_from_n3_not_escapes[\\I]
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/util.py:213: DeprecationWarning: invalid escape sequence '\I'
    value = value.encode("raw-unicode-escape").decode("unicode-escape")

test/test_w3c_spec/test_sparql10_w3c.py: 20 warnings
test/test_w3c_spec/test_sparql11_w3c.py: 50 warnings
  /home/ncopa/aports/community/py3-rdflib/src/rdflib-7.0.0/rdflib/term.py:1161: DeprecationWarning: NotImplemented should not be used in a boolean context
    return not self.__gt__(other) and not self.eq(other)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED test/test_misc/test_parse_file_guess_format.py::TestFileParserGuessFormat::test_warning - Failed: DID NOT WARN. No warnings of type (<class 'UserWarning'>,) were emi...
= 1 failed, 7277 passed, 59 skipped, 370 xfailed, 128 warnings in 107.85s (0:01:47) =
@edmondchuc
Copy link
Contributor

Can you check what pytest version you are running? I think this may be related to #2727 if you are on pytest 8.

@ncopa
Copy link
Author

ncopa commented Mar 20, 2024

This is the latest release, 7.0.0.

Yes I believe it is it is definitively related to #2727 and pytest 8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants