Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to parse documents with un-quoted attribute values #217

Open
dreystone opened this issue Apr 4, 2022 · 2 comments
Open

Unable to parse documents with un-quoted attribute values #217

dreystone opened this issue Apr 4, 2022 · 2 comments

Comments

@dreystone
Copy link

I encountered some pages that were using minify, and the meta and link tags in the head were missing the quotes for the attribute values.

According to WC3, this is permitted part of HTML5 spec for attributes:

https://html.spec.whatwg.org/multipage/syntax.html#attributes-2

Here is a code example which fails:

<!DOCTYPE html>
<html lang=en-US>
<head>
    <meta charset=utf-8><meta content="IE=edge" http-equiv=X-UA-Compatible>
    <meta content=unsafe-url name=referrer>
    <link href=/images/favicons/favicon--16x16.png rel=icon sizes=16x16 type=image/png>
</head>
<body>
    page contents
</body>
</html>
@scinfu
Copy link
Owner

scinfu commented Apr 20, 2022

I parsed this HTML without problems.
Can you explain what does not work?

@nikolaykargin
Copy link

This snippet was correctly parsed with the latest version of the library. Below is the code snippet and output.

import Foundation
import SwiftSoup

var html = """
<!DOCTYPE html>
<html lang=en-US>
<head>
    <meta charset=utf-8><meta content="IE=edge" http-equiv=X-UA-Compatible>
    <meta content=unsafe-url name=referrer>
    <link href=/images/favicons/favicon--16x16.png rel=icon sizes=16x16 type=image/png>
</head>
<body>
    page contents
</body>
</html>
"""

let doc = try SwiftSoup.parse(html)

let metaElements = try doc.select("head *")
for meta in metaElements {
    if let attributes = meta.getAttributes() {
        print(meta.tagName(), attributes.compactMap { "\($0.getKey())=\($0.getValue())" })
    }
}

print(try doc.body()?.text() ?? "–")
meta ["charset=utf-8"]
meta ["content=IE=edge", "http-equiv=X-UA-Compatible"]
meta ["content=unsafe-url", "name=referrer"]
link ["href=/images/favicons/favicon--16x16.png", "rel=icon", "sizes=16x16", "type=image/png"]
page contents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants