Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ampersand in the XML files #182

Open
MagdalenaZZ opened this issue Jul 7, 2021 · 1 comment
Open

Ampersand in the XML files #182

MagdalenaZZ opened this issue Jul 7, 2021 · 1 comment
Assignees

Comments

@MagdalenaZZ
Copy link

On our FTP-site, there is a file which is Reuters citation index eg
ftp://ftp.wormbase.org/pub/wormbase/releases/WS280/species/c_elegans/PRJNA13758/annotation/c_elegans.PRJNA13758.WS280.reuters_citation_index.xml.gz

We have been advised that:
Please note that the xml file contains a few un-encoded ampersand characters, and would recommend that ampersands in text be encoded as &. We've worked around it, but the XML specification says they should not be used as literals.

https://www.w3.org/TR/2004/REC-xml-20040204/REC-xml-20040204.xml
"The ampersand character (&) and the left angle bracket (<) must not appear in their literal form, except when used as markup delimiters, or within a comment, a processing instruction, or a CDATA section. If they are needed elsewhere, they must be escaped using either numeric character references or the strings "&" and "<" respectively. The right angle bracket (>) may be represented using the string ">", and must, for compatibility, be escaped using either ">" or a character reference when it appears in the string "]]>" in content, when that string is not marking the end of a CDATA section."

@veb7vmehra
Copy link

Hey there @MagdalenaZZ and @sdiamantakis , I have applied for a wormbase project in GSoC and unit the results come out, I was thinking to explore the existing codebase a bit. I came over this issue, which I feel like I understand and can probably contribute to. The approach can be simple to use some if and else while creating the xml files and use the inverting if and else while extracting data from it, as far I am able to understand.
Please confirm if I am thinking in the correct direction and also it will be really helpful if you can point me out which file is actually doing this so that I can perform the changes and create a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants