Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unwanted underscores in url property value #4910

Open
thomas-topway-it opened this issue Jan 13, 2021 · 10 comments · May be fixed by #5396
Open

unwanted underscores in url property value #4910

thomas-topway-it opened this issue Jan 13, 2021 · 10 comments · May be fixed by #5396
Labels
bug Occurrence of an unintended or unanticipated behaviour that causes a vulnerability or fatal error

Comments

@thomas-topway-it
Copy link
Contributor

Setup

  • SMW version: 3.1.0
  • MW version: 1.34.0
  • PHP version: 7.3.25 (apache2handler)
  • DB system (MySQL, Blazegraph, etc.) and version: 10.4.17-MariaDB

Issue

Spaces in urls get mistakenly underscored by contrast to what stated here
https://www.semantic-mediawiki.org/wiki/Help:Service_links

Help:Datatype "URL", Help:Datatype "Annotation URI"
$1: URL-encoded value of the URL

by contrast to

Help:Datatype "Page"
$1: URL-encoded article name (no namespace), with underscores replaced by spaces

Steps to reproduce the observation (recommendation is to use the sandbox):

Create a URL data type and then add a link like the following

http://www.opal.unito.it/psixsite/Miscellanea%20di%20testi%20di%20genere%20diverso/Elenco%20opere/imgMIscE4813.pdf

the link will not work anymore because %20 get replaced with '_' . Here you can find a "working" example

https://culturaitaliana.org/wiki/Digital_libraries/Universit%C3%A0_degli_Studi_di_Torino,_Biblioteca_storica_di_ateneo_%22Arturo_Graf%22

field "Digital library digital resource example url"

This might be an unwanted effect of the following commit

#2201

@thomas-topway-it thomas-topway-it added the bug Occurrence of an unintended or unanticipated behaviour that causes a vulnerability or fatal error label Jan 13, 2021
@hexmode
Copy link
Member

hexmode commented Oct 4, 2021

Just saw this myself. Thanks for filing the bug.

@hexmode
Copy link
Member

hexmode commented Oct 14, 2021

Reverting the commit that @thomas-topway-it identified, fixes the problem.

@turnstyyle
Copy link

I am affected by this issue in SMW 4.0.2 (2022-03-24).

@krabina
Copy link
Contributor

krabina commented Jan 27, 2023

@thomas-topway-it has dug deep into this one. However, I think it needs manual testing. @turnstyyle - can you test and tell us if everything is working fine.
Also the PR #5396 needs reviewing. Might be something for @hexmode ?

@turnstyyle
Copy link

The changes in #5396 (8f6a918) fix the biggest problem for me. After the change, clicking a link from a URL property whose value contains a space character goes to the right place.

I noticed one more minor problem that may be related. URL properties whose value contains underscores render on the screen (i.e., when the property is set inline, or in contexts like Special:Browse) with the underscores changed to spaces. The link still goes to the right place (to the URL with underscores). For me, this is OK as long as the links work. This was also the case before #5396, and might be considered a separate issue.

In summary, this is what I see before the changes of #5396:

property value renders as href
https://example.com/one two
https://example.com/one two

https://example.com/one_two
https://example.com/one_two 🆗
https://example.com/one two

https://example.com/one_two

And this is what I see after #5396:

property value renders as href
https://example.com/one two
https://example.com/one two

https://example.com/one%20two
https://example.com/one_two 🆗
https://example.com/one two

https://example.com/one_two

@krabina
Copy link
Contributor

krabina commented Jan 30, 2023

Thank you for testing!

@thomas-topway-it
Copy link
Contributor Author

@turnstyyle so the second url in the second column theoretically should and could be
https://example.com/one_two (with underscore) -- what do you think ? It depends on this code

		if ( $this->m_mode !== SMW_URI_MODE_EMAIL && $linker !== null &&
			// replace underscore with a space only for wiki links
			// either internal or external 
			// in all other cases a solution like the following
			// https://css-tricks.com/snippets/css/prevent-long-urls-from-breaking-out-of-container/
			// shall instead be used
			// @fixme the following method could lead to false-positives !
			Title::newFromURL( $url ) ) {
				$context = str_replace( '_', ' ', $context );
		}

I think the most correct is to remove the underscore also in captions for non-wiki url, as in the example. However currently the code, as you can see in the comment, can lead to "false-positive", I don't know if there are ways to achieve that which don't lead to "false-positive" cases.

@turnstyyle
Copy link

@turnstyyle so the second url in the second column theoretically should and could be https://example.com/one_two (with underscore) -- what do you think ?

Yes, the fact that underscores in external URLs are rendered as spaces is unexpected and (in my opinion) undesirable.

It depends on this code

		if ( $this->m_mode !== SMW_URI_MODE_EMAIL && $linker !== null &&
			// replace underscore with a space only for wiki links
			// either internal or external 
			// in all other cases a solution like the following
			// https://css-tricks.com/snippets/css/prevent-long-urls-from-breaking-out-of-container/
			// shall instead be used
			// @fixme the following method could lead to false-positives !
			Title::newFromURL( $url ) ) {
				$context = str_replace( '_', ' ', $context );
		}

I think the most correct is to remove the underscore also in captions for non-wiki url, as in the example. However currently the code, as you can see in the comment, can lead to "false-positive", I don't know if there are ways to achieve that which don't lead to "false-positive" cases.

I don't understand what this code is trying to do. It looks like it wants to distinguish wiki links from other kinds of URLs? And relies on the Title::newFromURL method to do the distinguishing? I don't see how Title::newFromURL knows the difference between wiki links and other URLs—what's a URL that would be a true negative in that test?

If long URLs disturbing the layout is a concern, maybe CSS word-break will work.

@thomas-topway-it
Copy link
Contributor Author

hello @turnstyyle, yes, we could just remove the code supposed to identify an external wiki since it does not work, perhaps it was not the proper method.
Indeed, css' word-break or other directives should be always used without to change the semantic value of contents.
So this is the push request #5396

and it remains to address this #5396 (comment)

I had partially updated most of the tests, but because this is really tricky (mwjames created a lot of tests, with a content difficult to understand ) I think this should be supervised by at least another key-developer (otherwise a lot of work required to update all tests will be wasted)

@thomas-topway-it
Copy link
Contributor Author

perhaps my comment above contained an error

"I think the most correct is to remove the underscore "

should be

I think the most correct is to keep the underscore

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Occurrence of an unintended or unanticipated behaviour that causes a vulnerability or fatal error
Projects
None yet
4 participants