[wip] A command which displays images from the article #5

TiredSounds · 2016-02-29T03:33:28Z

I accidentally hit enter when first creating this!

Please allow me some time to do a proper write up for this, and I'll reopen it.

EDIT: Sorry for the wait, I've been at work. I'll respond to your comments below. I guess the writeup isn't necessary now that there is a dialogue going.

Summary of changes: - Added three functions which implement the image display: `showImages`, `fetchImageTargets`, `fetchImageUrls`. - Added :image command which calls `showImages`. - Modified _query to accept a custom URL kwarg. (necessary for `fetchImageTargets`.) Description: The new :image command calls a program with the fetched images as a list of space separated URLs. By default this is `feh` as it is a lightweight and nifty image viewer. As of this commit, this feature only works on Wikipedia. I may try to extend that, though it may prove difficult as the API is rather lacking in this department, and different wikis seems to use images differently. I'll need to do some more research. The program and arguments (flags etc) can be specified in the config file. For example: [images] program = ristretto arguments = foo, bar, baz This will lead to the following being called upon execution: subprocess.Popen(['ristretto', 'foo', 'bar', 'baz', 'url1', 'url2',...]) TODO: More testing. Try to improve coverage so that we get more relevant images. Implement some kind of error reporting as there is literally zero as of this commit.

`findImageTargets` now looks for another common notation used by Wikipedia for images, which generally takes the form: """image = foobar.ext image1 = barfoo.ext image2=barbaz.ext""" The default arguments given to `feh` have changed: `--scale-down` has replaced `--zoom fill`.

`page.title` now passed as an arg to `fetchImageTargets` and `fetchImageUrls` to make memoization possible.

As opposed to a comma and space separated list. The change is reflected in the man page.

The results of `fetchImageTargets` are now matched with the titles of the images from the current page, and the corresponding URL is used. This is as opposed to matching the targets with the URLs directly. This has increased coverage on certain pages. Supporting changes: - `fetchImageUrls` name changed to `fetchImageInfo`. - `fetchImageInfo` returns a list of 2-tuples with (title, url).

ids1024 · 2016-02-29T03:50:43Z

wikicurses/main.py

+
+    These are used to filter out non-relevant images from the API result.
+    """
+    url = re.sub(r'api.php', 'index.php', wiki.siteurl)


wiki.siteurl is the api url, not the url of the homepage.

Edit: Oops, somehow I misread the substitution as the reverse. But anyway, to support all wikis, the site url should be retrieved through the api instead of doing it this way.

Edit2: Actually, that might be a safe assumption. I'm not sure.

The index.php which is used to query for the raw data is (to my knowledge) always located in the same directory as api.php, but perhaps I'm mistaken?

I don't know if index.php is always at that path. It probably usually is, at least. I'm not really sure.

ids1024 · 2016-02-29T03:59:06Z

This made me think it might be good to add support for w3mimagedisplay to (optionally) display images in the terminal, like w3m or ranger can do. I don't know if that would play well with urwid.

That said, opening in an external viewer like in this PR would be good even if such functionality was available, for various reasons (works better, supports all terminals, not a miserable hack, full screen image viewing, only see images when desired, etc.).

ids1024 · 2016-02-29T04:22:04Z

I think fetchImageTargets should be changed to use the API (something like this, or alternately use a better method for getting the index url) and both it and fetchImageInfo should be moved to the _Article class in wiki.py. That will require changing _Article.__init__ and wiki.search to pass the wiki object as an argument to the article constructor.

But anyway, I think this is a good feature.

ids1024 · 2016-02-29T05:12:42Z

Oh, or there's this API endpoint: https://www.mediawiki.org/wiki/API:Images

TiredSounds · 2016-02-29T22:17:08Z

First of all thanks for the feedback, working in a vacuum I never know if I'm doing something crazy or not. I'm glad you think this is a worthwhile addition.

w3mimagedisplay would be cool but I don't think I can offer much help there :) My experience with urwid is almost zero.
Do you mean that I should use the API to fetch the raw markup? If so that makes sense, I didn't realise there was more than one method to do that.
In terms of using API:Images to fetch the targets, that won't work as it returns all images (relevant or not, for instance the wikipedia logo and other icons etc) on the page. The point of getting the 'targets' is just to look for the common patterns used for relevant images in the markup. In fact, the whole issue with the API being unable to return relevant images is frustrating. I've seen that it's been asked about by various people, and it would have avoided all this filtering lark. But alas.
Moving fetchImageInfo into _Article sounds good. I was unsure where to put these functions as you can see. I'll probably do that in the next commit.

Other stuff:

I'd like to support the "Picture of the day" page somehow. It would seem a bit ironic if it didn't!
Currently I'm using API:imageinfo to get the urls as you can see. None of the guys on #mediawiki knew an alternative for these specific purposes. Unfortunately imageinfo has known issues, one of which is that there is no way to increase the limit on the number of results returned! The iilimit parameter only affects revisions. This limits us to 10 images per page, but I figure 10 images are better than no images so I've done it this way for now. Please let me know if you think of something. I seem to be getting 6-10 images per page at the moment.

I'll be working on this on and off because of work, but hope to keep the commits coming regularly.

ids1024 · 2016-02-29T23:29:34Z

Yes, there is an API endpoint for getting the raw markup. If you can use the html instead, though, it would probably be better, since the client has already retrieved that. I don't know if that work work as well.

Ok, I see the problem with API:Images. It's too bad that they didn't make that function more useful. It actually makes me wonder what it is useful for...

Picture of the Day is slightly more complicated, as it is a feed rather than an article. If you didn't already know, you can open it with wikicurses -f potd, but just can't view the pictures. But anyway, feeds are handled by the _Featured class in wiki.py.

TiredSounds · 2016-03-01T20:18:11Z

I won't be able to use the html as it doesn't retain the patterns from the markup (such as [[Image|File:foo bar.png]]) which we need.

I'm removing the Wikipedia limit for the next commit. How would you prefer me to do this try, except? Shall I wrap the call to showImages? I'm also unsure what exceptions to expect at this point actually, and of course it's best not to catch everything...

lahwaacz · 2016-03-01T20:30:30Z

wikicurses/main.py

+    targets = []
+
+    ## of the form `[[Image:foobar.png]]`
+    for match in re.finditer(r'\b(?:File:|Image:)([^]|\n\r]+)', raw):


Namespaces are case-insensitive and can have localized aliases.

I'm not sure what specifically you're referring to, could you elaborate?

The regex matches only [[File:foobar]], but not [[file:foobar]]. Similar for the Image alias. Non-English wikis usually define other aliases, e.g. Bild in German. See this query:
https://de.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespacealiases

Ah! I see. I'm still new to MediaWiki in general. Hm, well the case issue is easy, but I don't want to resolve the local aliases issue with another query. I'll have to think about a relatively foolproof solution to that.

The additional query is probably unavoidable, but it only needs to be issued once for the wiki. So make it a method of Wiki with @lru_cache.

TiredSounds · 2016-03-07T20:58:16Z

I'm back from Haskell land :) Got side tracked a bit.

Current TODO (in rough order of priority):

Remove Wikipedia limit, replace with a try; except.
Move fetch functions into _Article class
Look into merging fetch functions together so they use the same query (if the API allows that)
Use API instead of index to fetch the markup
Resolve local aliases for the markup with a cached query
Picture of the Day

pintert3 · 2023-01-30T14:43:23Z

Given the length of time this has had no activity, I wonder if it's still being worked on 7 years later

TiredSounds added 11 commits February 20, 2016 00:21

Add feature outline to TODO

35a79a6

Update TODO

a96cf37

Update TODO

9c5da2a

Hard limit :image command to wikipedia. Add notifications for errors.

60f7f71

Add memoization to image command.

2eb9606

`page.title` now passed as an arg to `fetchImageTargets` and `fetchImageUrls` to make memoization possible.

Add image command/configuration to man pages

d938844

Make args for image command a space separated list.

831e83b

As opposed to a comma and space separated list. The change is reflected in the man page.

Cleanup TODO

c4d4d97

TiredSounds closed this Feb 29, 2016

TiredSounds changed the title ~~[wip~~ [wip] A command which displays images from the article Feb 29, 2016

ids1024 reviewed Feb 29, 2016
View reviewed changes

TiredSounds reopened this Feb 29, 2016

lahwaacz reviewed Mar 1, 2016
View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] A command which displays images from the article #5

[wip] A command which displays images from the article #5

TiredSounds commented Feb 29, 2016

ids1024 Feb 29, 2016

TiredSounds Feb 29, 2016

ids1024 Feb 29, 2016

ids1024 commented Feb 29, 2016

ids1024 commented Feb 29, 2016

ids1024 commented Feb 29, 2016

TiredSounds commented Feb 29, 2016

ids1024 commented Feb 29, 2016

TiredSounds commented Mar 1, 2016

lahwaacz Mar 1, 2016

TiredSounds Mar 1, 2016

lahwaacz Mar 1, 2016

TiredSounds Mar 2, 2016

ids1024 Mar 2, 2016

TiredSounds commented Mar 7, 2016

pintert3 commented Jan 30, 2023

[wip] A command which displays images from the article #5

Are you sure you want to change the base?

[wip] A command which displays images from the article #5

Conversation

TiredSounds commented Feb 29, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ids1024 commented Feb 29, 2016

ids1024 commented Feb 29, 2016

ids1024 commented Feb 29, 2016

TiredSounds commented Feb 29, 2016

ids1024 commented Feb 29, 2016

TiredSounds commented Mar 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TiredSounds commented Mar 7, 2016

pintert3 commented Jan 30, 2023