Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip] A command which displays images from the article #5

Open
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

TiredSounds
Copy link
Contributor

I accidentally hit enter when first creating this!

Please allow me some time to do a proper write up for this, and I'll reopen it.

EDIT: Sorry for the wait, I've been at work. I'll respond to your comments below. I guess the writeup isn't necessary now that there is a dialogue going.

TiredSounds added 11 commits February 20, 2016 00:21
Summary of changes:

- Added three functions which implement the image display:
  `showImages`, `fetchImageTargets`, `fetchImageUrls`.
- Added :image command which calls `showImages`.
- Modified _query to accept a custom URL kwarg.
  (necessary for `fetchImageTargets`.)

Description:

The new :image command calls a program with the fetched images as a list
of space separated URLs. By default this is `feh` as it is a lightweight
and nifty image viewer.

As of this commit, this feature only works on Wikipedia. I may try to
extend that, though it may prove difficult as the API is rather lacking
in this department, and different wikis seems to use images differently.
I'll need to do some more research.

The program and arguments (flags etc) can be specified in the config file.
For example:

[images]
program = ristretto
arguments = foo, bar, baz

This will lead to the following being called upon execution:

subprocess.Popen(['ristretto', 'foo', 'bar', 'baz', 'url1', 'url2',...])

TODO:

More testing. Try to improve coverage so that we get more relevant
images. Implement some kind of error reporting as there is literally
zero as of this commit.
`findImageTargets` now looks for another common notation
used by Wikipedia for images, which generally takes the form:

"""image    = foobar.ext
image1 = barfoo.ext
image2=barbaz.ext"""

The default arguments given to `feh` have changed:
`--scale-down` has replaced `--zoom fill`.
`page.title` now passed as an arg to `fetchImageTargets` and
`fetchImageUrls` to make memoization possible.
As opposed to a comma and space separated list. The change is reflected
in the man page.
The results of `fetchImageTargets` are now matched with the titles of
the images from the current page, and the corresponding URL is used.
This is as opposed to matching the targets with the URLs directly.

This has increased coverage on certain pages.

Supporting changes:

- `fetchImageUrls` name changed to `fetchImageInfo`.
- `fetchImageInfo` returns a list of 2-tuples with (title, url).
@TiredSounds TiredSounds changed the title [wip [wip] A command which displays images from the article Feb 29, 2016

These are used to filter out non-relevant images from the API result.
"""
url = re.sub(r'api.php', 'index.php', wiki.siteurl)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wiki.siteurl is the api url, not the url of the homepage.

Edit: Oops, somehow I misread the substitution as the reverse. But anyway, to support all wikis, the site url should be retrieved through the api instead of doing it this way.

Edit2: Actually, that might be a safe assumption. I'm not sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The index.php which is used to query for the raw data is (to my knowledge) always located in the same directory as api.php, but perhaps I'm mistaken?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if index.php is always at that path. It probably usually is, at least. I'm not really sure.

@ids1024
Copy link
Owner

ids1024 commented Feb 29, 2016

This made me think it might be good to add support for w3mimagedisplay to (optionally) display images in the terminal, like w3m or ranger can do. I don't know if that would play well with urwid.

That said, opening in an external viewer like in this PR would be good even if such functionality was available, for various reasons (works better, supports all terminals, not a miserable hack, full screen image viewing, only see images when desired, etc.).

@ids1024
Copy link
Owner

ids1024 commented Feb 29, 2016

I think fetchImageTargets should be changed to use the API (something like this, or alternately use a better method for getting the index url) and both it and fetchImageInfo should be moved to the _Article class in wiki.py. That will require changing _Article.__init__ and wiki.search to pass the wiki object as an argument to the article constructor.

But anyway, I think this is a good feature.

@ids1024
Copy link
Owner

ids1024 commented Feb 29, 2016

Oh, or there's this API endpoint: https://www.mediawiki.org/wiki/API:Images

@TiredSounds
Copy link
Contributor Author

First of all thanks for the feedback, working in a vacuum I never know if I'm doing something crazy or not. I'm glad you think this is a worthwhile addition.

  • w3mimagedisplay would be cool but I don't think I can offer much help there :) My experience with urwid is almost zero.
  • Do you mean that I should use the API to fetch the raw markup? If so that makes sense, I didn't realise there was more than one method to do that.
  • In terms of using API:Images to fetch the targets, that won't work as it returns all images (relevant or not, for instance the wikipedia logo and other icons etc) on the page. The point of getting the 'targets' is just to look for the common patterns used for relevant images in the markup. In fact, the whole issue with the API being unable to return relevant images is frustrating. I've seen that it's been asked about by various people, and it would have avoided all this filtering lark. But alas.
  • Moving fetchImageInfo into _Article sounds good. I was unsure where to put these functions as you can see. I'll probably do that in the next commit.

Other stuff:

  • I'd like to support the "Picture of the day" page somehow. It would seem a bit ironic if it didn't!
  • Currently I'm using API:imageinfo to get the urls as you can see. None of the guys on #mediawiki knew an alternative for these specific purposes. Unfortunately imageinfo has known issues, one of which is that there is no way to increase the limit on the number of results returned! The iilimit parameter only affects revisions. This limits us to 10 images per page, but I figure 10 images are better than no images so I've done it this way for now. Please let me know if you think of something. I seem to be getting 6-10 images per page at the moment.

I'll be working on this on and off because of work, but hope to keep the commits coming regularly.

@TiredSounds TiredSounds reopened this Feb 29, 2016
@ids1024
Copy link
Owner

ids1024 commented Feb 29, 2016

Yes, there is an API endpoint for getting the raw markup. If you can use the html instead, though, it would probably be better, since the client has already retrieved that. I don't know if that work work as well.

Ok, I see the problem with API:Images. It's too bad that they didn't make that function more useful. It actually makes me wonder what it is useful for...

Picture of the Day is slightly more complicated, as it is a feed rather than an article. If you didn't already know, you can open it with wikicurses -f potd, but just can't view the pictures. But anyway, feeds are handled by the _Featured class in wiki.py.

@TiredSounds
Copy link
Contributor Author

I won't be able to use the html as it doesn't retain the patterns from the markup (such as [[Image|File:foo bar.png]]) which we need.

I'm removing the Wikipedia limit for the next commit. How would you prefer me to do this try, except? Shall I wrap the call to showImages? I'm also unsure what exceptions to expect at this point actually, and of course it's best not to catch everything...

targets = []

## of the form `[[Image:foobar.png]]`
for match in re.finditer(r'\b(?:File:|Image:)([^]|\n\r]+)', raw):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Namespaces are case-insensitive and can have localized aliases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what specifically you're referring to, could you elaborate?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex matches only [[File:foobar]], but not [[file:foobar]]. Similar for the Image alias. Non-English wikis usually define other aliases, e.g. Bild in German. See this query:
https://de.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespacealiases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah! I see. I'm still new to MediaWiki in general. Hm, well the case issue is easy, but I don't want to resolve the local aliases issue with another query. I'll have to think about a relatively foolproof solution to that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The additional query is probably unavoidable, but it only needs to be issued once for the wiki. So make it a method of Wiki with @lru_cache.

@TiredSounds
Copy link
Contributor Author

I'm back from Haskell land :) Got side tracked a bit.

Current TODO (in rough order of priority):

  • Remove Wikipedia limit, replace with a try; except.
  • Move fetch functions into _Article class
  • Look into merging fetch functions together so they use the same query (if the API allows that)
  • Use API instead of index to fetch the markup
  • Resolve local aliases for the markup with a cached query
  • Picture of the Day

@pintert3
Copy link

Given the length of time this has had no activity, I wonder if it's still being worked on 7 years later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants