-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[wip] A command which displays images from the article #5
base: master
Are you sure you want to change the base?
Conversation
Summary of changes: - Added three functions which implement the image display: `showImages`, `fetchImageTargets`, `fetchImageUrls`. - Added :image command which calls `showImages`. - Modified _query to accept a custom URL kwarg. (necessary for `fetchImageTargets`.) Description: The new :image command calls a program with the fetched images as a list of space separated URLs. By default this is `feh` as it is a lightweight and nifty image viewer. As of this commit, this feature only works on Wikipedia. I may try to extend that, though it may prove difficult as the API is rather lacking in this department, and different wikis seems to use images differently. I'll need to do some more research. The program and arguments (flags etc) can be specified in the config file. For example: [images] program = ristretto arguments = foo, bar, baz This will lead to the following being called upon execution: subprocess.Popen(['ristretto', 'foo', 'bar', 'baz', 'url1', 'url2',...]) TODO: More testing. Try to improve coverage so that we get more relevant images. Implement some kind of error reporting as there is literally zero as of this commit.
`findImageTargets` now looks for another common notation used by Wikipedia for images, which generally takes the form: """image = foobar.ext image1 = barfoo.ext image2=barbaz.ext""" The default arguments given to `feh` have changed: `--scale-down` has replaced `--zoom fill`.
`page.title` now passed as an arg to `fetchImageTargets` and `fetchImageUrls` to make memoization possible.
As opposed to a comma and space separated list. The change is reflected in the man page.
The results of `fetchImageTargets` are now matched with the titles of the images from the current page, and the corresponding URL is used. This is as opposed to matching the targets with the URLs directly. This has increased coverage on certain pages. Supporting changes: - `fetchImageUrls` name changed to `fetchImageInfo`. - `fetchImageInfo` returns a list of 2-tuples with (title, url).
|
||
These are used to filter out non-relevant images from the API result. | ||
""" | ||
url = re.sub(r'api.php', 'index.php', wiki.siteurl) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wiki.siteurl
is the api url, not the url of the homepage.
Edit: Oops, somehow I misread the substitution as the reverse. But anyway, to support all wikis, the site url should be retrieved through the api instead of doing it this way.
Edit2: Actually, that might be a safe assumption. I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The index.php which is used to query for the raw data is (to my knowledge) always located in the same directory as api.php, but perhaps I'm mistaken?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if index.php
is always at that path. It probably usually is, at least. I'm not really sure.
This made me think it might be good to add support for That said, opening in an external viewer like in this PR would be good even if such functionality was available, for various reasons (works better, supports all terminals, not a miserable hack, full screen image viewing, only see images when desired, etc.). |
I think But anyway, I think this is a good feature. |
Oh, or there's this API endpoint: https://www.mediawiki.org/wiki/API:Images |
First of all thanks for the feedback, working in a vacuum I never know if I'm doing something crazy or not. I'm glad you think this is a worthwhile addition.
Other stuff:
I'll be working on this on and off because of work, but hope to keep the commits coming regularly. |
Yes, there is an API endpoint for getting the raw markup. If you can use the html instead, though, it would probably be better, since the client has already retrieved that. I don't know if that work work as well. Ok, I see the problem with Picture of the Day is slightly more complicated, as it is a feed rather than an article. If you didn't already know, you can open it with |
I won't be able to use the html as it doesn't retain the patterns from the markup (such as I'm removing the Wikipedia limit for the next commit. How would you prefer me to do this try, except? Shall I wrap the call to |
targets = [] | ||
|
||
## of the form `[[Image:foobar.png]]` | ||
for match in re.finditer(r'\b(?:File:|Image:)([^]|\n\r]+)', raw): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Namespaces are case-insensitive and can have localized aliases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what specifically you're referring to, could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex matches only [[File:foobar]]
, but not [[file:foobar]]
. Similar for the Image
alias. Non-English wikis usually define other aliases, e.g. Bild
in German. See this query:
https://de.wikipedia.org/w/api.php?action=query&meta=siteinfo&siprop=namespacealiases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah! I see. I'm still new to MediaWiki in general. Hm, well the case issue is easy, but I don't want to resolve the local aliases issue with another query. I'll have to think about a relatively foolproof solution to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The additional query is probably unavoidable, but it only needs to be issued once for the wiki. So make it a method of Wiki
with @lru_cache
.
I'm back from Haskell land :) Got side tracked a bit. Current TODO (in rough order of priority):
|
Given the length of time this has had no activity, I wonder if it's still being worked on 7 years later |
I accidentally hit enter when first creating this!
Please allow me some time to do a proper write up for this, and I'll reopen it.
EDIT: Sorry for the wait, I've been at work. I'll respond to your comments below. I guess the writeup isn't necessary now that there is a dialogue going.