Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Localization of scikit-image website content. #7296

Open
steppi opened this issue Jan 15, 2024 · 4 comments
Open

Localization of scikit-image website content. #7296

steppi opened this issue Jan 15, 2024 · 4 comments
Labels
📄 type: Documentation Updates, fixes and additions to documentation 🤖 type: Infrastructure CI, packaging, tools and automation 💬 Discussion

Comments

@steppi
Copy link

steppi commented Jan 15, 2024

Hi,

I'm working for Quansight labs, helping set up infrastructure for translating content from the websites for core scientific python packages as part of the CZI Scientific Python Community & Communications Infrastructure. @jarrodmillman and @stefanv were authors one and two on the grant proposal, but I'll give an overview for the sake of everyone else reading this.

The goal is to translate the brochure websites of at least 8 of the Scientific Python core projects into at least 3 commonly used languages. The list of them can be found here. By "brochure website", I mean the project website that give a general overview of the package, as distinct from technical documentation like API references, examples, and tutorials. For scikit-image this is https://scikit-image.org/.

So far translations have been completed and published for https://numpy.org. I've recently reached out to Pandas (pandas-dev/pandas#56301 (comment)) and scikit-learn (scikit-learn/scikit-learn#28105), and plan to reach out to maintainers from the remaining core projects over the next week. There's a lot of work involved in setting up translation infrastructure, finding coordinating with qualified translators, and approving and publishing translated content. The hope is that a cross-functional team including employees from Quansight together with volunteer translators and reviewers could take on much of the burden, minimizing the effort needed from core project maintainers themselves.

For translation management, we've been using Crowdin enterprise. Crowdin have generously offered a free supported enterprise organization we can use for managing translations across the different projects. So far the support has been excellent. Crowdin can be synced with a GitHub repo containing content, with segmented strings of content being uploaded to Crowdin for translation, and translations sent back to the repo as commits to a running PR. For numpy.org, Crowdin was synced to directly to the repo https://github.com/numpy/numpy.org hosting the website content. Based on things that have come up in the discussions with Pandas and scikit-learn maintainers, it seems would be better to have a separate repo for managing translated content.

I'm just interested in getting the ball rolling here, and will give more info as things develop over the next coming weeks. Here's a summary of the steps I think would be involved:

  1. Set up a repository for managing content that should be translated, with an automated process to get the latest content whenever changes are made. There may be multiple repos where content needs to be taken from. (For scikit-image much of it is in the docs folder from the primary repo, but I think at the least the index is in https://github.com/scikit-image/skimage-web.)

  2. Set up Crowdin integration with this repository. Markdown files can be segmented automatically, gnu gettext can be used for sphinx .rst files to generate .po files as described here https://www.sphinx-doc.org/en/master/usage/advanced/intl.html.

  3. Myself and/or colleagues from Quansight will help take care of finding and vetting interested and qualified translators, and there will hopefully be large overlap between the translators for different projects.

  4. Publishing translations on the core project website, with a drop down selector to choose between languages. How this is done will depend on the static site generator used. For sites using the Scientific Python Hugo theme (thanks @jarrodmillman and @stefanv) like numpy.org, setting this up is almost automatic. I've found that scikit-image is using the pydata-sphinx theme. There, I think the version selector could be used, or code could be copied from it to make a separate language selector.

Please let me know if you have any questions, especially from those who are much more knowledgeable than me about much of this stuff, and would probably like to hear more specifics.

@lagru
Copy link
Member

lagru commented Jan 15, 2024

Thanks for reaching out @steppi and for the overview!

A few thoughts / comments from my site:

And a few questions:

  • While an (updated) document is being translated, is it practice to publish the source document already?

  • It looks like NumPy moved the translated pages, such as the installation guide, from its main doc to https://numpy.org and under the umbrella of the Hugo theme. I'm curious how their experience with this has been. I worry about uncoupling our installation guide from the rest of the documentation which is versioned.

@lagru lagru added 📄 type: Documentation Updates, fixes and additions to documentation 💬 Discussion 🤖 type: Infrastructure CI, packaging, tools and automation labels Jan 15, 2024
@steppi
Copy link
Author

steppi commented Jan 15, 2024

Thanks @lagru.

  • For our main documentation, we are indeed using the pydata-sphinx theme. The theme seems to have its own internationalization support but I'm not sure how mature that is.

That seems helpful. It could simplify the simplify the process of generating the .po files containing strings segmented for translation, compared with vanilla sphinx.

If this was something already planned, it would definitely simplify adding translations. Let me know if there's anything I could do to help out with that.

  • While an (updated) document is being translated, is it practice to publish the source document already?

Yes, this is what we're doing for https://numpy.org, except in a few cases where the translations were added together with the English language update, (for example, the announcement that translations have been added.). After a change is merged into main, new strings are uploaded to Crowdin for translation, and translators are notified that there's more work to do. I guess it's not ideal, but having to coordinate with translators before making updates seems like it would add a lot of overhead for maintainers.

  • It looks like NumPy moved the translated pages, such as the installation guide, from its main doc to https://numpy.org and under the umbrella of the Hugo theme. I'm curious how their experience with this has been. I worry about uncoupling our installation guide from the rest of the documentation which is versioned.

I don't think there have been any issues for numpy.org or scipy.org. Someone has to stay on top of things to make sure the website isn't neglected when changes are being made, and although I haven't really been involved in that side od things, my impression is that it isn't too bad. Updating the website needs to be part of process of producing a new release. At the least, an announcement with release highlights should always be added to the news section, and making any necessary changes to the installation guide should be part of the checklist when updating the website.

Let me know if you have any other questions.

@lagru
Copy link
Member

lagru commented Jan 16, 2024

Okay, I am glad then that this will probably not slow down updating documentation directly. If source version and translated version are out of sync, I guess there isn't some mechanism to make this transparent on the website?

Updating the website needs to be part of process of producing a new release.

I am more worried about something else. If we split into sphinx-generated documentation for which we maintain previous versions but put some documents on a static website without a version switcher, then we loose access to older versions of those documents. E.g. if we moved our user guide there, users wouldn't have easy access to guides for previous versions. This isn't necessarily a blocker for me, but maybe a trade-off in a few cases.

How does NumPy address this? It seems they solve this problem by duplicating certain parts..?

@steppi
Copy link
Author

steppi commented Jan 16, 2024

I am more worried about something else. If we split into sphinx-generated documentation for which we maintain previous versions but put some documents on a static website without a version switcher, then we loose access to older versions of those documents. E.g. if we moved our user guide there, users wouldn't have easy access to guides for previous versions. This isn't necessarily a blocker for me, but maybe a trade-off in a few cases.

How does NumPy address this? It seems they solve this problem by duplicating certain parts..?

Ah, I see your point. I agree there's a trade-off here. There are things which clearly need to be versioned, and these should go in the documentation. Ideally, the brochure website should contain content which is unlikely to change frequently. There is common info between the brochure website and the documentation, but instead of thinking of this as complete duplication, I think it's more that the brochure website should contain broad summaries and documentation fleshes thing out. Specific details are liable to change, but I the website should just give a general idea which should be more static.

I think past versions of numpy.org may be interesting for historical reasons, but typically information drops off when it's no longer relevant. e.g. links to tutorials which no longer exist, installation info which is out of date, links to communication channels which no longer exist. For historical research, the internet archive seems sufficient, https://web.archive.org/web/20240115000000*/numpy.org. Any information which is tied to specific versions, and will remain relevant for those versions into the future should go in the documentation though.

In any case. I think this discussion about the website is separate from the translation issue. We can still set up the translation infrastructure for the current website as is, and any heavy lifting I'd need to do, I'll need to do anyway for other projects which will continue to generate their websites with sphinx and host the code on the primary repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📄 type: Documentation Updates, fixes and additions to documentation 🤖 type: Infrastructure CI, packaging, tools and automation 💬 Discussion
Projects
None yet
Development

No branches or pull requests

2 participants