Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch skycultures to the new format #3751

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open

Conversation

10110111
Copy link
Contributor

This set of commits switches Stellarium to the new format of sky cultures used in stellarium-skycultures repo.

The old format is no longer supported, but a tool is provided (util/skyculture-converter) that helps convert an old culture to the new one (with a limited support for conversion of the description, mostly retaining HTML and only changing the heading structure to more or less follow the spec of the new format).

The sky cultures from the sky cultures repo are imported using a script, skycultures/update-skycultures.py.

Among the structural changes to this repo are:

  • skycultures/common_dso_names.fab and skycultures/common_star_names.fab now contain the common names that used to reside in modern_iau culture.
  • po/stellarium-skycultures now keeps translations of culture-specific names, while the common names are translated in po/stellarium-sky.
  • Sky culture descriptions are now translated using the files inside po/stellarium-skycultures-descriptions.
  • No localized description files exist any more, the translations from the single English source happen on the fly, a .po entry per section.
  • Sky cultures aren't supposed to be translated via the old Transifex entries. They are supposed to be handled by the entries working with the sky cultures repo. AFAICT, this hasn't been done yet. All the translations that currenlty exist in that repo were done by Google Translate and weren't edited by hand, with a few exceptions.
    • Something may have to be done with this before merging this PR.
  • The "Modern" sky cultures that existed in Stellarium are called "Western" in that repo, and I didn't change the name when I imported them. One exception is the simple modern culture that I converted to the new format and pushed into that repo, for compatibility with the Stellarium default.
    • Something may have to be done with this before merging this PR. Maybe we could rename the cultures on import, but I'm afraid that the description may also need to be changed, and this implies changes in translations.

The command used was:
for po in po/stellarium-sky/*.po; do msgmerge -o "$po.new" "$po" \
 po/stellarium-sky/stellarium-sky.pot && mv -v "$po.new" "$po"; done
This structure now contains new fields: sky culture ID and full path to
the directory containing the files of the sky culture.

The currentSkyCultureDir field in StelSkyCultureMgr is now useless and
is removed.
It's still quite raw, especially regarding the conversion of
descriptions and their translations, but at least it converts
the other data.
@gzotti
Copy link
Member

gzotti commented May 20, 2024

OMG, translators will hate us for that. Back to start for everything? Review all Google translations again? Any chance to see the old tranlsations?
This of course requires a rewrite of chapter 9 where the new format must be described in full glory.
I am lacking time currently for a thorough test/review, sorry. Please don't rush, should not go before 24.3.

@10110111
Copy link
Contributor Author

Any chance to see the old tranlsations?

The cultures in the external repo have some customized texts, so if we import them, the translations will have to change one way or another.

One way to go would be to start with converting all the current cultures to the new format, and only then replace them with the ones in the external repo. But anyway, something must be done with the translations at some point—now or after the separate import, and this does imply a large review.

Please don't rush, should not go before 24.3.

Yes, I expected this. The change is huge.

@xalioth
Copy link
Member

xalioth commented May 21, 2024

Hello,

OMG, translators will hate us for that. Back to start for everything? Review all Google translations again? Any chance to see the old tranlsations?

I think the old translations for object names (constellations etc..) should be more or less preserved with probably some errors (Ruslan can you confirm this?). But clearly the existing translations for the sky culture descriptions are lost. Most of the translations in the stellarium-skycultures repo were generated with google translate, and I still think auto-translation is the way to go for those long texts, but with better AI-based tools. Some tests I did showed that ChatGPT can perform remarkably well for many languages, much better than google translate (especially when passing a meaningful context in the prompt). For example I don't think I could do a better job than ChatGPT in French.

This of course requires a rewrite of chapter 9 where the new format must be described in full glory.

Yes, the repo already contains a documentation in the README.md. It's not enough but it's a good start.

@alex-w
Copy link
Member

alex-w commented May 21, 2024

The regions in new format (and in Mobile and Web editions) are different in comparison to Desktop edition (or old format) - I think we should use one universal list for regions (at least for SC) for all editions of planetarium.

@10110111
Copy link
Contributor Author

I think the old translations for object names (constellations etc..) should be more or less preserved with probably some errors (Ruslan can you confirm this?).

They don't seem to have been copied from the original sky cultures. E.g. in Anutan original:

#: skycultures/anutan/constellation_names.eng.fab:3
msgid "Bird of Flight"
msgstr "Птица полёта"

#: skycultures/anutan/constellation_names.eng.fab:4
msgid "The Tongs"
msgstr "Щипцы"

and new:

# Anutan constellation, native: Manu
msgid "Bird of Flight"
msgstr "Птица полета"

# Anutan constellation, native: Te Angaanga
msgid "The Tongs"
msgstr "щипцы"

The lack of the dieresis in the first name and failure to capitalize the second one compared to their old versions hint that they were translated independently.

Even worse, there are simply wrong translations, e.g.:

#: skycultures/anutan/constellation_names.eng.fab:10
msgid "Taro Plant"
msgstr "Таро (растение)"

becomes

# Anutan constellation, native: Taro
msgid "Taro Plant"
msgstr "Таро Завод"

Here in the new format the plant (vegetation) is translated with its second meaning (factory), and also is sloppy grammar-wise.

@gzotti
Copy link
Member

gzotti commented May 21, 2024

This is why all these machine translations (which of course have no context) must be marked unreviewed and reviewed (again) by a human with fitting background knowledge. This is a huge effort. Of course, the unreviewed "candidates" can go into the releases as before, to be found by all users. Should we add a "You found a suspect translation? Go to [Transifex] to help!" button to make that even more visible? (Of course also a note in the 24.3/24.4/25.1 release notes, but who reads them :-) The user translation again needs review/approval, of course.

@xalioth
Copy link
Member

xalioth commented May 21, 2024

I think it's better to improve the context passed to ChatGPT until everything is correct in the languages we know like Russian, German and French. Then use the same context for all languages to minimize the amount of errors.

Note that when I created the new format I tried to re-use the existing translations as much as I could, so I am not sure why it diverged in your examples..

@gzotti
Copy link
Member

gzotti commented May 21, 2024

Major SCs may have "canonical" translations in use for decades in the major languages where relevant books appear. These should be preferred (with a note like "German translations following X.Y. (1976)"!) over self-made translation dabbles or AI tools.

@sushoff
Copy link
Contributor

sushoff commented May 21, 2024

Immediate reactions/ thoughts:

  1. I definitely appreciate the new format
  2. I agree that machine translations are the future (I'm using DEEPL which is said to be better than GoogleTranslate but for implementing it in websites, e.g., requires a costly licence - unlike GoTranslate). People who want to read something in a foreign language typically know that the translation is not perfect but it still helps a lot, eases reading and makes us faster if we only have to cross-check the mysterious parts
  3. it's great that the new format comes with a translator from the old format. I guess you developed your translator independently. I believe to remember that Doina told me in November that she has a translator - possibly exchange with her (for review and different ideas/ feature comparison)?
  4. we need to translate all "old" SCs to the new format and compare them - some of them are updated only in the old format or the new - or is there a versioning/ version-comparison / merge-tool implemented in the code-translator?
  5. thanks for the reminder that the User Guide needs to be changed (that's life) but I don't think that's a problem: a "readme" is a good start and then somebody who tries and fails to do sth. (e.g. if I try to contribute a new SC or rework an existing one...) could write the chapter according to this experience. If the release of the new format is earlier, we should just delete the chapter in the User Guide (for the time being).

@sushoff
Copy link
Contributor

sushoff commented May 21, 2024

Further comments on the format

  1. some descriptions provid a list of all names (const., stars) with more information - e.g. concerning the (un)certainty of their identification or star lore or other cultural background. some of these lists are illustrated. Consider, for instance, the "Egypt Dendera" SC: the description contains a table with all constellation images. These are, of course, the same iimages that are used to be plotted into the map. BUT the new format now has two versions of them: one in the subfolder "illustrations" that is used in the map display and one directly in the folder "egypt_dendera". This is both a) ugly in terms of data storage and b) prone to mistakes if the SC is reworked: then, the image needs to be exchanged at two places.
  2. the ugliness ;) is reduced in the version we observe in "greek_leidenAratea" where the image for the description are stored in a separate folder - still the second issue (prone to errors) remains.

Can we find a solution for these cases to use the image in the "illustration" folder directly in the description?

This concerns the following SCs:
Aztec, Egypt:_dendera, Greek_Farnese, Greek-Leiden, Hawaiian, Maya, Northern Andes, Seri, Tibetian

  1. Furthermore:

Should we define a sort of template or "standard" ("one to rule them all" will not really work but maybe guidelline?) for the description

  1. due to the merging/ copying process, there are now "MODERN" and "WESTERN" which does not really make sense. I still vote for "MODERN" because "western" is only defined per epoch (e.g. "east/west roman empire" or "east/west Franconia" or "eastern/ western Han" (as times) or "east/west church"=Rome vs. orthodox for the definition of the easter date etc., or "east/west of the iron curtain" ...) and does not at all make sense on trans-epochal and global scale.

@gzotti
Copy link
Member

gzotti commented May 21, 2024

I think in our context "Western" has always predated the Iron Curtain meaning by centuries. What is commonly understood by "western" is European scholarship from the age of enlightenment but rooted in European antiquity (traditionally executed in universities and Academies of Science from Lissabon to St. Petersburg), as opposed to e.g. Islamic, Chinese, Indian, and indigenous traditions in other continents which are, in western scholarship, usually dealt with in "ethnographic studies".

Still, we have agreed to rename all Western* to Modern*.

@xalioth
Copy link
Member

xalioth commented May 21, 2024

  1. some descriptions provid a list of all names (const., stars) with more information - e.g. concerning the (un)certainty of their identification or star lore or other cultural background. some of these lists are illustrated. Consider, for instance, the "Egypt Dendera" SC: the description contains a table with all constellation images. These are, of course, the same iimages that are used to be plotted into the map. BUT the new format now has two versions of them: one in the subfolder "illustrations" that is used in the map display and one directly in the folder "egypt_dendera". This is both a) ugly in terms of data storage and b) prone to mistakes if the SC is reworked: then, the image needs to be exchanged at two places.
  2. the ugliness ;) is reduced in the version we observe in "greek_leidenAratea" where the image for the description are stored in a separate folder - still the second issue (prone to errors) remains.

Can we find a solution for these cases to use the image in the "illustration" folder directly in the description?

This concerns the following SCs: Aztec, Egypt:_dendera, Greek_Farnese, Greek-Leiden, Hawaiian, Maya, Northern Andes, Seri, Tibetian

Yes, we should use the images from the illustrations/ subfolder directly in the description. There is nothing preventing this from a technical point of view. In general in the new format I really encourage to avoid adding a section dedicated to each constellations outside the already existing ## Constellations section. The code then cross-match the content with the content of the index.json file, so it's usually not even necessary to link to the image at all.

  1. Furthermore:

Should we define a sort of template or "standard" ("one to rule them all" will not really work but maybe guidelline?) for the description

It's already like that. The template for the markdown file has a strict structure with mandatory sections.

@xalioth
Copy link
Member

xalioth commented May 21, 2024

I think in our context "Western" has always predated the Iron Curtain meaning by centuries. What is commonly understood by "western" is European scholarship from the age of enlightenment but rooted in European antiquity (traditionally executed in universities and Academies of Science from Lissabon to St. Petersburg), as opposed to e.g. Islamic, Chinese, Indian, and indigenous traditions in other continents which are, in western scholarship, usually dealt with in "ethnographic studies".

Still, we have agreed to rename all Western* to Modern*.

Yes.. In Stellarium Mobile we didn't switch because this work predated the renaming. I am a bit worried to do that now because in practice the "Modern" name seems to be annoying some users.. I have seen angry emails.. But I guess we will also need to switch.. Hopefully we won't receive too many bad reviews..

@10110111
Copy link
Contributor Author

People who want to read something in a foreign language typically know that the translation is not perfect

Everyone I know who uses localized software expects the translations to be good—at least made by people who speak both the source and the target languages. They definitely don't think of it as "reading something in a foreign language". Moreover, many users don't even read in foreign languages well enough (or at all) to be able to cross-check anything.

In my view, using an unedited machine translation is just a mark of poor quality of the product (which unfortunately applies to lots of commercial software nowadays, even those products that used to have great localizations two decades ago).


Anyway, I'm now going to switch to a bit more conservative approach for this PR and convert all "old" sky cultures to the new format, so that we could handle the switch to the new ones in a separate thread, with all the problems of the translations.

@gzotti
Copy link
Member

gzotti commented May 21, 2024

Still, we have agreed to rename all Western* to Modern*.

To be more precise, "Modern" are those from the 20th century and later that obey IAU constellations and borders. These are our default and some variants ("single presentations" after Rey, S&T, Hlad, others?). What did we decide on European 17-19th century atlases? (Or are they just "Hevelius", "Bayer", "Bode (1782)", "Bode (1801)" etc.?)

In this respect, we could still call our default (classic Stellarium) "Default" or even "Stellarium", pointing out the originality of Johan's figure set [which has been taken over successfully outside the project] and giving us all liberties about what to include, and the others "Modern-S&T", "Modern-Rey" etc.

@sushoff
Copy link
Contributor

sushoff commented May 21, 2024

I think in our context "Western" has always predated the Iron Curtain meaning by centuries. What is commonly understood by "western" is European scholarship from the age of enlightenment but rooted in European antiquity (traditionally executed in universities and Academies of Science from Lissabon to St. Petersburg), as opposed to e.g. Islamic, Chinese, Indian, and indigenous traditions in other continents which are, in western scholarship, usually dealt with in "ethnographic studies".

Still, we have agreed to rename all Western* to Modern*.

your opinion!

in reseach "western" is used in the recent decades by scholars west of the iron curtain (=western europe + n.america)

@sushoff
Copy link
Contributor

sushoff commented May 21, 2024

People who want to read something in a foreign language typically know that the translation is not perfect

Everyone I know who uses localized software expects the translations to be good—at least made by people who speak both the source and the target languages. They definitely don't think of it as "reading something in a foreign language". Moreover, many users don't even read in foreign languages well enough (or at all) to be able to cross-check anything.

In my view, using an unedited machine translation is just a mark of poor quality of the product (which unfortunately applies to lots of commercial software nowadays, even those products that used to have great localizations two decades ago).

Anyway, I'm now going to switch to a bit more conservative approach for this PR and convert all "old" sky cultures to the new format, so that we could handle the switch to the new ones in a separate thread, with all the problems of the translations.

hmmmm...
yes, you're right: software is different. The cases that I faced recently was websites (e.g. institutional websites where we are looking for information). There, we all agreed that people with poor language skills can still understand the website in a foreign language. for instances, the NASA and other US institutes provide terrific educational & outreach material (of course, in English). With AI-translation, a Spanish primary school teacher can still use this.

Thinking of software: I think, you are right, that's a bit different. we expect the translation to be good enough that we don't need to understand the technology before reading the text that explains it (which makes the text useless).

@gzotti
Copy link
Member

gzotti commented May 21, 2024

So, is Western Physics much different from Physics researched in Beijing?

@sushoff
Copy link
Contributor

sushoff commented May 21, 2024

So, is Western Physics much different from Physics researched in Beijing?

in my childhood, we called it "modern physics"/ "modern science" and not "western": that's what I am saying. if you want to politically frame a term (which was done in this time), you need to find differnt terms for things that have nothing to do with the negatively framed terms: like science.

China has confuzianism in addition to modern physics.

@sushoff
Copy link
Contributor

sushoff commented May 21, 2024

I think in our context "Western" has always predated the Iron Curtain meaning by centuries. What is commonly understood by "western" is European scholarship from the age of enlightenment but rooted in European antiquity (traditionally executed in universities and Academies of Science from Lissabon to St. Petersburg), as opposed to e.g. Islamic, Chinese, Indian, and indigenous traditions in other continents which are, in western scholarship, usually dealt with in "ethnographic studies".
Still, we have agreed to rename all Western* to Modern*.

Yes.. In Stellarium Mobile we didn't switch because this work predated the renaming. I am a bit worried to do that now because in practice the "Modern" name seems to be annoying some users.. I have seen angry emails.. But I guess we will also need to switch.. Hopefully we won't receive too many bad reviews..

yes, I hope so, too... maybe point them to me in this case.

In the 1990s we (east-germans) have undergone a linguistic re-education: suddenly, many terms were used differently and some terms were "forbidden" or meant sth. else ... as this influenced me rather deeply, I think a lot about the terms. I certainly do not want to 'always go back' but in contrast, I am embracing change. However, I think, in some cases the "newer" version does not really make sense. In case of the "western", I have the impression that it is both, a) too politically charged in whatever direction ('good' for one is 'bad' for others) and b) sometimes really confusing (because, e.g.. depending on the context "western" means different things: sometimes, I really have to think about the meaning of a sentence).

@gzotti
Copy link
Member

gzotti commented May 21, 2024

Sure, you call that my opinion. But I feel I am not alone. The rest of the world still uses and understands the term "Western Science" without problems. Quick example: https://en.wikipedia.org/wiki/The_Beginnings_of_Western_Science

This is fully non-political. Sorry, but maybe it was your childhood experience that was politicized by the powers around you then, when everything from the "West", even the European science tradition, had to be presented in a bad light or needed a new name in the GDR. But even the Soviet A-bomb is based on "Western" 20th century physics. (Not only thanks to Klaus Fuchs. The physics behind it was discovered in the European physics tradition of science, in North America, while in Germany a non-Einsteinian "German Physics" was tried and failed instead. There is probably just one unpolitical way nature behaves, and our scientific understanding (call it European, Western, Modern or what you want) seems to provide the best model, despite shortcomings).

The political East/West separation is a post-1945 (no "Eastern Block" before that) thingy that we had all hoped to have overcome in 1991. Before that there was of course the Christian East/West divide which had a strong influence in traditions and beliefs, but royal courts were closely related from UK to Russia, which of course was also an imperialistic monarchy by undisputed Grace of God that tried its best to be European (Western). I cannot say whether "East" was then not rather understood as "oriental, Ottoman" etc.

OK, we have gone largely off-topic, and I would stop here. Above, I had suggested possibly renaming our own default "Modern" SC into "Stellarium" (to give us all liberties on style and displayed objects), and use Modern-* for those IAU-constellation aware SCs where traces of Western-* naming may still be found. I did not suggest renaming anything back to Western-* because of your expected opposition, although almost everybody was OK with that name.

@sushoff
Copy link
Contributor

sushoff commented May 22, 2024

Sure, you call that my opinion. But I feel I am not alone. The rest of the world still uses and understands the term "Western Science" without problems. Quick example: https://en.wikipedia.org/wiki/The_Beginnings_of_Western_Science

This is fully non-political. Sorry, but maybe it was your childhood experience that was politicized by the powers around you then, when everything from the "West", even the European science tradition, had to be presented in a bad light or needed a new name in the GDR. But even the Soviet A-bomb is based on "Western" 20th century physics. (Not only thanks to Klaus Fuchs. The physics behind it was discovered in the European physics tradition of science, in North America, while in Germany a non-Einsteinian "German Physics" was tried and failed instead. There is probably just one unpolitical way nature behaves, and our scientific understanding (call it European, Western, Modern or what you want) seems to provide the best model, despite shortcomings).

The political East/West separation is a post-1945 (no "Eastern Block" before that) thingy that we had all hoped to have overcome in 1991. Before that there was of course the Christian East/West divide which had a strong influence in traditions and beliefs, but royal courts were closely related from UK to Russia, which of course was also an imperialistic monarchy by undisputed Grace of God that tried its best to be European (Western). I cannot say whether "East" was then not rather understood as "oriental, Ottoman" etc.

OK, we have gone largely off-topic, and I would stop here. Above, I had suggested possibly renaming our own default "Modern" SC into "Stellarium" (to give us all liberties on style and displayed objects), and use Modern-* for those IAU-constellation aware SCs where traces of Western-* naming may still be found. I did not suggest renaming anything back to Western-* because of your expected opposition, although almost everybody was OK with that name.

The term "Oriental" also depends from context: sometimes it is China, sometimes it West Asia. that is why there are terms like "Near East", "Middle East" and "Far East" which don't make sense.... "east" and "west" are defined by Aristotle as directions (since more than 2000 years clear). The sense comes in when you define the vertex where the vector starts. ... I really have more important things to do.

Let's just happily disagree ... we will never have a consensus here.

@github-actions github-actions bot added the has conflicts The pull request has conflicts label May 30, 2024
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
has conflicts The pull request has conflicts
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants