Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dimensionality Reduction T4 - Use of problematic Iris dataset #364

Open
da5nsy opened this issue Jul 17, 2020 · 1 comment
Open

Dimensionality Reduction T4 - Use of problematic Iris dataset #364

da5nsy opened this issue Jul 17, 2020 · 1 comment
Labels
Content-Issue Problems and clarifications in the content (e.g., text, videos, slides)

Comments

@da5nsy
Copy link

da5nsy commented Jul 17, 2020

W1D5 Dimensionality Reduction Tutorial 4: Part 1
https://youtu.be/2Zb93aOWioM?t=147

https://en.wikipedia.org/wiki/Iris_flower_data_set:

Fisher's paper was published in the journal, the Annals of Eugenics, creating controversy about the continued use of the Iris dataset for teaching statistical techniques today.

https://armchairecology.blog/iris-dataset/

One of the points of the paper (and of the journal, and of Fisher’s leading role in developing biometry and biostatistics) was to propose a methodological framework to delineate desirable traits, in support of eugenics programs. One does not publish in the Annals of Eugenics in 1936 on a misunderstanding.

A penguin-based alternative:
https://twitter.com/allison_horst/status/1270046399418138625
https://allisonhorst.github.io/palmerpenguins/articles/pca.html

Palmer Penguins is an R package but there are instructions for using it in Python here:
https://towardsdatascience.com/data-analysis-in-python-getting-started-with-pandas-8cbcc1500c83
I understand that pandas is banned here, but I'd be shocked if this hasn't been added into a package that is already used (and if it hasn't, could it be?)

Other non-penguin based alternatives are probably also available.

@mwaskom mwaskom added the W1D5 label Jul 17, 2020
@mwaskom
Copy link
Contributor

mwaskom commented Jul 17, 2020

Penguins is indeed very fun and serves the same pedagogical goals.

@spirosChv spirosChv added W1D4 and removed W1D5 labels Jun 18, 2022
@spirosChv spirosChv changed the title W1D5T4 - Use of problematic Iris dataset Dimensionality Reduction T4 - Use of problematic Iris dataset Jun 18, 2022
@spirosChv spirosChv added Tutorial Content-Issue Problems and clarifications in the content (e.g., text, videos, slides) and removed W1D4 labels Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Content-Issue Problems and clarifications in the content (e.g., text, videos, slides)
Projects
None yet
Development

No branches or pull requests

3 participants