Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chapter one / not understanding pd.read parameters #588

Open
KhidirA opened this issue Aug 25, 2020 · 3 comments
Open

chapter one / not understanding pd.read parameters #588

KhidirA opened this issue Aug 25, 2020 · 3 comments

Comments

@KhidirA
Copy link

KhidirA commented Aug 25, 2020

hey all
I'm trying to run the code example from chapter one in the book
I know it said I have to have an idea about the libraries (which I kind of do from a coursera machine learning course) but I failed to understand line 8
I know the first parameter if locating the file but what is the second one do?
also can anyone explain the next line too what are the parameters mean?

@Praful932
Copy link

Hi @KhidirA Could you specify which notebook or better paste the code in codeblocks here

@pdx97
Copy link

pdx97 commented Jan 17, 2021

@KhidirA exactly which parameter are you not able to understand can you show the code here and the exact line number .

@ageron
Copy link
Owner

ageron commented Mar 24, 2021

Hi @KhidirA ,

If I understand correctly you were confused about the arguments to the pd.read_csv() function in chapter 1:

oecd_bli = pd.read_csv(datapath + "oecd_bli_2015.csv", thousands=',')
gdp_per_capita = pd.read_csv(datapath + "gdp_per_capita.csv",thousands=',',delimiter='\t',
                             encoding='latin1', na_values="n/a")

This function loads a CSV file. Here's what the arguments mean:

  • The first argument is the path to the file we want to load.
  • The thousands=',' argument specifies that "1,000,000" should be interpreted as "1000000" ( = one million).
  • The delimiter='\t' argument specifies that the fields in the CSV file are separated by tabs (\t) not commas. So the files are actually TSV (tab-separated values) files instead of CSV (comma-separated values) files.
  • The encoding='latin1' argument means that the files are encoded using the Latin-1 encoding. If you don't know what text encoding is, please check out this introduction.
  • Lastly, the na_values='n/a' argument says any field equal to "n/a" should be considered as an unspecified value.

If you search "pandas read_csv" on Google, you'll find this documentation page which explains these arguments as well as many others.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants