Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

02: Filtering and Sorting/Chipotle, Step-4 & 5 #65

Open
realyashnag opened this issue Oct 19, 2018 · 7 comments
Open

02: Filtering and Sorting/Chipotle, Step-4 & 5 #65

realyashnag opened this issue Oct 19, 2018 · 7 comments

Comments

@realyashnag
Copy link

realyashnag commented Oct 19, 2018

In 02-Filtering_and_Sorting/Chipotle, step 4 and 5,

  • Solution doesn't consider items which do not have a 'quantity'==1 in the data
  • They can be extracted by
    chipo['item_price'] = chipo['item_price']/chipo['quantity']
    chipo['quantity'] = 1 #Dividing item_price by quantity, therefore let quantity be 1
    chipo.drop_duplicates(['item_name'], keep='first', inplace=True)
    chipo.sort_values(by='item_price', ascending=False, inplace=True)
    display(chipo[['item_name', 'item_price']])

I'm also a beginner at Pandas, please let me know about any stupid thing that I missed. Thanks.

@bobfang1992
Copy link

I agree with you. I think the answer provide is simply wrong.

@maticalderini
Copy link

I agree too. However, to get the number of products costing more than $10.00, I believe you could use a simpler command:
chipo.loc[chipo['item_price']/chipo['quantity'] > 10, 'item_name'].drop_duplicates().shape[0]
Hope this helps other newcomers.

@bromero26
Copy link

I came up with the same issue. The question needs to be reconsidered, or the proposed solution changed. I believe the number of unique products with a price higher than 10 is 31. I use this code, after changing the column to float:
chipo[chipo['item_price']>10]['item_name'].nunique()

@rahimnathwani
Copy link

@bromero26 I get the same answer as you using a slightly more verbose method:

min_max_price_per_item = chipo.groupby('item_name').agg({'item_price': [np.max, np.min]})
min_max_price_per_item[min_max_price_per_item.item_price.amax > 10].shape[0]

But all of those high prices are caused by extras or specific configurations. If you stick to the basics, any item can be had for less than $10:

min_max_price_per_item[min_max_price_per_item.item_price.amin > 10].shape[0]

The question is not well-formed. It could be asking:

  1. Which products did at least one person order for more than $10? (A: 31)
  2. Which products always cost at least $10, regardless of choice_description? (A: 0)
  3. Which product combinations (combination of item_name and choice_description) cost at least $10? (A: 777)

@AndreaAmico
Copy link

@rahimnathwani I agree with you, the question is not well-formed. Still, I believe that @matiascalderini suggestion is correct since price vs quantity seems quite linear. Check using water bottles as an example:

(
    chipo.query('item_name == "Bottled Water"')[['quantity', 'item_price']]
    .groupby('quantity')
    .agg(['mean', 'std'])
    .item_price
    .reset_index()
    .plot(x='quantity', y='mean', yerr='std', kind='scatter')
)

Normalizing the cost I get these values:

chipo['price_per_item'] = chipo.item_price/chipo.quantity
A1 = chipo.query('price_per_item > 10').item_name.nunique()
A2 = (chipo.groupby('item_name').price_per_item.min()>10).sum()
chipo['name_with_variants'] = chipo.item_name+chipo.choice_description
A3 = (chipo.groupby('name_with_variants').price_per_item.min()>10).sum()

print(f'A1:{A1}, A2:{A2}, A3:{A3}')

A1:25, A2:0, A3:707

@guipsamora
Copy link
Owner

guipsamora commented Oct 13, 2019

Hi everyone, thank you for the comments and feedback. I agree that this question is not so clear too.

Some clarifications:

  1. There is a clear distinction of order_id, quantity and product. Example, in a same order_id, you can ask a product in a quantity greater than 1, which will influence the price.

Example:
order _id | quantity | item_name | choice_description | item_price
9 | 2 | Canned Soda | [Sprite] | $2.18
14 | 1 | Canned Soda | [Dr. Pepper] | $1.09

Canned Soda costs $1.09. If I buy 10 sodas, the line will show up $10.90, which is greater than $10, but that doesn't mean that the product Canned Soda costs more than $10.

That is the reason that quantity needs to be considered for this exercise.

  1. In order to simplify the exercise take the combination of item_name + choice_description as "one product".

Example:

order _id | quantity | item_name | choice_description | item_price
12 | 1 | Chicken Burrito | [[Tomatillo-Green Chili Salsa (Medium), Tomati... | $10.98
8 | 1 | Chicken Burrito | [Tomatillo-Green Chili Salsa (Medium), [Pinto ... | $8.49

"Chicken Burrito" is the "main" product but depending on the additional items it will cost more or less than $10, so to simplify take the combination item_name + choice_description as "one product".

Considering that what is your suggestion? Send me a PR! 😉

@newera-001
Copy link

In the getting and knowing your data part ,the url donnot work.what should I do?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants