[WIP] Ch6 #40

canyon289 · 2019-07-22T15:11:48Z

No description provided.

AlexAndorra · 2019-07-23T09:14:25Z

Thank you for this hard and useful work @canyon289 and @aloctavodia !
Ravin, I can help you if you need? Particularly on chapters 6 (Mixture Models) and 7 (Gaussian Processes). I just finished the exercises on these models for Richard McElreath's book, so it's still fresh in my head 😉
Of course, I can also assist on other chapters, but looks like they are quite ready.

canyon289 · 2019-07-23T14:41:15Z

Hey @AlexAndorra would you be open to reviewing as I finish the exercises? Notebooks are hard to work on in parallel but an extra set of eyes would be helpful!

AlexAndorra · 2019-07-23T15:02:12Z

Sure, would be happy to!
Which chapters are the most pressing for review?

canyon289 · 2019-07-23T15:05:23Z

Any of the ones in PR really. Just a heads up though. Osvaldo and I are reeeaaalllllyyyy slow on this. Like we've taken months, and sometimes take multiple weeks to respond to each other. If were slow with you too please don't be frustrated :)

AlexAndorra · 2019-07-23T15:16:35Z

Ha ha I totally understand! I'm doing that on my spare time too, which is quite cyclical (but, sadly, rather hard to predict, even with a Bayesian model...).
Will take a look at the PRs ASAP (may have to re-read some chapters first 😝).

canyon289 · 2019-07-23T15:18:50Z

No worries. Thank you for your time. We really appreciate your efforts :)

AlexAndorra · 2019-07-24T09:08:00Z

Going through the exercises, a first thought is that it would be useful for readers to have the text of the questions, that are in the book, in addition to the answers.
If @aloctavodia is ok with that, I can do a PR to add it to the 4 notebooks already merged. Not sure I'm able to do it on the NBs currently in review though (maybe with the Review NB app?)...

canyon289 · 2019-07-24T13:39:19Z

If @aloctavodia agrees I can add them into the ones in review to make things easier

AlexAndorra · 2019-07-24T14:50:09Z

That would be great!
Also, allow the Review NB App access to this repo would make the review process easier I think. But I guess we have to wait for our dear Oslvado to do that ;)

aloctavodia · 2019-07-26T19:53:20Z

Happy to see you both working together, I agree with both of you! :-)
I am slowly transitioning from vacation to going back to work :-)

AlexAndorra · 2019-07-26T20:05:11Z

Ha ha good to have you back Osvaldo (and also good luck 😉)! I'll do the PR and reviews ASAP then! FYI, I'm doing the transition in the opposite direction 😜 Le ven. 26 juil. 2019 à 21:53, Osvaldo Martin <notifications@github.com> a écrit :

…

Happy to see you both working together, I agree with both of you! :-) I am slowly transitioning from vacation to going back to work :-) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#40?email_source=notifications&email_token=AHIJMTBRNVR2DZ3EE2CTL6LQBNI3DA5CNFSM4IFZWHQ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD25RYVQ#issuecomment-515578966>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AHIJMTBLKJVKX4RKAD4ZMOTQBNI3DANCNFSM4IFZWHQQ> .

canyon289 · 2019-07-28T19:21:34Z

So either @aloctavodia or @AlexAndorra I could use help! On this section for question 4 to 6 I'm getting a ton of divergences and not really sure what to do about it. Tried increasing tuning and number of samples but no luck :(

AlexAndorra · 2019-07-30T13:12:12Z

Thank you Ravin! I'll try to take a look this week or the next ;)

AlexAndorra · 2019-08-05T08:15:27Z

@aloctavodia do you think you could set up the Review NB App on this repo? I think it would make the review process easier ;) Thank you!

AlexAndorra · 2019-08-06T10:54:33Z

So I managed to take a look at your NB @canyon289 🎉 Here are my (deep) thoughts (sorry it's not very convenient, but without the Review NB app I don't see any other way):

In general, it looks like all the models needed more informative priors. Iteratively and with some prior predictive checks, I think I found some priors (particularly on the Dirichlet) that allow the models to sample quite smoothly (but it's very probable it's not the best solution and that @aloctavodia will come up with something better)
Exercise 1: there were a label-switching problem and divergences in the three models (a few for 2 and 3 clusters, a lot for 4 clusters). Adopting more regularizing priors and ordering the means solved it for 2 and 3 clusters, but the 4-cluster one still has a few divergences, despite my numerous attemtps. I suspect it is overparametrized, as it tries to find 4 clusters within 3-cluster data. Here is the modified model:

with pm.Model() as two_components:
        p = pm.Dirichlet("p", a=np.array([10.]*cluster))

        # Each mean of the mixture data has its own estimate of a mean with a fixed SD in this case
        means = pm.Normal("means", mu=np.linspace(vals.min(), vals.max(), cluster), 
                          sd=10., shape=cluster, transform=pm.distributions.transforms.ordered)
        
        # Estimate of the standard deviation of the whole population
        sd = pm.HalfCauchy("sd", 1.)
        y = pm.NormalMixture("y", w=p, mu=means, sd=sd, observed=vals)
        
        trace = pm.sample(1000, tune=5000, cores=2, random_seed=123)

Exercise 2: I just wanted to note that az.compare(traces) throws me a ValueError: could not broadcast input array from shape (300,1) into shape (300). Is it the same for you? Nothing other than that. The WAIC-LOO comparison still (rightly) ranks the 3-cluster model on top, and the traces look a lot healthier with the new model.
Execise 4: This one is a real head-scratcher... I tried ordering the means and changing the gamma prior but nothing worked. I think the model is misspecified but I really don't know where...
Exercise 5: Here, adopting regularizing priors and ordering the means made the model sample smoothly. The traces look better, although they are not perfect - I think the model has trouble distinguishing two species that have very similar probability. Plus, the third species' probability seems to be close to 0. Here is the model:

with pm.Model() as model_mg:
    
    p = pm.Dirichlet('p', a=np.array([5.]*clusters))
    
    means = pm.Normal('means', mu=np.linspace(sepal_length.min(), sepal_length.max(), clusters), 
                      sd=10., shape=clusters, transform=pm.distributions.transforms.ordered)
    
    sd = pm.HalfNormal('sd', sd=10.)
    
    y = pm.NormalMixture('y', w=p, mu=means, sd=sd, observed=sepal_length)
    
    sepal_trace = pm.sample(1000, tune=6000, cores=2, nuts_kwargs={"target_accept": 0.9})

Exercise 6: Same thing as above, with the same regularizing priors, but I have a question here. If I understood correctly, this model clusters the species based on sepal_width and sepal_length independently (we could actually do two separate models and sample from them independently). Is there a way to use both features at the same time in the same model - i.e cluster the species based both on sepal_width and sepal_length?

canyon289 · 2019-08-07T00:53:59Z

@AlexAndorra Thanks for the feedback here and on ch7

For question 2: I'm using the ArviZ version in the environment.yml file. At one point there was code introduced in ArviZ that created the exception you're referring to but I think its gone now.

For Exercise 6. I'm actually not sure. I thought the way I specified the model is how MultiObserved works in ArviZ. I'll ask on the PyMC discourse to be sure.

As for the rest I appreciate you going through them! I'll implement your suggestions this weekend.

PyMCheers!

AlexAndorra · 2019-08-07T17:26:31Z

You're welcome Ravin ;)

FYI, I'm using Arivz's latest version (0.4.1), which means a novelty broke this specific code... Do I need to open an issue on Arivz's repo?

For exercise 6, my knowledge stops there on the matter, but I'm curious about the answer. I'll follow your question on Discourse!

canyon289 · 2019-08-10T15:26:31Z

Fixed exercise 2 per Alex's notes. Thank you!

canyon289 added 3 commits July 12, 2019 09:21

Start Chapter 6 exercises

e173d76

Add Exercise 1 and 2

285fb9b

Add WIP dp solution

4c9fbdd

canyon289 added 2 commits July 28, 2019 12:08

Add all exercises

acba8b3

Update plots for Chapter 6

7b7cc63

Add Chapter 6 Exercise 2 edits

9880a76

AlexAndorra mentioned this pull request Aug 24, 2019

Add exercises questions to merged chapters #46

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Ch6 #40

[WIP] Ch6 #40

canyon289 commented Jul 22, 2019

AlexAndorra commented Jul 23, 2019

canyon289 commented Jul 23, 2019

AlexAndorra commented Jul 23, 2019

canyon289 commented Jul 23, 2019

AlexAndorra commented Jul 23, 2019

canyon289 commented Jul 23, 2019

AlexAndorra commented Jul 24, 2019 •

edited

canyon289 commented Jul 24, 2019

AlexAndorra commented Jul 24, 2019

aloctavodia commented Jul 26, 2019

AlexAndorra commented Jul 26, 2019 via email

canyon289 commented Jul 28, 2019

AlexAndorra commented Jul 30, 2019

AlexAndorra commented Aug 5, 2019

AlexAndorra commented Aug 6, 2019

canyon289 commented Aug 7, 2019

AlexAndorra commented Aug 7, 2019

canyon289 commented Aug 10, 2019

[WIP] Ch6 #40

Are you sure you want to change the base?

[WIP] Ch6 #40

Conversation

canyon289 commented Jul 22, 2019

AlexAndorra commented Jul 23, 2019

canyon289 commented Jul 23, 2019

AlexAndorra commented Jul 23, 2019

canyon289 commented Jul 23, 2019

AlexAndorra commented Jul 23, 2019

canyon289 commented Jul 23, 2019

AlexAndorra commented Jul 24, 2019 • edited

canyon289 commented Jul 24, 2019

AlexAndorra commented Jul 24, 2019

aloctavodia commented Jul 26, 2019

AlexAndorra commented Jul 26, 2019 via email

canyon289 commented Jul 28, 2019

AlexAndorra commented Jul 30, 2019

AlexAndorra commented Aug 5, 2019

AlexAndorra commented Aug 6, 2019

canyon289 commented Aug 7, 2019

AlexAndorra commented Aug 7, 2019

canyon289 commented Aug 10, 2019

AlexAndorra commented Jul 24, 2019 •

edited