Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spatiol Temporal Prediction #1254

Open
kmirijan opened this issue Apr 11, 2022 · 2 comments
Open

Spatiol Temporal Prediction #1254

kmirijan opened this issue Apr 11, 2022 · 2 comments

Comments

@kmirijan
Copy link

Hey everyone, this is less of a feature request and more of a question. I hope this is the right place to ask this.

I am working on a project where I attempt to predict heroin overdose rates is the US by county using spreg, and I've run into 2 issues.

The first issue is that spreg models don't seem to have a predict function, so I can't actually predict with it. I'll be honest, I don't even know if the models in the spreg package are the right ones for this problem anyway.

The other problem I'm running into that I can't find a way to use both spatial and temporal components at the same time.
If I arrange by data like this, I can't take calculate spatial weights since you can't do that with coincident points, which is what happens when you have multiple years of data for the same county

County Death Rate Median Income Year geometry
Alameda 120 20000 2010 Polygon(...)
Alameda 121 20100 2011 Polygon(...)
Los Angeles 98 45000 2010 Polygon(...)
Los Angeles 99 45100 2011 Polygon(...)
... ... ... ... ...

An example of a model I could use here that didn't take into account spatial effects is OLSRegimes.

If I do try to have spatial effects, then I can't have the data split by year, as I need to ensure the the same geometric point doesn't come up twice in the data, and the data has to look like this

County Death Rate 2010 Median Income 2010 Death Rate 2011 Median Income 2011 geometry
Alameda 120 20000 121 20100 Polygon(...)
Los Angeles 98 45000 99 45100 Polygon(...)
... ... ... ... ...

And the models I've used so far like GM_Error_Het just haven't performed well so far.

Is there something I'm doing wrong here? Am I using the wrong models? Is something about my data not right? Or does pysal simply not support this functionality and I have to look elsewhere.

@ljwolf
Copy link
Member

ljwolf commented Apr 12, 2022

Hey @kmirijan! Thanks for the issue.

On prediction, we generally suggest people create the predictions themselves. For the models that don't have "lag" in their title, you should be able to construct predictions using the data matrix and the .betas attribute using matrix algebra (X_new @ regression.betas).

On the other problem, you need to use dummy/one-hot encoded variables to express your year effects the way you request. Check out pandas.get_dummies, for example, on how to do this.

Finally, you indeed might be using the wrong models. The spatial econometric methods in spreg generally are not focused on maximizing predictive accuracy the way a machine learning model might. Instead, they're interested in recovering "correct" estimates for parameters that are either biased (or made inefficient) by spatial dependence. A model like GM_Error_Het is trained to analyze data and account for the joint effects of heteroskedasticity and spatial dependence in the error term.

I'd suggest checking out our book, chapters 11 and 12, for a full treatment treatment and discussion of how you can use geographic methods to improve prediction in other typical prediction-oriented models.

@kmirijan
Copy link
Author

I've actually been reading the book! It's very good. I've been following Chapter 11 pretty closely. Although I haven't looked at Chapter 12 all that much.

I'm a bit confused as to what you mean by expressing year effects with pandas.get_dummies? I've used it before, but I don't understand how the data would need to look to capture the year effects. Could you elaborate a bit more?

And in your opinion, what pysal models would work best for this kind of spatio-temporal prediction?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants