Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modify rosie adapter to load files acording to starting year #476

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

lucastt
Copy link

@lucastt lucastt commented May 25, 2019

What is the purpose of this Pull Request?
Make it possible to to specify the starting year for the loading step of rosie algorithms. With this changes will be easier to contribute to the project with low end machines.

What was done to achieve this purpose?
Was added a function to compare the year of the CSV being loaded and the parameter STARTING_YEAR. If the year of the file is greater or equal it will be loaded, else it won't.

How to test if it really works?
During the load process verify that only the files referring to years after the starting year or in the starting year were loaded.

@cuducos
Copy link
Collaborator

cuducos commented May 25, 2019

I would raise a warning here. Some classifiers, such as the meal outlier, depends on the full dataset to be more accurate (it takes into account a longitudinal analysis of how much was spent in each restaurant to find the threshold for outliers). Thus I can only agree with the proposed change if we make sure classifiers with these feature are loading a model generated with the full dataset — and checking for that is complex.

I do think this might add too much complexity. I'm not sure it would worth it the risk to offer less accurate suspicions as output…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants