New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve parallelisation of evaluations #481
Comments
Ping @tomMoral, to join the conversation |
The proposed pattern couples the code that perform the evaluation (run the code + parallelization) from the process that decide the split. I would recommend to further decouple them, in light of what scikit-learn does, so that the API is similar, making it easy to grasp the various concepts. Basically, the
Taking back the memory = joblib.Memory(location="__cache__")
class Evaluation:
def __init__(
self,
...
n_nodes=1, # number of data chunks to load in memory in parallel.
n_jobs=1, # number of jobs per data chunk. One job fits one pipeline on one fold.
cv="intersubject",
):
self.n_nodes = n_nodes
self.n_jobs = n_jobs
if isinstance(cv, str): # make it easy if you want default parameters for cv
cv = CV_CLASSES[cv]()
self.cv= cv
def process(self, pipelines, datasets):
results = Parallel(n_jobs=self.n_jobs)(
delayed(self.process_split)(p, d, metadata, train_idx, test_idx)
for p in pipelines for d in datasets
for (train_idx, test_idx) in self.cv.split(d)
)
return pd.DataFrame(results)
@memory.cache
def process_split(self, clf, dataset, metadata, split_args, train_idx, test_idx):
clf = deepcopy(clf)
X_train, X_test, y_train, y_test, metadata = self.paradigm.get_data(
**datachunk_args, train_idx, test_idx
)
clf.fit(X_train, y_train)
score = clf.score(X_test, y_test)
return {'metadata': datachunk_args, 'clf': clf, 'score': score} Note that I changed the manual caching to use |
Thanks @tomMoral for your feedback!! But not sure if this would completely work because have some quite specific constraints:
This is why I proposed this nested parallelism. Maybe an in-between would be to implement |
After discussions at the braindecode code sprint and following up on #460, I think we should break down the evaluations into something like that:
This would remove all the for loops we have in the different evaluations and allow for larger parallelisation.
The text was updated successfully, but these errors were encountered: