Bootstrap Aggregation #229

jk1015 · 2022-07-07T13:24:46Z

Following on from the discussion in issue #199 here is an implementation of Bootstrap Aggregation following the approach laid out there. There is also an example showing how the provided methods can be used together with the existing linfa-trees package to implement a Random Forest.

Outside of the linfa-ensemble package the only required change was to factor the trait FromTargetArray to create an additional trait FromTargetArrayOwned. This was done to allow owned data to be used without needing to specify a lifetime, which was impossible under the previous FromTargetArray trait.

-Added an example using bootstrap aggregation to carry out Random Forest classification. -Factored the linfa trait FromTargetArray to create an additional trait FromTargetArrayOwned.

algorithms/linfa-ensemble/src/ensemble.rs

codecov-commenter · 2022-07-16T06:07:08Z

Codecov Report

Base: 55.44% // Head: 55.11% // Decreases project coverage by -0.33% ⚠️

Coverage data is based on head (46722ec) compared to base (d4bd9c9).
Patch coverage: 0.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##           master     #229      +/-   ##
==========================================
- Coverage   55.44%   55.11%   -0.34%     
==========================================
  Files          95       97       +2     
  Lines        8774     9014     +240     
==========================================
+ Hits         4865     4968     +103     
- Misses       3909     4046     +137

Impacted Files	Coverage Δ
algorithms/linfa-ensemble/src/ensemble.rs	`0.00% <0.00%> (ø)`
src/dataset/impl_dataset.rs	`41.95% <ø> (ø)`
src/dataset/impl_targets.rs	`21.31% <ø> (ø)`
src/dataset/mod.rs	`87.93% <ø> (-1.04%)`	⬇️
algorithms/linfa-trees/src/decision_trees/iter.rs	`0.00% <0.00%> (-25.00%)`	⬇️
...thms/linfa-trees/src/decision_trees/hyperparams.rs	`19.56% <0.00%> (-6.53%)`	⬇️
algorithms/linfa-linear/src/glm/mod.rs	`55.55% <0.00%> (-2.99%)`	⬇️
algorithms/linfa-kernel/src/lib.rs	`61.39% <0.00%> (-1.88%)`	⬇️
...lgorithms/linfa-clustering/src/optics/algorithm.rs	`48.57% <0.00%> (-1.73%)`	⬇️
...rithms/linfa-clustering/src/k_means/hyperparams.rs	`43.33% <0.00%> (-1.67%)`	⬇️
... and 40 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

algorithms/linfa-ensemble/src/ensemble.rs

YuhanLiin · 2022-07-16T06:12:33Z

algorithms/linfa-ensemble/src/ensemble.rs

+}
+
+pub struct EnsembleLearnerParams<P> {
+    pub ensemble_size: usize,


Why do we need a separate field for ensemble_size? Isn't this value implied by bootstrap_proportion?

ensemble_size gives the number of models in the ensemble while bootstrap_proportion gives the proportion of the total number of training samples that should be given to each model for training. These should be distinct parameters.

Shouldn't bootstrap_proportion be the same as 1/ensemble_size?

Not necessarily, each model in the ensemble just needs its own random set of samples of training data from the complete training data set. There are no constraints on the size of this set other than it being non-empty, so we let the user tune this size as a hyperparameter.

OK so bootstrap_samples just grabs random sets of samples from the input and yields them infinitely. I thought it divided the input into random subsamples. This makes sense now.

Can you add this behaviour to the docs, along with a general description of EnsembleLearner? We should also have top level docs in src/lib.rs like with the other crates.

algorithms/linfa-ensemble/src/ensemble.rs

YuhanLiin · 2022-07-16T07:11:58Z

algorithms/linfa-ensemble/src/ensemble.rs

+        let aggregated_predictions = self.aggregate_predictions(&mut predictions);
+
+        for (target, output) in y_array.axis_iter_mut(Axis(0)).zip(aggregated_predictions.into_iter()) {
+            for (t, o) in target.into_iter().zip(output[0].0.iter()) {
+                *t = *o;
+            }
+        }


Replace with this:

// prediction map has same shape as y_array, but the elements are maps let mut prediction_maps = y_array.map(|_| HashMap::new()); for prediction in predictions { let p_arr = prediction.as_targets(); assert_eq!(p_arr.shape(), y_array.shape()); // Insert each prediction value into the corresponding map Zip::from(&mut prediction_maps).and(&p_arr).for_each(|(&mut map, &val)| map.entry(val).or_insert(0) += 1); } // For each prediction, pick the result with the highest number of votes y_array = prediction_maps.mapv_into(|map| map.iter().max_by_key(|(_, v)| v).0);

It picks out the predictions with the highest number of votes without the complexity of aggregate_predictions

This comment still applies I believe.

algorithms/linfa-ensemble/src/ensemble.rs

HridayM25 · 2023-03-21T10:21:16Z

Hi!
Could you please guide me as to what is remaining in this?
Thank You!

YuhanLiin · 2023-03-22T00:24:27Z

Merge/rebase with the latest master and address the open review comments. That's pretty much it.

-Added bootstrap aggregation for general linfa classifiers.

d4c8658

-Added an example using bootstrap aggregation to carry out Random Forest classification. -Factored the linfa trait FromTargetArray to create an additional trait FromTargetArrayOwned.

YuhanLiin reviewed Jul 16, 2022

View reviewed changes

algorithms/linfa-ensemble/src/ensemble.rs Outdated Show resolved Hide resolved

YuhanLiin reviewed Jul 16, 2022

View reviewed changes

algorithms/linfa-ensemble/src/ensemble.rs Outdated Show resolved Hide resolved

YuhanLiin reviewed Jul 16, 2022

View reviewed changes

algorithms/linfa-ensemble/src/ensemble.rs Outdated Show resolved Hide resolved

YuhanLiin reviewed Jul 16, 2022

View reviewed changes

algorithms/linfa-ensemble/src/ensemble.rs Outdated Show resolved Hide resolved

YuhanLiin reviewed Jul 16, 2022

View reviewed changes

algorithms/linfa-ensemble/src/ensemble.rs Outdated Show resolved Hide resolved

YuhanLiin reviewed Jul 16, 2022

View reviewed changes

YuhanLiin mentioned this pull request Aug 7, 2022

Random Forest and Ensemble Learning #199

Open

James Knight added 2 commits August 25, 2022 09:45

All fixes from PR review other than changes to predict_inplace

4bf178c

updated example

647083a

YuhanLiin reviewed Sep 6, 2022

View reviewed changes

algorithms/linfa-ensemble/src/ensemble.rs Outdated Show resolved Hide resolved

YuhanLiin reviewed Sep 6, 2022

View reviewed changes

algorithms/linfa-ensemble/src/ensemble.rs Outdated Show resolved Hide resolved

YuhanLiin reviewed Sep 6, 2022

View reviewed changes

algorithms/linfa-ensemble/src/ensemble.rs Show resolved Hide resolved

YuhanLiin reviewed Sep 6, 2022

View reviewed changes

algorithms/linfa-ensemble/src/ensemble.rs Outdated Show resolved Hide resolved

Added ParamGuard with consuming builder to EnsembleLearnerParams

46722ec

YuhanLiin mentioned this pull request Oct 22, 2022

Roadmap #7

Open

24 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bootstrap Aggregation #229

Bootstrap Aggregation #229

jk1015 commented Jul 7, 2022

codecov-commenter commented Jul 16, 2022 •

edited

YuhanLiin Jul 16, 2022

jk1015 Sep 27, 2022

YuhanLiin Sep 28, 2022

jk1015 Sep 28, 2022

YuhanLiin Sep 30, 2022

YuhanLiin Sep 30, 2022

YuhanLiin Jul 16, 2022

YuhanLiin Sep 30, 2022

HridayM25 commented Mar 21, 2023

YuhanLiin commented Mar 22, 2023

Bootstrap Aggregation #229

Are you sure you want to change the base?

Bootstrap Aggregation #229

Conversation

jk1015 commented Jul 7, 2022

codecov-commenter commented Jul 16, 2022 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

HridayM25 commented Mar 21, 2023

YuhanLiin commented Mar 22, 2023

codecov-commenter commented Jul 16, 2022 •

edited