Convex-Optimizer-in-Deep-Learning

Authors notice:

DL got developed thanks to the a variaty of techniques such as layer, gradient update, initialization, non-linear, and normalization, but optimization in DL are usually non-convex problem.

GD: using full batch of dataset, and easier to stuck into local minima (cost functions in DL are always not that flat). Also large computation requires;

SGD: using single data, and this kind of strategy introduced larger noise so that the training can jump out from saddle points. This method tends to be slow, and performances will have dramatical vibrataion.

MiniBatch GD: using a certain part of dataset. Usually it's not intuitive to find a optimal value for batch size. Too large Batch size limited by GPU storge, but too small Batch can also limited by time.

1st Order: SGD, Adagrad, Adam..., Batch size would be around hundreds; 2nd Order: Conj-gradient, Newton, Quasi-Newton, L-BFGS... the errors would be skyrocketing if the 1st order have obviously errors; Even though the smaller batch can speed-up convergence, it also introduced much noise which will adversly affects the performance. So batch size would be thousands or more for 2nd order optimization methods;

In the other hand, smaller batch are not that stable, so I always increased the learning rate to fix this gap, otherwise the model could not convergent since the malignant oscillation.

Experience from my DL book:

slightly increase batch size step-by-step;
using SGD near the end of training;
small batch to introduce noise at the earlier stage, and large batch to get rid of oscillation later.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Convex optimizations.py		Convex optimizations.py
Optimization Methods in Deep Learning.pdf		Optimization Methods in Deep Learning.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convex optimizations.py

Convex optimizations.py

Optimization Methods in Deep Learning.pdf

Optimization Methods in Deep Learning.pdf

README.md

README.md

Repository files navigation

Convex-Optimizer-in-Deep-Learning

About

Releases

Packages

Languages

vince-CV/optimization-in-deep-learning

Folders and files

Latest commit

History

Repository files navigation

Convex-Optimizer-in-Deep-Learning

About

Topics

Resources

Stars

Watchers

Forks

Languages