# Conclusion:-

So, I successfully completed my GSoC’18 Project :D. I had a wonderfull summer doing a really cool project. I learned a lot especially the colloborative work culture, object oriented design skills, colloborating and working using GIT and mainly the in-depth understanding of the various deep learning optimization algorithms, the math behind them etc.

I need to thank all my mentors at CERN especially Dr. Stefan Wunch and Dr. Lorenzo Moneta for guiding me throughout my journey!

I will continue contributing to ROOT-TMVA project. I really feel proud to be one of the contributors of the great project ROOT Software framework by CERN.

UPDATE: My code has been successfully integrated into the ROOT production release v6.16 and the complete release notes can be found Here

# List of PRs submitted:-

Here are the list of Pull Requests that are submitted to the root-project/root master branch as a result of the GSoC project.

1) [TMVA] API-Support for SGD Optimizer: PR #2309

3) [TMVA] Add new Evaluation Metric ( meanAbsoluteError between two matrices ): PR #2376

4) [TMVA] Refactor MethodDL Tests for Optimization: PR #2379

7) [TMVA] Add API Support for RMSProp Optimizer: PR #2440

The link to all PRs: Here

# Comparison of various optimizers and future work:-

In this blog post, I will be comparing all the optimizers on the same dataset that is used for performing classification using TMVA.

The above figures show the convergence of the training and testing erros of various optimizers during the integration tests ( methodDL tests ).

## Future Work:

1) Implement other optimizers like Adamax, Nadam and Nesterov accelerated SGD optimizers.
2) Add Weight Decay of learning rate implementation to optimizers.
3) Benchmark the individual optimizers on separate datasets with tensorflow.

# Adam and RMSProp Optimizer - Implementation and Testing:-

In this blog post, I’ll be explaining the implementation of the Adam Optimizer, RMSProp optimizer with and without momentum approach.

## RMSProp Optimizer:

RMSprop is an unpublished, adaptive learning rate method proposed by Geoff Hinton. The main idea is “Divide the gradient by a running average of its recent magnitude”. It is similar to Adadelta but it is developed independently to overcome the disadvantages of the Adagrad algorithm.

Thus, the update is implemented as follows, ( similar to the tensorflow implementation )

Vt = rho * Vt-1 + (1-rho) * currentSquaredGradients
Wt = momentum * Wt-1 + (learningRate * currentGradients) / (sqrt(Vt + epsilon))
theta = theta - Wt


So, one step of update is performed as,

## Testing RMSProp:

I used the same unit tests approach as for SGD optimizer. Have a look at Testing the SGD optimizer post.

The above figures shows the convergence of the training and testing errors for the RMSProp optimizer without and with momentum during the unit tests.

Adaptive Moment Estimation (Adam) is a method that computes adaptive learning rates for each parameter. It stores both the decaying average of the past gradients $m_t$, similar to momentum and also the decaying average of the past squared gradients $v_t$, similar to RMSprop and Adadelta. Thus, it combines the advantages of both the methods. Adam is the default choice of the optimizer for any application in general.

Thus, the update is implemented as follows, ( similar to the tensorflow implementation )

Mt = beta1 * Mt-1 + (1-beta1) * currentGradients
Vt = beta2 * Vt-1 + (1-beta2) * currentSquaredGradients
alpha = learningRate * sqrt(1 - beta2^t) / (1-beta1^t)
theta = theta - alpha * Mt / (sqrt(Vt) + epsilon)


So, one step of update is performed as,