# Adam and RMSProp Optimizer - Implementation and Testing:-

In this blog post, I’ll be explaining the implementation of the Adam Optimizer, RMSProp optimizer with and without momentum approach.

## RMSProp Optimizer:

RMSprop is an unpublished, adaptive learning rate method proposed by Geoff Hinton. The main idea is “Divide the gradient by a running average of its recent magnitude”. It is similar to Adadelta but it is developed independently to overcome the disadvantages of the Adagrad algorithm.

Thus, the update is implemented as follows, ( similar to the tensorflow implementation )

Vt = rho * Vt-1 + (1-rho) * currentSquaredGradients
Wt = momentum * Wt-1 + (learningRate * currentGradients) / (sqrt(Vt + epsilon))
theta = theta - Wt


So, one step of update is performed as,

## Testing RMSProp:

I used the same unit tests approach as for SGD optimizer. Have a look at Testing the SGD optimizer post.

The above figures shows the convergence of the training and testing errors for the RMSProp optimizer without and with momentum during the unit tests.

Adaptive Moment Estimation (Adam) is a method that computes adaptive learning rates for each parameter. It stores both the decaying average of the past gradients $m_t$, similar to momentum and also the decaying average of the past squared gradients $v_t$, similar to RMSprop and Adadelta. Thus, it combines the advantages of both the methods. Adam is the default choice of the optimizer for any application in general.

Thus, the update is implemented as follows, ( similar to the tensorflow implementation )

Mt = beta1 * Mt-1 + (1-beta1) * currentGradients
Vt = beta2 * Vt-1 + (1-beta2) * currentSquaredGradients
alpha = learningRate * sqrt(1 - beta2^t) / (1-beta1^t)
theta = theta - alpha * Mt / (sqrt(Vt) + epsilon)


So, one step of update is performed as,