Optimization Modules - Class Structure:-01 Jun 2018
In this blog post, I will be describing the class structure that can be potentially used to include the Optimizers in the TMVA submodule. The already existing API needs to be modified a bit to account for the Optimizers. I will be describing the class structure and also the changes that needs to be done to the existing API.
Atlast, after discussing with the mentors, I finally found the correct design solution for including the Optimizers in the existing framework and the design seems to be feasible.
Current workflow of one step of Optimizer’s update:
In the current workflow, the class MethodDL creates an instance of the class TDLGradientDescent as minimizer and it is initialized. For performing one step of the update, the minimizer’s Step() function is called. The minimizer’s Step() function in turn calls the class TDeepNet’s Forward(), Backward() and Update() functions. The class TDeepNet’s Update() function iterates through its layers and calls the class VGeneralLayer’s Update() function for each layer. And the class VGeneralLayer’s Update() function is the one which actually does the update of the weights and biases with the corresponding weightGradients and biasGradients through its UpdateWeights() and UpdateBiases() functions.
Design Issues and its solutions:
1) How to identify the type of update in the class VGeneralLayer like SGD or Adam etc ?
The current implementation by default performs only the schocastic gradient descent update. And there is no way of identifying the type of update depending on the type of Optimizer. So the solution would be to pass the reference to the Optimizer object from the class MethodDL down to the class VGeneralLayer.
2) Where to store the additional variables like sum of past gradients and other parameters specific to each Optimizer ?
Since these additional variables are needed for performing the updates, they need to be stored in the same class which actually performs the update. The current implementation actually performs the update in the class VGeneralLayer. So as per the current implementation, if we store all the additional variables in the class VGeneralLayer, that would look clumsy and strange since that is not the responsibility of the class VGeneralLayer. So I have decided to perform the actual update in the Optimizer class and store additional variables in it. This would look a bit cleaner.
3) Should we need to move the update function from the class GeneralLayer to Optimizer class ?
Yes, we should move the update function to the Optimizer class since its the responsibility of the Optimizer to perform such an update.
4) Do we really need to test the convergence in the Optimizer class itself or move it to the train() method of class MethodDL ?
The current implementation actually has the test for convergence in the Optimizer class itself. But testing for convergence is not the responsibility of the Optimizer class but is the responsibility of the training procedure. So the better option is to move the test for convergence to the train() method of class MethodDL.
Modified workflow of one step of Optimizer’s update:
In the modified workflow, the class MethodDL creates an instance of the class TSGD ( i.e. specific type of optimizer as mentioned in the option ) as minimizer and it is initialized. Now for performing one step of the update, we actually pass the reference to the minimizer object to the class DeepNet’s Update() function. The class DeepNet’s Update() function iterates through each of its layer objects and pass the same reference of the minimizer object to the class VGeneralLayer’s Update() function. And the class VGeneralLayer’s Update() function now performs the update with the use of minimizer’s Step() function. So depending on the type of minimizer object used to perform the update, the corresponding update is performed. Here, the class TSGD’s Step() function is the one which actually does the update of the weights and biases with the corresponding weightGradients and biasGradients through its UpdateWeights() and UpdateBiases() functions.
In class MethodDL :
1) Modify the Struct TTrainingSettings to include the option for optimizer.
2) Create an instance of the particular type of optimizer in the Train() method based on the option specified. If no option is specified, default choice would be to use Adam optimizer.
In file Functions.h :
1) Create a enum class EOptimizer with various optimizers.
1) Create a base class TOptimizer with the basic functions. ( Refer to the class diagram below. )
2) Create various classes like Class TSGD, Class TAdam etc for each optimizer extending from the base Class TOptimizer.
3) Re-implement the existing Class TDLGradientDescent with the new design as Class TSGD.
I hope that this post helps to get a clear understanding of the new design and its workflow. And atlast I can start coding :D . So my next goal is to re-implement the basic stochastic gradient descent with my new design along with unit tests and make sure everything works good. So, from my next post, I’ll describe more related to coding :P