An Introduction to Transfer Learning
Lorien Pratt published the first known paper on transfer learning in 1993. Since then, there has been a lot of research in this space. With time, it has made deep learning more straightforward. It’s importance and value escalated so much that in 2016, Angre Ng had this to say about transfer learning:
“Transfer learning will become a key driver of machine learning success in the industry.”
–Andrew Ng, 2016 Conference on Neural Information Processing Systems
What is transfer learning?
Transfer learning is a method wherein a model developed for a particular task is used as a starting point for another task. By model here, we mean a neural network that is trained with data and knowledge gained while solving one problem. For example, the knowledge gained in learning to recognize crocodiles can be used to recognize alligators because they have a lot of features in common.
What is a pre-trained model?
The pre-trained model is a model used by someone else to solve a problem that is similar in nature to our problem. Building a model from scratch is usually time-consuming. So, to avoid it, you can reuse the model- which was used to solve a similar problem- as the starting point. Pre-trained models are usually not 100% effective, but they will serve as a good starting point and save the time and effort of starting from scratch.
How can I use the pre-trained model?
Most deep learning consulting firms have started using pre-trained models for their clients. Let’s take the example of image recognition and understand how we can use pre-trained models. There are four ways of using it:
- As a classifier
The pre-trained model obtained from the source is directly used to classify images.
- As a standalone feature extractor
The pre-trained model is used for image pre-processing and extracting the relevant features only.
- As an integrated feature extractor
One or more pre-trained models are integrated into a new model, but during training, the layers of the pre-trained model are frozen.
- For weight initialization
One or more pre-trained models are integrated into a new model, and the layers of the pre-trained model are trained in tandem with the new model.
Ways to fine-tune the data model
We can give new datasets to fine-tune the model. Below are the three most accepted methods of data tuning:
- Feature extraction
We use the model only for feature extraction. Here, we remove the output layer and then use the entire network as a fixed feature extractor for the new data set.
- Model architecture extraction
We use the architecture of the model, but we make sure to initialize all the weights randomly and train the model according to our dataset again.
- Partial fine-tuning
Partial fine-tuning calls for freezing the weights of the initial layers of the model while we retrain the higher layers.
Remember that transfer learning is still in its budding stage, and to implement it’s better to hire deep learning solutions experts. You can contact AISmartz if you’re looking to implement robust deep learning solutions.