The advent of GPUs in Deep Learning

Evolution of Deep Learning - 2

Hello people!

Talking about the evolution of Deep Learning doesn’t make sense if we don’t talk about the GPUs. The advent of GPUs in Deep Learning completely changed the state of the field. GPUs are one of the main reasons behind the boom that we are seeing today in AI. Without any further ado, let’s get started

What are GPUs?

Graphics Processing Units (GPUs)[1][2][3] are specially designed chips for video games in the ’90s. The major GPU manufacturers at present are Nvidia, AMD, Intel and etc. GPUs have many task-specific advantages over the general CPUs. The main application of GPUs is to render images at high frame rates. They are fast in matrix operations like multiplication and division. The computations that occur at pixel/ voxel-level are independent of each other, which makes them have high parallel processing capacity. These are very efficient in image processing tasks with high memory bandwidth. The detailed architecture of GPUs is out of the scope of this blog, but let’s see why deep learning needed GPUs and what dramatic changes they brought in the field in the next sections.

The biggest need in Deep Learning

According to Goodfellow et al. [1], which is considered as the Bible of Deep Learning, the philosophy of connectionism is the basis for deep learning. It means that one neuron won’t be able to show Intelligence, but a group of neurons, a very large group, may be able to perform Intelligent tasks. This suggests the fact that for better performance, networks must be large with many neurons. And networks that vast are very data-hungry, this means a lot of data is required.

Although we knew that we need large networks and more data for better performance, there was still the biggest need for the advancement of AI at a fast pace. Major breakthroughs like perceptron and convolutional neural networks are made in the 20th century itself. But it took so many years for where we are. Back then no one was able to expand the architecture of LeNet. One answer to all of these questions is the lack of enough computational capacity.

For example, assume a 256 x 256 x 3 Image. Flattening it makes a 196608 x1 matrix. And large datasets contain thousands and millions of images, so imagine the huge pile of numbers here. To process data that huge, single-CPU are not sufficient. Hence to satisfy the computational needs many researchers tried optimizing algorithms and trying with smaller models, but this made the performance of the model suffer. Hence here comes the biggest need for better performance of deep learning: High Computation Capacity. The need is satisfied by modern GPUs. I will explain the relation between DL and GPUs in the next section.

Neural Networks == Gamers

The title of this section suggests that both neural networks and gamers need GPUs. We know why gamers need GPUs because they developed GPUs for them. Similarly, GPUs and NN are also related. The main characteristic of neural networks is all neurons in a layer are independent of each other [1]. This means, they are suitable for parallel processing, in which GPUs are much good like we discussed in earlier sections. Hence the capabilities of GPUs like high parallel processing capacity and high memory bandwidth makes them efficient for training neural nets.

This intuition is experimentally proved in Steinkraus et al. [5], where they trained a DNN on GPU for the first time in 2005 and proved they are faster than CPUs (Both single and multiple). In 2009, Raina et al.[4] trained deeper networks with millions of parameters and proved that GPUs can be used for complex architectures. Their work provided many insights like data parallelism and they stated that deep networks are training 70 times faster than dual-core CPUs. With this, the need for deep learning is satisfied and from then major developments took place. Recently OpenAI launched its GPT-3 model made up of 175 Billion parameters. Networks this complex are only possible of having no limit on computational capacities.

Ease of Usage

GPUs are not easy to code, they work at low clock rate with 1000’s of threads. Only few with very great understanding were able to code in beginning. But with the introduction of General Purpose-GPUs by Nvidia which are easily codable with a C-type of language called CUDA. Similar to CUDA, one more framework called OpenCL came for using GPUs for general purpose. If there were no GP-GPUs, maybe only a few would be using GPUs today.

Even with the advent of GP-GPUs and CUDA, writing efficient algorithms by keeping GPU architectures and flow of algorithms in mind is a very difficult task. This problem is addressed by frameworks like TensorFlow, PyTorch, Theano. Nowadays people are using multiple GPUs for training large models, by using techniques like data parallelism. So even though GPUs were difficult to handle at first, eventually they became accessible to many people today.

Hence, this is how the advent of GPUs changed the whole field.Finally, thanks to 20th-century gamers, indirectly they are one of the reasons behind the Evolution of GPUs which led to the “Evolution of Deep Learning” and the advanced developments in Artificial Intelligence we see today. I also acknowledge all those people from whose work, today I am able to gain some knowledge and share with you people. Thanks to them. I cited their work in the references section.

Thanks for reading. Feel free to give a review and suggestions. Please visit my blog for more interesting posts coming soon and a free subscription:

https://backpropogation.blogspot.com/

Cheers!!!

References

Ian Goodfellow and Yoshua Bengio and Aaron Courville. “Deep learning”. MIT Press, 2016. https://www.deeplearningbook.org/
https://en.wikipedia.org/wiki/Graphics_processing_unit
https://course.fast.ai/gpu_tutorial.html
Raina, Rajat, Anand Madhavan, and Andrew Y. Ng. “Large-scale deep unsupervised learning using graphics processors.” Proceedings of the 26th annual international conference on machine learning. 2009.
Steinkraus, Dave, Ian Buck, and P. Y. Simard. “Using GPUs for machine learning algorithms.” Eighth International Conference on Document Analysis and Recognition (ICDAR’05). IEEE, 2005.
Cireşan, Dan Claudiu, et al. “Deep, big, simple neural nets for handwritten digit recognition.” Neural computation 22.12 (2010): 3207–3220.
Photos from Unsplash

Search This Blog

Back-Prop