Artificial Intelligence and Neural Networks, are not a new concepts! Why, all of a sudden, have them become the next big thing that is changing our life again during this decade?
Deep Learning is changing our life
I’m sure you have doubtlessly noticed quantum leaps in the quality of a wide range of everyday technologies.
In Speech Recognition the transcription of voice to text has experimented amazing advances, and it is already available in different devices. We are increasingly interacting with “our” computers by just talking to them. Also there have been some spectacular advances in Natural Language Processing, for example, by simply clicking on the micro symbol of Google Translate, the system will transcribe what you are dictating to another language. Google Translate now renders spoken sentences in one language into spoken sentences in another for 32 pairs of languages, and offers text translation for more than 100 languages. And advances in Computer Vision are also tremendous, now our computers can recognize images and generate descriptions for photos in seconds. All these three areas are crucial to unleash improvements in robotics, drones, self-driving cars, etc. Artificial Intelligence (AI) is at the heart of today’s technological innovation.
From automatic speech recognition to driverless cars, AI is advancing rapidly and is, undoubtedly, the new way forward. Many of these breakthroughs had been possible by a family of Artificial Intelligence techniques popularly known as Deep Learning (DL). Although the greatest impacts of DL may be obtained when it is integrated into the whole toolbox of other AI techniques. DL can be accompanied by other techniques of AI as Reinforcement Learning, Probabilistic Models, etc.. However enter in this detail is outside the scope of this presentation.
Artificial Intelligence and Neural Networks are not new concepts
John McCarthy coined the term Artificial Intelligence in the 1950s and was one of the founding fathers of Artificial Intelligence, along with Marvin Minsky. Also in 1958 Frank Rosenblatt built a prototype neural net, which he called the Perceptron. Furthermore, the key ideas of Deep Learning for computer vision were already well-understood in 1989; fundamental algorithms to Deep Learning for time series as LSTM, was developed in 1997. The history of the field is marked by periods of hype and high expectations alternating with periods of setback and disappointment. So way did Deep Learning only take off after 2012?
The Data Deluge
The neural networks used for deep learning typically require large data sets for training. One of the key drivers of its current progress is clearly the huge deluge of data available today. Thanks to the advent of Big Data these models can be “trained” by exposing them to large data sets that were previously unavailable. Today, large companies work with image, video datasets, and natural language datasets that could not have been collected without the Internet.
Beyond the increases of the availability of general data, specialist data resources have catalysed progress in this field. For instance the open ImageNet dataset, a freely available database since 2009, of over 10 million hand-labelled images. But what makes ImageNet special is not just its large size, it is the yearly competition associated with, an excellent way to motivate researches and engineers.
The Figure shows accuracy trends for contest winners over the past several years. Whereas traditional image recognition approaches employ hand-crafted computer vision (CV) classifiers trained on a number of instances of each object class, In 2012 Alex Krizhevsky used a deep neural network, now known as AlexNet, which reduced the error rate of the next closest solution by more than 10%. By 2015, the winning algorithm rivalled human capabilities, and today Deep Learning algorithms exceed human capabilities. Imagenet data have been here since 2010, what was the hidden enabler of that achievement in 2012?
New massively parallel hardware: Supercomputing
Because the field is guided by experimental findings rather than by theory, algorithmic advances only become possible when appropriate hardware (and data) is available to try new ideas or scale up old ones. Thanks to Moore’s law, nowadays we can solve problems that would had been intractable some years ago. For instance, my computer in 1982, where I executed my programs, was a Fujitsu that can perform one or two millions of operations per second. Thirty years later, in 2012, the Marenostrum supercomputer was only 1.000.000.000 times fastest :-).
Until then, in 2012, the increase in computational power for every decade of “my” computer, was mainly thanks to CPU. Since then, the increase in computational power for Deep Learning has not only been from CPU improvements, but also from the realization that massively parallel systems as graphical processing units (GPU) were tens of times more efficient than traditional CPUs. Coming back to the previous section, in 2012 Alex Krizhevsky team trained their deep neural network AlexNet with GPUs. Since then, GPUs began to be used and now all the groups use them as shown in the Figure.
Originally, companies like Nvidia and AMD developed these fast and massively parallel chips for powering the graphics for video games. This hardware came to benefit the scientific community, and in 2007 Nvidia launched CUDA, a programming interface for its line of GPUs. Centers like BSC started to use a clusters of CPUs of various highly-parallelizable numerical applications. Deep neural networks, consisting mostly of many small matrix multiplications, are also highly parallelizable, and some researchers as Alex Krizhevsky started writing CUDA implementations of neural nets.
Nowadays, the Deep Learning industry is requiring to go beyond GPUs, and modern graphics processing unit architectures are explicitly designed for Deep Learning algorithms. Later in this course we will enter in more detail about Supercomputing and the new trends that have emerged for Deep Learning.
As a conclusion, COMPUTING POWER is the real enabler!
Democratization of Computing
However, what if I do not have this computing power in my company? Now we are entering into an era of computation democratization for companies! And what is “my/your” computer like now? Something like a data center of 28.000 square meters with thousands of servers inside. Do you know what this means? How many soccer fields that occupies? More than four. Huge, right?
For those “experts” who want to develop their own software, cloud services like Amazon Web Services provide GPU-driven deep-learning computation services. And Google … and IBM… and many cloud providers! And for “less expert” people, various companies are providing a working scalable implementation of ML/AI algorithms as a Service (AI-as-a-Service).
Cloud computing has revolutionized industries by democratizing the computation and completely changing the way businesses operate and now is the turn of change the DL and AI scenario.
An open-source world for Deep Learning community
Few years ago, doing Deep Learning required significant C++ and CUDA expertise. Nowadays, basic Python skills are enough. This has been possible thanks of the plentiful open-source software frameworks that have been available in the field and have greased the innovation process.
Many frameworks have surged in order to ease the task of create and train models. Those frameworks use different programing languages, strategies to train the models or compute the data and different characteristics as distribution or GPU acceleration support. Most of these frameworks are open sourced and their popularity (at Github) can be shown in the Figure (from Francesc Sastre Master Thesis).
A brief descriptions of these Frameworks can be found in this post.
Another crucial element is that an open-publication ethic appeared in the field, whereby many researchers publish their results immediately on one database such as arxiv.org (from Cornell University) without awaiting peer-review approval. This allow to know by everybody the most recent advancements of the work developed by the research community.
From now on, in this course we will go into more detail of how we can train DL models using Keras, Pytorch and TensorFlow programming models. Finally, we will review supercomputing trends in order to satisfy the training of huge Deep Learning models. I hope you enjoy this course.
In reality, this post was intended for my DLAI course’s students, although I think it may be of interest to other students. I am going to share in this blog the teaching material that I am going to generate for the part of DLAI course that will cover the basic principles of Deep Learning from a computational perspective.
update 29/09/17: Slides will be available next week.