Programming Models for Deep Learning

In reality, this post was intended for my DLAI course’s students, although I think it may be of interest to other students. I am going to share in this blog the teaching material that I am going to generate for the part of DLAI course that will cover the basic principles of Deep Learning from a computational perspective.

Recent years, as a result of the increase of the popularity of Deep Learning, many frameworks have surged in order to ease the task of create and train models. Frameworks use different programing languages, strategies to train the models or compute the data and different characteristics as distribution or GPU acceleration support. Most of these frameworks are open sourced and their popularity can be shown in the following Figure from Francesc Sastre:

Here is a brief description of more active and popular frameworks for Deep Learning with their characteristics.


Theano started as a CPU and GPU math compiler in Python similar as Numpy but with faster evaluation and GPU support. It has a Python core and it generates C++ and CUDA code for running . It implements experimental support for multi-GPU model parallelism but for data parallelism it needs external frameworks.


Caffe is a Deep Learning framework based in C++ with Python and MATLAB bindings. Caffe is maintained and developed by the Berkeley Vision and Learning Center (BVLC). Caffe allows experimentation and seamless switching among platforms for ease of development and deployment from prototyping machines to cloud environments. It has not a distributed implentation and needs external tools like CaffeOnSpark to run in a cluster.


Nevertheless, Caffe2 has been recently released directly by Facebook Research. Caffe2 GitHub page describes it as “an experimental refactoring of Caffe that allows a more flexible way to organize computation”. It includes multi-GPU and multi-machine distribution and it’s more focused in production environments being able to work in different devices such mobiles or servers.


Microsoft Computational Network Toolkit (CNTK)or  Cognitive Toolkit is a Deep Learning framework for Windows and Linux. Its core is based on C++/CUDA and it has APIs for C++, C# and Python. It has GPU, multi-GPU and distributed training support. CNTK uses graph computation and allows to easily realize and combine popular model types. It implements stochastic gradient descent (SGD, error backpropagation) and learning with automatic differentiation. CNTK has been available under an open-source license since April 2015. Microsoft also offers CNTK as a service in the Azure Platform6 with GPU instances.


MXNET is a multi-language framework for Machine Learning. It has a C++ core and Python, R, C++ and Julia bindings. It is compatible with CPU and GPUs and natively supports GPU parallelism and distributed execution. MXNET supports declarative and imperative execution and permits the user to choose which want to use. MXNet is computation and memory efficient and runs on various heterogeneous systems, ranging from mobile devices to distributed GPU clusters.


Deeplearning4j is an open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments on distributed GPUs and CPUs. Its core is written in C/C++/CUDA and has a Python API using Keras. It uses Spark/Hadoop for the distribution, it is compatible with multi-GPU and it can run in Amazon AWS Elastic Map Reduce.


Keras is a high-level neural networks API, written in Python and capable of running on top of either TensorFlow or Theano. The user can change the backend (TensorFlow/Theano) without changing the code. The Keras code can be mixed with the code of the backend. Keras uses the different characteristics offered by the available backends and it is compatible with multi-GPU and distributed run.

PyTorch and Torch

Torch is a flexible Machine Learning framework with LuaJIT interface launched in 2012. Its core is based in C and it has OpenMP and CUDA implementations for high performance requirements. It has modules for distributed training[92]. Recently, PyTorch was launched. It is based in Python and is more focused on Deep Learning. It maintains the functions of Torch and adds new tools about Deep Learning such as networks, data loading, optimizers and training. We will present in more detail this middleware.


TensorFlow is a Machine Learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general purpose GPUs, and custom designed ASICs known as Tensor Processing Units (TPUs).  TensorFlow has Python, Java, Go and C++ bindings and it supports distributed training using model parallelism and data parallelism. This framework is maintained by Google and it has a great support from the community as we see in the previous Figure.

More frameworks

There are several other deep learning frameworks that leverage the Deep Learning SDK, including Kaldi, Lasagne(Theano), Leaf, MatConvNet, SoooA, Chainer, and more.

My thanks to Francesc Sastre for helping me with the preparation of this post, based in his Master Thesis.

2017-09-27T10:08:29+00:00 September 24th, 2017|