tl;dr: No secret: ConvNets are still complex even when you compare across APIs that sound similar for four deep learning frameworks. Here, you'll find an attempt to compare simple ConvNets in these frameworks. Also, included is a little ConvNet conceptual breakdown. Lots of reference code.

Posted: 2018-05-13

Introduction

I've found recently that the Sequential classes and Layer/Layers modules are names used across Keras, PyTorch, TensorFlow and CNTK - making it a little confusing to switch from one framework to another. I was also curious how easy it would be to use these modules/APIs in each framework to define the same Convolutional neural network (ConvNet).

Let's get through some terminology, first. You can skip to the Code if you are already familiar with ConvNets on images. Note, the code originates from projects working with MNIST handwritten digits dataset.

The neural network architecture used in this post is as follows.

  1. Convolutional layer
  2. Max pooling layer
  3. Convolutional layer
  4. Max pooling layer
  5. Fully connected or dense layer with 10 outputs and softmax activation (to get probabilities)

A convolutional layer creates a feature map (using a filter or kernel, which I like to refer to as a "flashlight", shinning on the image and stepping through with a sliding window of 1 unit, that's a stride of 1, by the way). A good reference for this is in the CNTK Tutorial.

Convolutional layer Source

A pooling layer is a way to subsample an input feature map, or output from the convolutional layer that has already extracted salient features from an image in our case.

Pooling Source

A fully connected layer is defined such that every input unit is connected to every output unit much like the multilayer perceptron.

Dense layer Source

Not represented in the code below, but important nonetheless, is dropout. Dropout removes a percentage of the neuron connections - helping to prevent overfitting by reducing the feature space for convolutional and, especially, dense layers.

Dropout Source

Remember, the power of a convolutional layer is that we don't have to do much upfront raw image processing. The layer(s) will subsequently find the most salient features for us.

In this post you will find ConvNets defined for four frameworks with adaptations to create easier comparisons (please leave comments as needed). The full example code can be found as a Jupyter notebook - Ref.

The Frameworks

Keras

Below is a ConvNet defined with the Sequential model in Keras (Ref). This is a snippet with only the model definition parts - see the References for the full code example.

"""
Adapted from:
https://github.com/keras-team/keras/blob/master/examples/mnist_cnn.py
"""

import keras
from keras.models import Sequential
from keras.layers import Dense, Flatten
from keras.layers import Conv2D, MaxPooling2D

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 strides=(1, 1),
                 padding='same',
                 activation='relu',
                 input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, kernel_size=(3, 3),
                 strides=(1, 1),
                 padding='same',
                 activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(num_classes, activation='softmax'))

sgd = SGD(lr=0.05, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='categorical_crossentropy', 
              optimizer=sgd,
              metrics=['accuracy'])

What you don't see is:

  • Fit/train (model.fit())
  • Evaluate with given metric (model.evaluate())
  • To add dropout after the Convolution2D() layer (or after the fully connected in any of these examples) a dropout function will be used, e.g., Dropout(0.5)
  • Sometimes another fully connected (dense) layer with, say, ReLU activation, is added right before the final fully connected layer.

PyTorch

Below is a ConvNet defined with the Sequential container in PyTorch (Ref). This is a snippet with only the model definition parts - see the References for the full code example.

import torch
import torch.nn as nn
import torch.nn.functional as F

class ConvNetPyTorch(nn.Module):
    """Adapted from:
    https://github.com/rasbt/deep-learning-book/blob/master/code/model_zoo/pytorch_ipynb/convnet.ipynb
    """
    def __init__(self, num_classes=10):
        super(ConvNetPyTorch, self).__init__()
        self.layer1 = nn.Sequential(
            # 28x28x1 => 28x28x32
            nn.Conv2d(in_channels=1,
                      out_channels=32,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      padding=1), # (1(28-1) - 28 + 3) / 2 = 1
            nn.ReLU(),
            # 28x28x32 => 14x14x32
            nn.MaxPool2d(kernel_size=(2, 2),
                         stride=(2, 2),
                         padding=0)) # (2(14-1) - 28 + 2) = 0    
        self.layer2 = nn.Sequential(
            # 14x14x32 => 14x14x64
            nn.Conv2d(in_channels=32,
                      out_channels=64,
                      kernel_size=(3, 3),
                      stride=(1, 1),
                      padding=1), # (1(14-1) - 14 + 3) / 2 = 1   
            nn.ReLU(),
            # 14x14x64 => 7x7x64
            nn.MaxPool2d(kernel_size=(2, 2),
                         stride=(2, 2),
                         padding=0)) # (2(7-1) - 14 + 2) = 0
        self.linear_1 = nn.Linear(7*7*64, num_classes)

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = out.reshape(out.size(0), -1)
        logits = self.linear_1(out.view(-1, 7*7*64))
        probas = F.softmax(logits, dim=1)
        return logits, probas

model = ConvNetPyTorch(num_classes).to(device)

What you don't see is:

  • Fit/train (model.train())
  • Evaluate with given metric (model.eval())
  • To add dropout after the nn.ReLU() layer (or even after the fully connected in any of these examples) a dropout function will be used, e.g. nn.Dropout(0.5)
  • Sometimes another fully connected (dense) layer with, say, ReLU activation, is added right before the final fully connected layer.

Tensorflow

Below is a ConvNet defined with the Layers library and Estimators API in TensorFlow (Ref). This is a snippet with only the model definition parts - see the References for the full code example.

import tensorflow as tf

# Create the neural network
def convNetTensorFlow(x_dict, n_classes, reuse, is_training):
    """Adapted from:
    https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py
    """
    # Define a scope for reusing the variables
    with tf.variable_scope('ConvNet', reuse=reuse):
        # TF Estimator input is a dict, in case of multiple inputs
        x = x_dict['images']

        # MNIST data input is a 1-D vector of 784 features (28*28 pixels)
        # Reshape to match picture format [Height x Width x Channel]
        # Tensor input become 4-D: [Batch Size, Height, Width, Channel]
        x = tf.reshape(x, shape=[-1, 28, 28, 1])

        # Convolution Layer with 32 filters and a kernel size of 5
        conv1 = tf.layers.conv2d(x, 32, 3, activation=tf.nn.relu)
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        conv1 = tf.layers.max_pooling2d(conv1, 2, 2)

        # Convolution Layer with 64 filters and a kernel size of 3
        conv2 = tf.layers.conv2d(conv1, 64, 3, activation=tf.nn.relu)
        # Max Pooling (down-sampling) with strides of 2 and kernel size of 2
        conv2 = tf.layers.max_pooling2d(conv2, 2, 2)

        # Flatten the data to a 1-D vector for the fully connected layer
        fc1 = tf.contrib.layers.flatten(conv2)

        # Output layer, class prediction
        logits = tf.layers.dense(fc1, n_classes, activation=None)

    return logits

"""...[snipped for brevity]"""

# Build the Estimator
model = tf.estimator.Estimator(model_fn)

What you don't see is:

  • Fit/train (model.train())
  • Evaluate with given metric (model.evaluate())
  • To add dropout after the tf.layers.conv2d() layer (or even after the fully connected in any of these examples) a dropout function will be used, e.g. tf.layers.dropout(inputs=net_layer, rate=0.5, training=is_training)
  • Sometimes another fully connected (dense) layer with, say, ReLU activation, is added right before the final fully connected layer.

For more see tensorflow in the References below.

Cognitive Toolkit (CNTK)

Below is a ConvNet defined with the Layer API in CNTK (Ref). This is a snippet with only the model definition parts - see the References for the full code example (Note: as of this writing CNTK is Windows or Linux only)

import cntk as C

def convNetCNTK(features, num_output_classes):
    """https://cntk.ai/pythondocs/CNTK_103D_MNIST_ConvolutionalNeuralNetwork.html"""
    with C.layers.default_options(init=C.glorot_uniform(), activation=C.relu):
        model = C.layers.Sequential([
            C.layers.For(range(2), lambda i: [
                C.layers.Convolution((3,3), [32,64][i], pad=True),
                C.layers.MaxPooling((2,2), strides=(2,2))
                ]),
            C.layers.Dense(64),
            C.layers.Dense(out_dims, activation=None)
        ])

    return model(features)

What you don't see is:

  • Fit/train (trainer = C.Trainer() and trainer.train_minibatch())
  • Evaluate with given metric (out = C.softmax() and out.eval())
  • To add dropout after the C.layers.Convolution() layer (or even after the fully connected in any of these examples) a dropout function will be used, e.g. C.layers.Dropout(0.5).
  • Sometimes another fully connected (dense) layer with, say, ReLU activation, is added right before the final fully connected layer.

Conclusion

No real conclusion except to say these frameworks do pretty much the same sorts of things and all have different API layers, high-level to low-level. I did find Keras to be the easiest and just as fast as TensorFlow - albeit with a lot abstracted away, which is generally good if one needs a quick jumping off point. The benchmarks are not official in full code sample notebook.

The full code samples are in this Jupyter Notebook. Certainly some room for improvement in code and benchmarking so if you have any ideas, please leave a comment.

References

Samples adapted in this post:

  1. Keras code sample with Sequential model Ref
  2. PyTorch code sample with Sequential container Ref
  3. TensorFlow code sample with Layers and Estimators APIs Ref and ConvNets Tutorial at this Doc
  4. CNTK code sample with Layer API Doc

A great book from which I took some of the concepts written in this post: Book and Code

Even more nice code samples:

Thanks for reading.

Appendix

Nice explanation of tensor layouts (PyTorch vs. TensorFlow) in a PyTorch forum post by Mamy Ratsimbazafy (Post):

Furthermore there might be a difference due to the Tensor layouts:

PyTorch use NCHW and Tensorflow uses NHWC, NCHW was the first layout supported by 
CuDNN but presents a big challenge for optimization (due to access patterns in 
convolutions, memory coalescing and such …).
NHWC is easier to optimize for convolutions but suffer in linear layers iirc 
because you have to physically transpose/permute the dimensions.

Furthermore, due to it’s dynamic nature, PyTorch allocate new memory at 
each new batch while Tensorflow can just reuse previous memory locations 
since size is known in advance.

Memory is THE bottleneck in Deep Learning not CPU, the big challenge is how 
to feed data fast enough to the CPU and GPU to get the maximum GFLOPS throughput.