2018년 3월 15일 목요일

ddl-tensorflow에 포함된 DDL을 이용한 example python code : ddl-mnist.py


전의 posting에서는 DDL (distributed deep learning) option을 이용한 caffe-ibm 사용법에 대해 적었습니다.  이번에는 ddl-tensorflow에 대한 내용입니다. 

Caffe와는 달리 tensorflow는 python으로 app code를 짜야 하는데, DDL을 이용한 python code 작성을 위한 example code도 일부 제공됩니다.  가장 간단한 MNIST training을 위한 python code가 ddl-tensorflow에 포함되어 있습니다. 

먼저, PowerAI toolkit을 설치합니다.  (여기서는 최신 v5가 아니라 기존 v4를 썼습니다.)

u0017649@sys-92312:~$ dpkg -l | grep mldl
ii  mldl-repo-local                                          4.0.0                                      ppc64el      IBM repository for Deep Learning tools for POWER linux

PowerAI에서 deb 형태로 제공되는 tensorflow와 ddl-tensorflow를 확인하고, apt-get 명령으로 설치합니다.

u0017649@sys-92312:~$ apt-cache pkgnames | grep tensor
ddl-tensorflow
tensorflow

u0017649@sys-92312:~$ sudo apt-get install tensorflow ddl-tensorflow

관련 example code는 아래 directory에 있습니다.    mnist와 slim 관련 2가지가 있습니다.

u0017649@sys-92312:~$ cd /opt/DL/ddl-tensorflow/examples

u0017649@sys-92312:/opt/DL/ddl-tensorflow/examples$ ls -ltr
total 8
drwxr-xr-x 7 root root 4096 Mar 15 08:18 slim
drwxr-xr-x 2 root root 4096 Mar 15 08:18 mnist

u0017649@sys-92312:/opt/DL/ddl-tensorflow/examples$ cd mnist

u0017649@sys-92312:/opt/DL/ddl-tensorflow/examples/mnist$ ls -ltr
total 16
-rw-r--r-- 1 root root  240 Aug  2  2017 README.md
-rw-r--r-- 1 root root 8681 Aug  2  2017 ddl_mnist.py

 MNIST를 위한 README.md를 읽어보면, 그냥 이 code를 mpirun을 이용하여 어떻게 돌리느냐에 대한 사용방법 안내입니다.  여기서는 single-node에 GPU 2장이 설치된 경우이므로 -rf 옵션을 통해 별도의 rank file(rf)을 지정할 필요는 없습니다.  OpenMPI 특성상, 모든 GPU는 독립적인 learner로 처리되므로, 한대의 서버에 장착된 GPU 2장이나, 두대의 서버에 각각 1장씩 장착된 GPU 총 2장이나 topology가 다를 뿐 동일한 방식으로 처리됩니다.

u0017649@sys-92312:/opt/DL/ddl-tensorflow/examples/mnist$ vi README.md
# HOW TO RUN

To run the IBM PowerAI Distributed Deep Learning MNIST training example:

        $ source /opt/DL/ddl-tensorflow/bin/ddl-tensorflow-activate

        $ mpirun -x PATH -x LD_LIBRARY_PATH -x PYTHONPATH -n 2 python ddl_mnist.py


아래는 ddl_mnist.py 전체 내용입니다.


u0017649@sys-92312:/opt/DL/ddl-tensorflow/examples/mnist$ vi ddl_mnist.py
'''
Based on https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3_NeuralNetworks/convolutional_network.py:

A Convolutional Network implementation example using TensorFlow library.
This example is using the MNIST database of handwritten digits
(http://yann.lecun.com/exdb/mnist/)

Author: Aymeric Damien
Project: https://github.com/aymericdamien/TensorFlow-Examples/

Modifications:

*****************************************************************

Licensed Materials - Property of IBM

(C) Copyright IBM Corp. 2017. All Rights Reserved.

US Government Users Restricted Rights - Use, duplication or
disclosure restricted by GSA ADP Schedule Contract with IBM Corp.

*****************************************************************
'''
import tensorflow as tf
import numpy as np

############################################################################
#   IBM PowerAI Distributed Deep Learning (DDL) setup
############################################################################

# Disable GPU memory preallocation
config = tf.ConfigProto()
config.gpu_options.allow_growth = True

############################################################################
#   DDL Initialize BEGIN
############################################################################
# Load DDL operator
ddl = tf.load_op_library('/opt/DL/ddl-tensorflow/lib/ddl_MDR.so')

# DDL initializes MPI on CPU
# ddl.init takes two inputs
# 1) the number of GPUs to utilize on each host in training.
#    this number is not the number of GPUs to use for each leaner. It simply tells DDL that there are X GPUs in each host to be used for training
# 2) DDL options (refer to README for details)
with tf.Session(config=config) as sess:
    with tf.device('/cpu:0'):
        rank, size, gpuid = sess.run(ddl.init(2, mode = '-mode r:2 -dump_iter 100'))

# MPI info and assigned GPU
print [rank, size, gpuid]
############################################################################
#   DDL Initialize END
############################################################################

# Perform all TensorFlow computation within gpuid
with tf.device('/gpu:%d' %gpuid):
    ##############################################################################
    # Import MNIST data

    from tensorflow.examples.tutorials.mnist import input_data
    mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

    # Parameters
    learning_rate = 0.001
    training_iters = 200000
    batch_size = 100
    display_step = 1

    # Network Parameters
    n_input = 784 # MNIST data input (img shape: 28*28)
    n_classes = 10 # MNIST total classes (0-9 digits)
    dropout = 0.75 # Dropout, probability to keep units

    # tf Graph input
    x = tf.placeholder(tf.float32, [None, n_input])
    y = tf.placeholder(tf.float32, [None, n_classes])
    keep_prob = tf.placeholder(tf.float32) #dropout (keep probability)


    # Create some wrappers for simplicity
    def conv2d(x, W, b, strides=1):
        # Conv2D wrapper, with bias and relu activation
    def conv2d(x, W, b, strides=1):
        # Conv2D wrapper, with bias and relu activation
        x = tf.nn.conv2d(x, W, strides=[1, strides, strides, 1], padding='SAME')
        x = tf.nn.bias_add(x, b)
        return tf.nn.relu(x)


    def maxpool2d(x, k=2):
        # MaxPool2D wrapper
        return tf.nn.max_pool(x, ksize=[1, k, k, 1], strides=[1, k, k, 1],
                              padding='SAME')


    # Create model
    def conv_net(x, weights, biases, dropout):
        # Reshape input picture
        x = tf.reshape(x, shape=[-1, 28, 28, 1])

        # Convolution Layer
        conv1 = conv2d(x, weights['wc1'], biases['bc1'])
        # Max Pooling (down-sampling)
        conv1 = maxpool2d(conv1, k=2)

        # Convolution Layer
        conv2 = conv2d(conv1, weights['wc2'], biases['bc2'])
        # Max Pooling (down-sampling)
        conv2 = maxpool2d(conv2, k=2)

        # Fully connected layer
        # Reshape conv2 output to fit fully connected layer input
        fc1 = tf.reshape(conv2, [-1, weights['wd1'].get_shape().as_list()[0]])
        fc1 = tf.add(tf.matmul(fc1, weights['wd1']), biases['bd1'])
        fc1 = tf.nn.relu(fc1)
        # Apply Dropout
        fc1 = tf.nn.dropout(fc1, dropout)

        # Output, class prediction
        out = tf.add(tf.matmul(fc1, weights['out']), biases['out'])
        return out


    # Store layers weight & bias
    weights = {
        ############################################################################
        #   DDL BROADCAST BEGIN
        ############################################################################
        # This step ensures that all learners start with the same initial parameters

        # 5x5 conv, 1 input, 32 outputs
        'wc1': tf.Variable(ddl.bcast(tf.random_normal([5, 5, 1, 32]))),
        # 5x5 conv, 32 inputs, 64 outputs
        'wc2': tf.Variable(ddl.bcast(tf.random_normal([5, 5, 32, 64]))),
        # fully connected, 7*7*64 inputs, 1024 outputs
        'wd1': tf.Variable(ddl.bcast(tf.random_normal([7*7*64, 1024]))),
        # 1024 inputs, 10 outputs (class prediction)
        'out': tf.Variable(ddl.bcast(tf.random_normal([1024, n_classes])))
        ############################################################################
        #   DDL BROADCAST END
        ############################################################################
    }

    biases = {
        'bc1': tf.Variable(ddl.bcast(tf.random_normal([32]))),
        'bc2': tf.Variable(ddl.bcast(tf.random_normal([64]))),
        'bd1': tf.Variable(ddl.bcast(tf.random_normal([1024]))),
        'out': tf.Variable(ddl.bcast(tf.random_normal([n_classes])))
    }

    # Construct model
    pred = conv_net(x, weights, biases, keep_prob)

    # Define loss and optimizer
    cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y))
    optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate)


    ############################################################################
    #   DDL ALLREDUCE BEGIN
    ############################################################################

    # Collect the gradients and the corresponding parameters w.r.t the given cost
    grads_and_vars = optimizer.compute_gradients(cost)

    # Separate out the tuple
    grads, vars = zip(*grads_and_vars)

    # This step takes the average of the gradients on all the learners
    grads_and_vars_ddl = zip(ddl.all_reduce_n(grads, op='avg'), vars)

    # Update the parameters with the averaged gradient
    objective = optimizer.apply_gradients(grads_and_vars_ddl)

    ############################################################################
    #   DDL ALLREDUCE END
    ############################################################################

    # Evaluate model
    correct_pred = tf.equal(tf.argmax(pred, 1), tf.argmax(y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
    ##############################################################################

def split(a, n):
    k, m = divmod(len(a), n)
    return (a[i * k + min(i, m):(i + 1) * k + min(i + 1, m)] for i in xrange(n))

# Launch the graph
with tf.Session(config=config) as sess:
    sess.run(tf.global_variables_initializer())
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:

        # Each learner will read batch_size*size samples and
        # use only the portion correspoding to the current learner (or rank)

        batch_x, batch_y = mnist.train.next_batch(batch_size*size)

        batch_x = np.split(batch_x,size)[rank]
        batch_y = np.split(batch_y,size)[rank]

        # Run optimization op (backprop)
        sess.run(objective, feed_dict={x: batch_x, y: batch_y,
                                       keep_prob: dropout})
        if step % display_step == 0:
            # Calculate batch loss and accuracy
            loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
                                                              y: batch_y,
                                                              keep_prob: 1.})
            print("MPI "+str(rank)+"] Iter " + str(step*batch_size) + ", Minibatch Loss= " + \
                  "{:.6f}".format(loss) + ", Training Accuracy= " + \
                  "{:.5f}".format(acc))
        step += 1

    print("MPI "+str(rank)+"] Optimization Finished!")

    # Calculate accuracy for 256 mnist test images
    print("MPI "+str(rank)+"] Testing Accuracy:", \
        sess.run(accuracy, feed_dict={x: mnist.test.images[:256],
                                      y: mnist.test.labels[:256],
                                      keep_prob: 1.}))


위에서 언급된 또다른 example인 slim을 포함한 example code들 directory의 tar 파일을 아래 link에 올려두었습니다. 


위 링크에 올린 파일 내용은 아래의 ddl-examples.tgz이며, 그 속에 들어있는 파일은 아래와 같습니다.

u0017649@sys-92312:/opt/DL/ddl-tensorflow$ sudo tar -zcvf ddl-examples.tgz doc examples
doc/
doc/README-API.md
doc/README.md
doc/LICENSE.pdf
doc/images/
doc/images/clones2.png
doc/images/cifar10_overview.png
examples/
examples/slim/
examples/slim/WORKSPACE
examples/slim/__init__.py
examples/slim/nets/
examples/slim/nets/__init__.py
examples/slim/nets/resnet_v1_test.py
examples/slim/nets/nets_factory_test.py
examples/slim/nets/alexnet.py
examples/slim/nets/inception_utils.py
examples/slim/nets/vgg.py
examples/slim/nets/mobilenet_v1.png
examples/slim/nets/vgg_test.py
examples/slim/nets/inception_v4_test.py
examples/slim/nets/resnet_utils.py
examples/slim/nets/inception_v2.py
examples/slim/nets/nets_factory.py
examples/slim/nets/mobilenet_v1.py
examples/slim/nets/inception_v1.py
examples/slim/nets/inception_resnet_v2.py
examples/slim/nets/inception_v2_test.py
examples/slim/nets/inception_v1_test.py
examples/slim/nets/resnet_v2.py
examples/slim/nets/alexnet_test.py
examples/slim/nets/inception_v4.py
examples/slim/nets/inception_v3.py
examples/slim/nets/inception_resnet_v2_test.py
examples/slim/nets/inception_v3_test.py
examples/slim/nets/resnet_v1.py
examples/slim/nets/inception.py
examples/slim/nets/mobilenet_v1_test.py
examples/slim/nets/overfeat.py
examples/slim/nets/overfeat_test.py
examples/slim/nets/cifarnet.py
examples/slim/nets/resnet_v2_test.py
examples/slim/nets/lenet.py
examples/slim/nets/mobilenet_v1.md
examples/slim/train-inception_v3.sh
examples/slim/download_and_convert_data.py
examples/slim/train-cifar10.sh
examples/slim/preprocessing/
examples/slim/preprocessing/__init__.py
examples/slim/preprocessing/preprocessing_factory.py
examples/slim/preprocessing/lenet_preprocessing.py
examples/slim/preprocessing/cifarnet_preprocessing.py
examples/slim/preprocessing/inception_preprocessing.py
examples/slim/preprocessing/vgg_preprocessing.py
examples/slim/README.md
examples/slim/eval_image_classifier.py
examples/slim/scripts/
examples/slim/scripts/train_lenet_on_mnist.sh
examples/slim/scripts/finetune_resnet_v1_50_on_flowers.sh
examples/slim/scripts/finetune_inception_v1_on_flowers.sh
examples/slim/scripts/finetune_inception_resnet_v2_on_flowers.sh
examples/slim/scripts/finetune_inception_v3_on_flowers.sh
examples/slim/scripts/train_cifarnet_on_cifar10.sh
examples/slim/export_inference_graph_test.py
examples/slim/slim_walkthrough.ipynb
examples/slim/deployment/
examples/slim/deployment/__init__.py
examples/slim/deployment/model_deploy_test.py
examples/slim/deployment/model_deploy.py
examples/slim/train-alexnet.sh
examples/slim/BUILD
examples/slim/datasets/
examples/slim/datasets/__init__.py
examples/slim/datasets/cifar10.py
examples/slim/datasets/dataset_utils.py
examples/slim/datasets/download_and_convert_flowers.py
examples/slim/datasets/download_and_convert_cifar10.py
examples/slim/datasets/dataset_factory.py
examples/slim/datasets/imagenet.py
examples/slim/datasets/mnist.py
examples/slim/datasets/flowers.py
examples/slim/datasets/download_and_convert_mnist.py
examples/slim/setup.py
examples/slim/train_image_classifier.py
examples/slim/export_inference_graph.py
examples/mnist/
examples/mnist/README.md
examples/mnist/ddl_mnist.py

댓글 없음:

댓글 쓰기