「ゼロから作るDeep Learning」のその先へ... TensorFlowを使いこなす

「ゼロから作るDeep Learing」を読了しました。

ゼロから作るDeep Learning ―Pythonで学ぶディープラーニングの理論と実装

作者: 斎藤康毅
出版社/メーカー: オライリージャパン
発売日: 2016/09/24
メディア: 単行本（ソフトカバー）
この商品を含むブログ (11件) を見る

深層学習の概観は掴めたので、次は自分で手を動かしてみたくなります。ライブラリは色々ありますが今回はTensorFlowで「ゼロから作るDeep Learning」の次の一歩を踏み出してみたいと思います。

TensorFlow

TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them.

https://www.tensorflow.org

TensorFlowは深層学習に特化したものではなく、データフローグラフを使った数値計算のためのライブラリです。

GoogleのSatoさんのDistributed TensorFlowの話によると、Google Brainチームで培った成果をオープンソースソフトウェアとして一般公開したもの、とのこと。

詳細までは追っていないのですが、Distributed TensorFlowに拠ると、クラスタを組んでTensorFlowを実行することもサポートされているようです。

Tensorflow Tutorial

ざっと概観を掴んだので、次はチュートリアルを見てみます。

チュートリアルは「Build a Softmax Regression Model」と「Build a Multilayer Convolutional Network」の2つのパートに分かれています。

（ちょっと図が雑ですが）Build a Softmax Regression Modelのイメージ図です。

f:id:kotaroito2002:20170316154146p:plain

入力Xと重みWを掛けたものに、バイアスbを足すという単純なモデルです。

書かれているコードを繋ぎ合わせるとこんな感じになります。なお、tensorflowは0.12.1 を用いています。

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

x = tf.placeholder(tf.float32, shape=[None, 784])
y_ = tf.placeholder(tf.float32, shape=[None, 10])

W = tf.Variable(tf.zeros([784,10]))
b = tf.Variable(tf.zeros([10]))
y = tf.matmul(x,W) + b

cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, y_))
train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
        
    for i in range(1000):
      batch = mnist.train.next_batch(100)
      train_step.run(feed_dict={x: batch[0], y_: batch[1]})
        
        
    correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
    print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

accuracyは0.9151でした。

なお、mnist.train.next_batch(100) から得られるデータはMNISTの画像をnumpy.ndarrayに変換したものです。

batch = mnist.train.next_batch(100)
print(type(batch)) # <class 'tuple'>
print(type(batch[0])) # <class 'numpy.ndarray'>
print(batch[0].shape) # (100, 784)
print(batch[1].shape) # (100, 10)

コードと出力を見ると、batch[0]は28x28=784のピクセルデータを格納した行列であることがわかります。行数はbatch sizeの100になっています。

「Build a Multilayer Convolutional Network」はコードをそのままコピーすれば動くし、単なる日本語訳に終始しそうなので割愛します。

Kaggleにチャレンジ

チュートリアルを終えたので、次はデータサイエンティストのためのコンペティションサイトであるKaggleにチャレンジです。

手頃そうな課題としてLeaf Classificationを見つけました。99種類ある葉を分類する問題です。画像は下記のようにグレースケールになっており、訓練データは計990（分類クラスごとに10サンプルずつ）あります。

f:id:kotaroito2002:20170316154050p:plain

Kaggleは画像から抽出した特徴量を用意していますが、それらは使わず元画像データからConvolutional Neural Networkで分類にチャレンジしてみます。

前処理

元画像はサイズがどれもバラバラなので、まずはサイズを合わせるところから始めます。

from PIL import Image
from skimage.transform import rescale, resize, rotate
from skimage.color import gray2rgb, rgb2gray

def load_image(path):        
    image_2d = np.array(Image.open(path))
    image_3d = gray2rgb(image_2d)
    
    return image_3d # np.array
    
def fit_image(image):
    fit_size = 138
    
    # rescale image
    max_size = np.maximum(image.shape[0], image.shape[1])
    scale = fit_size / max_size
    image_3d = rescale(image, scale, mode='reflect') 
    
    # fit
    margin = np.array((fit_size, fit_size)) - image_3d.shape[0:2]
    margin = np.round(margin / 2).astype(int)

    pos_x = (margin[0], margin[0] + image_3d.shape[0])
    pos_y = (margin[1], margin[1] + image_3d.shape[1])
    
    image_norm = np.zeros((fit_size, fit_size, 3))    
    image_norm[pos_x[0]:pos_x[1], pos_y[0]:pos_y[1], :] = image_3d
    
    return image_norm.astype(np.int32)

image = load_image('path/to/image')
image = fit_image(image)

ここでは幅と高さのうち長いほうがfit_size=138になるようrescaleし、正方形になるよう余白をゼロで埋めています。また、余白が上下（あるいは左右）均等になるよう、位置調整もしています。ライブラリはscikit-imageを用いました。

次に学習の出力となるラベルの処理です。Leaf Classificationの訓練データラベルは文字列なので、TensorFlowで扱いやすいよう数値に変換しておきます。scikit-learnのLabelEncoderを使えばよさそうです。

df_train = pd.read_csv('train.csv')

labels = df_train['species'].values

le = preprocessing.LabelEncoder()
le.fit(labels)
le.transform(labels)

array([ 3, 49, 65, 94, 84, 40, 54, 78, 53, 89, 98, 16, 74, 50, 58, 31, 43,
        4, 75, 44, 83, 84, 13, 66, 15,  6, 73, 22, 73, 31, 36, 27, 94, 88,
       12, 28, 21, 25, 20, 60, 84, 65, 69, ....])

TFRecords file

画像とラベルデータの前処理が終わったので、次はこれらをTensorFlowに読み込ませます。メモリに全て展開というわけにはいかないので、TensorFlowではTFRecordsというファイル形式が用意されています。

with tf.python_io.TFRecordWriter(path) as writer:
    image_raw = image.tostring()

    example = tf.train.Example(features=tf.train.Features(feature={
            'label': tf.train.Feature(int64_list=tf.train.Int64List(value=[label])),
            'image_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[image_raw]))}))

    writer.write(example.SerializeToString())

これでTensorFlowで読み込めるようになりました。コードの完全版はgistにあります。

学習

TFRecordsを読み込み、ミニバッチを作成するところからスタートです。

inputs

IMAGE_SIZE = 138
INPUT_SIZE = 128

def inputs(files, distortion=True, batch_params={'size': 10, 'min_after_dequeue': 20}):
    fqueue = tf.train.string_input_producer(files, shuffle=True)
    reader = tf.TFRecordReader()
    key, value = reader.read(fqueue)

    features = tf.parse_single_example(value, features={
        'label': tf.FixedLenFeature([], tf.int64),
        'image_raw': tf.FixedLenFeature([], tf.string),
    })

    label = tf.cast(features['label'], tf.int32)
    image = tf.decode_raw(features['image_raw'], tf.int32)

    image = tf.reshape(image, [IMAGE_SIZE, IMAGE_SIZE, 3])
    image.set_shape([IMAGE_SIZE, IMAGE_SIZE, 3])
    image = tf.cast(image, tf.float32)
    
    if distortion:
        cropsize = random.randint(INPUT_SIZE, INPUT_SIZE + (IMAGE_SIZE - INPUT_SIZE) / 2)
        framesize = INPUT_SIZE + (cropsize - INPUT_SIZE) * 2
        image = tf.image.resize_image_with_crop_or_pad(image, framesize, framesize)
        image = tf.random_crop(image, [cropsize, cropsize, 3])
        image = tf.image.random_flip_left_right(image)
        image = tf.image.random_flip_up_down(image)
        
    one_hot_label = tf.one_hot(label, depth=99, dtype=tf.float32)

    capacity = batch_params['min_after_dequeue'] + 3 * batch_params['size']
    
    images, labels = tf.train.shuffle_batch(
        [image, one_hot_label],
        batch_size= batch_params['size'],
        capacity=capacity,
        min_after_dequeue=batch_params['min_after_dequeue']
    )
    
    images = tf.image.resize_images(images, [INPUT_SIZE, INPUT_SIZE])
    return images, labels

まず、filesとしてTFRecordsのpathのリストを与え、TFRecordファイルを読み込むキューを作成します。

次に訓練データに対してランダムに反転・切り取り処理を行い、バリエーションを増やします。また、tf.one_hotを利用して、ラベルのone-hot encodingを行います。

最後に、tf.train.shuffle_batchでミニバッチをつくります。

capacityは文字通りキューに格納できる最大要素数で、min_after_dequeue はバッチ作成後にキューに残っている最小要素数で、要素のミックスレベルを決めるものです。capacityはmin_after_dequeueとbatch_sizeから決めています。（この数式は確かどこかを参照したはず… ）

inference

モデルはTensorFlowチュートリアルのDeep MNIST for Expertsをベースにしています。

def conv2d(x, W):
  return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')

def max_pool_2x2(x):
  return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],
                        strides=[1, 2, 2, 1], padding='SAME')

def inference(images, keep_prob):
    x = tf.image.rgb_to_grayscale(images)
    x_image = tf.reshape(x, [-1,128, 128,1])

    with tf.variable_scope("conv1") as scope:
        stddev = 2.0 / math.sqrt(5 * 5 * 1)
        W_conv1 = tf.get_variable('weights', [5, 5, 1, 32], initializer=tf.random_normal_initializer(stddev=stddev))
        b_conv1 = tf.get_variable("biases", [32], initializer=tf.constant_initializer(0.0))
        
        h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
        h_pool1 = max_pool_2x2(h_conv1)

    with tf.variable_scope("conv2") as scope:
        stddev = 2.0 / math.sqrt(5 * 5 * 32)
        W_conv2 = tf.get_variable('weights', [5, 5, 32, 64], initializer=tf.random_normal_initializer(stddev=stddev))
        b_conv2  = tf.get_variable("biases", [64], initializer=tf.constant_initializer(0.0))
        
        h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
        h_pool2 = max_pool_2x2(h_conv2)

    with tf.variable_scope("fc1") as scope:
        stddev = 2.0 / math.sqrt(32 * 32 * 64)
        W_fc1 = tf.get_variable('weights', [32 * 32 * 64, 1024], initializer=tf.random_normal_initializer(stddev=stddev))
        b_fc1 = tf.get_variable("biases", [1024], initializer=tf.constant_initializer(0.0))

        h_pool2_flat = tf.reshape(h_pool2, [-1, 32*32*64])
        h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)

        h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)

    with tf.variable_scope("fc2") as scope:
        stddev = 1.0 / math.sqrt(1024)
        W_fc2 = tf.get_variable('weights', [1024, 99], initializer=tf.random_normal_initializer(stddev=0.1))
        b_fc2 = tf.get_variable("biases", [99], initializer=tf.constant_initializer(0.0))

    logits = tf.matmul(h_fc1_drop, W_fc2) + b_fc2

    return logits

チュートリアルでは重みの初期値が0.1でしたが、今回はHeの初期値を利用しています。

また、tf.Variableではなく、tf.get_variableを利用しています。tf.Variableのままだと、inferenceメソッドを呼び出す度に新しい変数が割り当てられてしまうためです。最初はここに気づかずだいぶハマりました。

なお、変数名は下記のように確認することができます。

variables = tf.trainable_variables()
for v in variables:
    print(v.name)

loss

全結合の第一層目にdropoutを入れていますが、overfitting が疑われたのでL2正則化も入れています。

def loss(logits, labels, l2=True):
    loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=labels, logits=logits))
    
    if l2:
        variables = tf.trainable_variables()
        l2_loss = tf.add_n([ tf.nn.l2_loss(v) for v in variables if 'bias' not in v.name ]) * 0.001
        loss = loss + l2_loss
    
    return loss

Learning

あとは学習するだけです。コードが汚いのはご容赦を。

keep_prob = tf.placeholder(tf.float32)

images, labels = inputs(train_files,  batch_params={'size': 64,  'min_after_dequeue': 1000})
validation_images, validation_labels = inputs(validation_files, batch_params={'size': 100, 'min_after_dequeue': 1000})

with tf.variable_scope('inference') as scope:
    logits  = inference(images, keep_prob)

    scope.reuse_variables()
    validation_logits = inference(validation_images, keep_prob)

    cross_entropy = loss(logits, labels)
    train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)
    correct_prediction = tf.equal(tf.argmax(logits,1), tf.argmax(labels,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

    validation_cross_entropy = loss(validation_logits, validation_labels)
    validation_correct_prediction = tf.equal(tf.argmax(validation_logits,1), tf.argmax(validation_labels,1))
    validation_accuracy = tf.reduce_mean(tf.cast(validation_correct_prediction, tf.float32))

    test_images, test_ids = test_inputs(test_files)
    test_logits = inference(test_images, keep_prob)
    test_prediction = tf.nn.softmax(test_logits)

変数を再利用できるよう scope.reuse_variables() を実行しています。詳しくはPROGRAMMER’S GUIDEのSharing Variablesを読むと良いです。

sess = tf.Session()
sess.run(tf.global_variables_initializer())


coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)

for i in range(2000):
    _, loss_value = sess.run([train_step, cross_entropy], feed_dict={keep_prob: 0.5})

    if i%100 == 0:
        train_acc, validation_acc = sess.run([accuracy, validation_accuracy], feed_dict={keep_prob: 1.0})
        print("step %d, loss value %g, training accuracy %g, validation accuracy %g"%(i, loss_value, train_acc, validation_acc))
        
        
coord.request_stop()
coord.join(threads)

sess.close()

コード全文はgistに上げています。

あとはひたすら学習を進めればよいだけのはず。が、なかなか収束しない。。。データを減らせば（過学習にはなるが）収束はするので、バグではないと思うのですが。。。

まとめ

TensorFlowのチュートリアルをベースにConvolutional Neural Networkをやってみました。 TFRecord、ミニバッチ作成、変数再利用などこれまで知らなかったことを学べました。結果が出てないのは少々残念ですが。

kotaroito's notes