快速上手：用你自己的數據集快速打造一個圖像識別器

02-02

昨天教了你們如何尋找數據集（沒看到的同學戳這裡：

自己學習深度學習時，有哪些途徑尋找數據集？www.zhihu.com

那麼今天，我們來玩一玩實戰，如何用自己的數據集逐步打造一個簡單但強大的圖像識別器。

我們會使用 TensorFlow 創建一個卷積神經網路作為我們的圖像識別器。但如果只用我們自己的照片去打造這個圖像識別器，我們需要從不同的角度為數據對象拍攝很多張照片，這樣做就非常麻煩，因此我們不去一張張的拍照片了，而是直接拍一段視頻。

https://www.zhihu.com/video/932600427841609728

拍完視頻後，我們會從視頻中提取視頻幀來創建我們自己的訓練數據集。既可以使用 FFmpeg從視頻中提取幀，操作代碼如下：

ffmpeg -i < video_filename > -vf fps=1 output_image%d.jpgn

Mac 系統和 Windows 系統都可以用 FFmeg。要運行上面的代碼，我們需要在 Mac 中打開終端（或 Windows 中的命令提示符窗口），導航到視頻文件的位置，然後執行代碼。FFmeg 或讀取視頻文件，為視頻中的每一秒創建圖像。這些圖像會導出名為 output_image1.png, output_image2.png 等等這樣的文件。

我想預測一張給定的照片是不是香蕉（靈感來自美劇《矽谷》中 Jing Yang 的 Not Hotdog APP），所以我給放在桌子上的一串香蕉拍了一段視頻，又拍了些其它物件的視頻，比如橘子、我的茶杯等等。

https://www.zhihu.com/video/932600753323884544

選什麼物件都可以，這不重要，只需要確保拍攝時只拍攝一個物體即可。

為了讀取我們的照片文件，我們需要將數據以正確的順序排列。為此，我們要創建一個文件夾，命名為 images，在該文件夾中，我們再創建另外兩個文件夾：banana 和 not_a_banana。我們會把所有的香蕉照片放在文件夾 banana 中，被其它照片放在文件夾not_a_banana 中。你也可以把視頻文件放在相應的文件夾內，在文件夾中運行 FFmeg。下面是我創建的文件目錄截圖：

現在我們已經把所有照片保存在正確的文件夾中，是時候開始寫代碼了！

在下面的代碼中，我創建了兩個函數。第一個 prepare_data 函數讀取照片目錄，然後將其轉換為兩個列表，包含照片文件位置和照片標籤。第二個函數 read_image_array 讀取照片，然後將它們轉換為各自的像素值。我用了 tf.image.decode_jpeg 讀取列表中的照片，並將結果保存在一個 TensorFlow 數組對象中。

我還寫了一個叫做 read_single_image 的函數，它能用來讀取單個照片。我們用上面這些函數做預測。

import tensorflow as tfnfrom tensorflow.python.platform import gfilenimport numpy as npnimport osnndef prepare_data(img_dir):nn image_dirs = np.array([dirpath for (dirpath, dirnames, filenames) in gfile.Walk(os.getcwd()+/+img_dir)])n file_list = []n y_= []n# Ignoring the first directory as it is the base directoryn for image_dir in image_dirs[1:]:n extensions = [jpg, jpeg, JPG, JPEG]n dir_name = os.path.basename(image_dir)n image_file_list =[]n tf.logging.info("Looking for images in " + dir_name + "")n for extension in extensions:n # Building the filename patternn file_glob = os.path.join(image_dir,*. + extension)n #This looks for a file name pattern, helps us ensure that only jpg extensions are choosenn image_file_list = gfile.Glob(file_glob)n file_list.extend(image_file_list)n y_.extend([dir_name]*len(image_file_list))nn return file_list,y_nnnndef read_image_array(image_loc_array):n resized_image_array=[]nn for image_loc in image_loc_array:n image_decoded = tf.image.decode_jpeg(tf.gfile.FastGFile(image_loc, rb).read(),channels=3)n resized_image = tf.reshape(tf.image.resize_images(image_decoded, [28,28]),[1,28*28*3])n resized_image_array.append(resized_image)nn resized_image_array = tf.reshape(tf.stack(resized_image_array),[len(image_loc_array),28*28*3])n return resized_image_arraynnnnnndef read_single_image(image_loc):n image_decoded = tf.image.decode_jpeg(tf.gfile.FastGFile(image_loc, rb).read(),channels=3)n resized_image = tf.reshape(tf.image.resize_images(image_decoded, [28,28]),[1,28*28*3])n return resized_imagen

等我們創建好自己的數據預處理函數後，就該搭建我們的卷積神經網路了。在本文案例中，我們會搭建一個簡單的雙層神經網路帶有一個 dropout 層（dropout 層是一個正則化矩陣，隨機的設置輸入值為零來避免過擬合）。我用了交叉熵作為損失函數。

此外我還添加了一個層，它可以讀取單張照片並預測照片是不是香蕉。

import tensorflow as tfnfrom tensorflow.python.platform import gfilenimport osnimport numpy as npnimport argparsenimport sysnfrom sklearn import preprocessingnfrom read_image import prepare_data,read_image_array,read_single_imagennndef conv2d(x, W):n return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding=SAME)nndef max_pool_2x2(x):n return tf.nn.max_pool(x, ksize=[1, 2, 2, 1],strides=[1, 2, 2, 1], padding=SAME)nnndef weight_variable(shape):n initial = tf.truncated_normal(shape, stddev=0.1)n return tf.Variable(initial)nndef bias_variable(shape):n initial = tf.constant(0.1, shape=shape)n return tf.Variable(initial)nnndef main(_):nn x = tf.placeholder(tf.float32, shape=[None, 2352])n y_ = tf.placeholder(tf.float32, shape=[None, 2])nn # First Convolution and Pooling Layernn conv_weight_1 = weight_variable([5, 5, 3, 31])n conv_bias_1 = bias_variable([31])nn x_image = tf.reshape(x, [-1, 28, 28, 3])n conv_1_1 = conv2d(x_image, conv_weight_1)n conv_1 = tf.nn.relu(conv2d(x_image, conv_weight_1) + conv_bias_1)n pool_1 = max_pool_2x2(conv_1)nn # Second Convolution and Pooling layernn conv_weight_2 = weight_variable([5, 5, 31, 64])n conv_bias_2 = bias_variable([64])nn conv_2 = tf.nn.relu(conv2d(pool_1, conv_weight_2) + conv_bias_2)n pool_2 = max_pool_2x2(conv_2)nn # First fully connected layernn fc_weight_1 = weight_variable([7 * 7 * 64, 1024])n fc_bias_1 = bias_variable([1024])nn pool_2_flat = tf.reshape(pool_2, [-1, 7*7*64])n fc_1 = tf.nn.relu(tf.matmul(pool_2_flat, fc_weight_1) + fc_bias_1)nn # A drop out layern keep_prob = tf.placeholder(tf.float32)n custom_fc1_drop = tf.nn.dropout(fc_1, keep_prob)nn # Second custom fully connected layern fc_weights_2 = weight_variable([1024,2])n fc_bias_2 = bias_variable([2])n fc_2 = tf.matmul(fc_1, fc_weights_2) + fc_bias_2nn y_conv = fc_2nn cross_entropy = tf.reduce_mean(n tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_conv))nn train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)nn correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))nn accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))nn with tf.Session() as sess:nn sess.run(tf.global_variables_initializer())n file_list, y_image_label = prepare_data(FLAGS.image_dir)nn le = preprocessing.LabelEncoder()n y_one_hot = tf.one_hot(le.fit_transform(y_image_label),depth=2)nn x_feed = sess.run(read_image_array(file_list))n y_feed = sess.run(y_one_hot)nn for i in range(75):n if i % 10 == 0:n train_accuracy = accuracy.eval(feed_dict={n x:x_feed , y_: y_feed, keep_prob: 1.0})n print(step %d, training accuracy %g % (i, train_accuracy))n train_step.run(feed_dict={x:x_feed , y_: y_feed, keep_prob: 0.8})n predicted = tf.argmax(y_conv, 1)nn if FLAGS.predict_image != "":n x_single_img = sess.run(read_single_image(FLAGS.predict_image))n print(You got %s%le.inverse_transform(sess.run(predicted,feed_dict={x:x_single_img}))[0])nnnnif __name__ == __main__:n parser = argparse.ArgumentParser()n parser.add_argument(n --image_dir,n type=str,n default=images,n help=Path to folders of labeled images.n )n parser.add_argument(n --predict_image,n type=str,n default="",n help=Unknown imagen )n FLAGS, unparsed = parser.parse_known_args()n tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)n

現在我們已經編寫萬卷積神經網路了，我們來運行一下，看看效果如何。

python train_the_bananas.py?—?predict_image banana.jpgn

我直接從網上找了一張香蕉照片作為測試樣本。

不好意思拿錯了，其實應該是這個↓↓↓

這是我得到的輸出結果：

可以看到，我們用自己的數據集訓練的模型能很好地識別圖像。

但是呢，這只是個很簡單的卷積神經網路，當成一款產品肯定是還不夠格。如果找一張比較複雜的照片，模型就識別不出了。我會研究其他稍微複雜的方法讓模型識別出更複雜的圖像，敬請期待我的下一篇文章。

對於本文的所有項目文件，可以點擊這裡獲取，趕緊動手自己打造一個識別器吧：

sanghapriya/not_a_bananagithub.com

小黃人團隊表示這篇文章很贊。

參考資料：https://medium.com/@sangho/how-to-build-a-image-recogniser-using-your-own-dataset-22bb9f806e1d