目標檢測Tensorflow object detection API之構建自己的模型

05-10

構建自己的模型之前，推薦先跑一下Tensorflow object detection API的demo

JustDoIT：目標檢測Tensorflow object detection API?

zhuanlan.zhihu.com

比較喜歡杰倫和奕迅，那就來構建檢測他們的模型吧

1.準備訓練數據和測試數據

D:python3models-master
esearchobject_detection新建一個名為images的文件夾

再images文件下創建兩個文件夾，一個名為train，另一個名為test，文件結構如下圖

train裡面有杰倫的55張圖片和奕迅的30張圖片（其實本來打算都弄100張圖片的，無奈宿舍網速不好）

test裡面有杰倫的10張圖片和奕迅的10張圖片

圖片命名格式均是image+數字，圖片類型是jpg格式，如圖

對每一張圖片做標籤，生成包含該圖片標籤以及位置信息的xml文件，推薦一款小軟體LabelImg,方便快捷做標籤

tzutalin/labelImg?

github.com

點擊所圈處然後下載最新的版本，然後解壓，解壓完如圖

打開該應用程序，界面如圖

點擊左上角所圈處的Open Dir然後選擇對應目錄並點擊右下角所圈處的選擇文件夾，我這裡選擇的是train文件（test文件夾也要執行和train文件夾一眼的操作，即生成xml和record文件），效果如圖

然後點擊左邊的Create RectBox按鈕，然後圈出杰倫，會跳出對話框，輸入標籤，我這裡輸的是ZJL，如果是陳奕迅就輸入CYX，效果如圖

點擊ok然後再點左邊的save按鈕即可生成對應該圖片的xml文件，效果如圖

最後的效果如圖

然後把所有的xml集合成csv文件，需要用到Python代碼來實現，代碼如下，把如下代碼複製粘貼到一個python文件里

只需修改三處，第一第二處改成對應的文件夾目錄，第三處改成對應的文件名，這裡是train.csvos.chdir(D:\python3\models-master\research\object_detection\images\train)path = D:\python3\models-master\research\object_detection\images\trainxml_df.to_csv(train.csv, index=None)import osimport globimport pandas as pdimport xml.etree.ElementTree as ETos.chdir(C:\Users\87703\Desktop\picture\test)path = C:\Users\87703\Desktop\picture\testdef xml_to_csv(path): xml_list = [] for xml_file in glob.glob(path + /*.xml): tree = ET.parse(xml_file) root = tree.getroot() for member in root.findall(object): value = (root.find(filename).text, int(root.find(size)[0].text), int(root.find(size)[1].text), member[0].text, int(member[4][0].text), int(member[4][1].text), int(member[4][2].text), int(member[4][3].text) ) xml_list.append(value) column_name = [filename, width, height, class, xmin, ymin, xmax, ymax] xml_df = pd.DataFrame(xml_list, columns=column_name) return xml_dfdef main(): image_path = path xml_df = xml_to_csv(image_path) xml_df.to_csv(train.csv, index=None) print(Successfully converted xml to csv.)main()

運行上述代碼，生成如下圖所示的csv文件

因為Tensorflow object detection API的輸入數據格式是TFRcords Format格式的，所以我們要把csv文件轉化成record文件,先把上面生成的train.csv和test.csv複製粘貼到D:python3models-master
esearchobject_detectiondata，如圖

然後需要用到Python代碼來實現csv到record的轉換，代碼如下，把如下代碼複製粘貼到一個D:python3models-master
esearchobject_detection 下的名為geberate_TFR.py文件里

"""Usage: # From tensorflow/models/ # Create train data:python generate_TFR.py --csv_input=data/train.csv --output_path=data/train.record # Create test data:python generate_TFR.py --csv_input=data/test.csv --output_path=data/test.record 需要修改三處 os.chdir(D:\python3\models-master\research\object_detection\) path = os.path.join(os.getcwd(), images/train) def class_text_to_int(row_label): #對應的標籤返回一個整數，後面會有文件用到 if row_label == ZJL: return 1 elif row_label == CYX: return 2 else: None"""import osimport ioimport pandas as pdimport tensorflow as tffrom PIL import Imagefrom object_detection.utils import dataset_utilfrom collections import namedtuple, OrderedDictos.chdir(D:\python3\models-master\research\object_detection\)flags = tf.app.flagsflags.DEFINE_string(csv_input, , Path to the CSV input)flags.DEFINE_string(output_path, , Path to output TFRecord)FLAGS = flags.FLAGS# TO-DO replace this with label mapdef class_text_to_int(row_label): if row_label == ZJL: return 1 elif row_label == CYX: return 2 else: Nonedef split(df, group): data = namedtuple(data, [filename, object]) gb = df.groupby(group) return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]def create_tf_example(group, path): with tf.gfile.GFile(os.path.join(path, {}.format(group.filename)), rb) as fid: encoded_jpg = fid.read() encoded_jpg_io = io.BytesIO(encoded_jpg) image = Image.open(encoded_jpg_io) width, height = image.size filename = group.filename.encode(utf8) image_format = bjpg xmins = [] xmaxs = [] ymins = [] ymaxs = [] classes_text = [] classes = [] for index, row in group.object.iterrows(): xmins.append(row[xmin] / width) xmaxs.append(row[xmax] / width) ymins.append(row[ymin] / height) ymaxs.append(row[ymax] / height) classes_text.append(row[class].encode(utf8)) classes.append(class_text_to_int(row[class])) tf_example = tf.train.Example(features=tf.train.Features(feature={ image/height: dataset_util.int64_feature(height), image/width: dataset_util.int64_feature(width), image/filename: dataset_util.bytes_feature(filename), image/source_id: dataset_util.bytes_feature(filename), image/encoded: dataset_util.bytes_feature(encoded_jpg), image/format: dataset_util.bytes_feature(image_format), image/object/bbox/xmin: dataset_util.float_list_feature(xmins), image/object/bbox/xmax: dataset_util.float_list_feature(xmaxs), image/object/bbox/ymin: dataset_util.float_list_feature(ymins), image/object/bbox/ymax: dataset_util.float_list_feature(ymaxs), image/object/class/text: dataset_util.bytes_list_feature(classes_text), image/object/class/label: dataset_util.int64_list_feature(classes), })) return tf_exampledef main(_): writer = tf.python_io.TFRecordWriter(FLAGS.output_path) path = os.path.join(os.getcwd(), images/test) #20180418做了修改 examples = pd.read_csv(FLAGS.csv_input) grouped = split(examples, filename) for group in grouped: tf_example = create_tf_example(group, path) writer.write(tf_example.SerializeToString()) writer.close() output_path = os.path.join(os.getcwd(), FLAGS.output_path) print(Successfully created the TFRecords: {}.format(output_path))if __name__ == __main__: tf.app.run()

然後再「開始-Anaconda3-Anaconda Prompt」調出命令行，改變工作目錄至 models-master
esearchobject_detection，輸入下面命令行

轉換train.csv對應的是
python generate_TFR.py --csv_input=data/train.csv --output_path=data/train.record
轉換test.csv對應的是

python generate_TFR.py --csv_input=data/test.csv --output_path=data/test.record

出現下圖即為轉換成功

到此，數據的準備動作已經完成

2.配置文件和模型

進入 Object Detection github尋找目標模型

tensorflow/models?

github.com

我這裡選擇的是ssd_mobilenet_v1_coco.config ，點擊打開並複製裡面的代碼到新建的名為ssd_mobilenet_v1_coco.config的文件里，並在D:python3models-master
esearchobject_detection目錄下新建一個名為traning的文件夾，並把ssd_mobilenet_v1_coco.config放到train文件夾中，如下圖

用文本編輯器打開ssd_mobilenet_v1_coco.config文件，如下所示

# SSD with Mobilenet v1 configuration for MSCOCO Dataset.# Users should configure the fine_tune_checkpoint field in the train config as# well as the label_map_path and input_path fields in the train_input_reader and# eval_input_reader. Search for "PATH_TO_BE_CONFIGURED" to find the fields that# should be configured.需修改5處1、train_input_reader: { tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record" } label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"}這的input_path是訓練數據的路徑，改為對應的路徑，這裡是input_path:data/train.record這的label_map_path是label路徑，這裡是label_map_path:data/ZJL_CYX.pbtxt2、eval_input_reader: { tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record" } label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" shuffle: false num_readers: 1}這的input_path是訓練數據的路徑，改為對應的路徑，這裡是input_path:data/test.record這的label_map_path是label路徑，這裡是label_map_path:data/ZJL_CYX.pbtxt3、ssd { num_classes: 90 box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } }num_classes是標籤類別數，這裡只有杰倫和奕迅，所以 num_classes: 24、train_config: { batch_size: 24 optimizer { rms_prop_optimizer: { learning_rate: { exponential_decay_learning_rate { initial_learning_rate: 0.004 decay_steps: 800720 decay_factor: 0.95 } } momentum_optimizer_value: 0.9 decay: 0.9 epsilon: 1.0 }}batch_size是每次迭代的數據數，我這裡設為15、fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt" from_detection_checkpoint: true這兩行注釋掉或者刪除掉，否則會運行很慢model { ssd { num_classes: 90 box_coder { faster_rcnn_box_coder { y_scale: 10.0 x_scale: 10.0 height_scale: 5.0 width_scale: 5.0 } } matcher { argmax_matcher { matched_threshold: 0.5 unmatched_threshold: 0.5 ignore_thresholds: false negatives_lower_than_unmatched: true force_match_for_each_row: true } } similarity_calculator { iou_similarity { } } anchor_generator { ssd_anchor_generator { num_layers: 6 min_scale: 0.2 max_scale: 0.95 aspect_ratios: 1.0 aspect_ratios: 2.0 aspect_ratios: 0.5 aspect_ratios: 3.0 aspect_ratios: 0.3333 } } image_resizer { fixed_shape_resizer { height: 300 width: 300 } } box_predictor { convolutional_box_predictor { min_depth: 0 max_depth: 0 num_layers_before_predictor: 0 use_dropout: false dropout_keep_probability: 0.8 kernel_size: 1 box_code_size: 4 apply_sigmoid_to_scores: false conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { train: true, scale: true, center: true, decay: 0.9997, epsilon: 0.001, } } } } feature_extractor { type: ssd_mobilenet_v1 min_depth: 16 depth_multiplier: 1.0 conv_hyperparams { activation: RELU_6, regularizer { l2_regularizer { weight: 0.00004 } } initializer { truncated_normal_initializer { stddev: 0.03 mean: 0.0 } } batch_norm { train: true, scale: true, center: true, decay: 0.9997, epsilon: 0.001, } } } loss { classification_loss { weighted_sigmoid { } } localization_loss { weighted_smooth_l1 { } } hard_example_miner { num_hard_examples: 3000 iou_threshold: 0.99 loss_type: CLASSIFICATION max_negatives_per_positive: 3 min_negatives_per_image: 0 } classification_weight: 1.0 localization_weight: 1.0 } normalize_loss_by_num_matches: true post_processing { batch_non_max_suppression { score_threshold: 1e-8 iou_threshold: 0.6 max_detections_per_class: 100 max_total_detections: 100 } score_converter: SIGMOID } }}train_config: { batch_size: 24 optimizer { rms_prop_optimizer: { learning_rate: { exponential_decay_learning_rate { initial_learning_rate: 0.004 decay_steps: 800720 decay_factor: 0.95 } } momentum_optimizer_value: 0.9 decay: 0.9 epsilon: 1.0 } } fine_tune_checkpoint: "PATH_TO_BE_CONFIGURED/model.ckpt" from_detection_checkpoint: true # Note: The below line limits the training process to 200K steps, which we # empirically found to be sufficient enough to train the pets dataset. This # effectively bypasses the learning rate schedule (the learning rate will # never decay). Remove the below line to train indefinitely. num_steps: 200000 data_augmentation_options { random_horizontal_flip { } } data_augmentation_options { ssd_random_crop { } }}train_input_reader: { tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED/mscoco_train.record" } label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt"}eval_config: { num_examples: 8000 # Note: The below line limits the evaluation process to 10 evaluations. # Remove the below line to evaluate indefinitely. max_evals: 10}eval_input_reader: { tf_record_input_reader { input_path: "PATH_TO_BE_CONFIGURED/mscoco_val.record" } label_map_path: "PATH_TO_BE_CONFIGURED/mscoco_label_map.pbtxt" shuffle: false num_readers: 1}

上面代碼注釋的第1,2處的data/ZJL_CYX.pbtxt文件需要自己新建，可以複製一個文件然後把文件名改了即可，如圖

打開該文件，修改文件內容為

item {
name: "ZJL"
id: 1
display_name: "ZJL"
}
item {
name: "CYX"
id: 2

display_name: "CYX"
}

配置到此完成，開始訓練

3.開始訓練模型

「開始-Anaconda3-Anaconda Prompt」調出命令行，改變工作目錄至 models-master
esearchobject_detection，輸入下面命令行

python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config

如果沒有報錯的話那就慢慢等待結果

如果有報以下的錯 TypeError: `pred` must be a Tensor, or a Python bool, or 1 or 0. Found instead: None，那麼需要把下圖所圈文件的109行 is_training=None 改為 is_training=True

如果有報以下的錯Tensorflow object detection API之InvalidArgumentError: image_size must contain 3 elements[4]，請參考下面鏈接

JustDoIT：Tensorflow object detection API之InvalidArgumentError: image_size must contain 3 elements[4]?

zhuanlan.zhihu.com

可以通過可視化的頁面看優化的的情況

通過「開始-Anaconda3-Anaconda Prompt」調出命令行，改變工作目錄至 models-master
esearchobject_detection 執行下面的命令

tensorboard --logdir=training

出現下圖

複製上圖所圈處的地址到火狐瀏覽器打開，會出現下圖的界面

可以看到每迭代一次的情況

..................................................................大概經過兩個多小時的等待（可能會出現訓練中斷或者卡頓，那應該是顯存不足，所以重新輸入上述命令接著訓練，是的，是接著上次中斷的地方開始訓練），迭代到了8000次。

我們可以先來測試一下目前的模型效果如何，關閉命令行。在 D:python3models-master
esearchobject_detection 文件夾下找到 export_inference_graph.py 文件，要運行這個文件，還需要傳入config以及checkpoint的相關參數。

「開始-Anaconda3-Anaconda Prompt」調出命令行，改變工作目錄至 models-master
esearchobject_detection 執行下面的命令

python export_inference_graph.py --input_type image_tensor --pipeline_config_path training/ssd_mobilenet_v1_coco.config --trained_checkpoint_prefix training/model.ckpt-31012 --output_directory ZJL_CYX_inference_graph

這裡的--output_directory 是輸出模型的文件夾名稱

運行上述命令後會在object_detection文件夾下生成ZJL_CYX_inference_graph文件夾，內容如下圖

到此為止，我們的模型已經構建完成了，接下來是開始測試效果了

4.測試模型效果

對以下的代碼做一點修改即可

1.# What model to download. 用自己構建的模型，所以不用下載模型MODEL_NAME = ZJL_CYX_inference_graph #這裡做了修改#MODEL_FILE = MODEL_NAME + .tar.gz#DOWNLOAD_BASE = http://download.tensorflow.org/models/object_detection/# Path to frozen detection graph. This is the actual model that is used for the object detection.PATH_TO_CKPT = MODEL_NAME + /frozen_inference_graph.pb# List of the strings that is used to add correct label for each box.PATH_TO_LABELS = os.path.join(data, ZJL_CYX.pbtxt)NUM_CLASSES = 2 #只有兩個標籤2.修改測試圖片的路徑# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.PATH_TO_TEST_IMAGES_DIR = test_images#TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, image{}.jpg.format(i)) for i in range(1, 3) ]TEST_IMAGE_PATHS = os.listdir(D:\python3\models-master\research\object_detection\test_images)os.chdir(D:\python3\models-master\research\object_detection\test_images)import numpy as npimport osimport six.moves.urllib as urllibimport sysimport tarfileimport tensorflow as tfimport zipfilefrom collections import defaultdictfrom io import StringIOfrom matplotlib import pyplot as pltfrom PIL import Image# This is needed since the notebook is stored in the object_detection folder.sys.path.append("..")from object_detection.utils import ops as utils_opsif tf.__version__ < 1.4.0: raise ImportError(Please upgrade your tensorflow installation to v1.4.* or later!)# ## Env setup# In[2]:# This is needed to display the images.get_ipython().magic(matplotlib inline)# ## Object detection imports# Here are the imports from the object detection module.# In[3]:from utils import label_map_utilfrom utils import visualization_utils as vis_util# # Model preparation # ## Variables# # Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_CKPT` to point to a new .pb file. # # By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies.# In[4]:# What model to download.MODEL_NAME = ZJL_CYX_inference_graph#MODEL_FILE = MODEL_NAME + .tar.gz#DOWNLOAD_BASE = http://download.tensorflow.org/models/object_detection/# Path to frozen detection graph. This is the actual model that is used for the object detection.PATH_TO_CKPT = MODEL_NAME + /frozen_inference_graph.pb# List of the strings that is used to add correct label for each box.PATH_TO_LABELS = os.path.join(data, ZJL_CYX.pbtxt)NUM_CLASSES = 2# ## Download Model# In[ ]:# ## Load a (frozen) Tensorflow model into memory.# In[5]:detection_graph = tf.Graph()with detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_CKPT, rb) as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name=)# ## Loading label map# Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`. Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine# In[6]:label_map = label_map_util.load_labelmap(PATH_TO_LABELS)categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)category_index = label_map_util.create_category_index(categories)# ## Helper code# In[7]:def load_image_into_numpy_array(image): (im_width, im_height) = image.size return np.array(image.getdata()).reshape( (im_height, im_width, 3)).astype(np.uint8)# # Detection# In[8]:# For the sake of simplicity we will use only 2 images:# image1.jpg# image2.jpg# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.PATH_TO_TEST_IMAGES_DIR = test_images#TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, image{}.jpg.format(i)) for i in range(1, 3) ]TEST_IMAGE_PATHS = os.listdir(D:\python3\models-master\research\object_detection\test_images)os.chdir(D:\python3\models-master\research\object_detection\test_images)# Size, in inches, of the output images.IMAGE_SIZE = (12, 8)# In[9]:def run_inference_for_single_image(image, graph): with graph.as_default(): with tf.Session() as sess: # Get handles to input and output tensors ops = tf.get_default_graph().get_operations() all_tensor_names = {output.name for op in ops for output in op.outputs} tensor_dict = {} for key in [ num_detections, detection_boxes, detection_scores, detection_classes, detection_masks ]: tensor_name = key + :0 if tensor_name in all_tensor_names: tensor_dict[key] = tf.get_default_graph().get_tensor_by_name( tensor_name) if detection_masks in tensor_dict: # The following processing is only for single image detection_boxes = tf.squeeze(tensor_dict[detection_boxes], [0]) detection_masks = tf.squeeze(tensor_dict[detection_masks], [0]) # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size. real_num_detection = tf.cast(tensor_dict[num_detections][0], tf.int32) detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1]) detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1]) detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks( detection_masks, detection_boxes, image.shape[0], image.shape[1]) detection_masks_reframed = tf.cast( tf.greater(detection_masks_reframed, 0.5), tf.uint8) # Follow the convention by adding back the batch dimension tensor_dict[detection_masks] = tf.expand_dims( detection_masks_reframed, 0) image_tensor = tf.get_default_graph().get_tensor_by_name(image_tensor:0) # Run inference output_dict = sess.run(tensor_dict, feed_dict={image_tensor: np.expand_dims(image, 0)}) # all outputs are float32 numpy arrays, so convert types as appropriate output_dict[num_detections] = int(output_dict[num_detections][0]) output_dict[detection_classes] = output_dict[ detection_classes][0].astype(np.uint8) output_dict[detection_boxes] = output_dict[detection_boxes][0] output_dict[detection_scores] = output_dict[detection_scores][0] if detection_masks in output_dict: output_dict[detection_masks] = output_dict[detection_masks][0] return output_dict# In[10]:for image_path in TEST_IMAGE_PATHS: image = Image.open(image_path) # the array based representation of the image will be used later in order to prepare the # result image with boxes and labels on it. image_np = load_image_into_numpy_array(image) # Expand dimensions since the model expects images to have shape: [1, None, None, 3] image_np_expanded = np.expand_dims(image_np, axis=0) # Actual detection. output_dict = run_inference_for_single_image(image_np, detection_graph) # Visualization of the results of a detection. vis_util.visualize_boxes_and_labels_on_image_array( image_np, output_dict[detection_boxes], output_dict[detection_classes], output_dict[detection_scores], category_index, instance_masks=output_dict.get(detection_masks), use_normalized_coordinates=True, line_thickness=8) plt.figure(figsize=IMAGE_SIZE) plt.imshow(image_np)

把以上代碼複製到新建的python文件中，我這裡命名為ZJLCYX_test.py 將其保存到D:python3models-master
esearchobject_detection 文件夾下

然後再D:python3models-master
esearchobject_detection est_images文件夾下放測試的圖，如下圖

最後打開spider運行ZJLCYX_test.py 文件

運行結果如下圖

到此，自己的模型已經構建起來了

過幾天會閱讀api的源碼，等閱讀完再分享閱讀心得