TensorFlow（1）-AlexNet實現

01-25

Tensorflow沒有對於AlexNet這種簡單網路的實現，因此我想通過實現AlexNet熟悉對於Tensorflow的操作，包括通過ImageNet預訓練模型識別測試圖片，以及在自己的訓練集上訓練網路。分為三個部分：網路定義，數據讀取，操作定義。

GitHub：qiansi/tensorflow-AlexNet

網路定義

網路的參數主要有輸入的圖片，全連接層dropOut的keep_prob，因為訓練時需要dropout，而在測試時需要將完整的網路因此不需要dropout。不同的訓練集網路差異在於Fc8層的輸出神經元，因此我們將類別數設置為參數。skip_layer用於模型參數載入時跳過，僅使用默認初始化，這樣我們訓練自己的網路的時候將skip_layer設置為Fc層就可以通過ImageNet預訓練模型初始化網路並僅訓練Fc層啦。權重路徑為默認路徑，即ImageNet預訓練模型。

class AlexNet(object): def __init__(self, input_x, keep_prob, num_classes, skip_layer, weights_path = "Default"): # Initialization the parameters self.input_x = input_x self.keep_prob = keep_prob self.skip_layer = skip_layer if weights_path == "Default" : self.weights_path = "bvlc_alexnet.npy" else: self.weights_path = weights_path self.num_classes = num_classes # Create the AlexNet Network Define self.create()

卷積層定義

由於2012年，GPU性能較弱，因此AlexNet通過兩個GPU協同訓練CNN，在一些層的計算我們需要先將輸入以及卷積核分為兩組，分別計算然後得到的feature map再合併，因此我們設置了groups參數。

note：tensorflow padding的問題，AlexNet除第一層卷積層使用「VALID」的padding方式外，其他卷積層使用「SAME」的padding方式，而兩種方式得到的feature map的尺寸是不一致的。

「SAME」方式對於第一個感受野不作任何處理，認為每個感受野都佔據stride位置，因此計算方式為輸入尺寸除以步伐然後向上取整。而「VALID」方式考慮第一個卷積核的完整性，也就是第一個卷積核佔據的位置一定是filter_shape大小，因此計算方式為(輸入空間維度-filter_shape+1)/步伐然後向上取整。

If padding == "SAME": output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i]) If padding == "VALID": output_spatial_shape[i] = ceil((input_spatial_shape[i] - (spatial_filter_shape[i]-1) * dilation_rate[i]) / strides[i]).def conv(self, x, kernel_height, num_kernels, stride, name, padding = "SAME",padding_num = 0,groups = 1): print ("name is {} np.shape(input) {}".format(name, np.shape(x))) input_channels = int(np.shape(x)[-1]) if not padding_num == 0: x = tf.pad(x,[[0,0],[padding_num,padding_num],[padding_num,padding_num],[0,0]]) convolve = lambda i,k:tf.nn.conv2d(i,k, strides = [1, stride, stride ,1], padding = padding) with tf.variable_scope(name) as scope: weights = tf.get_variable("weights", shape = [kernel_height, kernel_height, input_channels/groups, num_kernels]) biases = tf.get_variable("biases", shape = [num_kernels]) if groups == 1: conv = convolve(x,weights) else: input_groups = tf.split(axis=3,num_or_size_splits = groups, value = x) weights_groups = tf.split(axis = 3, num_or_size_splits = groups, value = weights) output_groups = [convolve(i,k) for i,k in zip(input_groups,weights_groups)] conv = tf.concat(axis = 3, values = output_groups) # add biases and avtive function withBias = tf.reshape(tf.nn.bias_add(conv,biases),conv.get_shape().as_list()) relu = tf.nn.relu(withBias) return relu

MAX pooling、LRN、FC層定義

def maxPooling(self, input,filter_size,stride,name,padding = "SAME"): print ("name is {} np.shape(input) {}".format(name,np.shape(input))) return tf.nn.max_pool(input,ksize=[1,filter_size,filter_size,1],strides = [1,stride,stride,1],padding = padding, name = name) def lrn(self, input,radius,alpha,beta,name,bias = 1.0): print ("name is {} np.shape(input) {}".format(name,np.shape(input))) return tf.nn.local_response_normalization(input,depth_radius=radius, alpha=alpha,beta=beta,bias=bias,name=name) def fc(self, input,num_in,num_out,name,drop_ratio=0,relu = True): print ("name is {} np.shape(input) {}".format(name,np.shape(input))) with tf.variable_scope(name) as scope: weights = tf.get_variable("weights",shape = [num_in,num_out],trainable=True) biases = tf.get_variable("biases",[num_out],trainable=True) # Linear act = tf.nn.xw_plus_b(input,weights,biases,name=scope.name) if relu == True: relu = tf.nn.relu(act) if drop_ratio == 0: return relu else: return tf.nn.dropout(relu,1.0-drop_ratio) else: if drop_ratio == 0: return act else: return tf.nn.dropout(act,1.0-drop_ratio)

至此我們就完成了所有需要的網路層的定義，僅需要根據參數組裝我們的model就好啦。

完整的網路定義，根據卷積層、池化層、LRN層、FC層的參數定義我們的model，其中FC層num_out為我們的類別個數。

def create(self): #layer 1 conv1 = self.conv(self.input_x,11,96,4,name = "conv1", padding = "VALID") pool1 = self.maxPooling(conv1, filter_size = 3, stride = 2, name = "pool1", padding = "VALID") norm1 = self.lrn(pool1,2,2e-05,0.75,name="norm1") #layer 2 conv2 = self.conv(norm1,5,256,1,name = "conv2",padding_num = 0, groups = 2) pool2 = self.maxPooling(conv2, filter_size = 3, stride = 2, name = "pool2", padding = "VALID") norm2 = self.lrn(pool2,2,2e-05,0.75,name="norm2") #layer 3 conv3 = self.conv(norm2, 3, 384, 1, name = "conv3") #layer 4 conv4 = self.conv(conv3, 3, 384, 1, name = "conv4",groups = 2) #layer 5 conv5 = self.conv(conv4, 3, 256, 1, name = "conv5", groups = 2) pool5 = self.maxPooling(conv5, filter_size = 3, stride = 2, name= "pool5", padding = "VALID") #layer 6 flattened = tf.reshape(pool5, [-1,6*6*256]) fc6 = self.fc(input = flattened, num_in = 6*6*256, num_out = 4096, name = "fc6", drop_ratio = 1.0-self.keep_prob, relu = True) #layer 7 fc7 = self.fc(input = fc6, num_in = 4096, num_out = 4096, name = "fc7", drop_ratio = 1.0 - self.keep_prob, relu = True) #layer 8 self.fc8 = self.fc(input = fc7, num_in = 4096, num_out = self.num_classes, name = "fc8", drop_ratio = 0, relu = False)

2.數據處理

參數初始化

通過ImageNet預訓練模型初始化我們的各層參數，各層以dict形式存在nparray中，因此我們首先載入模型，然後根據網路的名字分別載入權重及偏置的參數。

#load pretrained weights def load_weights(self, session): weights_dict = np.load(self.weights_path, encoding = "bytes").item() for op_name in weights_dict: if op_name not in self.skip_layer: with tf.variable_scope(op_name, reuse = True): for data in weights_dict[op_name]: if len(data.shape) == 1: var = tf.get_variable("biases",trainable=False) session.run(var.assign(data)) else: var = tf.get_variable("weights",trainable=False) session.run(var.assign(data))

輸入圖像處理

對於輸入圖像，我們首先將其大小歸一化為網路輸入大小，並定義一個佔位符用以結果輸出，然後載入網路模型，正向傳播網路得到FC8層輸出結果，然後將softmax結果轉換為類別結果。

def test_image(path_image,num_class,path_classes,weights_path = "Default"): #x = tf.placeholder(tf.float32, [1,227,227,3]) x = cv2.imread(path_image) x = cv2.resize(x,(227,227)) x = x.astype(np.float32) x = np.reshape(x,[1,227,227,3]) y = tf.placeholder(tf.float32,[None,num_class]) model = AlexNet(x,0.5,1000,skip_layer = "", weights_path = weights_path) score = model.fc8 max = tf.arg_max(score,1) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) model.load_weights(sess) #score = model.fc8 label_id = sess.run(max)[0] with open(path_classes) as f: lines = f.readlines() label = lines[label_id] print("image name is {} class_id is {} class_name is {}".format(path_image,label_id,label)) f.close()

現在，我們測試一張圖片看看結果

test_image("C:/Users/Rain/finetune_alexnet_with_tensorflow/images/zebra.jpeg",1000,"caffe_classes.py")

3.根據自己的數據訓練模型

參看GitHub：qiansi/tensorflow-AlexNet