TensorFlow煉丹（1） Using GPUs

02-04

1. Supported devices

TensorFlow支持CPU和GPU兩種模式：

"/cpu:0"：你電腦上的CPU
"/gpu:0"：你電腦上的GPU
"/gpu:1"：你電腦上的第二塊GPU

如果一個TensorFLow的操作同時包含CPU和GPU的實現，當這個操作被分配給設備的時候，GPU設備將優先被分配。例如，「matmul」這個操作，當設備有cpu:0和gpu:0時，gpu:0將會被選擇去執行「matmul」。

2. Logging Device placement

如果要找出你的操作和tensors被分配給了哪些設備，請使用log_device_placement設置為True配置session。

# Creates a graph.na = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=a)nb = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=b)nc = tf.matmul(a, b)n# Creates a session with log_device_placement set to True.nsess = tf.Session(config=tf.ConfigProto(log_device_placement=True))n# Runs the op.nprint(sess.run(c))n

輸出：

Device mapping:n/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci busnid: 0000:05:00.0nb: /job:localhost/replica:0/task:0/gpu:0na: /job:localhost/replica:0/task:0/gpu:0nMatMul: /job:localhost/replica:0/task:0/gpu:0n[[ 22. 28.]n [ 49. 64.]]n

3. Manual device placement

如果你希望特定的操作在你選擇的設備上運行，而不是自動選擇的設備，則可以使用tf.device去創建設備內容，以使得該設備內容中的所有操作具有相同的設備分配。

# Creates a graph.nwith tf.device(/cpu:0):n a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=a)n b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=b)nc = tf.matmul(a, b)n# Creates a session with log_device_placement set to True.nsess = tf.Session(config=tf.ConfigProto(log_device_placement=True))n# Runs the op.nprint(sess.run(c))n

a和b被分配給了cpu:0。由於沒有指定設備執行Matmul操作，TensorFlow運行的時候自動選擇gpu:0執行Matmul。

Device mapping:n/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci busnid: 0000:05:00.0nb: /job:localhost/replica:0/task:0/cpu:0na: /job:localhost/replica:0/task:0/cpu:0nMatMul: /job:localhost/replica:0/task:0/gpu:0n[[ 22. 28.]n [ 49. 64.]]n

4. Allowing GPU memory growth

默認情況下，TensorFlow會佔用所有GPUs的所有GPU內存（取決於CUDA_VISIBLE_DEVICES這個系統變數）。這樣做可以減少內存碎片來更有效地利用設備上相對寶貴的GPU內存資源。

在某些情況下，該進程僅僅需要分配可用內存的一部分，或者根據該進程的需要來增加內存的使用量。TensorFlow在Session上提供了兩個Config選項來進行控制。

第一個是「allow_growth」選項，它僅僅基於運行時的分配來分配更多的GPU內存：它開始分配非常少的內存，並且隨著Session的運行和更多的GPU內存需求，擴展TensorFlow所需的GPU內存區域。者可能導致很糟糕的內存碎片。要打開此選項，請在ConfigProto中將設置為：

config = tf.ConfigProto()nconfig.gpu_options.allow_growth = Truensession = tf.Session(config=config, ...)n

第二個方法是「pre_process_gpu_memory_fraction選項」，它決定了每個可見的GPU應分配的內存總量的百分比。例如，您可以告訴TensorFlow僅僅分配總內存的40%，通過一下設定就可以實現：

config = tf.ConfigProto()nconfig.gpu_options.per_process_gpu_memory_fraction = 0.4nsession = tf.Session(config=config, ...)n

如果你想限制TensorFlow進程可利用的GPU內存數量，以上對你非常的有用。

5. Using a single GPU on a multi-GPU system

如果在你的電腦中有超過一塊GPU，默認會選擇最低ID的GPU。如果你想運行在其他的GPU上，你可以明確指定一個選項：

# Creates a graph.nwith tf.device(/gpu:2):n a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=a)n b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=b)n c = tf.matmul(a, b)n# Creates a session with log_device_placement set to True.nsess = tf.Session(config=tf.ConfigProto(log_device_placement=True))n# Runs the op.nprint(sess.run(c))n

如果你指定的設備不存在，你會得到「InvalidArgumentError」：

InvalidArgumentError: Invalid argument: Cannot assign a device to node b:nCould not satisfy explicit device specification /gpu:2n [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]n values: 1 2 3...>, _device="/gpu:2"]()]]n

如果你希望TensorFlow自動選擇一個存在的並且可以支持的設備運行操作，以防指定的設備不存在，則可以在創建session時在配置選項中將「allow_soft_placement」設置為True。

# Creates a graph.nwith tf.device(/gpu:2):n a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name=a)n b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name=b)n c = tf.matmul(a, b)n# Creates a session with allow_soft_placement and log_device_placement setn# to True.nsess = tf.Session(config=tf.ConfigProto(n allow_soft_placement=True, log_device_placement=True))n# Runs the op.nprint(sess.run(c))n

6. Using multiple GPUs

如果你想在多塊GPUs上運行TensorFlow，你可以以multi-tower模式構建你的模型，這裡每個模式被分配給不同的GPU，例如：

# Creates a graph.nc = []nfor d in [/gpu:2, /gpu:3]:n with tf.device(d):n a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])n b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])n c.append(tf.matmul(a, b))nwith tf.device(/cpu:0):n sum = tf.add_n(c)n# Creates a session with log_device_placement set to True.nsess = tf.Session(config=tf.ConfigProto(log_device_placement=True))n# Runs the op.nprint(sess.run(sum))n

輸出：

Device mapping:n/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci busnid: 0000:02:00.0n/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: Tesla K20m, pci busnid: 0000:03:00.0n/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: Tesla K20m, pci busnid: 0000:83:00.0n/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: Tesla K20m, pci busnid: 0000:84:00.0nConst_3: /job:localhost/replica:0/task:0/gpu:3nConst_2: /job:localhost/replica:0/task:0/gpu:3nMatMul_1: /job:localhost/replica:0/task:0/gpu:3nConst_1: /job:localhost/replica:0/task:0/gpu:2nConst: /job:localhost/replica:0/task:0/gpu:2nMatMul: /job:localhost/replica:0/task:0/gpu:2nAddN: /job:localhost/replica:0/task:0/cpu:0n[[ 44. 56.]n [ 98. 128.]]n