GPU 中的計算單元之間能不能通信？

01-19

有沒有類似所有計算單元都能訪問的寄存器？

常規做法的話，單元之間通訊得通過global memory。intel為了加速這個，專門在phi里加了一個很小的message buffer，用於消息傳遞。

至於非常規做法，還是有的。比如，在CUDA里可以不用重新啟動一次Kernel就能在單元間做全局同步，也算是一種通訊了。但這個仍然需要global memory。

/* First, sync within each Block */ __syncthreads(); /* Pick a representative from each (here, 1D) block */ if (threadIdx.x == 0) { /* Get my barrier number */ int barno = barnos[blockIdx.x] + 1; int hisbarno; int who = (blockIdx.x + 1) % gridDim.x; /* Check in at barrier */ barnos[blockIdx.x] = barno; /* Scan for all here or somebody passed */ do { /* Wait for who */ do { hisbarno = barnos[who]; } while (hisbarno &< barno); /* Bump to next who */ if (++who &>= gridDim.x) who = 0; } while ((hisbarno == barno) (who != blockIdx.x)); /* Tell others we are all here */ barnos[blockIdx.x] = barno + 1; } /* Rejoin with rest of my Block */ __syncthreads();

Nvidia實現了GPUDirect技術，可以讓同主機GPU或跨主機GPU之間進行通信。

https://developer.nvidia.com/gpudirect

全局內存Global Memory所有計算單元都可訪問，但是沒有可以所有計算單元都訪問的寄存器

相關文檔：

http://docs.nvidia.com/cuda/cuda-c-programming-guide/#memory-hierarchy