Linux線程數限制

廢話不多說,直接上Python3代碼

import timenimport threadingnndef worker():n while True:n time.sleep(1000)n nnthreads = []nnum_worker_threads = 19000nfor i in range(num_worker_threads):n t = threading.Thread(target=worker)n try:n t.start()n threads.append(t)n print(i)n except:n print("EX")n time.sleep(2000)nnfor t in threads:n t.join()n

創建到10000多個的時候,拋了異常,創建失敗了。大家都是看APUE學的Linux,首先想到的必須是limits限制,先查出進程號,再查看這個進程的limits信息

Limit Soft Limit Hard Limit Units nMax cpu time unlimited unlimited seconds nMax file size unlimited unlimited bytes nMax data size unlimited unlimited bytes nMax stack size 8388608 unlimited bytes nMax core file size 0 unlimited bytes nMax resident set unlimited unlimited bytes nMax processes 60886 60886 processes nMax open files 65535 65535 files nMax locked memory 65536 65536 bytes nMax address space unlimited unlimited bytes nMax file locks unlimited unlimited locks nMax pending signals 60886 60886 signals nMax msgqueue size 819200 819200 bytes nMax nice priority 0 0 nMax realtime priority 0 0 nMax realtime timeout unlimited unlimited us n

可以看到,進程數和文件打開數都遠大於10000,所以不太可能是這個原因了

然後祭出strace,找到了相關log,clone返回了EAGAIN

[pid 21107] clone(child_stack=0x7f599afb0fb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f599afb19d0, tls=0x7f599afb1700, child_tidptr=0x7f599afb19d0) = -1 EAGAIN (Resource temporarily unavailable)n

既然知道是fork失敗,那麼查一下內核的上限

$ sysctl kernel.pid_max kernel.threads-maxnkernel.pid_max = 32768nkernel.threads-max = 121773n

仍然是遠大於10000,然後懷疑是內存不夠,但內存原因應該返回ENOMEM才對,不死心,翻了一下fork的文檔。

EAGAIN A system-imposed limit on the number of threads was encountered. There are a number of limits that may trigger this error:nn * the RLIMIT_NPROC soft resource limit (set via setrlimit(2)), which limits the number of processes and threads for a real user ID, was reached;nn * the kernels system-wide limit on the number of processes and threads, /proc/sys/kernel/threads-max, was reached (see proc(5));nn * the maximum number of PIDs, /proc/sys/kernel/pid_max, was reached (see proc(5)); ornn * the PID limit (pids.max) imposed by the cgroup "process number" (PIDs) controller was reached.n

前三個已經排除掉了,那麼主觀上,第四條的嫌疑就非常大了,cgroup? APUE上面好像沒講到啊,現學了一下,開啟systemd-cgtop再運行上面的腳本,清楚的看到,tasks這一列,跟線程數的瓶頸吻合。查看一下該進程的cgroup限制。

$ cat /proc/18570/cgroupn12:pids:/user.slice/user-1000.slice/session-c2.scopen11:memory:/n10:perf_event:/n9:devices:/user.slicen8:cpuset:/n7:blkio:/n6:net_cls,net_prio:/n5:cpu,cpuacct:/n4:hugetlb:/n3:freezer:/n2:rdma:/n1:name=systemd:/user.slice/user-1000.slice/session-c2.scopen0::/user.slice/user-1000.slice/session-c2.scopen

順藤摸瓜下去

$ for f in `find /sys/fs/cgroup/pids/user.slice -name pids.max`; do echo "$f"; cat "$f"; donen/sys/fs/cgroup/pids/user.slice/pids.maxnmaxn/sys/fs/cgroup/pids/user.slice/user-1000.slice/pids.maxn10813n/sys/fs/cgroup/pids/user.slice/user-1000.slice/user@1000.service/pids.maxnmaxn/sys/fs/cgroup/pids/user.slice/user-1000.slice/session-c2.scope/pids.maxnmaxn

看到了10813,把它修改到20000,19000個線程創建成功。

順便了解了一下systemd的slice配置

UserTasksMax=

Sets the maximum number of OS tasks each user may run concurrently. This controls the TasksMax= setting of the per-user slice unit, see systemd.resource-control(5) for

details. If assigned the special value "infinity", no tasks limit is applied. Defaults to 33%, which equals 10813 with the kernels defaults on the host, but might be smaller

in OS containers.

除了10813,還發現了一個神奇的數字4915,留給看客自己發現。

推薦閱讀:

非同步,多線程和並行的區別?
Android的界面組件不能被子線程訪問是什麼意思呢?
如何寫出線程不安全的代碼
在不改變方法簽名(method signature)的情況下, 請描述這段代碼的問題以及如何解決?

TAG:Linux | 多线程 | Python |