mainloop: select() with timeout under load is very expensive
Earlier, in
2373e951 (NetRx,View: Avoid copying incoming frame to
intermediate netrx.framebuf - put it to view's overlay directly) I've
noticed that there are lots of read_hpet() calls in profile.
And they turned out to be related to select() calls (perf report -g graph):
9.79% rawv [kernel.kallsyms] [k] __copy_to_user_ll
|
--- __copy_to_user_ll
|
--9.52%-- copy_to_user
|
--9.38%-- memcpy_toiovec
skb_copy_datagram_iovec
udp_recvmsg
inet_recvmsg
sock_recvmsg
sys_recvfrom
sys_recv
sys_socketcall
sysenter_do_call
|
|--2.56%-- 0xa76ff424
|
|--2.48%-- 0xa770e424
|
|--2.21%-- 0xa7711424
|
--2.13%-- 0xa771d424
9.56% rawv [kernel.kallsyms] [k] read_hpet
|
--- read_hpet
|
--9.26%-- timekeeping_get_ns
|
--9.16%-- ktime_get_ts
|
|--3.17%-- poll_select_copy_remaining
| sys_select
| sysenter_do_call
| |
| |--0.91%-- 0xa770e424
| |
| |--0.82%-- 0xa7711424
| |
| |--0.73%-- 0xa76ff424
| |
| --0.71%-- 0xa771d424
|
|--2.99%-- poll_select_set_timeout
| sys_select
| sysenter_do_call
| |
| |--0.82%-- 0xa770e424
| |
| |--0.80%-- 0xa76ff424
| |
| |--0.72%-- 0xa7711424
| |
| --0.65%-- 0xa771d424
|
--2.97%-- select_estimate_accuracy
do_select
core_sys_select
sys_select
sysenter_do_call
|
|--0.79%-- 0xa76ff424
|
|--0.77%-- 0xa771d424
|
|--0.73%-- 0xa7711424
|
--0.69%-- 0xa770e424
6.65% rawv libc-2.13.so [.] __memcpy_ssse3
|
--- __memcpy_ssse3
So for every select() 3 calls are made to ktime_get_ts() which in turn
calls read_hpet(), to handle select timeout parameter. And that
read_hpet() calls cost us the same cycles as to copy udp data to
userspace - quite a lot!
Stop using select timeout, and receiver read_hpet() overhead is gone:
old:
9.79% rawv [kernel.kallsyms] [k] __copy_to_user_ll
9.56% rawv [kernel.kallsyms] [k] read_hpet
6.65% rawv libc-2.13.so [.] __memcpy_ssse3
6.27% swapper [kernel.kallsyms] [k] uvc_video_decode_data.isra.10
2.70% rawvtx [kernel.kallsyms] [k] __copy_from_user_ll
1.79% Xorg [unknown] [.] 0xa756657b
1.43% rawv [kernel.kallsyms] [k] do_select
1.28% rawv [kernel.kallsyms] [k] __schedule
1.21% rawv [kernel.kallsyms] [k] sysenter_past_esp
1.11% rawv [kernel.kallsyms] [k] __switch_to
1.09% swapper [kernel.kallsyms] [k] read_hpet
new:
10.43% rawv [kernel.kallsyms] [k] __copy_to_user_ll
7.81% rawv libc-2.13.so [.] __memcpy_ssse3
7.64% swapper [kernel.kallsyms] [k] uvc_video_decode_data.isra.10
3.45% rawvtx [kernel.kallsyms] [k] __copy_from_user_ll
2.18% Xorg [unknown] [.] 0xa76f2476
1.83% rawv [kernel.kallsyms] [k] do_select
1.61% rawv [kernel.kallsyms] [k] __schedule
1.49% rawv [kernel.kallsyms] [k] __switch_to
1.45% rawv [kernel.kallsyms] [k] sysenter_past_esp
1.34% swapper [kernel.kallsyms] [k] read_hpet
1.12% rawv rawv [.] mainloop()
but also note, that our EAGAIN trick doesn't really work: Under load we
seldomly fetch more than a couple of packets in one burst, and so go
back to select, back to recv(), back to select, back recv(), etc...
Later we'd better figure out how to get rid of calling lots of select()
at all.
Signed-off-by: Kirill Smelkov <kirr@navytux.spb.ru>