mainloop: select() with timeout under load is very expensive
commitb46d308a55427d1d4c2fe01965752c2604965d54
authorKirill Smelkov <kirr@navytux.spb.ru>
Sun, 17 Jun 2012 07:31:59 +0000 (17 11:31 +0400)
committerKirill Smelkov <kirr@navytux.spb.ru>
Sun, 17 Jun 2012 07:31:59 +0000 (17 11:31 +0400)
treed0c65d34a872b108438ba03dc93f5257f4ce9aeb
parent2373e951c4a65b1f291839cf24a731c6b9fe140c
mainloop: select() with timeout under load is very expensive

Earlier, in 2373e951 (NetRx,View: Avoid copying incoming frame to
intermediate netrx.framebuf - put it to view's overlay directly) I've
noticed that there are lots of read_hpet() calls in profile.

And they turned out to be related to select() calls (perf report -g graph):

     9.79%         rawv  [kernel.kallsyms]     [k] __copy_to_user_ll
                   |
                   --- __copy_to_user_ll
                      |
                       --9.52%-- copy_to_user
                                 |
                                  --9.38%-- memcpy_toiovec
                                            skb_copy_datagram_iovec
                                            udp_recvmsg
                                            inet_recvmsg
                                            sock_recvmsg
                                            sys_recvfrom
                                            sys_recv
                                            sys_socketcall
                                            sysenter_do_call
                                            |
                                            |--2.56%-- 0xa76ff424
                                            |
                                            |--2.48%-- 0xa770e424
                                            |
                                            |--2.21%-- 0xa7711424
                                            |
                                             --2.13%-- 0xa771d424
     9.56%         rawv  [kernel.kallsyms]     [k] read_hpet
                   |
                   --- read_hpet
                      |
                       --9.26%-- timekeeping_get_ns
                                 |
                                  --9.16%-- ktime_get_ts
                                            |
                                            |--3.17%-- poll_select_copy_remaining
                                            |          sys_select
                                            |          sysenter_do_call
                                            |          |
                                            |          |--0.91%-- 0xa770e424
                                            |          |
                                            |          |--0.82%-- 0xa7711424
                                            |          |
                                            |          |--0.73%-- 0xa76ff424
                                            |          |
                                            |           --0.71%-- 0xa771d424
                                            |
                                            |--2.99%-- poll_select_set_timeout
                                            |          sys_select
                                            |          sysenter_do_call
                                            |          |
                                            |          |--0.82%-- 0xa770e424
                                            |          |
                                            |          |--0.80%-- 0xa76ff424
                                            |          |
                                            |          |--0.72%-- 0xa7711424
                                            |          |
                                            |           --0.65%-- 0xa771d424
                                            |
                                             --2.97%-- select_estimate_accuracy
                                                       do_select
                                                       core_sys_select
                                                       sys_select
                                                       sysenter_do_call
                                                       |
                                                       |--0.79%-- 0xa76ff424
                                                       |
                                                       |--0.77%-- 0xa771d424
                                                       |
                                                       |--0.73%-- 0xa7711424
                                                       |
                                                        --0.69%-- 0xa770e424
     6.65%         rawv  libc-2.13.so          [.] __memcpy_ssse3
                   |
                   --- __memcpy_ssse3

So for every select() 3 calls are made to ktime_get_ts() which in turn
calls read_hpet(), to handle select timeout parameter. And that
read_hpet() calls cost us the same cycles as to copy udp data to
userspace - quite a lot!

Stop using select timeout, and receiver read_hpet() overhead is gone:

old:

     9.79%         rawv  [kernel.kallsyms]     [k] __copy_to_user_ll
     9.56%         rawv  [kernel.kallsyms]     [k] read_hpet
     6.65%         rawv  libc-2.13.so          [.] __memcpy_ssse3
     6.27%      swapper  [kernel.kallsyms]     [k] uvc_video_decode_data.isra.10
     2.70%       rawvtx  [kernel.kallsyms]     [k] __copy_from_user_ll
     1.79%         Xorg  [unknown]             [.] 0xa756657b
     1.43%         rawv  [kernel.kallsyms]     [k] do_select
     1.28%         rawv  [kernel.kallsyms]     [k] __schedule
     1.21%         rawv  [kernel.kallsyms]     [k] sysenter_past_esp
     1.11%         rawv  [kernel.kallsyms]     [k] __switch_to
     1.09%      swapper  [kernel.kallsyms]     [k] read_hpet

new:

    10.43%         rawv  [kernel.kallsyms]     [k] __copy_to_user_ll
     7.81%         rawv  libc-2.13.so          [.] __memcpy_ssse3
     7.64%      swapper  [kernel.kallsyms]     [k] uvc_video_decode_data.isra.10
     3.45%       rawvtx  [kernel.kallsyms]     [k] __copy_from_user_ll
     2.18%         Xorg  [unknown]             [.] 0xa76f2476
     1.83%         rawv  [kernel.kallsyms]     [k] do_select
     1.61%         rawv  [kernel.kallsyms]     [k] __schedule
     1.49%         rawv  [kernel.kallsyms]     [k] __switch_to
     1.45%         rawv  [kernel.kallsyms]     [k] sysenter_past_esp
     1.34%      swapper  [kernel.kallsyms]     [k] read_hpet
     1.12%         rawv  rawv                  [.] mainloop()

but also note, that our EAGAIN trick doesn't really work: Under load we
seldomly fetch more than a couple of packets in one burst, and so go
back to select, back to recv(), back to select, back recv(), etc...

Later we'd better figure out how to get rid of calling lots of select()
at all.

Signed-off-by: Kirill Smelkov <kirr@navytux.spb.ru>
net.cpp
rawv.cpp