NetRx,View: Avoid copying incoming frame to intermediate netrx.framebuf - put it to view's overlay directly
I.e. put video frames directly to sink's framebuffer, thus avoiding
lot of memcpy overhead. I've tried to do it without loosing generality
and the scheme is as follows:
1) for incoming frame, source will ask sinks, whether some sink
could provide framebuffer with required frame-metric (pixfmt,
w,h);
2) if yes, incoming frame will be put to that buffer directly;
3) upon notification, sink which provided the buffer will be
notified last, so that buffer is valid for other sinks;
4) after buffer-provider sink is notified, the buffer is returned to
that sink.
To measure how it affects performance on my HP Mini 5103 with Atom N455
I've run
1) ./rawvtx -d /dev/video0 -t 127.255.255.255:5555 # rawvtx symlinked to rawv
and
2) 4 instances of ./rawv -r :5555 -v # with window manually hidden
(all programs were bound to cpu0 via schedtool -a 0x01)
and below is profile output:
before patch:
16.39% rawv libc-2.13.so [.] __memcpy_ssse3
8.92% rawv [kernel.kallsyms] [k] read_hpet
8.32% rawv [kernel.kallsyms] [k] __copy_to_user_ll
5.60% swapper [kernel.kallsyms] [k] uvc_video_decode_data.isra.10
2.54% rawvtx [kernel.kallsyms] [k] __copy_from_user_ll
2.02% Xorg [unknown] [.] 0xa752f414
1.26% rawv [kernel.kallsyms] [k] do_select
1.03% rawv [kernel.kallsyms] [k] sysenter_past_esp
1.01% rawv [kernel.kallsyms] [k] __switch_to
after patch:
10.03% rawv [kernel.kallsyms] [k] read_hpet
9.58% rawv [kernel.kallsyms] [k] __copy_to_user_ll
7.03% rawv libc-2.13.so [.] __memcpy_ssse3
6.22% swapper [kernel.kallsyms] [k] uvc_video_decode_data.isra.10
2.85% rawvtx [kernel.kallsyms] [k] __copy_from_user_ll
1.98% Xorg [unknown] [.] 0xa75c9108
1.25% rawv [kernel.kallsyms] [k] sysenter_past_esp
1.21% rawv [kernel.kallsyms] [k] do_select
1.19% rawv [kernel.kallsyms] [k] __schedule
1.17% rawv [kernel.kallsyms] [k] __switch_to
1.02% rawv rawv [.] mainloop()
i.e. ~half of memcpy overhead is gone.
NOTE: netrx.fragbuf, with memcpy from it to target frame, is still there.
NOTE2: a lot of time is also spent in copying from kernel -> user space
(via udp_recv()), avoiding that would optimize even more.
NOTE3: read_hpet needs investigating...
Signed-off-by: Kirill Smelkov <kirr@navytux.spb.ru>