ipfw: Add per-cpu table support.
This is intended to improve performance and reduce latency for
matching discrete addresses. Table itself is radix tree.
For exmaple, nginx, 1KB web object, 30K concurrent connections,
1 request/connection. ipfw is running on the server side.
Comparison between no-match rules and no-match table entries:
| perf-avg | lat-avg | lat-stdev | lat-99%
| (tps) | (ms) | (ms) | (ms)
-------------------+-----------+---------+-----------+---------
100 nomatch rules | 184752.65 | 67.50 | 5.69 | 79.11
-------------------+-----------+---------+-----------+---------
100 nomatch tblent | 200754.53 | 61.18 | 5.72 | 73.10
1K nomatch rules | 90836.43 | 144.72 | 12.28 | 168.97
-------------------+-----------+---------+-----------+---------
1K nomatch tblent | 199750.35 | 61.54 | 5.73 | 72.90
10K nomatch rules | 14836.69 | 864.46 | 157.49 | 1110.00
-------------------+-----------+---------+-----------+---------
10K nomatch tblent | 198412.93 | 62.17 | 5.66 | 73.08
Comparison between number of no-match table entries:
| perf-avg | lat-avg | lat-stdev | lat-99%
| (tps) | (ms) | (ms) | (ms)
-------------------+-----------+---------+-----------+---------
no-ipfw | 210658.80 | 58.01 | 5.20 | 68.73
-------------------+-----------+---------+-----------+---------
100 nomatch tblent | 200754.53 | 61.18 | 5.72 | 73.10
-------------------+-----------+---------+-----------+---------
1K nomatch tblent | 199750.35 | 61.54 | 5.73 | 72.90
-------------------+-----------+---------+-----------+---------
10K nomatch tblent | 198412.93 | 62.17 | 5.66 | 73.08
It scales pretty well with the number of no-match table entries.
En if it is compared w/ no-ipfw case, the performance and latency
impacts of the ipfw after this commit are pretty small.