Post by Jason ThorpePost by Bang Jun-YoungBTW, how about making ffs(9) inline as well? Calling overhead seems
to be quite high on i386...
GCC will already inline ffs() if the CPU back-end provides the
appropriate pattern. The right answer, if GCC is not doing it on i386,
would be to add that pattern to i386.md.
I looked further and found that ffs() was properly inlined in the
kernel:
c01c02a4 <fdalloc>:
[snip]
c01c0348: 83 fa ff cmp $0xffffffff,%edx
c01c034b: 0f 84 5e 01 00 00 je c01c04af <fdalloc+0x20b>
c01c0351: f7 d2 not %edx
c01c0353: c1 e3 05 shl $0x5,%ebx
c01c0356: 31 c0 xor %eax,%eax
c01c0358: 0f bc d2 bsf %edx,%edx
c01c035b: 0f 94 c0 sete %al
c01c035e: f7 d8 neg %eax
So it's clear that a little speedup with inlined ffz() as shown in
provos' graph was due to incomplete implementation, isn't it?
Jun-Young
--
Bang Jun-Young <***@NetBSD.org>