scalability enhancements to pool(9)

Discussion:

Chuck Silvers

2003-11-08 21:16:54 UTC

hi folks,

I found that there are yet more scalability problems that show up with
the fork+exit microbenchmark, this time in the pool code. there are two
problems:

(1) we have a single list for all the pages allocated to a pool, which we
traverse at various times looking for a page to allocate from.

(2) each pool has an 8-bucket hash table for page headers that don't
fit nicely in the page itself.

the diff at ftp://ftp.netbsd.org/pub/NetBSD/misc/chs/diff.pool
fixes these problems by:

(1) split the list of pages allocated to a pool into three lists:
completely full, partially full, and completely empty.
there is no longer any need to traverse any list looking for a
certain type of page.

(2) replace the hash table with a splay tree. yes, this is probably
not the ideal data structure for this, but it'll do until we have
something better, and it's not measurably worse in this context
than the hash table when there are few entries.

after these changes (and david's changes to child-tracking), we have
linear scaling for the fork+exit benchmark. the profile looks like:

% cumulative self self total
time seconds seconds calls us/call us/call name
8.07 0.41 0.41 2475675 0.17 0.17 pvtree_SPLAY
7.09 0.77 0.36 Xspllower
5.71 1.06 0.29 40032 7.24 7.24 memcpy
3.74 1.25 0.19 663043 0.29 0.56 pmap_enter
3.35 1.42 0.17 192230 0.88 7.16 uvm_fault
2.36 1.54 0.12 769850 0.16 0.16 lockmgr
2.17 1.65 0.11 8000 13.75 35.21 uvmspace_fork
1.77 1.74 0.09 64044 1.41 6.53 pmap_remove_ptes
1.57 1.82 0.08 2517396 0.03 0.03 uvm_rb_subtree_space
1.38 1.89 0.07 184333 0.38 6.68 trap
1.18 1.95 0.06 832428 0.07 0.07 pmap_extract
1.18 2.01 0.06 8003 7.50 8.09 sigactsinit
1.18 2.07 0.06 Xtrap0e

-Chuck

Jason Thorpe

2003-11-08 23:09:18 UTC

Permalink

Post by Chuck Silvers
after these changes (and david's changes to child-tracking), we have

Nice work, guys!

-- Jason R. Thorpe <***@wasabisystems.com>

David Laight

2003-11-09 17:37:31 UTC

Permalink

Post by Chuck Silvers
7.09 0.77 0.36 Xspllower

Any thoughts about finding out where this is called from?

I've done this in the past by setting the histogram count to ~0,
detecting that when incrementing it, and going away to search a table
to find the rules for finding the caller.

David

--
David Laight: ***@l8s.co.uk

Frank van der Linden

2003-11-09 17:50:09 UTC

Permalink

Post by David Laight

Post by Chuck Silvers
7.09 0.77 0.36 Xspllower

Any thoughts about finding out where this is called from?

That's tough to say. It's called from splx(), but only if there are pending
interrupts. Could be from any splx(). If you want to know, copy/paste
spllower (and splraise) from the inlines in x86/include/intr.h into
real functions; this will show where they're called from.

- Frank

--
Frank van der Linden ***@netbsd.org
===============================================================================
NetBSD. Free, Unix-like OS. > 45 different platforms. http://www.netbsd.org/

David Laight

2003-11-09 19:48:42 UTC

Permalink

Post by Frank van der Linden

Post by David Laight

Post by Chuck Silvers
7.09 0.77 0.36 Xspllower

Any thoughts about finding out where this is called from?

I know that, but typically splx gets charged with all the time spent
with interrupts disabled. So what you really need to do is attribute
the time back to the caller of splx().

If we do a deferred splxxx (ie defer an interrupt after it happens instead
of disabling it) maybe the profiler could save the original saved PC
but do the actual work when splx is finally called.

De-inlining things only makes it worse!
I've been known to inline stuff just to get better profiling - even though
code-bloat makes the whole system too slow.

David

--
David Laight: ***@l8s.co.uk