misc. MMU: NUMA, big pages, idle zero, ring buffers, PAE, ...

eddy+public+

2006-04-28 23:33:01 UTC

Greetings all,

Note: Some of these ramblings are ia32/aa64-focused, but the principles
are general.

While exploring PAE last November, I wound up browsing through uvm/pmap
code. I've had a few additional ideas, and would like some [more]
feedback.

/* Big Pages */

Begin by allocating memory stride 2M/4M (former iff PAE, latter iff
!PAE). Track wasted 4K [sub]pages. Split big pages into smaller ones
when needed, but avoid using page tables until then. Coalesce smaller
pages into bigger ones when free RAM permits.

Rationale: Hopefully less MMU management overhead and fewer TLB misses
while memory is plentiful. Fall back to standard behavior when needed.

/* Fractional/Checkpointed Zeroing of Big Pages */

I whipped up a crude program that performed 1000 bzero(3) iterations on
a 2M chunk. Each iteration took about 9 ms on a PIII/500 notebook.
Should the idle-zero loop zero a fraction of a big page? What about
dedicating a PDE slot (Intel terminology) to the zero code?

Rationale: Several milliseconds -- although certainly less than 9 ms
when on faster CPU and with optimized zeroing code -- is an eternity.

/* Per-CPU Management */

Both of the above, as well as free page lists, should be per-CPU. Can a
CPU be forced to work with the memory closest to it? (Consider NUMA
performance, such as multiprocessor Opteron systems.)

Rationale: Reduced inter-CPU contention. Assuming processes have
significant CPU affininty, using "nearby" memory would reduce reduce
both interconnect bandwidth use and memory access time.

/* Ring Buffers */

A native mapping for ring buffers would be nice:

u_char *ringbuf = mmapringbuf(..., MAP_RINGBUF, ...) ;

would allocate a memory region from <base> to <base + 2 * size>. i.e.,

base
base + size

would both be aliased to the same physical pages. Voila! Simple,
linear ringbuf where the MMU handles wraparound at the region's end.

Rationale: It's just so much easier this way. :-)

/* mremap() */

Zero-copy allocation-size changes are convenient.

Rationale: Obvious.

Eddy
--
Everquick Internet - http://www.everquick.net/
A division of Brotsman & Dreger, Inc. - http://www.brotsman.com/
Bandwidth, consulting, e-commerce, hosting, and network building
Phone: +1 785 865 5885 Lawrence and [inter]national
Phone: +1 316 794 8922 Wichita
________________________________________________________________________
DO NOT send mail to the following addresses:
***@brics.com -*- ***@intc.net -*- ***@everquick.net
Sending mail to spambait addresses is a great way to get blocked.
Ditto for broken OOO autoresponders and foolish AV software backscatter.