Discussion:
rbtree for vm_map
Noriyuki Soda
2003-10-29 15:52:29 UTC
Permalink
I measured the mmap benchmarks in http://bulk.fefe.de/scalability/
with the vm map rbtree patch ported from OpenBSD by yamt.

As you suppose, the rbtree patch made the results much better as follows:
the mmap benchmark results:
http://www.sra.co.jp/people/soda/scalability/mmap.gif
http://www.sra.co.jp/people/soda/scalability/mmap.zoom.gif
http://www.sra.co.jp/people/soda/scalability/mmaptouch.gif

mmap()ing many small files:
http://www.sra.co.jp/people/soda/scalability/open.gif
http://www.sra.co.jp/people/soda/scalability/manymap-mmap.gif
http://www.sra.co.jp/people/soda/scalability/manymap-mmap.zoom.gif
http://www.sra.co.jp/people/soda/scalability/manymap-touch.gif

Hardware:
Celeron 300MHz
Software:
NetBSD-splay : NetBSD-1.6ZE on Oct 26, 2003
NetBSD-splay+rbtree : NetBSD-1.6ZE on Oct 26, 2003 + rbtree patch
Linux-2.6.0-test8 : RedHat9 w/ linux-2.6.0-test8 kernel

But as shown in the following mail by yamt,
http://mail-index.netbsd.org/tech-kern/2003/08/26/0000.html
The rbtree one is worse than stock NetBSD in the /dev/zero mapping
benchmark as follows:
http://www.sra.co.jp/people/soda/scalability/devzero.gif
This is because vm map entries for /dev/zero are merged into one entry
in this case.
So, the result on stock NetBSD may become worse even with /dev/zero
mapping, if there are some reasons which make the entries unable to be
merged, e.g. if there are gaps or other type mappings between the
/dev/zero mappings.
The following benchmark shows the fact:
http://www.sra.co.jp/people/soda/scalability/devzeromod.gif
This is a result with a patch attached at the end of this mail.

I also tried the following case to be sure:
http://www.sra.co.jp/people/soda/scalability/anon.gif
... use MAP_ANON|MAP_PRIVATE mapping
http://www.sra.co.jp/people/soda/scalability/anonmod.gif
... use MAP_ANON|MAP_PRIVATE mapping
with similar patch at the end of this mail.
This shows same behavior with the /dev/zero benchmark.

The followings are the results of other benchmarks in the yamt's mail:
http://www.sra.co.jp/people/soda/scalability/file.gif
http://www.sra.co.jp/people/soda/scalability/walk.gif

For people who don't think that the mmap benchmark doesn't concern
with real applications, the following is a result of benchmark which
does many pthread_create() and pthreads_join():
http://www.sra.co.jp/people/soda/scalability/pthread.gif
As you see in this result, the rbtree patch helps a lot with
the case that a program creates more than 1,000 threads simultaneously.
Perhaps you think that's nonsense, but there are people who use
even 10,000 threads simultaneously. See the following web page,
for example:
http://www.kegel.com/c10k.html

BTW, the fork benchmark modified by Niels becomes slightly worse as
follows with the rbtree patch:
http://www.sra.co.jp/people/soda/scalability/forkbench-niels.gif
Because of the cost to maintain rbtree?
--
soda

Index: walk.c
===================================================================
--- walk.c Mon Aug 25 08:06:43 2003
+++ walk.c Wed Oct 29 09:48:12 2003
@@ -22,7 +22,6 @@
void *vp;
int i, j;
size_t mapsz = PAGE_SIZE * PPM;
- int prot = PROT_READ;
int nmap = NMAP;
int fd;
int nloop = NLOOP;
@@ -55,7 +54,9 @@
void *map[nmap];

for (i = 0; i < nmap; i++) {
- vp = mmap(NULL, mapsz, prot, MAP_FILE|MAP_SHARED, fd, 0);
+ vp = mmap(NULL, mapsz,
+ (i & 1) ? PROT_READ : PROT_READ|PROT_WRITE,
+ MAP_FILE|MAP_SHARED, fd, 0);
if (vp == MAP_FAILED)
err(1, "mmap");
map[i] = vp;
Bang Jun-Young
2003-10-30 04:09:02 UTC
Permalink
On Thu, Oct 30, 2003 at 12:52:29AM +0900, Noriyuki Soda wrote:
> I measured the mmap benchmarks in http://bulk.fefe.de/scalability/
> with the vm map rbtree patch ported from OpenBSD by yamt.
>
> As you suppose, the rbtree patch made the results much better as follows:
> the mmap benchmark results:
> http://www.sra.co.jp/people/soda/scalability/mmap.gif
> http://www.sra.co.jp/people/soda/scalability/mmap.zoom.gif
> http://www.sra.co.jp/people/soda/scalability/mmaptouch.gif
>
> mmap()ing many small files:
> http://www.sra.co.jp/people/soda/scalability/open.gif
> http://www.sra.co.jp/people/soda/scalability/manymap-mmap.gif
> http://www.sra.co.jp/people/soda/scalability/manymap-mmap.zoom.gif
> http://www.sra.co.jp/people/soda/scalability/manymap-touch.gif
>
> Hardware:
> Celeron 300MHz
> Software:
> NetBSD-splay : NetBSD-1.6ZE on Oct 26, 2003
> NetBSD-splay+rbtree : NetBSD-1.6ZE on Oct 26, 2003 + rbtree patch
> Linux-2.6.0-test8 : RedHat9 w/ linux-2.6.0-test8 kernel
>
> But as shown in the following mail by yamt,
> http://mail-index.netbsd.org/tech-kern/2003/08/26/0000.html
> The rbtree one is worse than stock NetBSD in the /dev/zero mapping
> benchmark as follows:
> http://www.sra.co.jp/people/soda/scalability/devzero.gif
> This is because vm map entries for /dev/zero are merged into one entry
> in this case.
> So, the result on stock NetBSD may become worse even with /dev/zero
> mapping, if there are some reasons which make the entries unable to be
> merged, e.g. if there are gaps or other type mappings between the
> /dev/zero mappings.
> The following benchmark shows the fact:
> http://www.sra.co.jp/people/soda/scalability/devzeromod.gif
> This is a result with a patch attached at the end of this mail.
>
> I also tried the following case to be sure:
> http://www.sra.co.jp/people/soda/scalability/anon.gif
> ... use MAP_ANON|MAP_PRIVATE mapping
> http://www.sra.co.jp/people/soda/scalability/anonmod.gif
> ... use MAP_ANON|MAP_PRIVATE mapping
> with similar patch at the end of this mail.
> This shows same behavior with the /dev/zero benchmark.
>
> The followings are the results of other benchmarks in the yamt's mail:
> http://www.sra.co.jp/people/soda/scalability/file.gif
> http://www.sra.co.jp/people/soda/scalability/walk.gif
>
> For people who don't think that the mmap benchmark doesn't concern
> with real applications, the following is a result of benchmark which
> does many pthread_create() and pthreads_join():
> http://www.sra.co.jp/people/soda/scalability/pthread.gif
> As you see in this result, the rbtree patch helps a lot with
> the case that a program creates more than 1,000 threads simultaneously.
> Perhaps you think that's nonsense, but there are people who use
> even 10,000 threads simultaneously. See the following web page,
> for example:
> http://www.kegel.com/c10k.html
>
> BTW, the fork benchmark modified by Niels becomes slightly worse as
> follows with the rbtree patch:
> http://www.sra.co.jp/people/soda/scalability/forkbench-niels.gif
> Because of the cost to maintain rbtree?

These results are really great. Let's get rbtree in the tree first
and fix /dev/zero problem later. :-)

Jun-Young

--
Bang Jun-Young <***@NetBSD.org>
Chris Gilbert
2003-10-31 10:10:34 UTC
Permalink
On Thu, 30 Oct 2003 00:52:29 +0900 (JST)
Noriyuki Soda <***@sra.co.jp> wrote:

> I measured the mmap benchmarks in http://bulk.fefe.de/scalability/
> with the vm map rbtree patch ported from OpenBSD by yamt.
>
> As you suppose, the rbtree patch made the results much better as
> follows:
> the mmap benchmark results:
> http://www.sra.co.jp/people/soda/scalability/mmap.gif
> http://www.sra.co.jp/people/soda/scalability/mmap.zoom.gif
> http://www.sra.co.jp/people/soda/scalability/mmaptouch.gif
>
> mmap()ing many small files:
> http://www.sra.co.jp/people/soda/scalability/open.gif

Have you had a chance to look into why the open results? It looks quite
bad, or are the very frequent spikes just single values? Would the
fdalloc help here?

Chris
Noriyuki Soda
2003-11-03 13:53:54 UTC
Permalink
>>>>> On Fri, 31 Oct 2003 10:10:34 +0000,
Chris Gilbert <***@dokein.co.uk> said:

>> I measured the mmap benchmarks in http://bulk.fefe.de/scalability/
>> with the vm map rbtree patch ported from OpenBSD by yamt.
:
>> mmap()ing many small files:
>> http://www.sra.co.jp/people/soda/scalability/open.gif

> Have you had a chance to look into why the open results? It looks quite
> bad,

Perhaps you are misunderstanding that the rbtree patch makes the
result bad? Acutally, the rbtree patch makes the result much better.

The open.gif graph is drawn by the following order:
Linux-2.6.6-test8, NetBSD-splay, NetBSD-splay+rbtree

The following one is the graph which is drawn by opposite order:
http://www.sra.co.jp/people/soda/scalability/open.2.gif

In other words, The reason why the NetBSD-splay+rbtree looks bad is
because it's drawn over the NetBSD-splay graph.
And as you see, the rbtree patch makes non-spike case much better.

Average height of the spike cases is followings:
NetBSD-splay: 3.4ms
NetBSD-splay+rbtree: 3.0ms

> or are the very frequent spikes just single values?
> Would the fdalloc help here?

The spikes happen exactly once per 100 openings on NetBSD with or
without the rbtree patch. Because the average height of the spike is
3 microseconds, I guess NetBSD actually accesses harddisk to read
metadata in the spike cases.
I don't know why Linux doesn't show the frequent spikes. Perhaps
because of larger namei cache?
--
soda
c***@dokein.co.uk
2003-11-03 15:19:31 UTC
Permalink
>>>>>> On Fri, 31 Oct 2003 10:10:34 +0000,
> Chris Gilbert <***@dokein.co.uk> said:
>
>>> I measured the mmap benchmarks in http://bulk.fefe.de/scalability/
>>> with the vm map rbtree patch ported from OpenBSD by yamt.
> :
>>> mmap()ing many small files:
>>> http://www.sra.co.jp/people/soda/scalability/open.gif
>
>> Have you had a chance to look into why the open results? It looks quite
>> bad,
>
> Perhaps you are misunderstanding that the rbtree patch makes the
> result bad? Acutally, the rbtree patch makes the result much better.
>
> The open.gif graph is drawn by the following order:
> Linux-2.6.6-test8, NetBSD-splay, NetBSD-splay+rbtree
>
> The following one is the graph which is drawn by opposite order:
> http://www.sra.co.jp/people/soda/scalability/open.2.gif
>
> In other words, The reason why the NetBSD-splay+rbtree looks bad is
> because it's drawn over the NetBSD-splay graph.
> And as you see, the rbtree patch makes non-spike case much better.

Ah, right, that makes more sense. Sorry, it just made it look much worse
being the last drawn. I didn't realise that they both suffered from the
same issue.

Thanks for clarifing this,
Chris
c***@dokein.co.uk
2003-11-03 15:19:53 UTC
Permalink
>>>>>> On Fri, 31 Oct 2003 10:10:34 +0000,
> Chris Gilbert <***@dokein.co.uk> said:
>
>>> I measured the mmap benchmarks in http://bulk.fefe.de/scalability/
>>> with the vm map rbtree patch ported from OpenBSD by yamt.
> :
>>> mmap()ing many small files:
>>> http://www.sra.co.jp/people/soda/scalability/open.gif
>
>> Have you had a chance to look into why the open results? It looks quite
>> bad,
>
> Perhaps you are misunderstanding that the rbtree patch makes the
> result bad? Acutally, the rbtree patch makes the result much better.
>
> The open.gif graph is drawn by the following order:
> Linux-2.6.6-test8, NetBSD-splay, NetBSD-splay+rbtree
>
> The following one is the graph which is drawn by opposite order:
> http://www.sra.co.jp/people/soda/scalability/open.2.gif
>
> In other words, The reason why the NetBSD-splay+rbtree looks bad is
> because it's drawn over the NetBSD-splay graph.
> And as you see, the rbtree patch makes non-spike case much better.

Ah, right, that makes more sense. Sorry, it just made it look much worse
being the last drawn. I didn't realise that they both suffered from the
same issue.

Thanks for clarifing this,
Chris
Loading...