Discussion:
NetBSD 2 vs the rest with MySQL
Joel Macklow
2005-02-09 19:38:36 UTC
Permalink
Anyone seen this?

http://software.newsforge.com/article.pl?sid=04/12/27/1243207&from=rss

We do really well with the select-key test on a single CPU. Any
comments regarding the other tests from those who know? This is the kind
of info the advocacy group needs - as long as we look good in it ;)
Frank van der Linden
2005-02-09 19:48:16 UTC
Permalink
Post by Joel Macklow
We do really well with the select-key test on a single CPU. Any
comments regarding the other tests from those who know? This is the kind
of info the advocacy group needs - as long as we look good in it ;)
Hm.. is this a threaded test? It looks like it. If so, the lack of
PTHREAD_CONCURRENCY > 1 explains the crappy 2-CPU results.

- Frank
Jason Thorpe
2005-02-09 21:20:38 UTC
Permalink
Post by Frank van der Linden
Post by Joel Macklow
We do really well with the select-key test on a single CPU. Any
comments regarding the other tests from those who know? This is the kind
of info the advocacy group needs - as long as we look good in it ;)
Hm.. is this a threaded test? It looks like it. If so, the lack of
PTHREAD_CONCURRENCY > 1 explains the crappy 2-CPU results.
I would definitely suggest to the author of the article to increase
PTHREAD_CONCURRENCY for his tests and run them again.

-- thorpej
Frank van der Linden
2005-02-09 21:45:22 UTC
Permalink
Post by Jason Thorpe
Post by Frank van der Linden
Hm.. is this a threaded test? It looks like it. If so, the lack of
PTHREAD_CONCURRENCY > 1 explains the crappy 2-CPU results.
I would definitely suggest to the author of the article to increase
PTHREAD_CONCURRENCY for his tests and run them again.
Yeah, I dropped him a note. I'm curious how well it will perform
(if stable). I'm glad to see that we kick ass in the single-CPU
case.

- Frank
Hubert Feyrer
2005-02-09 21:46:30 UTC
Permalink
Post by Frank van der Linden
(if stable). I'm glad to see that we kick ass in the single-CPU
case.
Um, have you looked at the 10M Row test?
http://www.newsforge.com/blob.pl?id=82311a9e7896a841032c395f270d6a0f
Why so we not kick ass there at all?


- Hubert
--
NetBSD - Free AND Open! (And of course secure, portable, yadda yadda)
Frank van der Linden
2005-02-09 21:58:33 UTC
Permalink
Post by Hubert Feyrer
Um, have you looked at the 10M Row test?
http://www.newsforge.com/blob.pl?id=82311a9e7896a841032c395f270d6a0f
Why so we not kick ass there at all?
I was talking about the first test, and the 1- to 2-CPU comparison,
which is interesting to look at if PTHREAD_CONCURRENCY=2 should work,
since we do so well in the 1-CPU case.

The I/O problem is a whole different matter, I don't know what's going
on there.

- Frank
Andy Ruhl
2005-02-09 22:16:15 UTC
Permalink
Post by Frank van der Linden
The I/O problem is a whole different matter, I don't know what's going
on there.
Not sure if this is related...

I work on a database product and the forcedirectio option on Solaris
is very well known by us to improve performance.

The writeup sort of shows a parallel between Solaris 10 and NetBSD in
performance with the large row... That is until they set forcedirectio
on Solaris.

Maybe this is more specifically a filesystem problem, not a general I/O problem?

Might be interesting to see this done again with various different
parameters on the filesystem, or just a different filesystem.

Andy
Rui Paulo
2005-02-10 00:19:47 UTC
Permalink
Post by Frank van der Linden
The I/O problem is a whole different matter, I don't know what's going
on there.
Maybe NEW_BUFQ_STRATEGY helps ?

"The FFS2 file system was used for all of the NetBSD 2.0 partitions,
with soft updates enabled. I built two separate kernels, one for
single-CPU and one for dual-CPU. They were based on GENERIC and
included the process size increases I mentioned above."

The author doesn't mention if it was enabled or not.
--
Rui Paulo <***@netbsd-pt.org> http://www.netbsd-pt.org/users/rpaulo/
Andy Ruhl
2005-02-10 05:18:54 UTC
Permalink
On Thu, 10 Feb 2005 00:19:47 +0000 (UTC), Rui Paulo
Post by Rui Paulo
Post by Frank van der Linden
The I/O problem is a whole different matter, I don't know what's going
on there.
Maybe NEW_BUFQ_STRATEGY helps ?
"The FFS2 file system was used for all of the NetBSD 2.0 partitions,
with soft updates enabled. I built two separate kernels, one for
single-CPU and one for dual-CPU. They were based on GENERIC and
included the process size increases I mentioned above."
The author doesn't mention if it was enabled or not.
If I wanted to set up some simulation of the problem in a simple way,
is there any particular way I could do this? I want to test a few
things. If I can show that some small write is proportionally a lot
faster than a larger one, that would be all it would take.

Is it as easy as doing something like this:

dd if=/dev/zero of=somefile bs=1024k count=1

versus

dd if=/dev/zero of=somefile bs=1024k count=10

?

Not sure if block sizes have anything to do with this...

If someone gives me something to chew on, I can test this on 2.0 on
i386 or amd64, both on the same amd64 machine. It's only 1 processor
but I don't think that makes any difference here. Figured I'd try
various filesystem options, or even a different filesystem.

Andy
Jeff Rizzo
2005-02-09 22:10:42 UTC
Permalink
Post by Hubert Feyrer
Post by Frank van der Linden
(if stable). I'm glad to see that we kick ass in the single-CPU
case.
Um, have you looked at the 10M Row test?
http://www.newsforge.com/blob.pl?id=82311a9e7896a841032c395f270d6a0f
Why so we not kick ass there at all?
- Hubert
Well, I'm not really sure, since this is pretty far outside my area of
expertise, but since he claims the Solaris numbers on the 10M row test
were also horrid until turning off some FS cacheing, I wonder if tuning
the VM system might help some. I just sent the author a link to Arto
Selonen's VM tuning page
(http://www.selonen.org/arto/netbsd/vm_tune.html) in hopes that he might
be able to squeeze some better numbers out of it...

Of course, someone who knows better than I should feel free to let me
know if I'm full of it...

+j
Pavel Cahyna
2005-02-09 22:53:56 UTC
Permalink
Post by Jeff Rizzo
Post by Hubert Feyrer
Post by Frank van der Linden
(if stable). I'm glad to see that we kick ass in the single-CPU
case.
Um, have you looked at the 10M Row test?
http://www.newsforge.com/blob.pl?id=82311a9e7896a841032c395f270d6a0f
Why so we not kick ass there at all?
- Hubert
Well, I'm not really sure, since this is pretty far outside my area of
expertise, but since he claims the Solaris numbers on the 10M row test
were also horrid until turning off some FS cacheing, I wonder if tuning
the VM system might help some. I just sent the author a link to Arto
Selonen's VM tuning page
(http://www.selonen.org/arto/netbsd/vm_tune.html) in hopes that he might
be able to squeeze some better numbers out of it...
I don't think that's the reason. He explicitely says that NetBSD wasn't
swapping at all, so turning the caching off wouldn't help IMHO.

But, doesn't NetBSD need to read data to cache when they are rewritten,
since UBC was introduced? Could this explain that behaviour?

Bye Pavel
SODA Noriyuki
2005-02-10 23:49:50 UTC
Permalink
Post by Pavel Cahyna
Post by Hubert Feyrer
On Wed, 09 Feb 2005 23:53:56 +0100,
Um, have you looked at the 10M Row test?
http://www.newsforge.com/blob.pl?id=82311a9e7896a841032c395f270d6a0f
Why so we not kick ass there at all?
But, doesn't NetBSD need to read data to cache when they are rewritten,
since UBC was introduced? Could this explain that behaviour?
This sounds right, indeed.

Isn't anyone working to implement chuq's idea?

http://mail-index.netbsd.org/tech-kern/2003/05/27/0000.html
--
soda
Chuck Silvers
2005-02-15 17:34:56 UTC
Permalink
Post by SODA Noriyuki
Post by Pavel Cahyna
Post by Hubert Feyrer
On Wed, 09 Feb 2005 23:53:56 +0100,
Um, have you looked at the 10M Row test?
http://www.newsforge.com/blob.pl?id=82311a9e7896a841032c395f270d6a0f
Why so we not kick ass there at all?
But, doesn't NetBSD need to read data to cache when they are rewritten,
since UBC was introduced? Could this explain that behaviour?
This sounds right, indeed.
Isn't anyone working to implement chuq's idea?
http://mail-index.netbsd.org/tech-kern/2003/05/27/0000.html
well, the reason I haven't done it myself yet is that it'll be a fair bit
of work to implement. but it occurs to me that there's an easier way to
achieve most of the same effect: use UVM loaning instead of uiomove()
to transfer the data in write(). this would be a bit more restricted
than what I described earlier, since it would require that the address
of the memory being written to the file be page aligned (whereas there's
no restriction on the memory address in the previous scheme), but I bet
that many applications that do multi-page write()s will happen to be using
page-aligned memory addresses as well. I verified that this is the case
for mysql, at least in the default configuration.

the past few days I've been finishing my code for loaning-for-read()
that I've been infrequently fiddling with for probably a couple years now,
I'll post about it on tech-kern shortly. loaning for write() shouldn't
be much harder than for read().

I suspect that in addition to loaning-for-write() we would also need to
improve the flush-behind and page reuse policies some more, but we'll
find out if we go down that path.

(of course, what databases that do their own caching really want is
unbuffered I/O, but mysql appears to rely on the OS file cache since
the mysqld process is only 60MB on this machine with 3GB of RAM.
maybe that's just how it's configured by default, I don't know much
about mysql. unbuffered I/O would also be a fair bit of work to
implement, but I'd like to see that in netbsd eventually.)

-Chuck
SODA Noriyuki
2005-02-15 19:18:41 UTC
Permalink
Post by Chuck Silvers
but it occurs to me that there's an easier way to
achieve most of the same effect: use UVM loaning instead of uiomove()
to transfer the data in write().
but I bet
that many applications that do multi-page write()s will happen to be using
page-aligned memory addresses as well. I verified that this is the case
for mysql, at least in the default configuration.
the past few days I've been finishing my code for loaning-for-read()
that I've been infrequently fiddling with for probably a couple years now,
I'll post about it on tech-kern shortly. loaning for write() shouldn't
be much harder than for read().
Mmmm, great.
Post by Chuck Silvers
(of course, what databases that do their own caching really want is
unbuffered I/O, but mysql appears to rely on the OS file cache since
the mysqld process is only 60MB on this machine with 3GB of RAM.
maybe that's just how it's configured by default,
The MySQL configuration that the author of the report used is
described in previous article:
http://software.newsforge.com/software/04/12/27/1238216.shtml?tid=72&tid=29

i.e.
innodb_buffer_pool_size=256M
innodb_log_file_size=128M
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=1

Since the machine only has 512MB RAM, more than half of the memory
(256MB+8MB) are allocated to the database cache.
Post by Chuck Silvers
unbuffered I/O would also be a fair bit of work to
implement, but I'd like to see that in netbsd eventually.)
Yeah, direct I/O must be better for database applications.
--
soda
Chuck Silvers
2005-02-16 02:32:10 UTC
Permalink
Post by SODA Noriyuki
Post by Chuck Silvers
(of course, what databases that do their own caching really want is
unbuffered I/O, but mysql appears to rely on the OS file cache since
the mysqld process is only 60MB on this machine with 3GB of RAM.
maybe that's just how it's configured by default,
The MySQL configuration that the author of the report used is
http://software.newsforge.com/software/04/12/27/1238216.shtml?tid=72&tid=29
i.e.
innodb_buffer_pool_size=256M
innodb_log_file_size=128M
innodb_log_buffer_size=8M
innodb_flush_log_at_trx_commit=1
Since the machine only has 512MB RAM, more than half of the memory
(256MB+8MB) are allocated to the database cache.
hmm, I had applied those setting without paying attention to what they were.
even with those, the mysqld process never got above 60-odd MB in size.
it's not using any sysv shared memory, so where is the cached data going?
I'm sure I'm just being dense here. :-)

-Chuck
Geert Hendrickx
2005-04-03 08:51:47 UTC
Permalink
Post by Frank van der Linden
Post by Jason Thorpe
Post by Frank van der Linden
Hm.. is this a threaded test? It looks like it. If so, the lack of
PTHREAD_CONCURRENCY > 1 explains the crappy 2-CPU results.
I would definitely suggest to the author of the article to increase
PTHREAD_CONCURRENCY for his tests and run them again.
Yeah, I dropped him a note. I'm curious how well it will perform
(if stable). I'm glad to see that we kick ass in the single-CPU
case.
Have these benchmarks be re-run yet? The author promised to do so, and
post the results, but I haven't heard anything about it anymore. Which
is a pity.

GH
--
:wq
Chuck Silvers
2005-04-10 15:22:35 UTC
Permalink
Post by Geert Hendrickx
Post by Frank van der Linden
Post by Jason Thorpe
Post by Frank van der Linden
Hm.. is this a threaded test? It looks like it. If so, the lack of
PTHREAD_CONCURRENCY > 1 explains the crappy 2-CPU results.
I would definitely suggest to the author of the article to increase
PTHREAD_CONCURRENCY for his tests and run them again.
Yeah, I dropped him a note. I'm curious how well it will perform
(if stable). I'm glad to see that we kick ass in the single-CPU
case.
Have these benchmarks be re-run yet? The author promised to do so, and
post the results, but I haven't heard anything about it anymore. Which
is a pity.
tony has posted an addendum to his benchmark articles on his blog:

http://vegan.net/tony/blog/index.php?Type=Article&ArticleID=2

-Chuck
James Chacon
2005-02-10 07:28:49 UTC
Permalink
Post by Jason Thorpe
Post by Frank van der Linden
Post by Joel Macklow
We do really well with the select-key test on a single CPU. Any
comments regarding the other tests from those who know? This is the kind
of info the advocacy group needs - as long as we look good in it ;)
Hm.. is this a threaded test? It looks like it. If so, the lack of
PTHREAD_CONCURRENCY > 1 explains the crappy 2-CPU results.
I would definitely suggest to the author of the article to increase
PTHREAD_CONCURRENCY for his tests and run them again.
Is there any reason the current system doesn't auto adapt to # cpus for
default concurrency? Seems very non-intuitive to have to do this manually..

James
Mike M. Volokhov
2005-02-10 08:15:45 UTC
Permalink
On Wed, 9 Feb 2005 20:48:16 +0100
Post by Frank van der Linden
Post by Joel Macklow
We do really well with the select-key test on a single CPU. Any
comments regarding the other tests from those who know? This is the kind
of info the advocacy group needs - as long as we look good in it ;)
Hm.. is this a threaded test? It looks like it. If so, the lack of
PTHREAD_CONCURRENCY > 1 explains the crappy 2-CPU results.
Just wondering, why not set PTHREAD_CONCURRENCY to number of CPUs by
default?

--
Mishka.
Hubert Feyrer
2005-02-10 12:21:59 UTC
Permalink
Post by Mike M. Volokhov
Just wondering, why not set PTHREAD_CONCURRENCY to number of CPUs by
default?
IIRC there were (are?) problems with that.
But as they won't be found with the current default, that's useless, and
AFAIK that default was changed yesterda (in -current, no idea if it will
be pulled into 2.1).


- Hubert
--
NetBSD - Free AND Open! (And of course secure, portable, yadda yadda)
Jeff Rizzo
2005-02-09 20:21:02 UTC
Permalink
Post by Joel Macklow
Anyone seen this?
http://software.newsforge.com/article.pl?sid=04/12/27/1243207&from=rss
We do really well with the select-key test on a single CPU. Any
comments regarding the other tests from those who know? This is the
kind of info the advocacy group needs - as long as we look good in it ;)
I wonder if the crappy performance on the 10M row test would be affected
by the tuning of vm.* sysctls, especially since Solaris problems on the
same test were mitigated by forcedirectio...

Hm.

+j
Dmitri Nikulin
2005-04-10 17:28:45 UTC
Permalink
I think he invalidated himself when he said:
"The other kernel only included the i686 definition, making it specific
to Pentium III processors or higher."

I can only assume that this is the kind of thing Hubert was talking
about when his blog mentioned "Aaah, the joy of people who have no idea
of what they're writing about still being bold enough to write about
things they have no idea about."

NetBSD's performance is anomalous, since I have never found anywhere any
evidence that it is measurably slower than Linux: in fact on some of my
machines it runs circles around it. MySQL is a horrible benchmark
platform since it is highly Linux-centric. It's not even good software.
Postgres could have given a more balanced result.

I threw www/apache2 + lang/php5 + databases/mysql4-server +
databases/php-mysql onto a NetBSD 3.0-beta machine to run Invision Power
Boards (not my idea, just my server). The machine also does a thorough
pf and IPSec setup, and is an NFS server. It happens to be a P3 1Ghz
laptop. Everyone agrees it's blazing fast. Whatever "poor performance"
NetBSD has in this kind of setup must be restricted to their tests. Out
of curiosity: did the investigation into increasing (quadrupling..)
performance for the (IIRC) 10M Rows benchmark actually include testing
Linux on the same machine and seeing if the performance difference was
just as the public benchmark showed? I have a lot of trouble believing
the findings without evidence from the other side. But I admit my time
spent browsing the mailing list archives isn't that extensive.

Worse shame is that benchmarks like this are often taken as gospel and
used to judge the entire merits of an operating system. People don't
seem to care about quality, security, stability, or indeed anything you
can't measure with a few lines of someone's language of choice. I have
heard people say "Why would I use BSD for a server? Linux 2.6 is O(1)
everywhere!", even though the kinds of loads the would deal with would
never emerge into the zone where Linux' O(1) becomes faster than an O(n)
with a lower starting point (which is often the case). So not only are
they losing micro performane (and macro: has anyone here actually tried
RUNNING a real server on Linux? Nightmare) but they are throwing away
quality assurance and security. Their loss, but it's hard to deny that
public opinion also sways donors /and/ developers, possibly deducting
from the foundation's available resources. A little bad PR is all it takes.

It's good that on the other hand that NetBSD is gaining recognition as a
competitor for FreeBSD's vaunted performance titles, but the real
corporate- and government-recognition giant is Linux. You know what I
heard on Slashdot? That some people are condemning NetBSD for moving to
replace Linux, for instance, implementing PAM (because, you know, only
Linux has PAM, and there are no real uses for it. Right?), having a
corporate-friendly logo, new numbering scheme, and so on. I only hope
potential future developers don't see it the same way. Even if it is so,
good, it's high time Linux' most popular advantage (basically, that it's
well known) got some real competition too.

Wow, long rant.

-Dmitri Nikulin
Hubert Feyrer
2005-04-10 21:10:12 UTC
Permalink
Post by Dmitri Nikulin
"The other kernel only included the i686 definition, making it specific
to Pentium III processors or higher."
I can only assume that this is the kind of thing Hubert was talking
about when his blog mentioned "Aaah, the joy of people who have no idea
of what they're writing about still being bold enough to write about
things they have no idea about."
I'd appreciate it if you would not cite me completely out of context to
support your flames. I recommend you go back and read the article about
which I wrote what you cite.

FWIW, I think Tony definitely knows a fair bit about what he's writing,
and maybe you should start think about how NetBSD could make things easier
to do what he really wanted (add cpu-specific compiler flags).
Post by Dmitri Nikulin
Wow, long rant.
Indeed. If you want to start changing things for the better
-> http://www.netbsd.org/contrib/howto.html (patches preferred :)


- Hubert
--
NetBSD - Free AND Open! (And of course secure, portable, yadda yadda)
Dmitri Nikulin
2005-04-11 04:16:49 UTC
Permalink
Post by Hubert Feyrer
Post by Dmitri Nikulin
"The other kernel only included the i686 definition, making it specific
to Pentium III processors or higher."
I can only assume that this is the kind of thing Hubert was talking
about when his blog mentioned "Aaah, the joy of people who have no idea
of what they're writing about still being bold enough to write about
things they have no idea about."
I'd appreciate it if you would not cite me completely out of context
to support your flames. I recommend you go back and read the article
about which I wrote what you cite.
Well sorry, and I actually did re-read the article just in case, but
figured if you could say that about someone making a 'hearing error'
(only thing I can call it: converting 'cryptographic' to 'cryptic
graphic' isn't really a typo), me saying the same thing about someone
not knowing processor base instructions isn't a far stretch.
Post by Hubert Feyrer
FWIW, I think Tony definitely knows a fair bit about what he's
writing, and maybe you should start think about how NetBSD could make
things easier to do what he really wanted (add cpu-specific compiler
flags).
makeoptions COPTS is pretty easy, if he wants to do it just for a
specific kernel compile. But evidently that kind of micro-optimization
didn't make a big enough difference for him. The only x86 processor that
has impressed me with how much it benefits from optimisation is the
Pentium M, and I doubt that's what he was using. That's just my personal
experience so I'm not making a big sweeping claim...
Post by Hubert Feyrer
Indeed. If you want to start changing things for the better
-> http://www.netbsd.org/contrib/howto.html (patches preferred :)
I wouldn't know the first (useful) thing about kernel performance,
definitely nothing that hasn't been implemented before better than I
could. I'm dominantly a user-land programmer. The only kernel patch I
have ever submitted was a compile error fix for FreeBSD 5-current
shortly before 5.3. The best thing I could think of myself doing for
NetBSD right now is writing an alternative ftp-proxy built directly on
kqueue and independent of inetd, for those who want a maximally
efficient ftp-proxy for a dedicated gateway. Most would say the small
benefits aren't worth the cvs import, though, even if it's a feature
addition to the existing ftp-proxy. But if you think otherwise I could
take a look at it anyway.

But what can I say? Maybe it's time to start looking at how Linux does
things. It appears to be an industry leader in micro performance. It's
not like Linux hasn't copied the BSDs earlier in history. And if it can
be done without sacrificing cleanliness, correctness and stability (and
of course portability), it won't interfere with NetBSD's goals. But we
might not get far just poking through everything in the kernel and
sysctl until we find something that only might be part of the problem
(but then crashes when someone else tries it?).
Post by Hubert Feyrer
- Hubert
-Dmitri

Loading...