outsmarted - disk transfer rates

Discussion:

Tyler Retzlaff

2006-02-19 23:16:19 UTC

I've been trying to understand more about disk performance on
NetBSD. So recently I began to perform naive tests
using dd but the results have me confused perhaps someone can explain.

The details of hardware used is at the tail of this Email.

My first test involved reading from raw writing to raw disk. So a
simple dd if=/dev/rwd0d of=/dev/rwd1d bs=? where
I tried block sizes ranging from 32k -> 512k. On average all bs >=
64 yielded about 46MB/s reported both by
iostat (both disks) and the output of dd when exited. I was later
told that the >=64k probably had to do with the
maxphys and this coincides with iostat's output.

Prior to this I had also done some read tests on the same disk, again
a simple dd if=/dev/rwd0d of=/dev/null bs=?
for that I saw rates of around 64MB/s. So I guess I can say I know
my rwd0 can be read from at this rate.

After doing these tests with just the raw devices I decided to do
some raw -> ffs file so I created and newfs'd a partition
with a bsize of 32k (/dev/wd1a). I mounted this partition with no
options (so just mount /dev/wd1a /wd1a) at /wd1a and
decided to try a raw -> file dd expecting it to be slower than my
first test, surprisingly it was not and so my confusion.
A dd if=/dev/rwd0d of=/wd1a/dumpfile bs=? where again I tried varying
block sizes got me a rate of about ~64MB/s
reported by (for both disks) which faster than the raw -> raw transfer.

So what is outsmarting my test? Is it that in the ffs case the
copied data isn't actually being sync()'d? Or am I being
defeated by caching somehow? It's nice that it appears to write as
fast as I can read it but if I do large amounts of
sequential copies (i.e. constantly running for days) I want to know
what kind of bottlenecks are going to develop.

Any guru's have some advice?

wd0 at atabus1 drive 0: <ST3250823AS>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x 488397168
sectors
wd0: 32-bit data port
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(piixide1:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133)
(using DMA)

wd1 at atabus2 drive 0: <ST3250823AS>
wd1: drive supports 16-sector PIO transfers, LBA48 addressing
wd1: 232 GB, 484521 cyl, 16 head, 63 sec, 512 bytes/sect x 488397168
sectors
wd1: 32-bit data port
wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd1(piixide1:1:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133)
(using DMA)

atabus1 at piixide1 channel 0
atabus2 at piixide1 channel 1

piixide1 at pci0 dev 31 function 2
piixide1: Intel 82801EB Serial ATA Controller (rev. 0x02)
piixide1: bus-master DMA support present
piixide1: primary channel configured to native-PCI mode
piixide1: using ioapic0 pin 18 (irq 5) for native-PCI interrupt

Thanks

Tyler

Alfred Perlstein

2006-02-20 03:11:42 UTC

Permalink

Post by Tyler Retzlaff
I've been trying to understand more about disk performance on
NetBSD. So recently I began to perform naive tests
using dd but the results have me confused perhaps someone can explain.

...

Post by Tyler Retzlaff
So what is outsmarting my test? Is it that in the ffs case the
copied data isn't actually being sync()'d? Or am I being
defeated by caching somehow? It's nice that it appears to write as
fast as I can read it but if I do large amounts of
sequential copies (i.e. constantly running for days) I want to know
what kind of bottlenecks are going to develop.
Any guru's have some advice?

FFS will allow async writes for the most part issuing write behind.

This will be faster than being stalled attemping to write to the
raw parition.

--
- Alfred Perlstein
- CTO Okcupid.com / FreeBSD Hacker / All that jazz -

Jason Thorpe

2006-02-20 21:46:40 UTC

Permalink

Well, one thing that puts the dd(1) test at a disadvantage is that
the I/O requests are issued in lock-step with the latency of the
returning to userspace so dd(1) can issue another request. In the
kernel, I/O requests can be queued up and fired off asynchronously,
which allows for lower latency between I/O requests (in the case of a
"single-threaded" disk) or optimized completion of the requests by
the disk (as in the case of a disk with tagged command queueing).

-- thorpej