Discussion:
WAPL/RAIDframe performance problems
Edgar Fuß
2012-11-10 16:46:36 UTC
Permalink
So apart from the WAPL panic (which I'm currently unable to reproduce), I seem
to be facing two problems:

1. A certain svn update command is ridicously slow on my to-be file server.
2. During the svn update, the machine partially locks up and fails to respond
to NFS requests.

There is little I feel I can analyze (2). I posted a bunch of crash(8) traces,
but that doesn't seem to help.

There seem to be two routes I can persue analyzing (1):
A. Given the svn update is fast on a single-disc setup and slow on the file
server having a RAID, larger blocks and what else, find what the significant
difference between the two setups is.
B. Given the only unusual thing that svn update command does is creating a
bunch of .lock files, a lot of stat()'s and then unlinking the .lock files,
find some simple commands to enable others to reproduce the problem.

Regarding (B), a simple command somewhat mimicking the troublesome svn update
seems to be (after mkdir'ing the 3000 dirs)
time sh -c 'for i in $(seq 1 3000); do touch $i/x; done; for i in $(seq 1 3000); do rm $i/x; done'

Regarding (A), there seems to be no single difference explaining the whole
performance degradation, so I tried to test intermediate steps.

We start with a single SATA disc on a recent 5.1 system.
With WAPL, the svn update takes 5s to 7s (depending on the FFS version and the
fsbsize) while the touch/rm dance takes 4s.
Disabling WAPL makes the svn update take 5s (i.e. better or no worse than with
WAPL enabled), while the touch/rm slows down to almost 14s.
Enabling soft updates, the svn update finishes in 1,25s, the touch/rm in 4s.
Write speed (dd) on the file system is 95MB/s.
- So the initial data point is 5s for svn and 4s for the substitute for a
95MB/s file system write throughput.
- We also note that softdep outperforms WAPL by a factor of 4 for the svn
command and plain FFS performs no worse that WAPL.

We now move to a plain mpt(4) 7200rpm SAS disc (HUS723030ALS640, if anyone
cares) on the 6.0 system.
Without WAPL, the svn update takes (on different FFS versions and fsbsizes)
5s to 7s. The touch/rm takes 9,5 to 19s.
With WAPL, svn takes 9s to 13s and touch/rm 8 to 9,5s.
No softdeps on 6.0 to try.
Write speed to fs is 85MB/s.
So we have:
- without WAPL, both "the real thing" and the substitute are roughly as fast
as on the SATA system (which has slightly higher fs write throughput).
- with WAPL, both commands are significantly slower that on the SATA box.

Now to a two-component Level 1 RAID on two of these discs. We chose an SpSU
value of 32 and a matching fsbsize of 16k.
The svn update takes 13s with WAPL and just under 6s without.
The touch..rm test takes 22s with WAPL and 19s without.
Write speed is at 18MB/s, read at 80MB/s
So on the RAID 1:
- Without WAPL, things are roughly as fast as on the plain disc.
- With WAPL, both svn and the test are slower than without (with the real thing
worse than the substitute)!
- Read speed is as expected, while writing is four times slower than I would
expect given the optimal (for writing) fsbsize equals stripe size relation.

Next a five-component Level 5 RAID. Again, an SpSU of 8 matches the fsbsize
of 16k.
Here, the update takes 56s with WAPL and just 31 without.
The touch..rm test takes 1:50 with WAPL and 0:56 without.
Write speed on the fs is 25MB/s, read speed 190MB/s
So on the RAID 5:
- Both the "real thing" and the substitute are significantly (about a factor of
five) slower than on the RAID 1 although the RAID's stripe size matches the
file system block size and we should have no RMW cycles.
- OTOH, write and read speeds are faster than on RAID 1; still, writing is
much, much slower than reading (again, with an SpSU otimized for writing).
- Enabling WAPL _slows down_ things by a factor of two.

Simultaneously quadrupling both SpSU and fsbsize (to 32 and 64k) doesn't change
much on that.

But last, on a Level 5 RAID with 128SpSU and 64k fsbsize (i.e., one file system
block per stripe unit, not per stripe):
The svn update takes 36s without WAPL and just 1,7s with WAPL, but seven
seconds later, the discs are 100% busy for another 33s. So, in fact, it takes
38s until the operation really finishes.


Now, that's a lot of data (in fact, about one week of analysis).
Can anyone make sense out of it? Especially:
- Why is writing to the RAID so slow even with no RMWs?
- Why does WAPL slow down things?


If it wasn't for softdep-related panics i suffered on the old (active, 4.0)
file server no-one was able to fix, I would simply cry I wanted my softdeps
back. As it is, I need help.
Brian Buhrow
2012-11-10 18:29:44 UTC
Permalink
hello. Your tests and data have inspired me to think about and play
with different spsu sizes under raidframe because I too am experiencing
some possible performance issues with the mpt(4) driver. My testing has
lead to some questions, and, possibly in your case Edgar, some avenues that
you can pursue. It may be that I'm behind you in this process and that
you've already thought of some of the issues I lay out below. I don't have
all the answers, but perhaps these ramblings will help move you forward.

1. I've discovered that the raid cards the mpt(4) driver controls are
complicated beasts with very large firmwares inside them which do a lot of
magic. I think it's possible, even if you have the raid card in jbod mode,
that you could be writing with sub-optimal block sizes to the raid card
itself, as opposed to raidframe at the OS level. For example, I have two
production boxes with large amounts of disk on raidframe. One attaches the
SATA disks to 4-port Promise cards controlled by the pdcsata(4) driver. The
other attaches the disks to 3Ware Escalade cards controlled by the twa(4)
driver. The Promise cards give roughly twice the throughput for the
same work load and configuration than do the 3Ware cards. Is it
possible for you to set up a test with raid5 using a different sata or sas
interface card?
Also, I have a machine I'm qualifying right now using the mpt(4) with
the dual-port LSI Fusion 1030 card. I've found that for high work loads
this card just stops generating interrupts for no apparent reason. I
wonder if you're suffering from a similar, but not quite so fatal, kind of
issue with your raid controller? (just to reinforce how complex these cards
really are.)

2. As I was playing with different spsu sizes, it occurred to me that I
still might not be getting optimal performance through raidframe because
while the spsu size was right, the sector boundaries which defined the
stripe might not be the same as the blocks being read from and written to
on the filesystem itself. In other words, when a stripe size and
filesystem block size are calculated, I think it's important to count up
exactly where the active blocks that will be in use during the life of the
filesystem will land in your raid set and adjust the partition sizes
accordingly. For example, /usr/include/ufs/ffs/fs.h suggests that the
super block could bein one of 4 different places on your partition,
depending on what size your disk is, and what version of superblock you're
using. what I think this means is that if you want to arrange for
optimal performance of a filesystem on a raid5 set, you need to ensure that
the superblock for that filesystem starts on the beginning of a
stripe for the raid5 set. If you're using a BSD disklabel and an ffsV1
superblock, the superblock starts 8192 bytes into the partition. If it's
an ffsV2 superblock, then it could be at 65536 bytes from the beginning of
the partition. There is even a possibility the superblock could be at 256
kbytes from the beginning of a partition, though I'm not certain I know
when that's true. I want to reiterate that I'm no expert here, but unless
I'm not understanding things at all, I think this means that starting the
first data partition at sector 0 of the raid set is almost certain to
guarantee that you're going to get rmw write cycles when writing to a filesystem
placed in this manner on a raid set evenif the spsu is correct. This is why,
I think, you're not seeing any appreciable improvement in write speed on
your raid 5 set and why raid5 is so much slower than raid1. (raid5 will
always be slower than raid1, but how much slower is, I think, what's at
issue here.)
I've not had a chance to play with all my ideas here in my testing,
so I can't say for sure if my theory makes a huge difference in practice, but
it's something I plan to try and it may be worth investigating in your
case, especially since it severely impacts your work efficiency. And, as I
said earlier, perhaps you've thought of all this and have made the necessary
adjustments to your partition tables and I'm preaching to the choir.


I hope this helps you some, and gives you some paths forward. I kno
how frustrating it can be to try and figure out why something isn't working
as expected.

-Brian
On Nov 10, 5:46pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: WAPL/RAIDframe performance problems
} So apart from the WAPL panic (which I'm currently unable to reproduce), I seem
} to be facing two problems:
}
} 1. A certain svn update command is ridicously slow on my to-be file server.
} 2. During the svn update, the machine partially locks up and fails to respond
} to NFS requests.
}
} There is little I feel I can analyze (2). I posted a bunch of crash(8) traces,
} but that doesn't seem to help.
}
} There seem to be two routes I can persue analyzing (1):
} A. Given the svn update is fast on a single-disc setup and slow on the file
} server having a RAID, larger blocks and what else, find what the significant
} difference between the two setups is.
} B. Given the only unusual thing that svn update command does is creating a
} bunch of .lock files, a lot of stat()'s and then unlinking the .lock files,
} find some simple commands to enable others to reproduce the problem.
}
} Regarding (B), a simple command somewhat mimicking the troublesome svn update
} seems to be (after mkdir'ing the 3000 dirs)
} time sh -c 'for i in $(seq 1 3000); do touch $i/x; done; for i in $(seq 1 3000); do rm $i/x; done'
}
} Regarding (A), there seems to be no single difference explaining the whole
} performance degradation, so I tried to test intermediate steps.
}
} We start with a single SATA disc on a recent 5.1 system.
} With WAPL, the svn update takes 5s to 7s (depending on the FFS version and the
} fsbsize) while the touch/rm dance takes 4s.
} Disabling WAPL makes the svn update take 5s (i.e. better or no worse than with
} WAPL enabled), while the touch/rm slows down to almost 14s.
} Enabling soft updates, the svn update finishes in 1,25s, the touch/rm in 4s.
} Write speed (dd) on the file system is 95MB/s.
} - So the initial data point is 5s for svn and 4s for the substitute for a
} 95MB/s file system write throughput.
} - We also note that softdep outperforms WAPL by a factor of 4 for the svn
} command and plain FFS performs no worse that WAPL.
}
} We now move to a plain mpt(4) 7200rpm SAS disc (HUS723030ALS640, if anyone
} cares) on the 6.0 system.
} Without WAPL, the svn update takes (on different FFS versions and fsbsizes)
} 5s to 7s. The touch/rm takes 9,5 to 19s.
} With WAPL, svn takes 9s to 13s and touch/rm 8 to 9,5s.
} No softdeps on 6.0 to try.
} Write speed to fs is 85MB/s.
} So we have:
} - without WAPL, both "the real thing" and the substitute are roughly as fast
} as on the SATA system (which has slightly higher fs write throughput).
} - with WAPL, both commands are significantly slower that on the SATA box.
}
} Now to a two-component Level 1 RAID on two of these discs. We chose an SpSU
} value of 32 and a matching fsbsize of 16k.
} The svn update takes 13s with WAPL and just under 6s without.
} The touch..rm test takes 22s with WAPL and 19s without.
} Write speed is at 18MB/s, read at 80MB/s
} So on the RAID 1:
} - Without WAPL, things are roughly as fast as on the plain disc.
} - With WAPL, both svn and the test are slower than without (with the real thing
} worse than the substitute)!
} - Read speed is as expected, while writing is four times slower than I would
} expect given the optimal (for writing) fsbsize equals stripe size relation.
}
} Next a five-component Level 5 RAID. Again, an SpSU of 8 matches the fsbsize
} of 16k.
} Here, the update takes 56s with WAPL and just 31 without.
} The touch..rm test takes 1:50 with WAPL and 0:56 without.
} Write speed on the fs is 25MB/s, read speed 190MB/s
} So on the RAID 5:
} - Both the "real thing" and the substitute are significantly (about a factor of
} five) slower than on the RAID 1 although the RAID's stripe size matches the
} file system block size and we should have no RMW cycles.
} - OTOH, write and read speeds are faster than on RAID 1; still, writing is
} much, much slower than reading (again, with an SpSU otimized for writing).
} - Enabling WAPL _slows down_ things by a factor of two.
}
} Simultaneously quadrupling both SpSU and fsbsize (to 32 and 64k) doesn't change
} much on that.
}
} But last, on a Level 5 RAID with 128SpSU and 64k fsbsize (i.e., one file system
} block per stripe unit, not per stripe):
} The svn update takes 36s without WAPL and just 1,7s with WAPL, but seven
} seconds later, the discs are 100% busy for another 33s. So, in fact, it takes
} 38s until the operation really finishes.
}
}
} Now, that's a lot of data (in fact, about one week of analysis).
} Can anyone make sense out of it? Especially:
} - Why is writing to the RAID so slow even with no RMWs?
} - Why does WAPL slow down things?
}
}
} If it wasn't for softdep-related panics i suffered on the old (active, 4.0)
} file server no-one was able to fix, I would simply cry I wanted my softdeps
} back. As it is, I need help.
-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=
David Laight
2012-11-10 22:47:18 UTC
Permalink
Post by Brian Buhrow
For example, /usr/include/ufs/ffs/fs.h suggests that the
super block could bein one of 4 different places on your partition,
depending on what size your disk is, and what version of superblock you're
using.
From my memory of the ffs disk layout, fs block/sector numbers start from
the beginning of the partition and just avoid allocating the area
containing the subperblock copies.
So the position of the superblocks (one exists in each cylinder group)
is rather irrelevant.

What is more likely to cause grief is 512 byte writes - since modern
disks have 4k physical sectors.
I think netbsd tends to do single sector writes for directory entries
and the journal - these will be somewhat suboptimal!

David
--
David Laight: ***@l8s.co.uk
Mouse
2012-11-10 23:33:44 UTC
Permalink
Post by Brian Buhrow
For example, /usr/include/ufs/ffs/fs.h suggests that the super block
could bein one of 4 different places on your partition, depending on
what size your disk is, and what version of superblock you're using.
From my memory of the ffs disk layout, fs block/sector numbers start
from the beginning of the partition and just avoid allocating the
area containing the subperblock copies.
Yes.
Post by Brian Buhrow
So the position of the superblocks (one exists in each cylinder
group) is rather irrelevant.
Semi-. You want your blocks (and frags, if applicable) to be stripe-
and sector-aligned. However, the superblock is block-aligned, at least
in FFS versions I've worked with, so if it's aligned properly then all
blocks will be too.
Post by Brian Buhrow
What is more likely to cause grief is 512 byte writes - since modern
disks have 4k physical sectors.
Indeed. And some of them lie about it, too. :(

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Edgar Fuß
2012-11-12 11:21:12 UTC
Permalink
Post by David Laight
What is more likely to cause grief is 512 byte writes - since modern
disks have 4k physical sectors.
My impression is that my discs do have 512 byte sectors. At least the SCSI
documentation says they can be formatted with 512, 520 or 528 byte sectors.

Even if they did, in fact, have 4k sectors, everything would be aligned in my
set-up, wouldn't it?
Post by David Laight
I think netbsd tends to do single sector writes for directory entries
and the journal - these will be somewhat suboptimal!
Oops?
I thought that anything to a file system would be written in units of the
file system's block size as the metadata cache was addressed in file system
blocks. Can someone please enlighten me?
Edgar Fuß
2012-11-12 11:10:57 UTC
Permalink
My testing has lead to some questions
Could you post some figures? For instance, what's the file system throughput
to the raw disc versus a two-component Level 1 RAID (with stripe size equals
file system block size)?
I think it's possible, even if you have the raid card in jbod mode,
I don't know of any other modes the card (3081E-R) has.
Is it possible for you to set up a test with raid5 using a different sata
or sas interface card?
Difficult. I don't have any other machine with SAS (except one identical to
the to-be file server minus the data discs). I could try a desktop machine
with two or three SATA drives attached to the mainboard.
I've found that for high work loads this card just stops generating
interrupts for no apparent reason.
I wonder if you're suffering from a similar, but not quite so fatal, kind of
issue with your raid controller?
How can I check that?
while the spsu size was right, the sector boundaries which defined the
stripe might not be the same as the blocks being read from and written to
on the filesystem itself.
I seriously hope that's not true.
I think if the partition is aligned, the file system blocks are aligned.
Can some file system expert please comment on this?
raid5 will always be slower than raid1
Why? If the stripe size equals the fs block size, I don't see why that should
be the case.
I kno[w] how frustrating it can be to try and figure out why something isn't
working as expected.
Yes. Of course, that's my job, but it gets frustrating after weeks without
clues of improvement.
Edgar Fuß
2012-11-13 20:23:15 UTC
Permalink
Is it possible for you to set up a test with raid5 using a different sata
or sas interface card?
I tried a Level 1 RAID on two SATA discs (on the same hardware that I did the
plain-SATA test on).

Throughput to the fs is 83MB/s write, 101MB/s read.
svn update takes 5,6s without WAPBL and 7s with WAPBL.
The touch..rm test takes 15,1s without WAPL and 13,6s with WAPBL.
Compared to plain disc on the same hardware:
- FS throughput is roughly as on a plain disc, as expected.
- svn update is slowed down only with WAPBL enabled.
- touch..rm performs roughly equivalent (to plain disc) without WAPBL,
but takes three times as long as on a plain disc with WAPBL enabled.
Compared to the mpt(4) SAS Level 1 RAID:
- FS write throughput is much faster as on mpt
- svn update with WAPBL is twice as fast as on mpt (no difference to mpt
when WAPBL is disabled)

I still don't understand why writing to a RAID 1 in stripe-sized chunks
could be noticebly slower than writing to the component device (as it is on
my mpt(4) SAS drives).
Thor Lancelot Simon
2012-11-13 21:12:37 UTC
Permalink
Post by Edgar Fuß
- FS write throughput is much faster as on mpt
- svn update with WAPBL is twice as fast as on mpt (no difference to mpt
when WAPBL is disabled)
Is this one of the mpt cards with built-in RAID1, or are you using RAIDframe
for the RAID1?

Thor
Edgar Fuß
2012-11-13 21:18:12 UTC
Permalink
Post by Thor Lancelot Simon
Is this one of the mpt cards with built-in RAID1,
I don't know. Anyway, I'm not using it.
Post by Thor Lancelot Simon
or are you using RAIDframe for the RAID1?
Yes, I'm using RAIDframe.
Martin S. Weber
2012-11-11 16:25:24 UTC
Permalink
On Sat, Nov 10, 2012 at 05:46:36PM +0100, Edgar Fuß wrote:
(...)
Post by Edgar Fuß
2. During the svn update, the machine partially locks up and fails to respond
to NFS requests.
You need no raid for that. Use two different partitions. We used to have this
sucking. Then we had alternative disk prio schedulers. They solved the issue
somewhat. then came WAPL, and we're back to great performance on a single
partition, but sucky performance when writing to partition 1 while reading
from partition 2. Double the pain when the written to partition is a DOS one.

wd0 at atabus1 drive 0
wd0: <ST750LX003-1AC154>
wd0: drive supports 16-sector PIO transfers, LBA48 addressing
wd0: 698 GB, 1453521 cyl, 16 head, 63 sec, 512 bytes/sect x 1465149168 sectors
wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133)
wd0(ahcisata0:0:0): using PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133) (using DMA)


/dev/wd0a on / type ffs (log, local)
/dev/wd0g on /var type ffs (log, local)
/dev/wd0f on /usr type ffs (log, local)
/dev/wd0h on /home type ffs (log, local)
/dev/wd0i on /home/exchange type msdos (local)
/dev/wd0j on /space type ffs (log, local)

but simple copying an audacity project (i.e., gigs worth of data in 1MB chunks)
from my /home to my /space renders firefox unusable (i.e., it will likely only
start reacting again once the copy and/or move operation is finished across the
partitions/mountpoints).

In my absolutely non-empiric, subjective perception the best time
was when bufq-priocscan was no longer deemed experimental, performance
has decreased massively since then. uname doesn't matter. Things got
worse gradually ever since I ran 5.1/i386 (when it came out) until now,
where I'm on netbsd-6/amd64.

Regards,
-Martin
Edgar Fuß
2012-11-19 11:39:57 UTC
Permalink
Post by Edgar Fuß
Regarding (A), there seems to be no single difference explaining the whole
performance degradation, so I tried to test intermediate steps.
Was anyone able to confirm this or obtain better performance on a different
SAS controller?
It looks like my only option is trying a an MPT-II controller. Any other
suggestions?
Edgar Fuß
2012-11-28 16:15:08 UTC
Permalink
Post by Edgar Fuß
1. A certain svn update command is ridicously slow on my to-be file server.
2. During the svn update, the machine partially locks up and fails to respond
to NFS requests.
Thanks to very kind help by hannken@, I now at least know what the problem is.

Short form: WAPBL is currently completely unusable on RAIDframe (I always
suspected something like that), at least on non-Level 0 sets.

The problem turned out to be wapbl_flush() writing non-fsbsize chunks on non-
fsbsize boundaries. So RAIDframe is nearly sure to RMW.
That makes the log being written to disc at about 1MB/s with the write lock
on the log being held. So everything else on that fs tstiles on the log's
read lock.

Anyone in a position to improve that? I could simply turn off logging, but then
any non-clean shutdown is sure to take ages.
Greg Troxel
2012-11-28 17:02:46 UTC
Permalink
Post by Edgar Fuß
Post by Edgar Fuß
1. A certain svn update command is ridicously slow on my to-be file server.
2. During the svn update, the machine partially locks up and fails to respond
to NFS requests.
Short form: WAPBL is currently completely unusable on RAIDframe (I always
suspected something like that), at least on non-Level 0 sets.
The problem turned out to be wapbl_flush() writing non-fsbsize chunks on non-
fsbsize boundaries. So RAIDframe is nearly sure to RMW.
That makes the log being written to disc at about 1MB/s with the write lock
on the log being held. So everything else on that fs tstiles on the log's
read lock.
Do you see this on RAID-1 too?

I wonder if it's possible (easily) to make the log only use fsbize
boundaries, (maybe forcing it to be bigger as a side effect.)
Edgar Fuß
2012-11-28 17:07:15 UTC
Permalink
Post by Greg Troxel
Do you see this on RAID-1 too?
Well, I see a performance degradation, albeit not as much as on Level 5.
Post by Greg Troxel
I wonder if it's possible (easily) to make the log only use fsbize
boundaries, (maybe forcing it to be bigger as a side effect.)
Volunteers welcome.
J. Hannken-Illjes
2012-11-28 17:41:28 UTC
Permalink
Post by Greg Troxel
Post by Edgar Fuß
Post by Edgar Fuß
1. A certain svn update command is ridicously slow on my to-be file server.
2. During the svn update, the machine partially locks up and fails to respond
to NFS requests.
Short form: WAPBL is currently completely unusable on RAIDframe (I always
suspected something like that), at least on non-Level 0 sets.
The problem turned out to be wapbl_flush() writing non-fsbsize chunks on non-
fsbsize boundaries. So RAIDframe is nearly sure to RMW.
That makes the log being written to disc at about 1MB/s with the write lock
on the log being held. So everything else on that fs tstiles on the log's
read lock.
Do you see this on RAID-1 too?
I wonder if it's possible (easily) to make the log only use fsbize
boundaries, (maybe forcing it to be bigger as a side effect.)
Sure -- add fsbsize sized buffer to struct wapbl and teach wapbl_write()
to collect data until the buffers start or end touches a fsbsize boundary.

As long as the writes don't cross the logs end they already come ordered.

--
J. Hannken-Illjes - ***@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Thor Lancelot Simon
2012-11-28 20:52:25 UTC
Permalink
Post by J. Hannken-Illjes
Post by Greg Troxel
Do you see this on RAID-1 too?
I wonder if it's possible (easily) to make the log only use fsbize
boundaries, (maybe forcing it to be bigger as a side effect.)
Sure -- add fsbsize sized buffer to struct wapbl and teach wapbl_write()
to collect data until the buffers start or end touches a fsbsize boundary.
It is worth looking at the extensive work they did on this in XFS.
Brian Buhrow
2012-11-28 20:20:59 UTC
Permalink
Hello. If running 5.1 or 5.2 is acceptable for you, you could run
ffs+softdep since it has all the namei fixes in it.
-Brian

`i
On Nov 28, 5:15pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: Problem identified: WAPL/RAIDframe performance problems
} > I seem to be facing two problems:
} >
} > 1. A certain svn update command is ridicously slow on my to-be file server.
} > 2. During the svn update, the machine partially locks up and fails to respond
} > to NFS requests.
} Thanks to very kind help by hannken@, I now at least know what the problem is.
}
} Short form: WAPBL is currently completely unusable on RAIDframe (I always
} suspected something like that), at least on non-Level 0 sets.
}
} The problem turned out to be wapbl_flush() writing non-fsbsize chunks on non-
} fsbsize boundaries. So RAIDframe is nearly sure to RMW.
} That makes the log being written to disc at about 1MB/s with the write lock
} on the log being held. So everything else on that fs tstiles on the log's
} read lock.
}
} Anyone in a position to improve that? I could simply turn off logging, but then
} any non-clean shutdown is sure to take ages.
-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=
J. Hannken-Illjes
2012-11-28 21:14:58 UTC
Permalink
Post by Brian Buhrow
Hello. If running 5.1 or 5.2 is acceptable for you, you could run
ffs+softdep since it has all the namei fixes in it.
I suppose running fsck on a 6 TByte file system will take hours and
softdep needs this after a crash.

--
J. Hannken-Illjes - ***@eis.cs.tu-bs.de - TU Braunschweig (Germany)
Mouse
2012-11-28 21:24:17 UTC
Permalink
Post by J. Hannken-Illjes
I suppose running fsck on a 6 TByte file system will take hours
Based on my own experience with a 7T filesystem, I would suggest you
try it rather than masking assumptions.

Depending on your use case, you may be able to speed fsck up
dramatically by choosing the parameters for your filesystem suitably.
I find that fsck on a filesystem built with -f 8192 -b 65536 -n 1, for
example, is a great deal faster than on a filesystem built on the same
amount of disk space with the defaults. (I have a few filesystems for
which that combination of parameters is appropriate: a small number of
large files with little churn.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
X Against HTML ***@rodents-montreal.org
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B
Brian Buhrow
2012-11-28 21:29:32 UTC
Permalink
Hello. Well, to each his own, but for comparison, I have a system
running 5.1 withthe the latest namei changes with a 13TB filesystem which,
if fsck needs to run, takes less than an hour to complete. I've found 5.1
to be very stable, and so haven't had to worry about the penalty of running
fsck after a crash very often. I've found raidframe to be invaluable in my
installations, and to have WAPBL be broken in 6.x in conjunction with
raidframe seems like a pretty big deturrent for me.
Manuel Bouyer
2012-11-28 21:34:12 UTC
Permalink
Post by J. Hannken-Illjes
Post by Brian Buhrow
Hello. If running 5.1 or 5.2 is acceptable for you, you could run
ffs+softdep since it has all the namei fixes in it.
I suppose running fsck on a 6 TByte file system will take hours and
softdep needs this after a crash.
Well, the journal doesn't always avoids the fsck, it depends on the king
of the crash (if it's a panic in filesystem code I know I want to run
fsck anyway :)

Also, the fsck time depends a lot of the filesystems parameters.
A 9Tb filesystem formatted -O2 -b 32k -f4k -i1000000 can be checked
in less than one hour.
--
Manuel Bouyer <***@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
Loading...