--- vinum.mm	2002/05/05 02:34:27	4.1
+++ vinum.mm	2003/06/24 05:37:32
@@ -1,90 +1,81 @@
 .\" This file is in -*- nroff-fill -*- mode
-.\" STATUS: draft 4th edition
-.\" $Id: vinum.mm,v 4.1 2002/05/05 02:34:27 grog Exp $
+.\" STATUS: 4th edition
+.\" $Id: vinum.mm,v 4.19 2003/04/09 19:56:42 grog Exp grog $
 .\"
 .Chapter \*[nchvinum] "The Vinum Volume Manager"
+.X "vinum"
+.X "volume manager"
 .Pn vinum
-No matter what disks you have, there will always be limitations:
+\fIVinum\fP\/ is a \fIVolume Manager\fP, a virtual disk driver that addresses
+these three issues:
 .Ls B
 .LI
-They can be too small.
+Disks can be too small.
 .LI
-They can be too slow.
+Disks can be too slow.
 .LI
-They can be too unreliable. 
+Disks can be too unreliable.
 .Le
-.X "Vinum"
-.X "Volume Manager"
-\fIVinum\fP\| is a so-called \fIVolume Manager\fP, a virtual disk driver that
-addresses these three problems.  Let's look at them in more detail.  Various
-solutions to these problems have been proposed and implemented:
-.H3 "Disks are too small"
-Disks are getting bigger, but so are data storage requirements.  Often you'll
-find you want a file system that is bigger than the disks you have available.
-Admittedly, this problem is not as acute as it was ten years ago, but it still
-exists.  Some systems have solved this by creating an abstract device which
-stores its data on a number of disks.
-.H3 "Access bottlenecks"
-Modern systems frequently need to access data in a highly concurrent manner.
-For example, large FTP or HTTP servers can maintain thousands of concurrent
-sessions and have multiple 100 Mbit/s connections to the outside world, well
-beyond the sustained transfer rate of most disks.
-.P
-Current disk drives can transfer data sequentially at up to 30 MB/s, but this
-value is of little importance in an environment where many independent processes
-access a drive, where they may achieve only a fraction of these values.  In such
-cases it's more interesting to view the problem from the viewpoint of the disk
-subsystem: the important parameter is the load that a transfer places on the
-subsystem, in other words the time for which a transfer occupies the drives
-involved in the transfer.
-.P
-In any disk transfer, the drive must first position the heads, wait for the
-first sector to pass under the read head, and then perform the transfer.  These
-actions can be considered to be atomic: it doesn't make any sense to interrupt
-them.
-.P
-.Pn latency
-Consider a typical transfer of about 10 kB: the current generation of
-high-performance disks can position the heads in an average of 6 ms.  The
-fastest drives spin at 10,000 rpm, so the average rotational latency (half a
-revolution) is 3 ms.  At 30 MB/s, the transfer itself takes about 350 s, almost
-nothing compared to the positioning time.  In such a case, the effective
-transfer rate drops to a little over 1 MB/s and is clearly highly dependent on
-the transfer size.
-.P
-The traditional and obvious solution to this bottleneck is ``more spindles'':
-rather than using one large disk, it uses several smaller disks with the same
-aggregate storage space.  Each disk is capable of positioning and transferring
-independently, so the effective throughput increases by a factor close to the
-number of disks used.
-.P
-The exact throughput improvement is, of course, smaller than the number of disks
-involved: although each drive is capable of transferring in parallel, there is
-no way to ensure that the requests are evenly distributed across the drives.
-Inevitably the load on one drive will be higher than on another.
-.P
-.X "concatenation, Vinum"
-.X "Vinum, concatenation"
-The evenness of the load on the disks is strongly dependent on the way the data
-is shared across the drives.  In the following discussion, it's convenient to
-think of the disk storage as a large number of data sectors which are
-addressable by number, rather like the pages in a book.  The most obvious method
-is to divide the virtual disk into groups of consecutive sectors the size of the
-individual physical disks and store them in this manner, rather like taking a
-large book and tearing it into smaller sections.  This method is called
-\fIconcatenation\fP\| and has the advantage that the disks do not need to have
-any specific size relationships.  It works well when the access to the virtual
-disk is spread evenly about its address space.  When access is concentrated on a
-smaller area, the improvement is less marked.  Figure \*[concat] illustrates the
+From a user viewpoint, Vinum looks almost exactly the same as a disk, but in
+addition to the disks there is a maintenance program.
+.H2 "Vinum objects"
+Vinum implements a four-level hierarchy of objects:
+.Ls B
+.LI
+.X "volume, vinum"
+.X "vinum, volume"
+The most visible object is the virtual disk, called a \fIvolume\fP.  Volumes
+have essentially the same properties as a UNIX disk drive, though there are some
+minor differences.  They have no size limitations.
+.LI
+.X "plex, vinum"
+.X "vinum, plex"
+Volumes are composed of \fIplexes\fP, each of which represents the total address
+space of a volume.  This level in the hierarchy thus provides redundancy.  Think
+of plexes as individual disks in a mirrored array, each containing the same
+data.
+.LI
+.X "drive, vinum"
+.X "vinum, drive"
+.X "subdisk, vinum"
+.X "vinum, subdisk"
+Vinum exists within the UNIX disk storage framework, so it would be possible
+to use UNIX partitions as the building block for multi-disk plexes, but in fact
+this turns out to be too inflexible: UNIX disks can have only a limited number
+of partitions.  Instead, Vinum subdivides a single UNIX partition (the
+\fIdrive\fP\/) into contiguous areas called \fIsubdisks\fP, which it uses as
+building blocks for plexes.
+.LI
+Subdisks reside on Vinum \fIdrives\fP, currently UNIX partitions.  Vinum drives
+can contain any number of subdisks.  With the exception of a small area at the
+beginning of the drive, which is used for storing configuration and state
+information, the entire drive is available for data storage.
+.Le
+Plexes can include multiple subdisks spread over all drives in the Vinum
+configuration, so the size of an individual drive does not limit the size of a
+plex, and thus of a volume.
+.H3 "Mapping disk space to plexes"
+.X "concatenation, vinum"
+.X "vinum, concatenation"
+.X "JBOD"
+The way the data is shared across the drives has a strong influence on
+performance.  It's convenient to think of the disk storage as a large number of
+data sectors that are addressable by number, rather like the pages in a book.
+The most obvious method is to divide the virtual disk into groups of consecutive
+sectors the size of the individual physical disks and store them in this manner,
+rather like the way a large encyclopaedia is divided into a number of volumes.
+This method is called \fIconcatenation\fP, and sometimes \fIJBOD\fP\/ (\fIJust a
+Bunch Of Disks\fP\/).  It works well when the access to the virtual disk is
+spread evenly about its address space.  When access is concentrated on a smaller
+area, the improvement is less marked.  Figure \*[concat] illustrates the
 sequence in which storage units are allocated in a concatenated organization.
 .PS
-h = .3i
+boxht = .2i
 dh = .02i
-dw = .8i
+boxwid = .8i
 down
 [
         [
-                boxht = h; boxwid = dw
 .\" 
 .\"     ORIG:   box invis "\f(CW0\fP"
 .\"             box invis "\f(CW1\fP"
@@ -93,33 +84,33 @@
 .\"             box invis "\f(CW4\fP"
 .\"             box invis "\f(CW5\fP"
 
-.\"       A:    box dotted at ORIG.e+(.4,0) ht h "\f(CW0\fP"
-move right 1i; down
-          A:    box dotted ht h "\f(CW0\fP"
-          B:    box dotted ht h "\f(CW1\fP"
-          C:    box dotted ht h "\f(CW2\fP"
-          D:    box dotted ht h "\f(CW3\fP"
-          E:    box dotted ht h "\f(CW4\fP"
-          F:    box dotted ht h "\f(CW5\fP"
-                box ht h * 6 at C.s
-
-          A1:   box dotted at A+(dw*1.6,0) ht h "\f(CW6\fP"
-          B1:    box dotted ht h "\f(CW7\fP"
-          C1:    box dotted ht h "\f(CW8\fP"
-          D1:    box dotted ht h "\f(CW9\fP"
-                box ht h * 4 at C1.n
-
-          A2:   box dotted at A1+(dw*1.6,0) "\f(CW10\fP"
-          F2:    box dotted ht h "\f(CW11\fP"
-                box ht h * 2 at A2.s
-
-          A3:   box dotted at A2+(dw*1.6,0) ht h "\f(CW12\fP"
-          B3:    box dotted ht h "\f(CW13\fP"
-          C3:    box dotted ht h "\f(CW14\fP"
-          D3:    box dotted ht h "\f(CW15\fP"
-          E3:    box dotted ht h "\f(CW16\fP"
-          F3:    box dotted ht h "\f(CW17\fP"
-                box ht h * 6 at C3.s
+.\"       A:    box dotted at ORIG.e+(.4,0) "\f(CW0\fP"
+move right .5i; down
+          A:    box dotted "\f(CW0\fP"
+          B:    box dotted "\f(CW1\fP"
+          C:    box dotted "\f(CW2\fP"
+          D:    box dotted "\f(CW3\fP"
+          E:    box dotted "\f(CW4\fP"
+          F:    box dotted "\f(CW5\fP"
+                box ht boxht * 6 at C.s
+
+          A1:   box dotted at A+(boxwid*1.6,0) "\f(CW6\fP"
+          B1:    box dotted "\f(CW7\fP"
+          C1:    box dotted "\f(CW8\fP"
+          D1:    box dotted "\f(CW9\fP"
+                box ht boxht * 4 at C1.n
+
+          A2:   box dotted at A1+(boxwid*1.6,0) "\f(CW10\fP"
+          F2:    box dotted "\f(CW11\fP"
+                box ht boxht * 2 at A2.s
+
+          A3:   box dotted at A2+(boxwid*1.6,0) "\f(CW12\fP"
+          B3:    box dotted "\f(CW13\fP"
+          C3:    box dotted "\f(CW14\fP"
+          D3:    box dotted "\f(CW15\fP"
+          E3:    box dotted "\f(CW16\fP"
+          F3:    box dotted "\f(CW17\fP"
+                box ht boxht * 6 at C3.s
 
 .\" "Offset" at ORIG.n+(0,.2i)
 "Disk 1" at A.n+(0,.2i)
@@ -156,19 +147,21 @@
 .PE
 .ce
 .Figure-heading "Concatenated organization"
-.Tn concat
+.Fn concat
 .P
-.X "striping, Vinum"
-.X "Vinum, striping"
+.ps \n(PS
+.X "striping, vinum"
+.X "vinum, striping"
 An alternative mapping is to divide the address space into smaller, equal-sized
-components and store them sequentially on different devices.  For example, the
-first 256 sectors may be stored on the first disk, the next 256 sectors on the
-next disk and so on.  After filling the last disk, the process repeats until the
-disks are full.  This mapping is called \fIstriping\fP or RAID-0,\*F
+components, called \fIstripes\fP, and store them sequentially on different
+devices.  For example, the first stripe of 292 kB may be stored on the first
+disk, the next stripe on the next disk and so on.  After filling the last disk,
+the process repeats until the disks are full.  This mapping is called
+\fIstriping\fP or RAID-0,\*F
 .FS
 .X "RAID"
 .X "Redundant Array of Inexpensive Disks"
-\fIRAID\fP\| stands for \fIRedundant Array of Inexpensive Disks\fP\| and offers
+\fIRAID\fP\/ stands for \fIRedundant Array of Inexpensive Disks\fP\/ and offers
 various forms of fault tolerance.
 .FE
 though the latter term is somewhat misleading: it provides no redundancy.
@@ -178,13 +171,12 @@
 illustrates the sequence in which storage units are allocated in a striped
 organization.
 .PS
-h = .3i
+boxht = .2i
 dh = .02i
-dw = .8i
+boxwid = .8i
 down
 [
         [
-                boxht = h; boxwid = dw
 
 .\"     ORIG:   box invis "\f(CW0\fP"
 .\"             box invis "\f(CW1\fP"
@@ -193,35 +185,35 @@
 .\"             box invis "\f(CW4\fP"
 .\"             box invis "\f(CW5\fP"
 .\" 
-.\"           A:    box at ORIG.e+(.4,0) ht h "\f(CW0\fP"
-move right 1i; down
-          A:    box ht h "\f(CW0\fP"
-          B:    box ht h "\f(CW4\fP"
-          C:    box ht h "\f(CW8\fP"
-          D:    box ht h "\f(CW12\fP"
-          E:    box ht h "\f(CW16\fP"
-          F:    box ht h "\f(CW20\fP"
-
-          A1:   box at A+(dw*1.6,0) ht h "\f(CW1\fP"
-          B1:    box ht h "\f(CW5\fP"
-          C1:    box ht h "\f(CW9\fP"
-          D1:    box ht h "\f(CW13\fP"
-          E1:    box ht h "\f(CW17\fP"
-          F1:    box ht h "\f(CW21\fP"
-
-          A2:   box at A1+(dw*1.6,0) ht h "\f(CW2\fP"
-          B2:    box ht h "\f(CW6\fP"
-          C2:    box ht h "\f(CW10\fP"
-          D2:    box ht h "\f(CW14\fP"
-          E2:    box ht h "\f(CW18\fP"
-          F2:    box ht h "\f(CW22\fP"
-
-          A3:   box at A2+(dw*1.6,0) ht h "\f(CW3\fP"
-          B3:    box ht h "\f(CW7\fP"
-          C3:    box ht h "\f(CW11\fP"
-          D3:    box ht h "\f(CW15\fP"
-          E3:    box ht h "\f(CW19\fP"
-          F3:    box ht h "\f(CW23\fP"
+.\"           A:    box at ORIG.e+(.4,0) "\f(CW0\fP"
+move right .5i; down
+          A:    box "\f(CW0\fP"
+          B:    box "\f(CW4\fP"
+          C:    box "\f(CW8\fP"
+          D:    box "\f(CW12\fP"
+          E:    box "\f(CW16\fP"
+          F:    box "\f(CW20\fP"
+
+          A1:   box at A+(boxwid*1.6,0) "\f(CW1\fP"
+          B1:    box "\f(CW5\fP"
+          C1:    box "\f(CW9\fP"
+          D1:    box "\f(CW13\fP"
+          E1:    box "\f(CW17\fP"
+          F1:    box "\f(CW21\fP"
+
+          A2:   box at A1+(boxwid*1.6,0) "\f(CW2\fP"
+          B2:    box "\f(CW6\fP"
+          C2:    box "\f(CW10\fP"
+          D2:    box "\f(CW14\fP"
+          E2:    box "\f(CW18\fP"
+          F2:    box "\f(CW22\fP"
+
+          A3:   box at A2+(boxwid*1.6,0) "\f(CW3\fP"
+          B3:    box "\f(CW7\fP"
+          C3:    box "\f(CW11\fP"
+          D3:    box "\f(CW15\fP"
+          E3:    box "\f(CW19\fP"
+          F3:    box "\f(CW23\fP"
 
 .\" "Offset" at ORIG.n+(0,.2i)
 "Disk 1" at A.n+(0,.2i)
@@ -261,54 +253,55 @@
 ]
 .PE
 .Figure-heading "Striped organization"
-.Tn striped
+.Fn striped
+.ps \n(PS
 .H3 "Data integrity"
-The final problem with current disks is that they are unreliable.  Although disk
-drive reliability has increased tremendously over the last few years, they are
-still the most likely core component of a server to fail.  When they do, the
-results can be catastrophic: replacing a failed disk drive and restoring data to
-it can take days.
-.P
-.X "mirroring, Vinum"
-.X "Vinum, mirroring"
-.X "RAID, level 1"
+.X "mirroring, vinum"
+.X "vinum, mirroring"
 .X "RAID-1"
-The traditional way to approach this problem has been \fImirroring\fP, keeping
-two copies of the data on different physical hardware.  Since the advent of the
-RAID levels, this technique has also been called \fIRAID level 1\fP\| or
-\fIRAID-1\fP.  Any write to the volume writes to both locations; a read can be
-satisfied from either, so if one drive fails, the data is still available on the
-other drive.
+Vinum offers two forms of redundant data storage aimed at surviving hardware
+failure: \fImirroring\fP, also known as RAID level 1,  and \fIparity\fP, also
+known as RAID levels 2 to 5.
 .P
-Mirroring has two problems:
+Mirroring maintains two or more copies of the data on different physical
+hardware.  Any write to the volume writes to both locations; a read can be
+satisfied from either, so if one drive fails, the data is still available on the
+other drive.  It has two problems:
 .Ls B
 .LI
 The price.  It requires twice as much disk storage as a non-redundant solution.
 .LI
 The performance impact.  Writes must be performed to both drives, so they take
 up twice the bandwidth of a non-mirrored volume.  Reads do not suffer from a
-performance penalty: it even looks as if they are faster.
+performance penalty: you only need to read from one of the disks, so in some
+cases, they can even be faster.
 .LE
-.ig
 .P
 .X "RAID-5"
-An alternative solution is \fIparity\fP, implemented in the RAID levels 2, 3, 4
-and 5.  Of these, RAID-5 is the most interesting.  As implemented in Vinum, it
-is a variant on a striped organization which dedicates one block of each stripe
-to parity of the other blocks: As implemented by Vinum, a \fIRAID-5\fP\| plex is
-similar to a striped plex, except that it implements RAID-5 by including a
-parity block in each stripe.  As required by RAID-5, the location of this parity
-block changes from one stripe to the next.  The numbers in the data blocks
-indicate the relative block numbers.
+.X "vinum, degraded mode"
+.X "degraded mode, vinum"
+The most interesting of the parity solutions is RAID level 5, usually called
+\fIRAID-5\fP.  The disk layout is similar to striped organization, except that
+one block in each stripe contains the parity of the remaining blocks.  The
+location of the parity block changes from one stripe to the next to balance the
+load on the drives.  If any one drive fails, the driver can reconstruct the data
+with the help of the parity information.  If one drive fails, the array
+continues to operate in \fIdegraded\fP\/ mode: a read from one of the remaining
+accessible drives continues normally, but a read request from the failed drive
+is satisfied by recalculating the contents from all the remaining drives.
+Writes simply ignore the dead drive.  When the drive is replaced, Vinum
+recalculates the contents and writes them back to the new drive.
+.P
+In the following figure, the numbers in the data blocks indicate the relative
+block numbers.
+.br
 .PS
-h = .3i
+boxht = .2i
 dh = .02i
-dw = .8i
+boxwid = .8i
 down
 [
         [
-                boxht = h; boxwid = dw
-
 .\"     ORIG:   box invis "\f(CW0\fP"
 .\"             box invis "\f(CW1\fP"
 .\"             box invis "\f(CW2\fP"
@@ -317,34 +310,34 @@
 .\"             box invis "\f(CW5\fP"
 .\" 
 .\"
-move right 1i; down
-          A:    box ht h "\f(CW0\fP"
-          B:    box ht h "\f(CW3\fP"
-          C:    box ht h "\f(CW6\fP"
-          D:    box ht h "Parity" filled 0.2
-          E:    box ht h "\f(CW12\fP"
-          F:    box ht h "\f(CW15\fP"
-
-          A1:   box at A+(dw*1.6,0) ht h "\f(CW1\fP"
-          B1:    box ht h "\f(CW4\fP"
-          C1:    box ht h "Parity" filled 0.2
-          D1:    box ht h "\f(CW9\fP"
-          E1:    box ht h "\f(CW13\fP"
-          F1:    box ht h "\f(CW16\fP"
-
-          A2:   box at A1+(dw*1.6,0) ht h "\f(CW2\fP"
-          B2:    box ht h "Parity" filled 0.2
-          C2:    box ht h "\f(CW7\fP"
-          D2:    box ht h "\f(CW10\fP"
-          E2:    box ht h "\f(CW14\fP"
-          F2:    box ht h "Parity" filled 0.2
-
-          A3:   box at A2+(dw*1.6,0) ht h "Parity" filled 0.2
-          B3:    box ht h "\f(CW5\fP"
-          C3:    box ht h "\f(CW8\fP"
-          D3:    box ht h "\f(CW11\fP"
-          E3:    box ht h "Parity" filled 0.2
-          F3:    box ht h "\f(CW17\fP"
+move right .5i; down
+          A:    box "\f(CW0\fP"
+          B:    box "\f(CW3\fP"
+          C:    box "\f(CW6\fP"
+          D:    box "Parity" filled 0.2
+          E:    box "\f(CW12\fP"
+          F:    box "\f(CW15\fP"
+
+          A1:   box at A+(boxwid*1.6,0) "\f(CW1\fP"
+          B1:    box "\f(CW4\fP"
+          C1:    box "Parity" filled 0.2
+          D1:    box "\f(CW9\fP"
+          E1:    box "\f(CW13\fP"
+          F1:    box "\f(CW16\fP"
+
+          A2:   box at A1+(boxwid*1.6,0) "\f(CW2\fP"
+          B2:    box "Parity" filled 0.2
+          C2:    box "\f(CW7\fP"
+          D2:    box "\f(CW10\fP"
+          E2:    box "\f(CW14\fP"
+          F2:    box "Parity" filled 0.2
+
+          A3:   box at A2+(boxwid*1.6,0) "Parity" filled 0.2
+          B3:    box "\f(CW5\fP"
+          C3:    box "\f(CW8\fP"
+          D3:    box "\f(CW11\fP"
+          E3:    box "Parity" filled 0.2
+          F3:    box "\f(CW17\fP"
 
 .\" "Offset" at ORIG.n+(0,.2i)
 "Disk 1" at A.n+(0,.2i)
@@ -384,157 +377,62 @@
 .PE
 .Figure-heading "RAID-5 organization"
 .P
+.ps \n(PS
 Compared to mirroring, RAID-5 has the advantage of requiring significantly less
 storage space.  Read access is similar to that of striped organizations, but
 write access is significantly slower, approximately 25% of the read performance.
-If one drive fails, the array can continue to operate in degraded mode: a read
-from one of the remaining accessible drives continues normally, but a read from
-the failed drive is recalculated from the corresponding block from all the
-remaining drives.
-..
-.H2 "Vinum objects"
-In order to address these problems, Vinum implements a four-level hierarchy of
-objects:
-.Ls B
-.LI
-.X "volume, Vinum"
-.X "Vinum, volume"
-The most visible object is the virtual disk, called a \fIvolume\fP.  Volumes
-have essentially the same properties as a UNIX disk drive, though there are some
-minor differences.  They have no size limitations.
-.LI
-.X "plex, Vinum"
-.X "Vinum, plex"
-Volumes are composed of \fIplexes\fP, each of which represent the total address
-space of a volume.  This level in the hierarchy thus provides redundancy.  Think
-of plexes as individual disks in a mirrored array, each containing the same
-data.
-.LI
-.X "drive, Vinum"
-.X "Vinum, drive"
-.X "subdisk, Vinum"
-.X "Vinum, subdisk"
-Since Vinum exists within the UNIX disk storage framework, it would be possible
-to use UNIX partitions as the building block for multi-disk plexes, but in fact
-this turns out to be too inflexible: UNIX disks can have only a limited number
-of partitions.  Instead, Vinum subdivides a single UNIX partition (the
-\fIdrive\fP\|) into contiguous areas called \fIsubdisks\fP, which it uses as
-building blocks for plexes.
-.LI
-Subdisks reside on Vinum \fIdrives\fP, currently UNIX partitions.  Vinum drives
-can contain any number of subdisks.  With the exception of a small area at the
-beginning of the drive, which is used for storing configuration and state
-information, the entire drive is available for data storage.
-.Le
-The following sections describe the way these objects provide the functionality
-required of Vinum.
-.H3 "Volume size considerations"
-Plexes can include multiple subdisks spread over all drives in the Vinum
-configuration.  As a result, the size of an individual drive does not limit the
-size of a plex, and thus of a volume.
-.H3 "Redundant data storage"
-Vinum 
-..if raid5
-provides both mirroring and RAID-5.  It 
-..endif
-implements mirroring by attaching multiple plexes to a volume.  Each plex is a
-representation of the data in a volume.  A volume may contain between one and
-eight plexes.
-.P
-Although a plex represents the complete data of a volume, it is possible for
-parts of the representation to be physically missing, either by design (by not
-defining a subdisk for parts of the plex) or by accident (as a result of the
-failure of a drive).  As long as at least one plex can provide the data for the
-complete address range of the volume, the volume is fully functional.
-..if raid5
-.P
-From an implementation standpoint, it is not practical to represent a RAID-5
-organization as a collection of plexes.  This issue is discussed below.
-..endif
-.H3 "Performance issues"
-Vinum implements both concatenation and striping at the plex level:
-.Ls B
-.LI
-A \fIconcatenated plex\fP\| uses the address space of each subdisk in turn.
-.LI
-A \fIstriped plex\fP\| stripes the data across each subdisk.  The subdisks must
-all have the same size, and there must be at least two subdisks in order to
-distinguish it from a concatenated plex.
-..if raid5
-.LI
-Like a striped plex, a \fIRAID-5 plex\fP\| stripes the data across each subdisk.
-The subdisks must all have the same size, and there must be at least three
-subdisks, since otherwise mirroring would be more efficient.
-..endif
-.LE
-..if raid5
-.H3 "RAID-5"
-Conceptually, RAID-5 is used for redundancy, but in fact the implementation is a
-kind of striping.  This poses problems for the implementation of Vinum: should
-it be a kind of plex or a kind of volume?  It would have been possible to
-implement it either way, but it proved to be simpler to implement RAID-5 as a
-plex type.  This means that there are two different ways of ensuring data
-redundancy: either have more than one plex in a volume, or have a single RAID-5
-plex.  These methods can be combined.
-..endif
+.P
+.X "RAID-4"
+Vinum also offers \fIRAID-4\fP, a simpler variant of RAID-5 which stores all the
+parity blocks on one disk.  This makes the parity disk a bottleneck when
+writing.  RAID-4 offers no advantages over RAID-5, so it's effectively useless.
 .H3 "Which plex organization?"
-..if raid5
-Vinum implements only that subset of RAID organizations which make sense in the
-framework of the implementation.  It would have been possible to implement all
-RAID levels, but there was no reason to do so.  Each of the chosen organizations
-has unique advantages:
-..else
-The version of Vinum supplied with FreeBSD \*[Fver] implements two kinds of
-plex:
-..endif
+Each plex organization has its unique advantages:
 .Ls B
 .LI
 Concatenated plexes are the most flexible: they can contain any number of
 subdisks, and the subdisks may be of different length.  The plex may be extended
-by adding additional subdisks.
-They require less CPU time than striped
-..if raid5
-or RAID-5
-..endif
-plexes, though the difference in CPU overhead
-..if raid5
-from striped plexes
-..endif
-is not measurable.  On the other hand, they are most susceptible to hot spots,
-where one disk is very active and others are idle.
+by adding additional subdisks.  They require less CPU time than striped or
+RAID-5 plexes, though the difference in CPU overhead from striped plexes is not
+measurable.  They are the only kind of plex that can be extended in size without
+loss of data.
 .LI
 The greatest advantage of striped (RAID-0) plexes is that they reduce hot spots:
-by choosing an optimum sized stripe (about 256 kB), you can even out the load on
-the component drives.  The disadvantages of this approach are (fractionally)
-more complex code and restrictions on subdisks: they must be all the same size,
-and extending a plex by adding new subdisks is so complicated that Vinum
-currently does not implement it.  Vinum imposes an additional, trivial
-restriction: a striped plex must have at least two subdisks, since otherwise it
-is indistinguishable from a concatenated plex.
-..if raid5
+by choosing an optimum sized stripe (between 256 and 512 kB), you can even out
+the load on the component drives.  The disadvantage of this approach is the
+restriction on subdisks, which must be all the same size.  Extending a striped
+plex by adding new subdisks is so complicated that Vinum currently does not
+implement it.  A striped plex must have at least two subdisks: otherwise it is
+indistinguishable from a concatenated plex.  In addition, there's an interaction
+between the geometry of UFS and Vinum that makes it advisable not to have a
+stripe size that is a power of 2: that's the background for the mention of a
+292 kB stripe size in the example above.
 .LI
 RAID-5 plexes are effectively an extension of striped plexes.  Compared to
 striped plexes, they offer the advantage of fault tolerance, but the
-disadvantages of higher storage cost and significantly higher CPU overhead,
-particularly for writes.  The code is an order of magnitude more complex than
-for concatenated and striped plexes.  Like striped plexes, RAID-5 plexes must
-have equal-sized subdisks and cannot currently be extended.  Vinum enforces a
-minimum of three subdisks for a RAID-5 plex, since any smaller number would not
-make any sense.
-..endif
+disadvantages of somewhat higher storage cost and significantly worse write
+performance.  Like striped plexes, RAID-5 plexes must have equal-sized subdisks
+and cannot currently be extended.  Vinum enforces a minimum of three subdisks
+for a RAID-5 plex: any smaller number would not make any sense.
+.LI
+Vinum also offers RAID-4, although this organization has some disadvantages and
+no advantages when compared to RAID-5.  The only reason for including this
+feature was that it was a trivial addition: it required only two lines of code.
 .Le
-Table \*[comparison] summarizes the advantages and disadvantages of each plex
+.\" Table \*[comparison]
+The following table summarizes the advantages and disadvantages of each plex
 organization.
 .br
 .na
 .ne 1i
+.Table-heading "Vinum plex organizations"
 .TS
-box,center,tab(#) ;
-l | l | l | l | lw26 .
-#Minimum#Can#Must be
-Plex type#subdisks#add#equal#Application
-##subdisks#size
-=
+tab(#) ;
+lfR | lfR | lfR | lfR | lw30 .
+#\fBMinimum#\fBCan#\fBMust be
+\fBPlex type#\fBsubdisks#\fBadd#\fBequal#\fBApplication
+##\fBsubdisks#\fBsize
+_
 concatenated#1#yes#no#T{
 Large data storage with maximum placement flexibility and
 moderate performance.
@@ -543,32 +441,69 @@
 striped#2#no#yes#T{
 High performance in combination with highly concurrent access.
 T}
-..if raid5
 .sp .4v
 RAID-5#3#no#yes#T{
 Highly reliable storage, primarily read access.
 T}
-..endif
 .TE
-.ad
-.sp 1.5v
-.Figure-heading "Vinum plex organizations"
 .Tn comparison
-.H2 "Some examples"
-.X "configuration database, Vinum"
-.X "Vinum, configuration database"
-Vinum maintains a \fIconfiguration database\fP\| which describes the objects
-known to an individual system.  Initially, the user creates the configuration
-database from one or more configuration files with the aid of the
-\fIvinum(8)\fP\| utility program.  Vinum stores a copy of its configuration
-database on each disk slice (which Vinum calls a \fIdevice\fP\|) under its
-control.  This database is updated on each state change, so that a restart
+.ad
+.H2 "Creating Vinum drives"
+Before you can do anything with Vinum, you need to reserve disk space for it.
+Vinum drive objects are in fact a special kind of disk partition, of type
+\fIvinum\fP.  We've seen how to create disk partitions on page
+.Sref \*[disklabel] .
+If in that example we had wanted to create a Vinum volume instead of a UFS
+partition, we would have created it like this:
+.Dx
+8 partitions:
+#        size   offset    fstype   [fsize bsize bps/cpg]
+  c:  6295133        0    unused        0     0         # (Cyl.    0 - 10302)
+.ft CB
+  b:  1048576        0    swap          0     0         # (Cyl.    0 - 10302)
+  h:  5246557  1048576    vinum         0     0         # (Cyl.    0 - 10302)
+.ft
+.De
+.SPUP
+.H2 "Starting Vinum"
+.Pn vinumstart
+.X "kld"
+Vinum comes with the base system as a \fIkld\fP.  It gets loaded automatically
+when you run the
+.Command vinum
+command.  It's possible to build a special kernel that includes Vinum, but this
+is not recommended: in this case, you will not be able to stop Vinum.
+.P
+.ne 10v
+FreeBSD Release 5 includes a new method of starting Vinum.  Put the following
+lines in
+.File /boot/loader.conf \/:
+.Dx
+vinum_load="YES"
+vinum.autostart="YES"
+.De
+The first line instructs the loader to load the Vinum kld, and the second tells
+it to start Vinum during the device probes.  Vinum still supports the older
+method of setting the variable \f(CWstart_vinum\fP in
+.File /etc/rc.conf ,
+but this method may go away soon.
+.H2 "Configuring Vinum"
+.X "configuration database, vinum"
+.X "vinum, configuration database"
+Vinum maintains a \fIconfiguration database\fP\/ that describes the objects
+known to an individual system.  You create the configuration database from one
+or more configuration files with the aid of the
+.Command vinum
+utility program.  Vinum stores a copy of its configuration database on each
+Vinum drive.  This database is updated on each state change, so that a restart
 accurately restores the state of each Vinum object.
 .H3 "The configuration file"
-The configuration file describes individual Vinum objects.  The definition of a
-simple volume might be:
+The configuration file describes individual Vinum objects.  To define a simple
+volume, you might create a file called, say,
+.File config1 ,
+containing the following definitions:
 .Dx
-drive a device /dev/da3h
+drive a device /dev/da1s2h
 volume myvol
   plex org concat
     sd length 512m drive a
@@ -576,7 +511,7 @@
 This file describes four Vinum objects:
 .Ls B
 .LI
-The \f(CWdrive\fP line describes a disk partition (\fIdrive\fP\|) and its
+The \f(CWdrive\fP line describes a disk partition (\fIdrive\fP\/) and its
 location relative to the underlying hardware.  It is given the symbolic name
 \fIa\fP.  This separation of the symbolic names from the device names allows
 disks to be moved from one location to another without confusion.
@@ -587,46 +522,51 @@
 The \f(CWplex\fP line defines a plex.  The only required parameter is the
 organization, in this case \f(CWconcat\fP.  No name is necessary: the system
 automatically generates a name from the volume name by adding the suffix
-\&\f(CW.p\f(BIx\fR, where \f(BIx\fP\| is the number of the plex in the volume.
+\&\f(CW.p\f(BIx\fR, where \f(BIx\fP\/ is the number of the plex in the volume.
 Thus this plex will be called \fImyvol.p0\fP.
 .LI
 The \f(CWsd\fP line describes a subdisk.  The minimum specifications are the
 name of a drive on which to store it, and the length of the subdisk.  As with
 plexes, no name is necessary: the system automatically assigns names derived
-from the plex name by adding the suffix \f(CW.s\f(BIx\fR, where \f(BIx\fP\| is
+from the plex name by adding the suffix \f(CW.s\f(BIx\fR, where \f(BIx\fP\/ is
 the number of the subdisk in the plex.  Thus Vinum gives this subdisk the name
 \fImyvol.p0.s0\fP
 .Le
-After processing this file, \fIvinum(8)\fP\| produces the following output:
+.ne 5v
+After processing this file, \fIvinum(8)\fP\/ produces the following output:
 .Dx
 vinum -> \f(CBcreate config1\fP
-Configuration summary
-
-Drives:         1 (4 configured)
-Volumes:        1 (4 configured)
-Plexes:         1 (8 configured)
-Subdisks:       1 (16 configured)
-
-D a                     State: up       Device /dev/da3h        Avail: 2061/2573 MB (80%)
+1 drives:
+D a                     State: up       /dev/da1s2h     A: 3582/4094 MB (87%)
 
+1 volumes:
 V myvol                 State: up       Plexes:       1 Size:        512 MB
 
+1 plexes:
 P myvol.p0            C State: up       Subdisks:     1 Size:        512 MB
 
-S myvol.p0.s0           State: up       PO:        0  B Size:        512 MB
+1 subdisks:
+S myvol.p0.s0           State: up       D: a            Size:        512 MB
 .De
-This output shows the brief listing format of \fIvinum(8)\fP.  It is represented
+This output shows the brief listing format of
+.Command vinum .
+It is represented
 graphically in Figure \*[simple-vol].
 .br
 .DF
 .PS
+vht=2i
+vwid=3i
+bht=vht/2
+bpos=vwid/2
 move right 1i
-T: ellipse ht .3i wid 4i
-   line from T.e down 4i
-   line from T.w down 4i
-B: arc rad 10i to T.e+(0,-4)
+T: ellipse ht .3i wid vwid
+   line from T.e down vht
+   line from T.w down vht
+.\"   ellipse dashed .1i with .w at T.w+(0,-4) ht .3i wid vwid chop 3i
+ arc rad 10i from T.w+(0,-vht) to T.e+(0,-vht)
 
-P1: S1: box ht 3i wid 1i "Subdisk" above "\s-2\f(CWmyvol.p0.s0\fP\s0" below with .c at T.w+(2,-1.95)
+P1: S1: box ht bht wid 1i "Subdisk" above "\s-2\f(CWmyvol.p0.s0\fP\s0" below with .c at T.w+(bpos,-vht*.4)
         "Plex 1" at S1.s+(0,-.3)
         move down .17i; "\s-2\f(CWmyvol.p0\fP\s0"
 
@@ -635,237 +575,505 @@
 
 A: arrow from S1.ne+(.1,0) to S1.se+(.1,0)
 
- "\s-2volume" ljust at A.n+(.1,-.63)
- "address" ljust at A.n+(.1,-.8)
- "space\s0" ljust at A.n+(.1,-.97)
+ "\s-2volume" ljust at A.n+(.1,-.3)
+ "address" ljust at A.n+(.1,-.5)
+ "space\s0" ljust at A.n+(.1,-.7)
 .PE
 .Figure-heading "A simple Vinum volume"
-.Tn simple-vol
+.Fn simple-vol
 .DE
+.ps \n(PS
 .P
-This figure, and the ones which follow, represent a volume, which contains the
+This figure, and the ones that follow, represent a volume, which contains the
 plexes, which in turn contain the subdisks.  In this trivial example, the volume
 contains one plex, and the plex contains one subdisk.
-.P
+.H3 "Creating a file system"
+You create a file system on this volume in the same way as you would for a
+conventional disk:
+.Dx
+# \f(CB newfs -U /dev/vinum/myvol \fP
+/dev/vinum/myvol: 512.0MB (1048576 sectors) block size 16384, fragment size 2048
+        using 4 cylinder groups of 128.02MB, 8193 blks, 16512 inodes.
+super-block backups (for fsck -b #) at:
+ 32, 262208, 524384, 786560
+.De
+.ne 4v
 This particular volume has no specific advantage over a conventional disk
 partition.  It contains a single plex, so it is not redundant.  The plex
 contains a single subdisk, so there is no difference in storage allocation from
 a conventional disk partition.  The following sections illustrate various more
 interesting configuration methods.
 .H3 "Increased resilience: mirroring"
-The resilience of a volume can be increased 
-..if raid5
-either by mirroring or by using
-RAID-5 plexes.
-..else
-by mirroring.
-..endif
-When laying out a mirrored volume, it is important to ensure that the subdisks
-of each plex are on different drives, so that a drive failure will not take down
-both plexes.  The following configuration mirrors a volume:
+The resilience of a volume can be increased either by mirroring or by using
+RAID-5 plexes.  When laying out a mirrored volume, it is important to ensure
+that the subdisks of each plex are on different drives, so that a drive failure
+will not take down both plexes.  The following configuration mirrors a volume:
 .Dx
-drive b device /dev/da4h
+drive b device /dev/da2s2h
 volume mirror
   plex org concat
     sd length 512m drive a
   plex org concat
     sd length 512m drive b
 .De
-In this example, it was not necessary to specify a definition of drive \fIa\fP\|
-again, since Vinum keeps track of all objects in its configuration database.
+In this example, it was not necessary to specify a definition of drive \fIa\fP\/
+again, because Vinum keeps track of all objects in its configuration database.
 After processing this definition, the configuration looks like:
 .Dx
-Drives:         2 (4 configured)
-Volumes:        2 (4 configured)
-Plexes:         3 (8 configured)
-Subdisks:       3 (16 configured)
-
-D a                     State: up       Device /dev/da3h        Avail: 1549/2573 MB (60%)
-D b                     State: up       Device /dev/da4h        Avail: 2061/2573 MB (80%)
+2 drives:
+D a                     State: up       /dev/da1s2h     A: 3070/4094 MB (74%)
+D b                     State: up       /dev/da2s2h     A: 3582/4094 MB (87%)
 
+2 volumes:
 V myvol                 State: up       Plexes:       1 Size:        512 MB
 V mirror                State: up       Plexes:       2 Size:        512 MB
 
+3 plexes:
 P myvol.p0            C State: up       Subdisks:     1 Size:        512 MB
 P mirror.p0           C State: up       Subdisks:     1 Size:        512 MB
 P mirror.p1           C State: initializing     Subdisks:     1 Size:        512 MB
 
-S myvol.p0.s0           State: up       PO:        0  B Size:        512 MB
-S mirror.p0.s0          State: up       PO:        0  B Size:        512 MB
-S mirror.p1.s0          State: empty    PO:        0  B Size:        512 MB
+3 subdisks:
+S myvol.p0.s0           State: up       D: a            Size:        512 MB
+S mirror.p0.s0          State: up       D: a            Size:        512 MB
+S mirror.p1.s0          State: empty    D: b            Size:        512 MB
 .De
 Figure \*[mirrored-vol] shows the structure graphically.
 .br
 .DF
 .PS
+vht=2i
+vwid=3i
+bht=vht/2
+b1pos=vwid*.2
+b2pos=vwid*.8
 move right 1i
-T: ellipse ht .3i wid 4i
-   line from T.e down 4i
-   line from T.w down 4i
-B: arc rad 10i to T.e+(0,-4)
+T: ellipse ht .3i wid vwid
+   line from T.e down vht
+   line from T.w down vht
+.\"   ellipse dashed .1i with .w at T.w+(0,-4) ht .3i wid vwid chop 3i
+ arc rad 10i from T.w+(0,-vht) to T.e+(0,-vht)
 
-P1: S1: box ht 3i wid 1i "Subdisk 1" above "\s-2\f(CWmirror.p0.s0\fP\s0" below with .c at T.c+(-1,-2)
+P1: S1: box ht bht wid 1i "Subdisk 1" above "\s-2\f(CWmirror.p0.s0\fP\s0" below with .c at T.w+(b1pos,-vht*.4)
         "Plex 1" at S1.s+(0,-.3)
         move down .17i; "\s-2\f(CWmirror.p0\fP\s0"
-P2: S3: box ht 3i wid 1i "Subdisk 2" above "\s-2\f(CWmirror.p1.s0\fP\s0" below with .c at T.c+(1,-2)
-        "Plex 2" at S3.s+(0,-.3) 
+
+P2: S2: box ht bht wid 1i "Subdisk 2" above "\s-2\f(CWmirror.p1.s0\fP\s0" below with .c at T.w+(b2pos,-vht*.4)
+        "Plex 2" at S2.s+(0,-.3)
         move down .17i; "\s-2\f(CWmirror.p1\fP\s0"
 
 "\s-2\&0 MB\s0" at S1.ne+(.5,0)
 "\s-2\&512 MB\s0" at S1.se+(.5,0)
 
 A: arrow from S1.ne+(.1,0) to S1.se+(.1,0)
-   arrow from S1.se+(.1,0) to S1.ne+(.1,0)
 
- "\s-2volume" ljust at A.n+(.1,-.63)
- "address" ljust at A.n+(.1,-.8)
- "space\s0" ljust at A.n+(.1,-.97)
+ "\s-2volume" ljust at A.n+(.1,-.3)
+ "address" ljust at A.n+(.1,-.5)
+ "space\s0" ljust at A.n+(.1,-.7)
 .PE
 .Figure-heading "A mirrored Vinum volume"
-.Tn mirrored-vol
+.Fn mirrored-vol
 .DE
+.ps \n(PS
 .P
 In this example, each plex contains the full 512 MB of address space.  As in the
 previous example, each plex contains only a single subdisk.
+.P
+.X "reviving, vinum"
+.X "vinum, reviving"
+Note the state of \fImirror.p1\fP\/ and \fImirror.p1.s0\fP\/:
+\f(CWinitializing\fP and \f(CWempty\fP respectively.  There's a problem when you
+create two identical plexes: to ensure that they're identical, you need to copy
+the entire contents of one plex to the other.  This process is called
+\fIreviving\fP, and you perform it with the \fIstart\fP\/ command:
+.Dx
+vinum -> \f(CBstart mirror.p1\fP
+vinum[278]: reviving mirror.p1.s0
+Reviving mirror.p1.s0 in the background
+vinum -> vinum[278]: mirror.p1.s0 is up
+.De
+.ne 4v
+During the start process, you can look at the status to see how far the revive
+has progressed:
+.Dx
+vinum -> \f(CBlist mirror.p1.s0\fP
+S mirror.p1.s0          State: R 43%    D: b            Size:        512 MB
+.De
+Reviving a large volume can take a very long time.  When you first create a
+volume, the contents are not defined.  Does it really matter if the contents of
+each plex are different?  If you will only ever read what you have first
+written, you don't need to worry too much.  In this case, you can use the
+\f(CWsetupstate\fP keyword in the configuration file.  We'll see an example of
+this below.
+.H3 "Adding plexes to an existing volume"
+.Pn adding-plex
+At some time after creating a volume, you may decide to add additional plexes.
+For example, you may want to add a plex to the volume \fImyvol\fP\/ we saw
+above, putting its subdisk on drive \fIb\fP.  The configuration file for this
+extension would look like:
+.Dx
+plex name myvol.p1 org concat volume myvol
+   sd size 1g drive b
+.De
+To see what has happened, use the recursive listing option \f(CW-r\fP for the
+\fIlist\fP\/ command:
+.Dx
+vinum -> \f(CBl -r myvol\fP
+V myvol                 State: up       Plexes:       2 Size:       1024 MB
+P myvol.p0            C State: up       Subdisks:     1 Size:        512 MB
+P myvol.p1            C State: initializing   Subdisks:     1 Size:       1024 MB
+S myvol.p0.s0           State: up       D: a            Size:        512 MB
+S myvol.p1.s0           State: empty    D: b            Size:       1024 MB
+.De
+.ne 4v
+The command \fIl\fP\/ is a synonym for \fIlist\fP, and the \f(CW-r\fP option
+means \fIrecursive\fP\/: it displays all subordinate objects.  In this example,
+plex \fImyvol.p1\fP\/ is 1 GB in size, although \fImyvol.p0\fP\/ is only 512 MB
+in size.  This discrepancy is allowed, though it isn't very useful by itself:
+only the first half of the volume is protected against failures.  As we'll see
+in the next section, though, this is a useful stepping stone to extending the
+size of a file system.
+.P
+Note that you can't use the \f(CWsetupstate\fP keyword here.  Vinum can't know
+whether the existing volume contains valid data or not, so you \fImust\fP\/ use
+the \fIstart\fP\/ command to synchronize the plexes.
+.H3 "Adding subdisks to existing plexes"
+After adding a second plex to \fImyvol\fP, it had one plex with 512 MB and
+another with 1024 MB.  It makes sense to have the same size plexes, so the first
+thing we should do is add a second subdisk to the plex \fImyvol.p0\fP.
+.P
+If you add subdisks to striped, RAID-4 or RAID-5 plexes, you will change the
+mapping of the data to the disks, which effectively destroys the contents.  As a
+result, you must use the \f(CW-f\fP option.  When you add subdisks to
+concatenated plexes, the data in the existing subdisks remains unchanged.  In
+our case, the plex is concatenated, so we create and add the subdisk like this:
+.Dx
+sd name myvol.p0.s1 plex myvol.p0 size 512m drive c
+.De
+After adding this subdisk, the volume looks like this:
+.PS
+vht=2i
+vwid=3i
+bht=vht/2
+b1pos=vwid*.2
+b2pos=vwid*.8
+move right 1i
+T: ellipse ht .3i wid vwid
+   line from T.e down vht
+   line from T.w down vht
+.\"   ellipse dashed .1i with .w at T.w+(0,-4) ht .3i wid vwid chop 3i
+ arc rad 10i from T.w+(0,-vht) to T.e+(0,-vht)
+
+P1: S1: box ht bht/2 wid 1i "\s-2\f(CWmyvol.p0.s0\fP\s0" with .c at T.w+(b1pos,-vht*.275)
+    S1A: box filled .1 ht bht/2 wid 1i "\s-2\f(CWmyvol.p0.s1\fP\s0"
+        "Plex 1" at S1A.s+(0,-.3)
+        move down .17i; "\s-2\f(CWmyvol.p0\fP\s0"
+
+P2: S2: box filled .1 ht bht wid 1i  "\s-2\f(CWmyvol.p1.s0\fP\s0" with .c at T.w+(b2pos,-vht*.4)
+        "Plex 2" at S2.s+(0,-.3)
+        move down .17i; "\s-2\f(CWmyvol.p1\fP\s0"
+
+"\s-2\&0 MB\s0" at S1.ne+(.5,0)
+"\s-2\&1024 MB\s0" at S1A.se+(.5,0)
+
+A: arrow from S1.ne+(.1,0) to S1.se+(.1,0)
+
+ "\s-2volume" ljust at A.n+(.1,-.3)
+ "address" ljust at A.n+(.1,-.5)
+ "space\s0" ljust at A.n+(.1,-.7)
+.PE
+.Figure-heading "An extended Vinum volume"
+.Fn extended-vol
+.ps \n(PS
+.ne 5v
+It doesn't look too happy, however:
+.Dx
+vinum -> \f(CBl -r myvol\fP
+V myvol                 State: up       Plexes:       2 Size:       1024 MB
+P myvol.p0            C State: corrupt  Subdisks:     2 Size:       1024 MB
+P myvol.p1            C State: initializing   Subdisks:     1 Size:       1024 MB
+S myvol.p0.s0           State: up       D: a            Size:        512 MB
+S myvol.p0.s1           State: empty    D: c            Size:        512 MB
+S myvol.p1.s0           State: stale    D: b            Size:       1024 MB
+.De
+In fact, it's in as good a shape as it ever has been.  The first half of
+\fImyvol\fP\/ still contains the file system that we put on it, and it's as
+accessible as ever.  The trouble here is that there is \fInothing\fP\/ in the
+other two subdisks, which are shown shaded in the figure.  Vinum can't know that
+that is acceptable, but we do.  In this case, we use some maintenance commands
+to set the correct object states:
+.Dx
+vinum -> \f(CBsetstate up myvol.p0.s1 myvol.p0 \fP
+vinum -> \f(CBl -r myvol\fP
+V myvol                 State: up       Plexes:       2 Size:       1024 MB
+P myvol.p0            C State: up       Subdisks:     2 Size:       1024 MB
+P myvol.p1            C State: faulty   Subdisks:     1 Size:       1024 MB
+S myvol.p0.s0           State: up       D: a            Size:        512 MB
+S myvol.p0.s1           State: up       D: c            Size:        512 MB
+S myvol.p1.s0           State: stale    D: b            Size:       1024 MB
+vinum -> \f(CBsaveconfig\fP
+.De
+.X "setstate, vinum command"
+.X "vinum, setstate command"
+.X "saveconfig, vinum command"
+.X "vinum, saveconfig command"
+The command \fIsetstate\fP changes the state of individual objects without
+updating those of related objects.  For example, you can use it to change the
+state of a plex to \f(CWup\fP even if all the subdisks are \f(CWdown\fP.  If
+used incorrectly, it can can cause severe data corruption.  Unlike normal
+commands, it doesn't save the configuration changes, so you use
+\fIsaveconfig\fP\/ for that, \fIafter\fP\/ you're sure you have the correct
+states.  Read the man page before using them for any other purpose.
+.P
+Next you start the second plex:
+.Dx
+vinum -> \f(CBstart myvol.p1\fP
+Reviving myvol.p1.s0 in the background
+vinum[446]: reviving myvol.p1.s0
+vinum -> vinum[446]: myvol.p1.s0 is up          \fIsome time later\fP\/
+\f(CBl\fP                                               \fIcommand for previous prompt\fP\/
+3 drives:
+D a                     State: up       /dev/da1s2h     A: 3582/4094 MB (87%)
+D b                     State: up       /dev/da2s2h     A: 3070/4094 MB (74%)
+D c                     State: up       /dev/da3s2h     A: 3582/4094 MB (87%)
+
+1 volumes:
+V myvol                 State: up       Plexes:       2 Size:       1024 MB
+
+2 plexes:
+P myvol.p0            C State: up       Subdisks:     2 Size:       1024 MB
+P myvol.p1            C State: up       Subdisks:     1 Size:       1024 MB
+
+3 subdisks:
+S myvol.p0.s0           State: up       D: a            Size:        512 MB
+S myvol.p1.s0           State: up       D: b            Size:       1024 MB
+S myvol.p0.s1           State: up       D: c            Size:        512 MB
+.De
+.ne 5v
+The message telling you that \fImyvol.p1.s0\fP\/ is up comes after the prompt,
+so the next command doesn't have a prompt.  At this point you have a fully
+mirrored, functional volume, 1 GB in size.  If you now look at the contents,
+though, you see:
+.Dx
+#  \f(CBdf /mnt\fP
+Filesystem       1048576-blocks Used Avail Capacity  Mounted on
+/dev/vinum/myvol            503    1   461     0%    /mnt
+.De
+The volume is now 1 GB in size, but the file system on the volume is still only
+512 MB.  To expand it, use
+.Command growfs \/:
+.Dx
+# \f(CBumount /mnt\fP
+# \f(CBgrowfs /dev/vinum/myvol \fP
+We strongly recommend you to make a backup before growing the Filesystem
+
+ Did you backup your data (Yes/No) ? \f(CBYes\fP
+new file systemsize is: 524288 frags
+Warning: 261920 sector(s) cannot be allocated.
+growfs: 896.1MB (1835232 sectors) block size 16384, fragment size 2048
+        using 7 cylinder groups of 128.02MB, 8193 blks, 16512 inodes.
+super-block backups (for fsck -b #) at:
+ 1048736, 1310912, 1573088
+# \f(CBmount /dev/vinum/myvol /mnt\fP
+# \f(CBdf /mnt\fP
+Filesystem       1048576-blocks Used Avail Capacity  Mounted on
+/dev/vinum/myvol            881    1   809     0%    /mnt
+.De
+.SPUP
 .H3 "Optimizing performance"
-The mirrored volume in the previous example is more resistant to failure than an
-unmirrored volume, but its performance is less: each write to the volume
+The mirrored volumes in the previous example are more resistant to failure than
+unmirrored volumes, but their performance is less: each write to the volume
 requires a write to both drives, using up a greater proportion of the total disk
 bandwidth.  Performance considerations demand a different approach: instead of
 mirroring, the data is striped across as many disk drives as possible.  The
 following configuration shows a volume with a plex striped across four disk
 drives:
 .Dx
-drive c device /dev/da5h
-drive d device /dev/da6h
+drive c device /dev/da3s2h
+drive d device /dev/da4s2h
 volume stripe
-  plex org striped 512k
+  plex org striped 480k
     sd length 128m drive a
     sd length 128m drive b
     sd length 128m drive c
     sd length 128m drive d
 .De
+When creating striped plexes for the UFS file system, ensure that the stripe
+size is a multiple of the file system block size (normally 16 kB), but not a
+power of 2.  UFS frequently allocates cylinder groups with lengths that are a
+power of 2, and if you allocate stripes that are also a power of 2, you may end
+up with all inodes on the same drive, which would significantly impact
+performance under some circumstances.  Files are allocated in blocks, so having
+a stripe size that is not a multiple of the block size can cause significant
+fragmentation of I/O requests and consequent drop in performance.  See the man
+page for more details.
+.P
+.ne 5v
+Vinum requires that a striped plex have an integral number of stripes.  You
+don't have to calculate the size exactly, though: if the size of the plex is not
+a multiple of the stripe size, Vinum trims off the remaining partial stripe and
+prints a console message:
+.Dx
+vinum: removing 256 blocks of partial stripe at the end of stripe.p0
+.De
 .P
-As before, it is not necessary to define the drives which are already known to
+As before, it is not necessary to define the drives that are already known to
 Vinum.  After processing this definition, the configuration looks like:
 .Dx
-Drives:         4 (4 configured)
-Volumes:        3 (4 configured)
-Plexes:         4 (8 configured)
-Subdisks:       7 (16 configured)
-
-D a                     State: up       Device /dev/da3h        Avail: 1421/2573 MB (55%)
-D b                     State: up       Device /dev/da4h        Avail: 1933/2573 MB (75%)
-D c                     State: up       Device /dev/da5h        Avail: 2445/2573 MB (95%)
-D d                     State: up       Device /dev/da6h        Avail: 2445/2573 MB (95%)
+4 drives:
+D a                     State: up       /dev/da1s2h     A: 2942/4094 MB (71%)
+D b                     State: up       /dev/da2s2h     A: 2430/4094 MB (59%)
+D c                     State: up       /dev/da3s2h     A: 3966/4094 MB (96%)
+D d                     State: up       /dev/da4s2h     A: 3966/4094 MB (96%)
 
-V myvol                 State: up       Plexes:       1 Size:        512 MB
+3 volumes:
+V myvol                 State: up       Plexes:       2 Size:       1024 MB
 V mirror                State: up       Plexes:       2 Size:        512 MB
-V striped               State: up       Plexes:       1 Size:        512 MB
+V stripe                State: up       Plexes:       1 Size:        511 MB
 
+5 plexes:
 P myvol.p0            C State: up       Subdisks:     1 Size:        512 MB
 P mirror.p0           C State: up       Subdisks:     1 Size:        512 MB
 P mirror.p1           C State: initializing     Subdisks:     1 Size:        512 MB
-P striped.p1            State: up       Subdisks:     1 Size:        512 MB
+P myvol.p1            C State: up       Subdisks:     1 Size:       1024 MB
+P stripe.p0           S State: up       Subdisks:     4 Size:        511 MB
 
-S myvol.p0.s0           State: up       PO:        0  B Size:        512 MB
-S mirror.p0.s0          State: up       PO:        0  B Size:        512 MB
-S mirror.p1.s0          State: empty    PO:        0  B Size:        512 MB
-S striped.p0.s0         State: up       PO:        0  B Size:        128 MB
-S striped.p0.s1         State: up       PO:      512 kB Size:        128 MB
-S striped.p0.s2         State: up       PO:     1024 kB Size:        128 MB
-S striped.p0.s3         State: up       PO:     1536 kB Size:        128 MB
+8 subdisks:
+S myvol.p0.s0           State: up       D: a            Size:        512 MB
+S mirror.p0.s0          State: up       D: a            Size:        512 MB
+S mirror.p1.s0          State: empty    D: b            Size:        512 MB
+S myvol.p1.s0           State: up       D: b            Size:       1024 MB
+S myvol.p0.s1           State: up       D: c            Size:        512 MB
+S stripe.p0.s0          State: up       D: a            Size:        127 MB
+S stripe.p0.s1          State: up       D: b            Size:        127 MB
+S stripe.p0.s2          State: up       D: c            Size:        127 MB
+S stripe.p0.s3          State: up       D: d            Size:        127 MB
 .De
 .DF
 .PS
-move right 1i
-T: ellipse ht .3i wid 4i
-   line from T.e down 4i
-   line from T.w down 4i
-B: arc rad 10i to T.e+(0,-4)
-
-P1: S1: box ht .7i wid 1i with .c at T.w+(2,-0.825)
-    S2: box ht .7i wid 1i
-    S3: box ht .7i wid 1i
-    S4: box ht .7i wid 1i
-        "Plex 1" at S4.s+(0,-.3)
-        move down .17i; "\s-2\f(CWstriped.p0\fP\s0"
-
-"\s-2\&0 MB\s0" at S1.ne+(.5,0)
-"\s-2\&512 MB\s0" at S4.se+(.5,0)
+vht=2i
+vwid=3i
+bht=vht/2
+boxht=.05i
+stripeht=.35i
 
-A: arrow from S1.ne+(.1,0) to S4.se+(.1,0)
-
- "\s-2volume" ljust at A.n+(.1,-.63)
- "address" ljust at A.n+(.1,-.8)
- "space\s0" ljust at A.n+(.1,-.97)
- "\f(CW\s-2striped.p0.s0\fP\s0" at S1.w-(.2,0) rjust
- "\f(CW\s-2striped.p0.s1\fP\s0" at S2.w-(.2,0) rjust
- "\f(CW\s-2striped.p0.s2\fP\s0" at S3.w-(.2,0) rjust
- "\f(CW\s-2striped.p0.s3\fP\s0" at S4.w-(.2,0) rjust
+move right 1i
+T: ellipse ht .3i wid vwid
+   line from T.e down vht
+   line from T.w down vht
+B:  arc rad 10i from T.w+(0,-vht) to T.e+(0,-vht)
+
+P1: S1: box ht stripeht wid 1i with .c at T.w+(vwid*.55,-vht*.2)
+    S2: box ht stripeht wid 1i
+    S3: box ht stripeht wid 1i
+    S4: box ht stripeht wid 1i
+        "Plex 1" at S4.s+(0,-.2)
+        move down .17i; "\s-2\f(CWstripe.p0\fP\s0"
+
+ "\f(CW\s-2stripe.p0.s0\fP\s0" at S1.w-(.2,0) rjust
+ "\f(CW\s-2stripe.p0.s1\fP\s0" at S2.w-(.2,0) rjust
+ "\f(CW\s-2stripe.p0.s2\fP\s0" at S3.w-(.2,0) rjust
+ "\f(CW\s-2stripe.p0.s3\fP\s0" at S4.w-(.2,0) rjust
 
 .\" Stripes
-  box filled .1 wid 1i ht .1i at S1.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S1.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S1.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S1.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S1.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S1.n+(0, -.55)
-  box filled .7 wid 1i ht .1i at S1.n+(0, -.65)
-
-  box filled .1 wid 1i ht .1i at S2.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S2.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S2.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S2.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S2.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S2.n+(0, -.55)
-  box filled .7 wid 1i ht .1i at S2.n+(0, -.65)
-
-  box filled .1 wid 1i ht .1i at S3.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S3.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S3.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S3.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S3.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S3.n+(0, -.55)
-  box filled .7 wid 1i ht .1i at S3.n+(0, -.65)
-
-  box filled .1 wid 1i ht .1i at S4.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S4.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S4.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S4.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S4.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S4.n+(0, -.55)
-  box filled .7 wid 1i ht .1i at S4.n+(0, -.65)
-
+  box filled .1 wid 1i with .nw at S1.nw
+  box filled .2 wid 1i
+  box filled .3 wid 1i
+  box filled .4 wid 1i
+  box filled .5 wid 1i
+  box filled .6 wid 1i
+  box filled .7 wid 1i
+
+  box filled .1 wid 1i
+  box filled .2 wid 1i
+  box filled .3 wid 1i
+  box filled .4 wid 1i
+  box filled .5 wid 1i
+  box filled .6 wid 1i
+  box filled .7 wid 1i
+
+  box filled .1 wid 1i
+  box filled .2 wid 1i
+  box filled .3 wid 1i
+  box filled .4 wid 1i
+  box filled .5 wid 1i
+  box filled .6 wid 1i
+  box filled .7 wid 1i
+
+  box filled .1 wid 1i
+  box filled .2 wid 1i
+  box filled .3 wid 1i
+  box filled .4 wid 1i
+  box filled .5 wid 1i
+  box filled .6 wid 1i
+  box filled .7 wid 1i
 .PE
 .Figure-heading "A striped Vinum volume"
-.Tn striped-vol
+.Fn striped-vol
 .DE
+.ps \n(PS
 .P
 This volume is represented in Figure \*[striped-vol].  The darkness of the
 stripes indicates the position within the plex address space: the lightest
 stripes come first, the darkest last.
 .H3 "Resilience and performance"
 .Pn resilience
-With sufficient hardware, it is possible to build volumes which show both
+With sufficient hardware, it is possible to build volumes that show both
 increased resilience and increased performance compared to standard UNIX
-partitions.  
-..if raid5
-Mirrored disks will always give better performance than RAID-5, so
-a
-..else
-A
-..endif
-typical configuration file might be:
+partitions.  Mirrored disks will always give better performance than RAID-5, so
+a typical configuration file might be:
 .Dx
-volume raid10
-  plex org striped 512k
+drive e device /dev/da5s2h
+drive f device /dev/da6s2h
+drive g device /dev/da7s2h
+drive h device /dev/da8s2h
+drive i device /dev/da9s2h
+drive j device /dev/da10s2h
+volume raid10 setupstate
+  plex org striped 480k
     sd length 102480k drive a
     sd length 102480k drive b
     sd length 102480k drive c
     sd length 102480k drive d
     sd length 102480k drive e
-  plex org striped 512k
+  plex org striped 480k
+    sd length 102480k drive f
+    sd length 102480k drive g
+    sd length 102480k drive h
+    sd length 102480k drive i
+    sd length 102480k drive j
+.De
+In this example, we have added another five disks for the second plex, so the
+volume is spread over ten spindles.  We have also used the \f(CWsetupstate\fP
+keyword so that all components come up. The volume looks like this:
+.Dx
+vinum -> \f(CBl -r raid10\fP
+V raid10                State: up       Plexes:       2 Size:        499 MB
+P raid10.p0           S State: up       Subdisks:     5 Size:        499 MB
+P raid10.p1           S State: up       Subdisks:     5 Size:        499 MB
+S raid10.p0.s0          State: up       D: a            Size:         99 MB
+S raid10.p0.s1          State: up       D: b            Size:         99 MB
+S raid10.p0.s2          State: up       D: c            Size:         99 MB
+S raid10.p0.s3          State: up       D: d            Size:         99 MB
+S raid10.p0.s4          State: up       D: e            Size:         99 MB
+S raid10.p1.s0          State: up       D: f            Size:         99 MB
+S raid10.p1.s1          State: up       D: g            Size:         99 MB
+S raid10.p1.s2          State: up       D: h            Size:         99 MB
+S raid10.p1.s3          State: up       D: i            Size:         99 MB
+S raid10.p1.s4          State: up       D: j            Size:         99 MB
+.De
+This assumes the availability of ten disks.  It's not essential to have all the
+components on different disks.  You could put the subdisks of the second plex on
+the same drives as the subdisks of the first plex.  If you do so, you should put
+corresponding subdisks on different drives:
+.Dx
+  plex org striped 480k
+    sd length 102480k drive a
+    sd length 102480k drive b
+    sd length 102480k drive c
+    sd length 102480k drive d
+    sd length 102480k drive e
+  plex org striped 480k
     sd length 102480k drive c
     sd length 102480k drive d
     sd length 102480k drive e
@@ -873,34 +1081,39 @@
     sd length 102480k drive b
 .De
 The subdisks of the second plex are offset by two drives from those of the first
-plex: this helps ensure that writes do not go to the same subdisks even if a
-transfer goes over two drives.
+plex: this helps ensure that the failure of a drive does not cause the same part
+of both plexes to become unreachable, which would destroy the file system.
 .P
 Figure \*[raid10-vol] represents the structure of this volume.
-.br
-.DF
 .PS
-move right 1i
-T: ellipse ht .3i wid 4i
-   line from T.e down 4i
-   line from T.w down 4i
-B: arc rad 10i to T.e+(0,-4)
-
-P1: S1: box ht .6i wid 1i with .c at T.w+(1.3,-0.825)
-    S2: box ht .6i wid 1i
-    S3: box ht .6i wid 1i
-    S4: box ht .6i wid 1i
-    S5: box ht .6i wid 1i
-        "Plex 1" at S5.s+(0,-.3)
-        move down .17i; "\s-2\f(CWstriped.p0\fP\s0"
-
-P1: S11: box ht .6i wid 1i with .c at T.w+(3.3,-0.825)
-    S12: box ht .6i wid 1i
-    S13: box ht .6i wid 1i
-    S14: box ht .6i wid 1i
-    S15: box ht .6i wid 1i
-        "Plex 2" at S15.s+(0,-.3)
-        move down .17i; "\s-2\f(CWstriped.p1\fP\s0"
+vht=2.5i
+vwid=3.5i
+boxwid=.8i
+bht=vht/2
+boxht=.05i
+stripeht=.35i
+
+move right .75i
+T: ellipse ht .3i wid vwid
+   line from T.e down vht
+   line from T.w down vht
+B:  arc rad 10i from T.w+(0,-vht) to T.e+(0,-vht)
+
+P1: S1: box ht stripeht with .c at T.w+(vwid*.3,-vht*.2)
+    S2: box ht stripeht
+    S3: box ht stripeht
+    S4: box ht stripeht
+    S5: box ht stripeht
+        "Plex 1" at S5.s+(0,-.2)
+        move down .14i; "\s-2\f(CWraid10.p0\fP\s0"
+
+P2: S11: box ht stripeht with .c at T.w+(vwid*.85,-vht*.2)
+    S12: box ht stripeht
+    S13: box ht stripeht
+    S14: box ht stripeht
+    S15: box ht stripeht
+        "Plex 2" at S15.s+(0,-.2)
+        move down .14i; "\s-2\f(CWraid10.p1\fP\s0"
 
  "\f(CW\s-2.p0.s0\fP\s0" at S1.w-(.2,0) rjust
  "\f(CW\s-2.p0.s1\fP\s0" at S2.w-(.2,0) rjust
@@ -914,278 +1127,132 @@
  "\f(CW\s-2.p1.s3\fP\s0" at S14.w-(.2,0) rjust
  "\f(CW\s-2.p1.s4\fP\s0" at S15.w-(.2,0) rjust
 
-.\" Stripes
-  box filled .1 wid 1i ht .1i at S1.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S1.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S1.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S1.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S1.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S1.n+(0, -.55)
-
-  box filled .1 wid 1i ht .1i at S2.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S2.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S2.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S2.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S2.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S2.n+(0, -.55)
-
-  box filled .1 wid 1i ht .1i at S3.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S3.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S3.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S3.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S3.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S3.n+(0, -.55)
-
-  box filled .1 wid 1i ht .1i at S4.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S4.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S4.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S4.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S4.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S4.n+(0, -.55)
-
-  box filled .1 wid 1i ht .1i at S5.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S5.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S5.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S5.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S5.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S5.n+(0, -.55)
-
-  box filled .1 wid 1i ht .1i at S11.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S11.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S11.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S11.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S11.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S11.n+(0, -.55)
-
-  box filled .1 wid 1i ht .1i at S12.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S12.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S12.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S12.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S12.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S12.n+(0, -.55)
-
-  box filled .1 wid 1i ht .1i at S13.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S13.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S13.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S13.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S13.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S13.n+(0, -.55)
-
-  box filled .1 wid 1i ht .1i at S14.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S14.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S14.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S14.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S14.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S14.n+(0, -.55)
-
-  box filled .1 wid 1i ht .1i at S15.n+(0, -.05)
-  box filled .2 wid 1i ht .1i at S15.n+(0, -.15)
-  box filled .3 wid 1i ht .1i at S15.n+(0, -.25)
-  box filled .4 wid 1i ht .1i at S15.n+(0, -.35)
-  box filled .5 wid 1i ht .1i at S15.n+(0, -.45)
-  box filled .6 wid 1i ht .1i at S15.n+(0, -.55)
-
+.\" Stripes, plex 1
+  box filled .1 with .nw at S1.nw
+  box filled .2
+  box filled .3
+  box filled .4
+  box filled .5
+  box filled .6
+  box filled .7
+
+  box filled .1
+  box filled .2
+  box filled .3
+  box filled .4
+  box filled .5
+  box filled .6
+  box filled .7
+
+  box filled .1
+  box filled .2
+  box filled .3
+  box filled .4
+  box filled .5
+  box filled .6
+  box filled .7
+
+  box filled .1
+  box filled .2
+  box filled .3
+  box filled .4
+  box filled .5
+  box filled .6
+  box filled .7
+
+  box filled .1
+  box filled .2
+  box filled .3
+  box filled .4
+  box filled .5
+  box filled .6
+  box filled .7
+
+.\" Stripes, plex 2
+  box filled .1 with .nw at S11.nw
+  box filled .2
+  box filled .3
+  box filled .4
+  box filled .5
+  box filled .6
+  box filled .7
+
+  box filled .1
+  box filled .2
+  box filled .3
+  box filled .4
+  box filled .5
+  box filled .6
+  box filled .7
+
+  box filled .1
+  box filled .2
+  box filled .3
+  box filled .4
+  box filled .5
+  box filled .6
+  box filled .7
+
+  box filled .1
+  box filled .2
+  box filled .3
+  box filled .4
+  box filled .5
+  box filled .6
+  box filled .7
+
+  box filled .1
+  box filled .2
+  box filled .3
+  box filled .4
+  box filled .5
+  box filled .6
+  box filled .7
 .PE
 .Figure-heading "A mirrored, striped Vinum volume"
-.Tn raid10-vol
-.DE
-.H2 "Object naming"
-As described above, Vinum assigns default names to plexes and subdisks, although
-they may be overridden.  Overriding the default names is not recommended:
-experience with the VERITAS\(rg volume manager, which allows arbitary naming of
-objects, has shown that this flexibility does not bring a significant advantage,
-and it can cause confusion.
-.P
-Names may contain any non-blank character, but it is recommended to restrict
-them to letters, digits and the underscore characters.  The names of volumes,
-plexes and subdisks may be up to 64 characters long, and the names of drives may
-up to 32 characters long.
-.P
-.X "/dev/vinum"
-Vinum objects are assigned device nodes in the hierarchy \fI/dev/vinum\fP.  The
-configuration shown above would cause Vinum to create the following device
-nodes:
-.Ls B
-.LI
-.X "/dev/vinum/control"
-The control devices \fI/dev/vinum/control\fP\| and \fI/dev/vinum/controld\fP,
-which are used by \fIvinum(8)\fP\| and the Vinum dmon respectively.
-.LI
-Block and character device entries for each volume.  These are the main devices
-used by Vinum.  The block device names are the name of the volume, while the
-character device names follow the BSD tradition of prepending the letter
-\f(CWr\fP to the name.  Thus the configuration above would include the block
-devices \fI/dev/vinum/myvol\fP, \fI/dev/vinum/mirror\fP,
-\fI/dev/vinum/striped\fP, \fI/dev/vinum/raid5\fP\| and \fI/dev/vinum/raid10\fP,
-and the character devices \fI/dev/vinum/rmyvol\fP, \fI/dev/vinum/rmirror\fP,
-\fI/dev/vinum/rstriped\fP, \fI/dev/vinum/rraid5\fP\| and
-\fI/dev/vinum/rraid10\fP.  There is obviously a problem here: it is possible to
-have two volumes called \fIr\fP\| and \fIrr\fP, but there will be a conflict
-creating the device node \fI/dev/vinum/rr\fP\|: is it a character device for
-volume \fIr\fP\| or a block device for volume \fIrr\fP\|?  Currently Vinum does
-not address this conflict: the first-defined volume will get the name.
-.LI
-A directory \fI/dev/vinum/drive\fP\| with entries for each drive.  These entries
-are in fact symbolic links to the corresponding disk nodes.
-.LI
-A directory \fI/dev/vinum/volume\fP\| with entries for each volume.  It contains
-subdirectories for each plex, which in turn contain subdirectories for their
-component subdisks.
-.LI
-The directories \fI/dev/vinum/plex\fP\| and \fI/dev/vinum/sd\fP,
-\fI/dev/vinum/rsd\fP, which contain block device nodes for each plex and block
-and character device nodes respectively for subdisk.
-.Le
-For example, consider the following configuration file:
-.Dx
-drive drive1 device /dev/sd1h
-drive drive2 device /dev/sd2h
-drive drive3 device /dev/sd3h
-drive drive4 device /dev/sd4h
-volume s64 setupstate
- plex org striped 64k
-   sd length 100m drive drive1
-   sd length 100m drive drive2
-   sd length 100m drive drive3
-   sd length 100m drive drive4
-.De
-After processing this file, \fIvinum(8)\fP\| creates the following structure in
-\fI/dev/vinum\fP\|:
-.Dx
-brwx------  1 root  wheel   25, 0x40000001 Apr 13 16:46 Control
-brwx------  1 root  wheel   25, 0x40000002 Apr 13 16:46 control
-brwx------  1 root  wheel   25, 0x40000000 Apr 13 16:46 controld
-drwxr-xr-x  2 root  wheel       512 Apr 13 16:46 drive
-drwxr-xr-x  2 root  wheel       512 Apr 13 16:46 plex
-crwxr-xr--  1 root  wheel   91,   2 Apr 13 16:46 rs64
-drwxr-xr-x  2 root  wheel       512 Apr 13 16:46 rsd
-drwxr-xr-x  2 root  wheel       512 Apr 13 16:46 rvol
-brwxr-xr--  1 root  wheel   25,   2 Apr 13 16:46 s64
-drwxr-xr-x  2 root  wheel       512 Apr 13 16:46 sd
-drwxr-xr-x  3 root  wheel       512 Apr 13 16:46 vol
-
-/dev/vinum/drive:
-total 0
-lrwxr-xr-x  1 root  wheel  9 Apr 13 16:46 drive1 -> /dev/sd1h
-lrwxr-xr-x  1 root  wheel  9 Apr 13 16:46 drive2 -> /dev/sd2h
-lrwxr-xr-x  1 root  wheel  9 Apr 13 16:46 drive3 -> /dev/sd3h
-lrwxr-xr-x  1 root  wheel  9 Apr 13 16:46 drive4 -> /dev/sd4h
-
-/dev/vinum/plex:
-total 0
-brwxr-xr--  1 root  wheel   25, 0x10000002 Apr 13 16:46 s64.p0
-
-/dev/vinum/rsd:
-total 0
-crwxr-xr--  1 root  wheel   91, 0x20000002 Apr 13 16:46 s64.p0.s0
-crwxr-xr--  1 root  wheel   91, 0x20100002 Apr 13 16:46 s64.p0.s1
-crwxr-xr--  1 root  wheel   91, 0x20200002 Apr 13 16:46 s64.p0.s2
-crwxr-xr--  1 root  wheel   91, 0x20300002 Apr 13 16:46 s64.p0.s3
-
-/dev/vinum/rvol:
-total 0
-crwxr-xr--  1 root  wheel   91,   2 Apr 13 16:46 s64
-
-/dev/vinum/sd:
-total 0
-brwxr-xr--  1 root  wheel   25, 0x20000002 Apr 13 16:46 s64.p0.s0
-brwxr-xr--  1 root  wheel   25, 0x20100002 Apr 13 16:46 s64.p0.s1
-brwxr-xr--  1 root  wheel   25, 0x20200002 Apr 13 16:46 s64.p0.s2
-brwxr-xr--  1 root  wheel   25, 0x20300002 Apr 13 16:46 s64.p0.s3
-
-/dev/vinum/vol:
-total 1
-brwxr-xr--  1 root  wheel   25,   2 Apr 13 16:46 s64
-drwxr-xr-x  3 root  wheel       512 Apr 13 16:46 s64.plex
-
-/dev/vinum/vol/s64.plex:
-total 1
-brwxr-xr--  1 root  wheel   25, 0x10000002 Apr 13 16:46 s64.p0
-drwxr-xr-x  2 root  wheel       512 Apr 13 16:46 s64.p0.sd
-
-/dev/vinum/vol/s64.plex/s64.p0.sd:
-total 0
-brwxr-xr--  1 root  wheel   25, 0x20000002 Apr 13 16:46 s64.p0.s0
-brwxr-xr--  1 root  wheel   25, 0x20100002 Apr 13 16:46 s64.p0.s1
-brwxr-xr--  1 root  wheel   25, 0x20200002 Apr 13 16:46 s64.p0.s2
-brwxr-xr--  1 root  wheel   25, 0x20300002 Apr 13 16:46 s64.p0.s3
-.De
-.P
-Although it is recommended that plexes and subdisks should not be allocated
-specific names, Vinum drives must be named.  This makes it possible to move a
-drive to a different location and still recognize it automatically.  Drive names
-may be up to 32 characters long.
-.H3 "Creating file systems"
-.X "newfs"
-Volumes appear to the system to be identical to disks, with one exception.
-Unlike UNIX drives, Vinum does not partition volumes, which thus do not contain
-a partition table.  This has required modification to some disk utilities,
-notably \fInewfs\fP\|, which previously tried to interpret the last letter of a
-Vinum volume name as a partition identifier.  For example, a disk drive may have
-a name like \fI/dev/wd0a\fP\| or \fI/dev/da2h\fP.  These names represent the
-first partition (\f(CWa\fP) on the first (0) IDE disk (\f(CWwd\fP) and the eight
-partition (\f(CWh\fP) on the third (2) SCSI disk (\f(CWda\fP) respectively.  By
-contrast, a Vinum volume might be called \fI/dev/vinum/concat\fP, a name which
-has no relationship with a partition name.
-.P
-Normally, \fInewfs(8)\fP\| interprets the name of the disk and complains if it
-cannot understand it.  For example:
-.Dx
-# \f(CBnewfs /dev/vinum/concat\fP
-newfs: /dev/vinum/concat: can't figure out file system partition
-.De
-In order to create a file system on this volume, use the \f(CW-v\fP option to
-\fInewfs(8)\fP\|:
-.Dx
-# \f(CBnewfs -v /dev/vinum/concat\fP
-.De
-.sp -1v
-.H2 "Configuring Vinum"
-The \f(CWGENERIC\fP kernel does not contain Vinum.  It's possible to build a
-special kernel which includes Vinum, but this is not recommended.  The standard
-way to start Vinum is as a \fIkld\fP\| (see page \*[kld] for more details).  You
-don't even need to use \fIkldload\fP\| for Vinum: when you start \fIvinum(8)\fP,
-it checks whether the module has been loaded, and if it isn't, it loads it
-automatically.
-.H2 "Startup"
-Vinum stores configuration information on the disk slices in essentially the
-same form as in the configuration files.  When reading from the configuration
-database, Vinum recognizes a number of keywords which are not allowed in the
-configuration files.  For example, a disk configuration might contain the
-following text:
+.Fn raid10-vol
+.SPUP
+.H2 "Vinum configuration database"
+.X "dumpconfig, vinum command"
+.X "vinum, dumpconfig command"
+Vinum stores configuration information on each drive in essentially the same
+form as in the configuration files.  You can display it with the
+\fIdumpconfig\fP\/ command.  When reading from the configuration database, Vinum
+recognizes a number of keywords that are not allowed in the configuration files,
+because they would compromise data integrity.  For example, after adding the
+second plex to \fImyvol\fP, the disk configuration would contain the following
+text:
 .Dx
+vinum -> \f(CBdumpconfig\fP
+Drive a:        Device /dev/da1s2h
+                Created on bumble.example.org at Tue Nov 26 14:35:12 2002
+                Config last updated Tue Nov 26 16:12:35 2002
+                Size:       4293563904 bytes (4094 MB)
+volume myvol state up
+plex name myvol.p0 state up org concat vol myvol
+plex name myvol.p1 state up org concat vol myvol
+sd name myvol.p0.s0 drive a plex myvol.p0 len 1048576s driveoffset 265s state up ple
+xoffset 0s
+sd name myvol.p1.s0 drive b plex myvol.p1 len 2097152s driveoffset 265s state up ple
+xoffset 0s
+sd name myvol.p0.s1 drive c plex myvol.p0 len 1048576s driveoffset 265s state up ple
+xoffset 1048576s
+
+Drive /dev/da1s2h: 4094 MB (4293563904 bytes)
+
+Drive b:        Device /dev/da2s2h
+                Created on bumble.example.org at Tue Nov 26 14:35:27 2002
+                Config last updated Tue Nov 26 16:12:35 2002
+                Size:       4293563904 bytes (4094 MB)
 volume myvol state up
-volume bigraid state down
 plex name myvol.p0 state up org concat vol myvol
 plex name myvol.p1 state up org concat vol myvol
-plex name myvol.p2 state init org striped 512b vol myvol
-plex name bigraid.p0 state initializing org raid5 512b vol bigraid
-sd name myvol.p0.s0 drive a plex myvol.p0 state up len 1048576b driveoffset 265b plexo
-ffset 0b
-sd name myvol.p0.s1 drive b plex myvol.p0 state up len 1048576b driveoffset 265b plexo
-ffset 1048576b
-sd name myvol.p1.s0 drive c plex myvol.p1 state up len 1048576b driveoffset 265b plexo
-ffset 0b
-sd name myvol.p1.s1 drive d plex myvol.p1 state up len 1048576b driveoffset 265b plexo
-ffset 1048576b
-sd name myvol.p2.s0 drive a plex myvol.p2 state init len 524288b driveoffset 1048841b 
-plexoffset 0b
-sd name myvol.p2.s1 drive b plex myvol.p2 state init len 524288b driveoffset 1048841b 
-plexoffset 524288b
-sd name myvol.p2.s2 drive c plex myvol.p2 state init len 524288b driveoffset 1048841b 
-plexoffset 1048576b
-sd name myvol.p2.s3 drive d plex myvol.p2 state init len 524288b driveoffset 1048841b 
-plexoffset 1572864b
-sd name bigraid.p0.s0 drive a plex bigraid.p0 state initializing len 4194304b driveoff
-set 1573129b plexoffset 0b
-sd name bigraid.p0.s1 drive b plex bigraid.p0 state initializing len 4194304b driveoff
-set 1573129b plexoffset 4194304b
-sd name bigraid.p0.s2 drive c plex bigraid.p0 state initializing len 4194304b driveoff
-set 1573129b plexoffset 8388608b
-sd name bigraid.p0.s3 drive d plex bigraid.p0 state initializing len 4194304b driveoff
-set 1573129b plexoffset 12582912b
-sd name bigraid.p0.s4 drive e plex bigraid.p0 state initializing len 4194304b driveoff
-set 1573129b plexoffset 16777216b
+sd name myvol.p0.s0 drive a plex myvol.p0 len 1048576s driveoffset 265s state up ple
+xoffset 0s
+sd name myvol.p1.s0 drive b plex myvol.p1 len 2097152s driveoffset 265s state up ple
+xoffset 0s
+sd name myvol.p0.s1 drive c plex myvol.p0 len 1048576s driveoffset 265s state up ple
+xoffset 1048576s
 .De
 The obvious differences here are the presence of explicit location information
 and naming (both of which are also allowed, but discouraged, for use by the
@@ -1194,36 +1261,349 @@
 it finds the drives by scanning the configured disk drives for partitions with a
 Vinum label.  This enables Vinum to identify drives correctly even if they have
 been assigned different UNIX drive IDs.
-.H3 "Automatic startup"
-In order to start Vinum automatically when you boot the system, ensure that you
-have the following line in your \fI/etc/rc.conf\fP\|:
-.Dx
-start_vinum="YES"		# set to YES to start vinum
-.De
-If you don't have a file \fI/etc/rc.conf\fP, create one with this content.  This
-will cause the system to load the Vinum kld at startup, and to start any objects
-mentioned in the configuration.  This is done before mounting file systems, so
-it's possible to automatically \fIfsck\fP\| and mount file systems on Vinum
-volumes.
 .P
-When you start Vinum with the \f(CWvinum\ start\fP command, Vinum reads the
+When you start Vinum with the \fIvinum\ start\fP\/ command, Vinum reads the
 configuration database from one of the Vinum drives.  Under normal
 circumstances, each drive contains an identical copy of the configuration
 database, so it does not matter which drive is read.  After a crash, however,
 Vinum must determine which drive was updated most recently and read the
-configuration from this drive.  It then updates the configuration if necessary
+configuration from this drive.  It then updates the configuration, if necessary,
 from progressively older drives.
+.H2 "Installing FreeBSD on Vinum"
+Installing FreeBSD on Vinum is complicated by the fact that
+.Command sysinstall
+and the loader don't support Vinum, so it is not possible to install directly on
+a Vinum volume.  Instead, you need to install a conventional system and then
+convert it to Vinum.  That's not as difficult as it might sound.
 .P
-XXXXXXX
-bmah        2002/01/11 15:55:59 PST
+.ne 5v
+A typical disk installation lays out disk partitions in the following manner:
+.PS
+h = .2i
+dh = .02i
+dw = 1.7i
+move right .5i
+down
+[
+        boxht = h; boxwid = 1.6i
 
-  Modified files:
-    share/man/man7       tuning.7
-  Log:
-  newfs -U enables softupdates beginning with FreeBSD 4.5.
+R:      box ht .3i "\fIda0s3a\fP\/: \fI/\fP\/ file system"
+        box ht .2i "\fIda0s3b\fR: swap"
+        box ht .5i "\fIda0s3e\fR: \fI/usr\fP\/ file system"
+        box ht .5i "\fIda0s3f\fR: \fI/var\fP\/ file system"
 
-  PR:             33391
-  Submitted by:   Ceri <setantae@submonkey.net>
+        box ht 1.5i with .nw at R.ne "\fIda0s3c: entire disk\fP\/"
+        ]
+.PE
+.Figure-heading "Typical partition layout without Vinum"
+.ps \n(PS
+This layout shows three file system partitions and a swap partition, which is
+not the layout recommended on page
+.Sref "\*[partition-size]" .
+We'll look at the reasons for this below.
+.P
+Each partition corresponds logically to a Vinum subdisk.  You could enclose all
+these subdisks in a Vinum drive.  The only problem is that Vinum stores its
+configuration information at the beginning of the drive, and that's where the
+root file system is.  One way to solve this problem is to put the swap partition
+first and make it 265 sectors longer than needed.  You can do this from
+.Command sysinstall
+simply by creating the swap partition before any other partition.  Consider
+installing FreeBSD on a 4 GB drive.  Create, in sequence, a swap partition of
+256 MB, a root file system of 256 MB, a
+.Directory /usr
+file system of 2 GB, and a
+.Directory /var
+file system to take up the rest.  It's important to create the swap partition at
+the beginning of the disk, so you create that first.  After installation, the
+output of
+.Command bsdlabel
+looks like this:
+.Dx
+8 partitions:
+#        size   offset    fstype   [fsize bsize bps/cpg]
+  a:   524288   532480    4.2BSD     2048 16384    94  
+  b:   532215      265      swap                       
+  c:  8386733        0    unused        0     0         # "raw" part, don't edit
+  e:  4194304  1056768    4.2BSD     2048 16384    89  
+  f:  3135661  5251072    4.2BSD     2048 16384    89  
+.De
+To convert to Vinum, use
+.Command bsdlabel
+with the \f(CW-e\fP (edit label) option to create a volume of type \fIvinum\fP\/
+that maps the \fIc\fP\/ partition:
+.Dx
+  h:  8386733        0     vinum
+.De
+.ne 10v
+After this, you have the following situation:
+.PS
+h = .2i
+dh = .02i
+dw = 1.7i
+move right .5i
+down
+[
+        boxht = h; boxwid = 1.6i
  
-  Revision  Changes    Path
-  1.43      +2 -2      src/share/man/man7/tuning.7
+S:      box ht .2i "\fIda0s3b\fR: swap"
+        box ht .3i "\fIda0s3a\fP\/: \fI/\fP\/ file system"
+        box ht .5i "\fIda0s3e\fR: \fI/usr\fP\/ file system"
+        box ht .5i "\fIda0s3f\fR: \fI/var\fP\/ file system"
+
+C:      box ht 1.5i with .nw at S.ne "\fIda0s3c: entire disk\fP\/"
+V:      box ht 1.5i with .nw at C.ne "\fIda0s3h: vinum drive\fP\/"
+        box ht .03 with .ne at V.ne fill
+        ]
+.PE
+.Figure-heading "Partition layout with Vinum"
+.ps \n(PS
+The shaded area at the top of the Vinum partition represents the configuration
+information, which cuts into the swap partition.  To fix that, we redefine the
+swap partition to start after the Vinum configuration information and to be 265
+sectors shorter.  The file systems are relatively trivial to recreate: take the
+size and offset values from the
+.Command bsdlabel
+output above and use them in a Vinum configuration file:
+.Dx
+drive rootdev device /dev/da0s2h
+volume swap
+  plex org concat
+#    b:    532215               265      swap
+    sd len 532215s  driveoffset 265s drive rootdev
+volume root
+  plex org concat
+#    a:    524288               532480    4.2BSD     2048 16384    94
+    sd len 524288s  driveoffset 532480s drive rootdev
+volume usr
+  plex org concat
+#    e:    4194304              1056768    4.2BSD     2048 16384    89
+    sd len 4194304s driveoffset 1056768s drive rootdev
+volume var
+  plex org concat
+#    f:    3135661              5251072    4.2BSD     2048 16384    89
+    sd len 3135661s driveoffset 5251072s drive rootdev
+.De
+.X "create, vinum command"
+.X "vinum, create command"
+The comments are the corresponding lines from the
+.Command bsdlabel
+output.  They show the corresponding values for size and offset.  Run \fIvinum
+create\fP\/ against this file, and confirm that you have the volumes
+.Directory / ,
+.Directory /usr
+and
+.Directory /var .
+.P
+Next, ensure that you are set up to start Vinum with the new method.  You should
+have the following lines in
+.File /boot/loader.conf \/:
+.Dx
+vinum_load="YES"
+vinum.autostart="YES"
+.De
+Then reboot to single-user mode, start Vinum and run
+.Command fsck
+against the volumes, using the \f(CW-n\fP option to tell
+.Command fsck
+not to correct any errors it finds.  You should see something like this:
+.Dx
+# \f(CBfsck -n -t ufs /dev/vinum/usr \fP
+** /dev/vinum/usr (NO WRITE)
+** Last Mounted on /usr
+** Phase 1 - Check Blocks and Sizes
+** Phase 2 - Check Pathnames
+** Phase 3 - Check Connectivity
+** Phase 4 - Check Reference Counts
+** Phase 5 - Check Cyl groups
+35323 files, 314115 used, 718036 free (4132 frags, 89238 blocks, 0.4% fragmentation)
+.De
+If there are any errors, they will probably be because you have miscalculated
+size or offset.  You'll see something like this:
+.Dx
+# \f(CBfsck -n -t ufs /dev/vinum/usr  \fP
+** /dev/vinum/usr (NO WRITE)
+Cannot find file system superblock
+/dev/vinum/usr: CANNOT FIGURE OUT FILE SYSTEM PARTITION
+.De
+You need to do this in single-user mode because the volumes are shadowing file
+systems, and it's normal for open file systems to fail
+.Command fsck ,
+since some of the state is in buffer cache.
+.P
+If all is well, remount the root file system read-write:
+.Dx
+# \f(CBmount -u /\fP
+.De
+Then edit
+.File /etc/fstab
+to point to the new devices.  For this example,
+.File /etc/fstab
+might initially contain:
+.Dx
+# $I\&d: fstab,v 1.3 2002/11/14 06:48:16 grog Exp $
+# Device                Mountpoint      FStype  Options         Dump    Pass#
+/dev/da0s4a             /               ufs     rw              1       1
+/dev/da0s4b             none            swap    sw              0       0
+/dev/da0s4e             /usr            ufs     rw              1       1
+/dev/da0s4f             /var            ufs     rw              1       1
+.De
+Change it to reflect the Vinum volumes:
+.Dx
+# $I\&d: fstab,v 1.3 2002/11/14 06:48:16 grog Exp $
+# Device                Mountpoint      FStype  Options         Dump    Pass#
+/dev/vinum/swap         none            swap    sw              0       0
+/dev/vinum/root         /               ufs     rw              1       1
+/dev/vinum/usr          /usr            ufs     rw              1       1
+/dev/vinum/var          /var            ufs     rw              1       1
+.De
+Then reboot again to mount the root file system from
+.Device vinum/root .
+You can also optionally remove all the UFS partitions \fIexcept the root
+partition\fP.  The loader doesn't know about Vinum, so it must boot from the
+UFS partition.
+.P
+Once you have reached this stage, you can add additional plexes to the volumes,
+or you can extend the plexes (and thus the size of the file system) by adding
+subdisks to the plexes, as discussed on page
+.Sref "\*[adding-plex]" .
+.H2 "Recovering from drive failures"
+One of the purposes of Vinum is to be able to recover from hardware problems.
+If you have chosen a redundant storage configuration, the failure of a single
+component will not stop the volume from working.  In many cases, you can replace
+the components without down time.
+.P
+If a drive fails, perform the following steps:
+.Ls
+.LI
+Replace the physical drive.
+.LI
+Partition the new drive.  Some restrictions apply:
+.Ls B
+.LI
+If you have hot-plugged the drive, it must have the same ID, the Vinum drive
+must be on the same partition, and it must have the same size.
+.LI
+If you have had to stop the system to replace the drive, the old drive will not
+be associated with a device name, and you can put it anywhere.  Create a Vinum
+partition that is at least large enough to take all the subdisks \fIin their
+original positions on the drive\fP\/.  Vinum currently does not compact free
+space when replacing a drive.  An easy way to ensure this is to make the new
+drive at least as large as the old drive.
+.P
+If you want to have this freedom with a hot-pluggable drive, you must stop Vinum
+and restart it.
+.Le
+.SPUP
+.LI
+If you have restarted Vinum, create a new drive.  For example, if the
+replacement drive \fIdata3\fP\/ is on the physical partition
+.File -n /dev/da3s1h ,
+create a configuration file, say \fIconfigfile\fP\/, with the single line
+.Dx
+drive data3 device /dev/da3s1h
+.De
+Then enter:
+.Dx
+# \f(CBvinum create configfile\fP
+.De
+.SPUP
+.LI
+Start the plexes that were down.  For example, \fIvinum list\fP\/ might show:
+.Dx
+vinum -> \f(CBl -r test\fP
+V test                  State: up       Plexes:       2 Size:         30 MB
+P test.p0             C State: up       Subdisks:     1 Size:         30 MB
+P test.p1             C State: faulty   Subdisks:     1 Size:         30 MB
+S test.p0.s0            State: up       PO:        0  B Size:         30 MB
+S test.p1.s0            State: obsolete PO:        0  B Size:         30 MB
+
+vinum -> \f(CBstart test.p1.s0\fP
+Reviving test.p1.s0 in the background
+vinum -> vinum[295]: reviving test.p1.s0        \fIthis message appears after the prompt\fP\/
+\fI(some time later)\fP\/
+vinum[295]: test.p1.s0 is up
+.De
+.SPUP
+.Le
+.H3 "Failed boot disk"
+If you're running your root file system on a Vinum volume, you can survive the
+failure of the boot volume if it is mirrored with at least two concatenated
+plexes each containing only one subdisk.  Under normal circumstances, you can
+carry on running as if nothing had happened, but obviously you will no longer be
+able to reboot from that disk.  Instead, boot from the other disk.
+.P
+The root file system also has individual UFS partitions, so you have a choice of
+what you mount.  For example, if your root file system has UFS partitions
+.Device -n da0s4a
+and
+.Device -n da1s4a ,
+you can mount either of these partitions or
+.Device -n vinum/root .
+Never mount more than one of them, otherwise you can cause data corruption.
+.P
+An even more insidious way to corrupt the root file system is to mount
+.Device -n da0s4a
+or
+.Device -n da1s4a
+and modify it.  In this case, the two partitions are no longer the same, but
+there's no way for Vinum to know that.  If this happens, you \fImust\fP\/ mark
+the other subdisk as crashed with the \fIvinum stop\fP\/ command.
+.H2 "Migrating Vinum to a new machine"
+Sometimes you might want to move a set of Vinum disks to a different FreeBSD
+machine.  This is simple, as long as there are no name conflicts between the
+objects on the Vinum disks and any other Vinum objects you may already have on
+the system.  Simply connect the disks and start Vinum.  You don't need to put
+the disks in any particular location, and you don't need to run \fIvinum
+create\fP\/: Vinum stores the configuration on the drives themselves, and when
+it starts, it locates it accordingly.
+.H2 "Things you shouldn't do with Vinum"
+The
+.Command vinum
+command offers a large number of subcommands intended for specific purposes.
+It's easy to abuse them.  Here are some things you should not do:
+.Ls B
+.LI
+Do not use the
+.Command resetconfig
+command unless you genuinely don't want to see any of your configuration again.
+There are other alternatives, such as
+.Command -n rm ,
+which removes individual objects or groups of objects.
+.LI
+Do not re-run the
+.Command -n create
+command for objects that already exist.  Vinum already knows about them, and
+the
+.Command -n start
+command should find them.
+.LI
+Do not name your drives after the disk device on which they are located.  The
+purpose of having drive names is to be device independent.  For example, if you
+have two drives
+.File -n a
+and
+.File -n b ,
+and they are located on devices
+.File -n /dev/da1s1h
+and
+.File -n /dev/da2s1h
+respectively, you can remove the drives, swap their locations and restart
+Vinum.  Vinum will still correctly locate the drives.  If you had called the
+drives
+.File -n da1
+and
+.File -n da2 ,
+you would then see something confusing like this:
+.Dx
+2 drives:
+D da2                   State: up       /dev/da1s1h     A: 3582/4094 MB (87%)
+D da1                   State: up       /dev/da1s2h     A: 3582/4094 MB (87%)
+.De
+This is clearly not helpful.
+.LI
+Don't put more than one drive on a physical disk.  Each drive contains two
+copies of the Vinum configuration, and both updating the configuration and
+starting Vinum slow down as a result.  If you want more than one file system to
+occupy space on a physical drive, create subdisks, not drives.
+.Le
