zx23 blog

Tunning ZFS Scrub

zfs scrub is a pain on one of our servers. It consumes all of the disk IO and any interactive work on it becomes annoying. Our pool is a mirror of two identical Samsung HD754JJ disks, we’re running 9.2-RELEASE, the box has 8GB RAM and default ZFS settings.

Here’s the IO load during scrub as shown by iostat(1):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
% iostat -x ada0 ada1 1 5
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0      11.3  31.0   428.6  1155.5   10  17.0  11 
ada1      11.3  31.1   429.5  1154.9    2  11.3   8 
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0     120.0   0.0 12985.6     0.0   10 100.2 100 
ada1     103.0   0.0 12067.8     0.0   10  31.9  51 
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0     187.8   0.0 21942.1     0.0   10  50.7  98 
ada1     192.8   0.0 21850.7     0.0    7  48.4  99 
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0     148.9   0.0 15679.7     0.0   10  56.3 101 
ada1     149.8   0.0 15653.8     0.0    7  39.9  73 
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0      86.9   1.0  7765.2     4.0    2 123.3  99 
ada1      65.9   1.0  5191.3     4.0    0  47.7  55

And, here’s our current pool status: (yes, we also seem to have a performance issue here, the scrub should go much faster):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
% zpool status
  pool: rpool
 state: ONLINE
  scan: scrub in progress since Wed Aug 13 04:59:44 2014
        247G scanned out of 569G at 10.1M/s, 9h5m to go
        0 repaired, 43.38% done
config:

        NAME                                            STATE     READ WRITE CKSUM
        rpool                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/6f4e7b58-cdb7-11df-b6d7-xxxxxxxxxxxx  ONLINE       0     0     0
            gptid/6ffa234a-cdb7-11df-b6d7-yyyyyyyyyyyy  ONLINE       0     0     0

errors: No known data errors

Direct tunning can be done by adjusting some sysctls, the relevant ones are below (with default values shown).

1
2
3
4
vfs.zfs.no_scrub_prefetch: 0
vfs.zfs.scrub_delay: 4
vfs.zfs.scan_idle: 50
vfs.zfs.vdev.max_pending: 10

After some testing with different settings, we settled with the following configuration. Note that we’re interested in having a responsive server during the scrub here and don’t care if scrub takes a long time to complete.

1
2
3
4
vfs.zfs.no_scrub_prefetch: 1
vfs.zfs.scrub_delay: 15
vfs.zfs.scan_idle: 1000
vfs.zfs.vdev.max_pending: 3

In summary, the above disables scrub prefetch; limits the number of IOPS to about 66 on each device (1000 / 15 = 66); tells ZFS that the pool can be considered idle 1000ms after last activity and sets max pending IO operations per device to 3.

You can read exelent descriptions of these (and other ZFS tunables) on this ZFS guide 1.

Now lets see what iostat(1) looks like with these changes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
% iostat -x ada0 ada1 1 5
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0      11.4  30.9   436.1  1155.3    0  17.0  11 
ada1      11.3  31.1   437.0  1154.7    0  11.3   8 
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0      84.9   0.0  2751.2     0.0    1   6.4  22 
ada1      93.9   0.0  2922.0     0.0    0   3.7  19 
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0      73.9  18.0  3377.6  2065.9    0   6.0  27 
ada1      71.9  18.0  3250.2  2065.9    0   6.1  23 
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0      65.9  87.9   810.7   579.9    0   5.7  34 
ada1      61.9  88.9   800.2   579.9    0   3.1  19 
                        extended device statistics  
device     r/s   w/s    kr/s    kw/s qlen svc_t  %b  
ada0      93.9  84.9  3690.8  4320.2    3   8.3  55 
ada1     121.9  84.9  4299.7  4320.2    0   5.6  40

Scrub seems to be running just as fast (when the system isn’t doing any other IO):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
% zpool status
  pool: rpool
 state: ONLINE
  scan: scrub in progress since Wed Aug 13 04:59:44 2014
        265G scanned out of 569G at 10.3M/s, 8h22m to go
        0 repaired, 46.63% done
config:

        NAME                                            STATE     READ WRITE CKSUM
        rpool                                           ONLINE       0     0     0
          mirror-0                                      ONLINE       0     0     0
            gptid/6f4e7b58-cdb7-11df-b6d7-xxxxxxxxxxxx  ONLINE       0     0     0
            gptid/6ffa234a-cdb7-11df-b6d7-yyyyyyyyyyyy  ONLINE       0     0     0

errors: No known data errors

And interactively the server is much more responsive, so thats objective complete.

Comments