Latest interface: 0.3.1
Latest system: 010
harisn
User

14 posts

Posted on 2 April 2015 @ 02:37
I was reading more about ZFS and came across the following article:
http://blog.delphix.com/uday/2013/02/19/78/

Anybody has any first hand experience with severely fragmented file system on ZFS?


CiPHER
Developer

1199 posts

Posted on 2 April 2015 @ 19:00
Generally, keep the pool from filling up more than 80%. As long as you keep enough free space, fragmentation won't be a major problem. And L2ARC cache will mitigate it at least partly.
harisn
User

14 posts

Posted on 3 April 2015 @ 01:23
What size of L2ARC might be needed for a 7x3TB Raid-Z2?

The OS is installed on a 120 GB SSD, so I created two 26 GB partitions from the unused space and assigned them for ZIL and L2ARC each. Is that correct approach?

Is non-raid ZIL safe to use? The machine is going to be on UPS so it will always get enough time to write the data onto the permanent storage.
CiPHER
Developer

1199 posts

Posted on 3 April 2015 @ 14:16
Even an UPS does not mean your unprotected SSD is safe. If you use an unprotected SSD without capacitors, you might consider that with running without a ZIL at all. ZFS will not die; it doesn't need a ZIL. But your applications might.

Intel 320/S3500 and Crucial MX100 and the likes are suitable for ZIL/sLOG. Other SSDs without capacitors are not suitable, even if you mirror then 20 times -- that doesn't do you much if all 20 SSDs will lose the recent writes. That means you run without ZIL.

L2ARC is safe to use - you can use the crappiest SSD out there.

sLOG/ZIL can be 2 gigabytes or 4 gigabytes, and it determined by the SPEED of the pool; NOT the capacity! So a 1000000000 terabyte pool @ 100MB/s needs about 0,5 gigabytes of partition size, while a 2 gigabyte pool @ 200MB/s needs double the amount.

L2ARC can be 20 - 50 gigabytes and can be striped across multiple SSDs. If any data is corrupted, ZFS knows this immediately and reads from the pool instead. So ZFS L2ARC is safe, unlike other caching technologies like Intel SRT on Windows, where if there data on SSD is corrupt this data will be fed to the application. SRT won't ever know about corruption.

Generally i recommend: 25GB for OS, 4GB for sLOG/ZIL and 40GB for L2ARC. That is about 70GB. If your SSD is 120GB you should leave the rest unused and unpartitioned, as overprovisioning. If you do not use overprovisioning, your SSD will wear about 10 - 20 times faster and get very slow over time. L2ARC means only random writes; this is not casual usage. You NEED the overprovisioning.
harisn
User

14 posts

Posted on 3 April 2015 @ 15:41edited 4 April 2015 @ 00:17
SSD with capacitor(s) ensures the writes are completely made to the flash storage without leaving SSD in limbo. I was wondering if a mirror configuration is required at all even with a battery backed SSD.

> even if you mirror then 20 times -- that doesn't do you much if all 20 SSDs will lose the recent writes. That means you run without ZIL.

I read somewhere that ZIL is for posix compliance of sync writes and that data from ZIL is read back only while recovering from a system crash. But that will happen only if the machine looses power __and__ the ZIL-SSD fails at the same time (while ZIL still has data that is not yet made to the storage). All data in ZIL is also in ARC-RAM and ZFS flushes that data to the appropriate permanent storage from ARC itself, correct? So during normal operation even if ZIL dies that won't result in data loss. Also, during normal operation even if system crashes or looses power, ZFS will subsequently recover the recent writes from the SSD.


CiPHER
Developer

1199 posts

Posted on 3 April 2015 @ 15:52
SSDs with capacitors do not always protect the whole write-back; capacitors are primarily used not to wreck the SSD. Any SSD without such protection, can die any time it loses power unexpectedly.

If you have an SSD, then please look at the SMART data at Unexpected Power-Loss. Now wonder yourself: did you have as many power failures as the high number suggests? Probably you have not had any power failure at all, but still the SSD had dozens of unexpected power-loss. That is because a power failure and unexpected power-loss are two different things. Just pressing the power button at BIOS-phase, means unexpected power-loss.

In short: you do need more than just a UPS. A UPS protects against power failure, not against unexpected power-loss.

The ZIL is to protect application data. Without it, ZFS can be in perfect shape but your data is (slightly) corrupted. This is especially true if you use iSCSI meaning that other filesystems (NTFS,Ext4) live on ZFS. Without a ZIL, ZFS will be fine but the NTFS/Ext4 filesystem will not. You need sync writes for your applications; not for ZFS. ZFS can live without a ZIL. Your applications cannot.

If you do not use a sLOG (separate LOG device) then the ZIL will live on the HDD pool instead, gaining redundancy as well. By using sLOG you basically disable the ZIL on the pool itself, and use a separate dedicated device for the ZIL. That is also why it is faster: the harddrives can now concentrate on doing sequential I/O instead of having random writes in between.
harisn
User

14 posts

Posted on 3 April 2015 @ 17:10edited 5 April 2015 @ 17:13
I still cannot get my head around how ZIL and L2ARC work. Would it be safer not to use ZIL if that is not RAID-1? How about L2ARC, is it OK to use a single partition from the boot drive as L2ARC cache?

I am seeing a different issue with cache (L2ARC), after reboot my cache disk state becomes "UNAVAIL". If I re-add it works again. Can that result in data loss or corruption?

I read some concerns where users lost all their data because of a ZIL failure. That was in 2010 so I guess those issues are hopefully fixed now.

harisn
User

14 posts

Posted on 6 April 2015 @ 04:13
Another question: what would be a good record size for ZFS? I am currently using the default 128kb.
CiPHER
Developer

1199 posts

Posted on 6 April 2015 @ 18:25
ZIL
If you do not use a separate log device (sLOG) your ZIL will live on the pool instead, thus on your harddrives. This is safer, as it means it gains the (parity) protection from the pool as well. But if you use sLOG in mirror, that is very safe too.

But understand that 'safe' means safe for the application, not for ZFS. ZFS will be fine even without a ZIL (starting from version 19). The whole purpose of the ZIL is to obey sync writes from applications so that their stored data is consistent. In simple English that means that if you use iSCSI and have NTFS on it, you need to have a ZIL otherwise ZFS might be ok but NTFS on the iSCSI image would not be consistent.

You have a ZIL automatically, but if you use the sLOG feature the ZIL is stored on your SSD instead and not on the HDD pool any longer. That is also why it is faster. The HDDs do less work. Thus losing the sLOG means you lost the ZIL. Then your application might be in danger.

But you can use snapshots so that you have a point in time you can return to.

L2ARC
L2ARC is always safe. If you lose your SSD, ZFS will read from the HDD pool instead. If your SSD has corruption or inconsistent data then ZFS will know this by the checksum and read from HDD pool instead.

You can use the crappiest SSD for L2ARC, but know that bad SSDs can wear out quickly with L2ARC due to the high number of random writes. Make sure you use overprovisioning by not using more than 25-50% of the SSD capacity. Enterprise SSDs have this done automatically, but you can do the same with consumer-grade SSDs. My favourite is the Crucial MX100 256GB and 512GB.

Mixing L2ARC/sLOG/boot on one SSD, multiple SSDs?
You can use one SSD for multiple tasks, just partition ZFSguru to do that. Other platforms may not allow this, however, such as FreeNAS.

If you have two SSDs, you can mirror the boot partition with ZFS mirror, put the sLOG in a mirror as well, and the L2ARC goes in stripe meaning you add up the capacity of the two partitions. Thus 2x40GB = 80GB L2ARC which is quite a lot.

Remember: L2ARC costs you RAM as well!
It takes about 1:16 depending on various factors how much RAM you need. So one gigabyte RAM per 16 gigabytes L2ARC. Note however, that you will start with 0 extra RAM usage and only use more RAM as the L2ARC is filled with data. If you do not have enough RAM at one point, no more data to L2ARC will be written. So it only grows as it can grow.

Record size
Do not set the recordsize lower than 128KiB unless you know what you are doing (special database server). To aid with large number of disks in RAID-Z1/2/3 family, a larger recordsize is better. But this requires the relatively new large_block ZFS feature, that allows to set larger recordsizes than 128KiB. This is effective with many disks, like 10 disks in RAID-Z2 where each disk would do 1/8 of 128K which is only 16K. Having 1MiB recordsize would allow them to do 128K transfers, which can be much more effective for a ZFS NAS storing mainly very large files.
hsn
User

15 posts

Posted on 6 April 2015 @ 21:36
Thank for the detailed explanation! Most of these can go into a ZFS FAQ.

I will soon upgrade to 2 x 120GB intel SSD for OS, ZIL and L2ARC. I am getting a new motherboard ASRock E3C224D4I-14S (from your list in AnandTech) that has more SATA ports.

I am thinking of getting a G3420 for CPU (Intel, 53W, 3.2GHz 2-Core Intel Celeron). Does it make sense to have more than two cores with ZFS?
CiPHER
Developer

1199 posts

Posted on 6 April 2015 @ 21:43
Yes, ZFS can use all cores. For compression and encryption it is very useful. FreeBSD can use AES-NI acceleration for GEOM_ELI encryption.

But ZFS is mainly RAM limited. It scales performance capability with RAM. I use the 10W J1900 quadcore SoC CPU for ZFS extensively. It runs just great. But i only use 1 gigabit for those; no 10 gigabit.
hsn
User

15 posts

Posted on 6 April 2015 @ 22:34
I was reading about the record size a second time, do you say that if I am using 8 drives in Raid-Z2 (with data on 6 drives), each disk would do 21.3KB for a total of 128KB of record size? How will that work for 4k block size of the disks? Should I make block size 96kB (since 192KB is not yet possible)?
hsn
User

15 posts

Posted on 7 April 2015 @ 17:38edited 17:41 06s
Do you have experience with ASRock E3C224D4I-14S? I am wondering what would be its power requirement compared to the one you are using. LSI-SAS2308 is known to be more power hungry.

10 SATA ports should work just fine for me, but I need USB 3.0 for backup.
CiPHER
Developer

1199 posts

Posted on 7 April 2015 @ 21:19
You cannot create 96KiB, it has to be a power of two. 4/8/16/32/64/128/etc.

On the ZFSguru Pools->Create page it is explained which pool configurations are most optimal for ZFS with 4K harddrives.

Virtually every modern board has USB3. The board you linked to also has 4 ports (including the onboard headers).
hsn
User

15 posts

Posted on 7 April 2015 @ 21:48
I have already ordered ASRock E3C224D4I-14S and one reason for choosing that was its USB3 ports. Later I read that LSI-SAS2308 needs much more power than other similar controllers, that means the idle power usage might be higher because of LSI and IPMI. My intention of going for a new motherboard was for on-board HBA ports and for lower power consumption. I was wondering if I made a wrong choice!
Last Page

Valid XHTML 1.1