Latest interface: 0.3.1
Latest system: 010
SaleB
User

6 posts

Posted on 27 March 2014 @ 14:52
Hello,

for a long time now I intend to build a file server for my personal needs, but newer had enough money to make a good system, so I prolonged the inevitable. Current situation: I am operating two pc's, one has sole purpose to carry a few hard drives and another is the main play/work/home pc. Both are running w7-64bit. There are now 12 independent drives with various data (multimedia, books, app, mostly static data) with approx. 20tb of data, without raid, backup or any other safety precaution. Both pc's are running 24/7 without ups and with consumer grade components. The only thing I have lost is 30gb of redownloadable multimedia on a 640gb drive five years ago, because a faulty cable corrupted a partition. You could say the systems are running on luck.

Intention is to transfer to ZFSguru, Raid-z2, without a backup, but maybe with some ups. With a proper server I could loose the small pc and could go for much smaller case for the main pc.

Small pc is gigabyte b85m with 8gb ram (max is 16gb) 4x sata3, 2xsata2 and it could be a a good solution for a ZFSguru, or not?

My first question is, how important is ECC memory really? Should I go for some ECC motherboard or is it not so important? This is very important question that I cannot answer for myself by reading the posts. Somewhere on HardForum have I read a year ago that no one should plan a file server without ECC memory, on the other hand, on my pc's I had no problem with ram (Kingston XMP 2x8gb on one machine and Kingston Genesys 8gb on the other; in earlier version there were geil ultra plus 2x4gb and 4gb partriot, also no issues). For my project most important question now is ECC or no ECC?

For the ecc option i have found a nice (cheap) asus board P8B WS (price approx 170e), it could run on some cheaper processor G2020 or similar, with KVR1333D3E9S modules (max 32gb), so the machine would cost approx 380e (mb, 2x 8gb ecc, g2020) without hba. That could be manageable. That board has 4xsata2 and 2xsata3, so the hba is needed.

Second question, is it advisable to use internal sata controller and hba in a ZFS machine? If I take a second hand 1068e (lsi original, ibm M1015 or similar) 8i hba, can i combine those 8 port with 6 on board, or is it better practice to use two 1068e cards?

I have been thinking about the pools, I would separate all data into two or three vdevs, one vdev z2 for critical, personal, non-replaceable data 4x3tb=6tb would be more then enough for next few years, and second vdev with non-critical data z2 with 6x4tb=16tb with intention to add a third on some point because this data collection grows 500-800gb yearly. With addition of another 4x4tb vdev in a year or two it would populate two 8port hba's. Can I use for example, 6 ports on a hba for one vdev and 4 ports on the motherboard for the other, and two other on motherboard for log/cache ssd's if needed? Or is it not advisable? Would it be advisable to separate those three vdevs in two separate machines?

Third question, I have been reading about cache ssd and log ssd, but I did not understand, is there one cache and one log drive for the whole zpool, or do all the vdevs have their own log/cache drives?

Now I will thank all of you who took time to read this huge post, and hope for a few good answers that would help me with my dilemmas.

Thankful in advance,
Sasa

CiPHER
Developer

1199 posts

Posted on 28 March 2014 @ 13:07
Small pc is gigabyte b85m with 8gb ram (max is 16gb) 4x sata3, 2xsata2 and it could be a a good solution for a ZFSguru, or not? Should be good, yes.


My first question is, how important is ECC memory really? Should I go for some ECC motherboard or is it not so important? This is very important question that I cannot answer for myself by reading the posts. Somewhere on HardForum have I read a year ago that no one should plan a file server without ECC memoryWell they are both right and wrong. The whole problem is that when people talk about ECC memory, they usually only talk about having it on the server; not on your desktop. However, since your desktops will probably be actually storing all data on the server, having ECC on the desktop is also very important.

If you opt for both ECC on server and desktop, that is really great.

Only using ECC on the server grants you only limited protection. And ZFS of all things is more resilient to RAM bitflips than anything else, because at least ZFS has two properties:
- it can detect RAM corruption (checksum errors)
- it can correct RAM corruption

Usually, when people see multiple checksum errors on multiple disks, this is because of bad RAM module which has become defect. After running MemTest86+ and replacing the bad module, the ZFS pool can be scrubbed again and most of the corruption will be corrected, or otherwise you get a message file X is corrupt in the output of 'zpool status -v' and this is also displayed on the ZFSguru web-interface on the Pools page.

So my argument is, you need ECC more on the desktop than on the server. The server at least has some protection against RAM bitflips, while your desktop without ECC memory has no protection at all. You will not even know there is corruption until you open a .JPG of your wedding pictures, to find out some are corrupt. That is not ZFS' fault; it has been given corrupt data to begin with so all the protections that ZFS employs will be for nothing.


Second question, is it advisable to use internal sata controller and hba in a ZFS machine? The chipset-provided SATA ports in AHCI-mode are the best ports you could possibly get, ever. Always use them, in particular for SSDs.


Third question, I have been reading about cache ssd and log ssd, but I did not understand, is there one cache and one log drive for the whole zpool, or do all the vdevs have their own log/cache drives?Both separate log (sLOG) and cache (L2ARC) devices are per-pool. You can have multiple sLOG disks in a pool in stripe/mirror configuration, or multiple cache disks in stripe configuration. One SSD can be partitioned to act as both sLOG and L2ARC. But know that only some consumer-grade SSDs are suitable as sLOG; Intel 320, Intel S3500/3700 and Crucial M500/M550.

Good luck :)
SaleB
User

6 posts

Posted on 28 March 2014 @ 14:33
First of all thank you for taking time to read and answer my questions.

I have spent many hours in last few days searching for hardware for the server that would be economical and suitable for my build.

Regarding your comments on desktop, I have not really been thinking about ECC in desktop, because there will always be the laptop, phone, tablet in use and all do not have ECC. So, my first concern is that data which gets to the server stays safe on the server.

It is decided server must have ECC.

On that matter, I have found a sweet little board ASRock C2750D4I with preassembled Avoton eight core 2.4GHz processor (Avoton is next generation Atom) with ECC support up to 64GB, 12 on-board SATA3 ports, 2 Gbit NICs, and it is server grade hardware. The price is approx 330eur, which is less then any other solution (MB+processor+controller combo) that I have calculated so far. FreeBSD 9.1 is supported by the manufacturer. Now there is only problem very short memory support list. But, I am working on options to find right memory for the job. There are a few people that mentioned on various forums that it does not boot NAS software, but I have found a blog about a guy who owns such a system and it works fine:
http://vsurf.wordpress.com/2014/02/14/building-my-diy-home-nas-whit-the-asrock-c2550d4i/

Now, I have a few more questions for you.

First, taking into account that above mentioned processor has limited power in contrast to some Xeon or similar. How do the vdevs relate to needed processing power. Is it better to have a few smaller vdevs or one bigger and how do the sLOG and L2ARC relate to that (more bluntly, what is the need for them). Is it better to have (to populate these 12 ports) one z2 with four drives, another z2 with six drives, sLOG and L2ARC, or is it better to go for one vdev z2 with 10 drives, with or without the two cache SSDs?

What you mentioned about SSD's, I had not been thinking about that, yet, but I hoped that I could use a Samsung 840 Evo/Pro, because they are by far most available in my country. What is the criteria for the SSD? It should be a MLC, or are there only the specific drives that perform well, and others do not?

Can I lower the need for sLOG and L2ARC by useing more then 1GB RAM per GB of data? Because this board that I mentioned above has max of 64GB it may be cheaper to add 32GB or 48GB of ram, in contrast to buying 2 ssd drives and using up 2 SATA ports.

Again, thankful in advance for all your help

Regards,
Sasa
SaleB
User

6 posts

Posted on 28 March 2014 @ 17:07
Ok, I have found a few answers for some of my questions.

I have found out that sLOG is a write cache device. It can be small, did not find how small. The device should have protection capacitors, and that is probably the reason why you named only three devices. I have found a S3500 120gb for 150eur and M500 240GB for 135eur, so if there is no reason for taking Intel I will opt for Crucial.

The L2ARC device is RAM extension for reading purposes. It takes a toll on RAM, people say 5gb, so I should opt for no less then 32gb RAM for 20 or so gb data space. Samsung 840 should be fine as L2ARC device and it's size should be 10-20x bigger then RAM. Should that mean that 240GB device is too small?

Here comes another question. When calculating 1GB RAM per 1TB data, when we have a vdev containing 4x 3tb in z2 formation. Do we calculate it as 12tb (all available space) or 6tb (space available for data only, excluding the parity drives)?

But, on matter of processing power needed for various vdev sizes I came up empty, this is still important and much needed information for my project.

On the matter of need for both cache devices, I have found out that they produce transfer speeds, but I could not find how much do they impact on those speeds.
CiPHER
Developer

1199 posts

Posted on 28 March 2014 @ 23:31edited 23:32 17s
ASrock C2750D4I motherboard
I know the ASRock C2750D4I and it's a great board for a NAS. The 2750 has 8-core Avoton, as you know, while the 2550 has 4 cores. The Asrock board has two additional Marvell AHCI SATA controllers for an extra 6 ports and they should work with ZFSguru/BSD. It also has dual Intel NIC. I don't know of these work with BSD yet, in the worst case you will need an addon NIC in the PCIe slot, like Intel PRO/1000 CT which is very cheap (22 euro).

But know that for 64GiB RAM you need 16GiB RAM modules, which are rare and more expensive. Most 16GiB RAM modules you find now are registered (RDIMM) and you need unbuffered (UDIMM). ECC UDIMM is less common.


Beware of incorrect information
I see you have been reading up on L2ARC and other ZFS issues. Please be aware that many things you read may not really be true, or just inaccurate. Or they may apply to server-like environments and not anything near the way you will be using it. For example, assuming you will need more than 240GB of L2ARC for a home NAS is totally insane.


File caching
Every modern computer OS uses file caching. Windows NT has it, Linux, BSD, Mac all have it. But most systems use only a 'dumb' cache that just overwrites the cache regardless of what that data is. This means that important stuff may not be cached while it could have been. ZFS uses its own caching engine, called the Adaptive Replacement Cache (ARC).

The ARC distinguishes between MRU (Most Recently Used) and MFU (Most Frequently Used). The MFU is the killer thing; it keeps those separate so that these important caches will not get overwritten whenever you read a large bulky file. That bulky file will get cached in the MRU instead.


What does this mean?
To make things less technical, the intelligent caching makes sure that the server will adapt to your 'usage pattern'. This causes programs you run or data you read respond quicker because it does not have to be read from mechanical disk; which has a latency that can be noticed by human beings. The RAM is insanely fast so feels 'instant'.


How much RAM do i need?
Obviously, the more RAM the better. But RAM is not cheap. SSDs are cheaper per GB. You are very correct when you say that L2ARC is like an extension of RAM memory. Storing cached data on the L2ARC device (SSD) will also cost some RAM memory, but only as the L2ARC device gets utilised more often. If there is not enough physical RAM available, then the L2ARC device simply cannot be utilised to its full capacity. So it's not like you NEED to have a lot of RAM to use L2ARC. But you should always try to have at least 8GiB and preferably 16GiB for a ZFS server. More is luxery.

The 1GB-RAM-per-1TB-data rule also senseless. You do not automatically need more RAM if you have bigger disks. But you may need more RAM if your disk configuration gets bigger with more vdevs. But it just boils down to: below 8GiB it is tricky but possible with tuning, above 8GiB you are good to go out of the box, tuning might help but is not required for decent baseline performance.


L2ARC caching
L2ARC is only useful if you intend to leave your server running for many days straight, because currently the L2ARC cache is not persistent, meaning that the cache is gone/reset when you reboot or power cycle. There are plans to implement persistent L2ARC in the future, so maybe next year or so we may see such feature. Then L2ARC caching might also be useful if you reboot regularly.

L2ARC is a very safe feature to use - if there is corruption on the SSD, it will get detected by ZFS and the data will be read from disk instead. L2ARC is a read cache, it only accelerates random reads. That does not mean that whole files will be cached on SSD. Instead, ZFS caches only random reads. These can be small files or small reads within large files. Think about games with a 16GB large file and it reads some spots within that large file. Because often it will read the same spots, ZFS will employ its caching techniques to respond quickly to those requests, meaning regular use of that game/app will mean quicker response times.

But personally, i think caching metadata is really important. The metadata is not your files themselves, but the structure (names, directories, properties, permissions). Caching a lot of metadata means fast file searches and quick directory access (feels like SSD).


L2ARC for reads, sLOG for writes?
A common misconception for the separate log device feature (sLOG) is that it is a regular write buffer. This is incorrect, and many people incorrectly think that all data you write will be written to the sLOG device. This is not true. Instead, only synchronous writes will be written to the sLOG. All other writes - called asynchronous writes - will be written directly to disk. It depends a lot on what kind of application you use. Some database users may see 100% sync writes. But regular home users see less than 1% sync writes.

Another misconception is that you need a bigger sLOG the bigger your pool gets. This is not true. You need a bigger sLOG the faster your pool gets. So a pool with ultra-fast 40GB disks in stripe will need a much larger sLOG than a slower but much larger 400TB pool. For most uses, 1GiB will be enough for most pools. But to be on the safe side, 4GiB is recommended. Very little space indeed.


sLOG == ZIL ??
sLOG is sometimes called 'ZIL' but that is incorrect; the ZFS Intent Log (ZIL or 'log') resides on the HDD pool if there is no sLOG device. That is why it is called a separate log device (sLOG).


So if i buy SSDs, how do i configure this?
Create multiple partitions using the partition map editor in ZFSguru, on the Disks page. You can create one 'system' partition for installing ZFSguru to and using it as boot device. Then a 4GiB sLOG partition. Then a L2ARC partition of the rest of the space but minus some space reserved for overprovisioning.

Overprovisioning your SSD means that you do not use all the capacity on the SSD. This does not work by simply having 'free space' but rather to create partitions and leaving some space at the end simply unused. This only works for brand new SSDs, used SSDs will need to be TRIM erased. ZFSguru can do this if the disk is on AHCI SATA controller (like the chipset SATA ports).

So for 240GB SSD:
partition 1: 20GiB 'system' partition
partition 2: 4GiB 'sLOG' partition
partition 3: 140GiB 'L2ARC' partition
unpartitioned space: ~80GB.


What SSD to get?
As for the SSD itself. You need capacitors if you want sLOG feature. If you only use L2ARC, you can get the crappiest SSD you want. But beware that it might wear quickly. You should use overprovisioning as well as choose a MLC-based SSD, and not a TLC or Triple-Level Cell based SSD device. That means: no Samsung EVO or Samsung 840.

Samsung is also not suitable for sLOG feature. If you use Samsung as sLOG, it will be as if you had the ZIL totally disabled, which is not what you want. Samsung does have some 'enterprise' models with power caps though, but if you want a cheap affordable SSD then Crucial M500 is the best buy. It is not perfect however, it does not have powerful enough capacitors to write all buffers. The Intel 320 is still the best proven SSD for this task in my opinion. But if my assumption is correct that Crucial honours the FLUSH CACHE command, then Crucial is also very suitable as sLOG. But i do not know this with 100% certainty.

SaleB
User

6 posts

Posted on 29 March 2014 @ 05:20
Ok, thank you again.

You have cleared many of my misconceptions, but I have a few new questions.

In the mean time I some German distributors wrote back to me. The distributor for the board offers Kingston KVR16E11 CL9 8gb modules for 75e per module, so I will take 16GB for the system and 8GB as a reserve (to be ready available in a drawer for some of worst case scenarios). Some other distributor has 40GB Intel 320, so it will be bought too.

First question, in the scenario with two ssd drives, does the system partition go on a L2ARC drive or to sLOG drive?

According to the ASRock web page they supply all the drivers for on-board devices for FreeBSD 9.1. So, let's hope the NICs should work fine.

So, that leaves only the question about the pool. There are 10 SATA ports left. Does the file system care how do I populate them?

There are a few options:
1. 10 drive vdev, as 8 data and 2 z2
2. 6 drive vdev, 4+2 and 4 drive vdev 2+2

Are there any reasons why I should opt for the first over the second option, or other way around. Are there any benefits of one solution over the other?

For me it would be more financially beneficial to create during a build, one 6 drive vdev and later add another vdev, on the other hand there is 2 drives less of usable space.

The devices should probably be WD Reds. I have read about many people who used Seagate DM001 2tb and 3tb devices, but I had problem with them in last tow years. Two of four devices had after 15 months (less then 500 days) of use concerning SMART data (reallocated sectors count, raw read errors) in contrast to that I have a few Samsung Spinpoint 204UI drives that have over 1200 days on the clock and are still in a perfect condition.

There should be one size of the disks, 3tb or 4tb, so I can go with only one "cold spare" in the drawer.

Last question (I hope): How do the vdev size choices reflect on length of the maintenance tasks (scrubs, resilvering, other tasks if they exist; did not yet get the chance to read about maintenance needs)?

Thank you again for all valuable help I could not grasp all the concepts so quickly without your help.

Regards,
Sasa
SaleB
User

6 posts

Posted on 29 March 2014 @ 13:03
Another quick question came up, about sizes of vdevs. I have found opposing informations on a few different places about sizes. I have found this:

A RAIDZ configuration with N disks of size X with P parity disks can hold approximately (N-P)*X bytes and can withstand P device(s) failing before data integrity is compromised.
Start a single-parity RAIDZ (raidz) configuration at 3 disks (2+1)
Start a double-parity RAIDZ (raidz2) configuration at 6 disks (4+2)
Start a triple-parity RAIDZ (raidz3) configuration at 9 disks (6+3)
(N+P) with P = 1 (raidz), 2 (raidz2), or 3 (raidz3) and N equals 2, 4, or 6
The recommended number of disks per group is between 3 and 9. If you have more disks, use multiple groups.

Here: http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide

On the other side people say that "prime" numbers are powers of 2 (2, 4, 8) that can be accompanied with 1, 2 or 3 parity drives, making acceptable sizes:

Z1: 2+1, 4+1 and 8+1
Z2: 2+2, 4+2 and 8+2
Z3: 2+3, 4+3 and 8+3

I feel that this option is right, in contrast to cited text from ZFS Best Practices, but I use the opportunity to ask.

Should the vdevs inside the same pool be the same size?
Last Page

Valid XHTML 1.1