Thursday, March 21, 2013

ZFS: Read Me 1st

Things Nobody Told You About ZFS

Yes, it's back. You may also notice it is now hosted on my Blogger page - just don't have time to deal with self-hosting at the moment, but I've made sure the old URL redirects here.

So, without further adieu..


I will be updating this article over time, so check back now and then.

Latest update 9/12/2013 - Hot Spare, 4K Sector and ARC/L2ARC sections edited, note on ZFS Destroy section, minor edit to Compression section.

There are a couple of things about ZFS itself that are often skipped over or missed by users/administrators. Many deploy home or business production systems without even being aware of these gotchya's and architectural issues. Don't be one of those people!

I do not want you to read this and think "ugh, forget ZFS". Every other filesystem I'm aware of has many and more issues than ZFS - going another route than ZFS because of perceived or actual issues with ZFS is like jumping into the hungry shark tank with a bleeding leg wound, instead of the goldfish tank, because the goldfish tank smelled a little fishy! Not a smart move.

ZFS is one of the most powerful, flexible, and robust filesystems (and I use that word loosely, as ZFS is much more than just a filesystem, incorporating many elements of what is traditionally called a volume manager as well) available today. On top of that it's open source and free (as in beer) in some cases, so there's a lot there to love.

However, like every other man-made creation ever dreamed up, it has its own share of caveats, gotchya's, hidden "features" and so on. The sorts of things that an administrator should be aware of before they lead to a 3 AM phone call! Due to its relative newness in the world (as compared to venerable filesystems like NTFS, ext2/3/4, and so on), and its very different architecture, yet very similar nomenclature, certain things can be ignored or assumed by potential adopters of ZFS that can lead to costly issues and lots of stress later.

I make various statements in here that might be difficult to understand or that you disagree with - and often without wholly explaining why I've directed the way I have. I will endeavor to produce articles explaining them and update this blog with links to them, as time allows. In the interim, please understand that I've been on literally 1000's of large ZFS deployments in the last 2+ years, often called in when they were broken, and much of what I say is backed up by quite a bit of experience. This article is also often used, cited, reviewed, and so on by many of my fellow ZFS support personnel, so it gets around and mistakes in it get back to me eventually. I can be wrong - but especially if you're new to ZFS, you're going to be better served not assuming I am. :)

1. Virtual Devices Determine IOPS

IOPS (I/O per second) are mostly a factor of the number of virtual devices (vdevs) in a zpool. They are not a factor of the raw number of disks in the zpool. This is probably the single most important thing to realize and understand, and is commonly not. 

ZFS stripes writes across vdevs (not individual disks). A vdev is typically IOPS bound to the speed of the slowest disk within it. So if you have one vdev of 100 disks, your zpool's raw IOPS potential is effectively only a single disk, not 100. There's a couple of caveats on here (such as the difference between write and read IOPS, etc), but if you just put as a rule of thumb in your head that a zpool's raw IOPS potential is equivalent to the single slowest disk in each vdev in the zpool, you won't end up surprised or disappointed.

2. Deduplication Is Not Free

Another common misunderstanding is that ZFS deduplication, since its inclusion, is a nice, free feature you can enable to hopefully gain space savings on your ZFS filesystems/zvols/zpools. Nothing could be farther from the truth. Unlike a number of other deduplication implementations, ZFS deduplication is on-the-fly as data is read and written. This creates a number of architectural challenges that the ZFS team had to conquer, and the methods by which this was achieved lead to a significant and sometimes unexpectedly high RAM requirement.

Every block of data in a dedup'ed filesystem can end up having an entry in a database known as the DDT (DeDupe Table). DDT entries need RAM. It is not uncommon for DDT's to grow to sizes larger than available RAM on zpools that aren't even that large (couple of TB's). If the hits against the DDT aren't being serviced primarily from RAM or fast SSD, performance quickly drops to abysmal levels. Because enabling/disabling deduplication within ZFS doesn't actually do anything to data already on disk, do not enable deduplication without a full understanding of its requirements and architecture first. You will be hard-pressed to get rid of it later.

3. Snapshots Are Not Backups

This is critically important to understand. ZFS has redundancy levels from mirrors and raidz. It has checksums and scrubs to help catch bit rot. It has snapshots to take lightweight point-in-time captures of data to let you roll back or grab older versions of files. It has all of these things to help protect your data. And one 'zfs destroy' by a disgruntled employee, one fire in your datacenter, one random chance of bad luck that causes a whole backplane, JBOD, or a number of disks to die at once, one faulty HBA, one hacker, one virus, etc, etc, etc -- and poof, your pool is gone. I've seen it. Lots of times. MAKE BACKUPS.

4. ZFS Destroy Can Be Painful

(9/12/2013) A few illumos-based OS are now shipping ZFS with "async destroy" feature. That has a significant mitigating impact on the below text, and ZFS destroys, while they still have to do the work, do so in the background in a less performance and stability damaging manner. However, not all shipping OS have this code in them yet (for instance, NexentaStor 3.x does not). If your ZFS has feature flag support, it might have async destroy, if it still is using the old 'zpool version' method, it probably doesn't.

Something often waxed over or not discussed about ZFS is how it presently handles destroy tasks. This is specific to the "zfs destroy" command, be it used on a zvol, filesystem, clone or snapshot. This does not apply to deleting files within a ZFS filesystem (unless that file is very large - for instance, if a single file is all that a whole filesystem contains) or on the filesystem formatted onto a zvol, etc. It also does not apply to "zpool destroy". ZFS destroy tasks are potential downtime causers, when not properly understood and treated with the respect they deserve. Many a SAN has suffered impacted performance or full service outages due to a "zfs destroy" in the middle of the day on just a couple of terabytes (no big deal, right?) of data. The truth is a "zfs destroy" is going to go touch many of the metadata blocks related to the object(s) being destroyed. Depending on the block size of the destroy target(s), the number of metadata blocks that have to be touched can quickly reach into the millions, even the hundreds of millions.

If a destroy needs to touch 100 million blocks, and the zpool's IOPS potential is 10,000, how long will that zfs destroy take? Somewhere around 2 1/2 hours! That's a good scenario - ask any long-time ZFS support person or administrator and they'll tell you horror stories about day long, even week long "zfs destroy" commands. There's eventual work that can be done to make this less painful (a major one is in the works right now) and there's a few things that can be done to mitigate it, but at the end of the day, always check the actual used disk size of something you're about to destroy and potentially hold off on that destroy if it's significant. How big is too big? That is a factor of block size, pool IOPS potential, extenuating circumstances (current I/O workload of the pool, deduplication on or off, a few other things).

5. RAID Cards vs HBA's

ZFS provides RAID, and does so with a number of improvements over most traditional hardware RAID card solutions. ZFS uses block-level logic for things like rebuilds, it has far better handling of disk loss & return due to the ability to rebuild only what was missed instead of rebuilding the entire disk, it has access to more powerful processors than the RAID card and far more RAM as well, it does checksumming and auto-correction based on it, etc. Many of these features are gone or useless if the disks provided to ZFS are, in fact, RAID LUN's from a RAID card, or even RAID0 single-disk entities offered up. 

If your RAID card doesn't support a true "JBOD" (sometimes referred to as "passthrough") mode, don't use it if you can avoid it. Creating single-disk RAID0's (sometimes called "virtual drives") and then letting ZFS create a pool out of those is better than creating RAID sets on the RAID card itself and offering those to ZFS, but only about 50% better, and still 50% worse than JBOD mode or a real HBA. Use a real HBA - don't use RAID cards.

6. SATA vs SAS

This has been a long-standing argument in the ZFS world. Simple fact is, the majority of ZFS storage appliances, most of the consultants and experts you'll talk to, and the majority of enterprise installations of ZFS are using SAS disks. To be clear, "nearline" SAS (7200 RPM SAS) is fine, but what will often get you in trouble is the use of SATA (including enterprise-grade) disks behind bad interposers (which is most of them) and SAS expanders (which almost every JBOD is going to be utilizing).

Plan to purchase SAS disks if you're deploying a 'production' ZFS box. In any decent-sized deployment, they're not going to have much of a price delta over equivalent SATA disks. The only exception to this rule is home and very small business use-cases -- and for more on that, I'll try to wax on about it in a post later.

7. Compression Is Good (Even When It Isn't)

It is the very rare dataset or use-case that I run into these days where compress=on (lzjb) doesn't make sense. It is on by default on most ZFS appliances, and that is my recommendation. Turn it on, and don't worry about it. Even if you discover that your compression ratio is nearly 0% - it still isn't hurting you enough to turn it off, generally speaking. Other compression algorithms such as gzip are another matter entirely, and in almost all cases should be strongly avoided. I do see environments using gzip for datasets they truly do not care about performance on (long-term archival, etc). In my experience if that is the case, go with gzip-9, as the performance difference between gzip-1 and gzip-9 is minimal (when then compared to lzjb or off). You're going to get the pain, so you may as well go for the best compression ratio.

8. RAIDZ - Even/Odd Disk Counts

Try (and not very hard) to keep the number of data disks in a raidz vdev to an even number. This means if its raidz1, the total number of disks in the vdev would be an odd number. If it is raidz2, an even number, and if it is raidz3, an odd number again. Breaking this rule has very little repercussion, however, so you should do so if your pool layout would be nicer by doing so (like to match things up on JBOD's, etc).

9. Pool Design Rules

I've got a variety of simple rules I tell people to follow when building zpools:
  • Do not use raidz1 for disks 1TB or greater in size.
  • For raidz1, do not use less than 3 disks, nor more than 7 disks in each vdev (and again, they should be under 1 TB in size, preferably under 750 GB in size) (5 is a typical average).
  • For raidz2, do not use less than 6 disks, nor more than 10 disks in each vdev (8 is a typical average).
  • For raidz3, do not use less than 7 disks, nor more than 15 disks in each vdev (13 & 15 are typical average).
  • Mirrors trump raidz almost every time. Far higher IOPS potential from a mirror pool than any raidz pool, given equal number of drives. Only downside is redundancy - raidz2/3 are safer, but much slower. Only way that doesn't trade off performance for safety is 3-way mirrors, but it sacrifices a ton of space (but I have seen customers do this - if your environment demands it, the cost may be worth it).
  • For >= 3TB size disks, 3-way mirrors begin to become more and more compelling.
  • Never mix disk sizes (within a few %, of course) or speeds (RPM) within a single vdev.
  • Never mix disk sizes (within a few %, of course) or speeds (RPM) within a zpool, except for l2arc & zil devices.
  • Never mix redundancy types for data vdevs in a zpool (no raidz1 vdev and 2 raidz2 vdevs, for example)
  • Never mix disk counts on data vdevs within a zpool (if the first data vdev is 6 disks, all data vdevs should be 6 disks).
  • If you have multiple JBOD's, try to spread each vdev out so that the minimum number of disks are in each JBOD. If you do this with enough JBOD's for your chosen redundancy level, you can even end up with no SPOF (Single Point of Failure) in the form of JBOD, and if the JBOD's themselves are spread out amongst sufficient HBA's, you can even remove HBA's as a SPOF.
If you keep these in mind when building your pool, you shouldn't end up with something tragic.

10. 4KB Sector Disks

(9/12/2013) The likelihood of this being an issue for you is presently very up in the air, very dependent on OS choice at the moment. There are more 4K disks out there, including some SSD's, and still some that are lying and claiming 512. However, there is also work being done to hard-code in recognition of these disks in illumos and so on. My blog post on here talking about my home BSD-based ZFS SAN has instructions on how to manually force recognition of 4K sector disks if they're not reporting on BSD, but it is not as easy on illumos derivatives as they do not have 'geom'. All I can suggest at the moment is Googling about zfs and "ashift" and your chosen OS and OS version -- not only does that vary the answer, but I myself am not spending any real time keeping track, so all I can suggest is do your own homework right now. I also do not recommend mixing -- if your pool started off with one sector size, keep it that way if you grow it or replace any drives. Do not mix/match.

There are a number of in-the-wild devices that are 4KB sector size instead of the old 512-byte sector size. ZFS handles this just fine if it knows the disk is 4K sector size. The problem is a number of these devices are lying to the OS about their sector size, claiming it is 512-byte (in order to be compatible with ancient Operating Systems like Windows 95); this will cause significant performance issues if not dealt with at zpool creation time.

11. ZFS Has No "Restripe"

If you're familiar with traditional RAID arrays, then the term "restripe" is probably in your vocabulary. Many people in this boat are surprised to hear that ZFS has no equivalent function at all. The method by which ZFS delivers data to the pool has a long-term equivalent to this functionality, but not an up-front way nor a command that can be run to kick off such a thing. 

The most obvious task where this shows up is when you add a vdev to an existing zpool. You could be forgiven to expect that the existing data in the pool would slide over and all your vdevs would end up of roughly equal used size (rebalancing is another term for this), since that's what a traditional RAID array would do. ZFS? It won't. That data balancing will only come as an indirect result of rewrites. If you only ever read from your pool, it'll never happen. Bear this in mind when designing your environment and making initial purchases. It is almost never a good idea, performance wise, to start off with a handful of disks if within a year or two you expect to grow that pool to a significant larger size, adding in small numbers of disks every X weeks/months.

12. Hot Spares

Don't use them. Pretty much ever. Warm spares make sense in some environments. Hot spares almost never make sense. Very often it makes more sense to include the disks in the pool and increase redundancy level because of it, than it does to leave them out and have a lower redundancy level.

For a bit of clarification, the main reasoning behind this has to do with the present method hot spares are handled by ZFS & Solaris FMA and so on - the whole environment involved in identifying a failed drive and choosing to replace it is far too simplistic to be useful in many situations. For instance, if you create a pool that is designed to have no SPOF in terms of JBOD's and HBA's, and even go so far as to put hot spares in each JBOD, the code presently in illumos (9/12/2013) has nothing in it to understand you did this, and it's going to be sheer chance if a disk dies and it picks the hot spare in the same JBOD to resilver to. It is more likely it just picks the first hot spare in the spares list, which is probably in a different JBOD, and now your pool has a SPOF.

Further, it isn't intelligent enough to understand things like catastrophic loss -- say you again have a pool setup where the HBA's and JBOD's are set up for no SPOF, and you lose an HBA and the JBOD connected to it - you had 40 drives in mirrors, and now you are only seeing half of each mirror -- but you also have a few hot spares in that JBOD, say 2. Now, obviously, picking 2 random mirrors and starting to resilver them from the hot spares still visible is silly - you lost a whole JBOD, all your mirrors have gone to single drive, and the only logical solution is getting the other JBOD back on (or if it somehow went nuts, a whole new JBOD full of drives and attach them to the existing mirrors). Resilvering 2 of your 20 mirror vdevs to hot spares in the still-visible JBOD is just a waste of time at best, and dangerous at worst, and it's GOING to do it.

What I tend to tell customers when the hot spare discussion comes up is actually to start with a question. The multi-part question is this: how many hours could possibly pass before your team is able to remotely login to the SAN after receiving an alert that there's been a disk loss event, and how many hours could possibly pass before your team is able to physically arrive to replace a disk after receiving an alert that there's been a disk loss event?

The idea, of course, is to determine if hot spares are seemingly required, or if warm spares would do, or if cold spares are acceptable. Here's the ruleset in my head that I use after they tell me the answers to that question (and obviously, this is just my opinion on the numbers to use):

  • Under 24 hours for remote access, but physical access or lack of disks could mean physical replacement takes longer
    • Warm spares
  • Under 24 hours for remote access, and physical access with replacement disks is available by that point as well
    • Pool is 2-way mirror or raidz1 vdevs
      • Warm spares
    • Pool is >2-way mirror or raidz2-3 vdevs
      • Cold spares
  • Over 24 hours for remote or physical access
    • Hot spares start to become a potential risk worth taking, but serious discussion about best practices and risks has to be had - often is it's 48-72 hours as the timeline, warm or cold spares may still make sense depending on pool layout; > 72 hours to replace is generally where hot spares become something of a requirement to cover those situations where they help, but at that point a discussion needs to be had on customer environment that there's a > 72 hour window where a replacement disk isn't available
I'd have to make one huge bullet list to try to cover every possible contingency here - each customer is unique, but this is some general guidelines. Remember, it takes a significant amount of time to resilver a disk, and so adding in X amount of additional hours is not adding a lot of risk, especially for 3-way or higher mirrors and raidz2-3 vdevs which can already handle multiple failures.

13. ZFS Is Not A Clustered Filesystem

I don't know where this got started, but at some point, something must have been said that has led some people to believe ZFS is or has clustered filesystem features. It does not. ZFS lives on a single set of disks in a single system at a time, period. Various HA technologies have been developed to "seamlessly" move the pool from one machine to another in case of hardware issues, but they move the pool - they don't offer up the storage from multiple heads at once. There is no present (9/12/2013) method of "clustered ZFS" where the same pool is offering up datasets from multiple physical machines. I'm aware of no work to change this.

14. To ZIL, Or Not To ZIL

This is a common question - do I need a ZIL (ZFS Intent Log)? So, first of all, this is the wrong question. In almost every storage system you'll ever build utilizing ZFS, you will need and will have a ZIL. The first thing to explain is that there is a difference between the ZIL and a ZIL (referred to as a log or slog) device. It is very common for people to call a log device a "ZIL" device, but this is wrong - there is a reason ZFS' own documentation always refers to the ZIL as the ZIL, and a log device as a log device. Not having a log device does not mean you do not have a ZIL!

So with that explained, the real question is, do you need to direct those writes to a separate device from the pool data disks or not? In general, you do if one or more of the intended use-cases of the storage server are very write latency sensitive, or if the total combined IOPS requirement of the clients is approaching say 30% of the raw pool IOPS potential of the zpool. In such scenarios, the addition of a log vdev can have an immediate and noticeable positive performance impact. If neither of those is true, it is likely you can just skip a log device and be perfectly happy. Most home systems, for example, have no need of a log device and won't miss not having it. Many small office environments using ZFS as a simple file store will also not require it. Larger enterprises or latency-sensitive storage will generally require fast log devices.

15. ARC and L2ARC

(9/12/2013) There are presently issues related to memory handling and the ARC that have me strongly suggesting you physically limit RAM in any ZFS-based SAN to 128 GB. Go to > 128 GB at your own peril (it might work fine for you, or might cause you some serious headaches). Once resolved, I will remove this note.

One of ZFS' strongest performance features is its intelligent caching mechanisms. The primary cache, stored in RAM, is the ARC (Adaptive Replacement Cache). The secondary cache, typically stored on fast media like SSD's, is the L2ARC (second level ARC). Basic rule of thumb in almost all scenarios is don't worry about L2ARC, and instead just put as much RAM into the system as you can, within financial realities. ZFS loves RAM, and it will use it - there is a point of diminishing returns depending on how big the total working set size really is for your dataset(s), but in almost all cases, more RAM is good. If your use-case does lend itself to a situation where RAM will be insufficient and L2ARC is going to end up being necessary, there are rules about how much addressable L2ARC one can have based on how much ARC (RAM) one has.

16. Just Because You Can, Doesn't Mean You Should

ZFS has very few limits - and what limits it has are typically measured in megazillions, and are thus unreachable with modern hardware. Does that mean you should create a single pool made up of 5,000 hard disks? In almost every scenario, the answer is no. The fact that ZFS is so flexible and has so few limits means, if anything, that proper design is more important than in legacy storage systems. It is a truism that in most environments that need lots of storage space, it is likely more efficient and architecturally sound to find a smaller-than-total break point and design systems to meet that size, then build more than one of them to meet your total space requirements. There is almost never a time when this is not true.

It is very rare for a company to need 1 PB of space in one filesystem, even if it does need 1 PB in total space. Find a logical separation and build to meet it, not go crazy and try to build a single 1 PB zpool. ZFS may let you, but various hardware constraints will inevitably doom this attempt or create an environment that works, but could have worked far better at the same or even lower cost.

Learn from Google, Facebook, Amazon, Yahoo and every other company with a huge server deployment -- they learned to scale out, with lots of smaller systems, because scaling up with giant systems not only becomes astronomically expensive, it quickly ends up being a negative ROI versus scaling out.

17. Crap In, Crap Out

ZFS is only as good as the hardware it is put on. Even ZFS can corrupt your data or lose it, if placed on inferior components. Examples of things you don't want to do if you want to keep your data intact include using non-ECC RAM, using non-enterprise disks, using SATA disks behind SAS expanders, using non-enterprise class motherboards, using a RAID card (especially one without a battery), putting the server in a poor environment for a server to be in, etc.


  1. This is great! When you have a slog, how do you decide pool spindle count to maximize the use of the slog. I have always used mirrors, but my math says to take advantage of a high performance slog, that I would want lots of spindles.

    My slogs do 900MB/sec, therefore don't I want a pool that does 900MB/sec, which is 20+ vdevs.

  2. That answer is really pretty specific on the workload of the pool itself. Much of the time, the slog devices are there to speed up the pool by offloading the ZIL traffic - and as an added benefit, reducing write latency from a client perspective.

    I almost always am looking at slog devices from an IOPS perspective first and foremost, and a throughput potential as a distant or even non-existent second (depends on the environment). Often a pool that can do 2.4 GB/s in a large-block sequential workload can't do anywhere near that at 4K random read/write request sizes (indeed, that's some 620,000 IOPS) -- and the client is doing exactly those, so suddenly all the interest is in IOPS and little time is spent worrying about throughput.

    In a pure throughput workload, things can and should be a bit different. And in ZFS, they are. For instance, ZFS has built-in mechanics for negating normal ZIL workflow if the incoming data is a large-block streaming workload. It can opt to send the data straight to the disk, bypassing any slog device (well, bypassing the ZIL entirely, really, and thus the slog device). There could be a whole post at some point on the varying conditions and how ZFS deals with each, I think. You've got 'logbias' on datasets (good writeup here: ). And even on latency, there's some code to deal with limits, I believe. Take a look at: , or the ZFS On Linux guys (dechamps, specifically) has a pretty good write-up on this at .

  3. I liked the oracle article the best, thanks for the feedback. My scenario is different then theirs however. My specific workload is a VDI implemnation with 80/20 r/w bias. I cannot seem to get a diskpool to get to performance levels to match the hardware I think. I have 22 spindle 10K mirrored pools with a ram based slog. The slog is rated at 90K iops and 900MB/sec.

    Wouldn't zpool iostat show under ideal conditions 22 * 50MB/s = 1100 MB/sec or near there? Best I can get is 300 MB/sec. I am just trying to explain the gap. Zpool iostat shows peaks of 42K iops which is great, but never very high MB/sec. When the system is not busy, I would think that a file copy would reach the speed of the slog at least or at least double what the readings are at 300MB/sec. Nobody seems to use zpool iostat for performance data. iostat seems to be the tool of choice, but I don't have that data compiled over time like I do for zpool iostat.

    So would taking my 22 spindle 10k mirrored pool to a 44 spindle mirrored pool, which a little bigger than what oracle pushed at here: I should see my numbers go up closer to the limits of my slog right?

  4. So, 'rated at' and 'capable of' are always two different things. However, more importantly, 'capable of when used as a ZFS log device' is a whole new ballgame.

    Manufacturers tend to provide numbers that show them in the most favorable light -- and even third-party analysis websites focus on typical use-cases: database, file transfer, those sorts of things.

    ZIL log device traffic is something every device fears - synchronous single-thread I/O. Your device may be capable of 90,000 IOPS @ 4K block size with 8, 16, 32, or more threads.. and anywhere from 4 to 64 threads is likely what both they and third-party websites run tests at -- but what can it do at 1 thread, at the average block size of your pool datasets? Because that's what a log device will be asked to do. :)

    As for SPECfs - I tend to, well, ignore that benchmark entirely. What it is testing isn't particularly real-world applicable, especially since vendors tend to game the system. For instance, you mention 44 spindle mirrors - no, in that test, the Oracle system had *280* drives, which they split up into 4 pools, each containing 4 filesystems, which were then tested in aggregate I believe. I also believe the data amount tested was significantly less than the pool size, and various other tunings were likely done as well. This picture gives some idea as to how big that system was:

    Even pretending you had the specific tunings, and ignoring for a moment its not particularly fair to just 'divide down' to get an idea for what a smaller system could do, doing so puts your 22 spindle 10K mirrored pool at about 14K iops, on the same benchmark.

    I generally want to see both iostat and zpool iostat; they're very different in what they're reporting, as they're reporting on different layers. Sometimes the combination of both gives hints that one or the other would not alone provide.

    I suspect with a 'VDI' implementation you're probably running 4-32K block size, and at that, I'd be happy with a peak of 42K iops out of 22 10K disks.. indeed, that's way past what you should realistically expect out of the drives, most of that 42K is coming out of ARC and an 80%+ read workload. Were I just gut feeling, I'd suspect you to get much less at times.

    This sort of performance work is time-consuming and involves a ton of variables. However, it is important to note that the log device is not some sort of write cache -- that's your RAM. The log device's job is to take the ZIL workload off the data pool. The performance benefit of that is purely in that the pool devices now have all those I/O they were spending on ZIL back. If there's any further benefit, its just that luck of the draw that the incoming writes were 'redundant' (they were writing some % of the same blocks multiple times within a txg, allowing ZFS to effectively ignore all but the last write that comes in when it lays it out on the spinning media). The pain that spinning disks feel from ZIL traffic cannot be understated. However, the streaming small-block performance of the spinning media minus the serious pain of interjecting the random read that gets past the ARC is, at the end of the day, the actual write performance the pool is capable of -- not what the log device can do at all.

    In super streaming workloads, sometimes, the log devices end up being the bottleneck. However, in almost all VM/VDI deployments I've seen, the log device is not your bottleneck - your drives are. :)

  5. Therefore, is going from a 22 disk mirror to a 44 disk mirror bad? How may vdevs are too many vdevs? The spec test, which I get was tuned, 280 disks, 2 controllers, leaves 140 disks per controller. 4 slogs, mean 4 mirrored pools, therefore they used 35 spindles. But you say that lots of spindles is bad.

    The slog I have is a STEC ZeusRAM. I discovered them in the Nexenta setup from VMworld 2011 (I have a diagram of it also), which is what I have been trying to replicate ever since. Since I have 100 of these 10K drives and JBODS to go with them, I am trying to figure out how to get the best out of them for a VDI deployment. So far I have only tried 22 spindles and I was thinking 44 would be better.

    Lots of $$ in equipment and consultants plus gobs and gobs of wasted time still has me scratching my head.

  6. No no, more spindles is usually better up to a point. I don't start to worry about spindle counts until it is up into the 100's. However, remember the Oracle box got the 200K+ IOPS only from 280 spindles - at 44, you're at a small fraction of that.

    Your box will perform twice as well as your 22-disk mirror system does, assuming no part in the system hits a bottleneck (which is going to happen if you've insufficient RAM, CPU, network, etc), and it is properly tuned(!). I would not expect, on a properly tuned system, in an 8-32K average block size VDI-workload, for 44 drives in mirror pool to be able to outperform a single STEC ZeusRAM (eg: I wouldn't expect it to be your bottleneck, from an IOPS perspective).

    I would expect the ZeusRAM to bottleneck you on a throughput test - or if your average blocksize is 32K or greater (getting ever more likely up to 128K). Its IOPS potential is not 90,000 at 4K, nor at 8K, 32K, or 128K (each of which is worse than the previous), because ZIL traffic is single-threaded, unlike most benchmarks you'd cite when saying how fast a device is.

    I love ZeusRAM, and I recommend them on every VM/VDI deployment I'm involved with and commend you on their use; but while they are in fact the very best device you could possibly use, it is not like they can't limit you, they are not of unlimited power. Still, again, if you're at 16K or under average block size, I'd suspect your pool (22 or 44 drives) to run out of IOPS, first. What block size are you using? What protocol (iSCSI, NFS)?

    Is this a NexentaStor-licensed system, or a home grown (and if so, what O/S & version)? That will matter in terms of where you can go for performance tuning assistance - because it needs some, unless you've already started down that path? I'm unaware of a single ZFS-capable O/S whose default tuneables for ZFS will well suit a high-IOPS VDI workload. The spinning disks are very likely underutilized.

  7. I am running Solaris 11 because the Nexenta resellers I reached out to were too busy to get back with me I guess because they never did. So I just started buying what made sense to me. If any of you out there are reading this.. look at what you missed. Sorry I wasn't interesting enough! I have 100 SAS 10K spindles, 2 Stec's, 2 DDRDrives, 2 256GB w/10Gbe Servers. Tried a 60 disk pool but someone told me it was too big, so now I have 22. Your nugget of vdev's are for I/O was worth the price of admission. I learned this, you can never have enough RAM, ever. All in all its been such a letdown because of the $$ spent and the results achieved.

  8. Sorry they never got back to you. Doubly so since that precludes the option of contacting Nexenta to do a performance tuning engagement. :(

    Also sorry the performance has seemed underwhelming - this is one of the current problems with ZFS go-it-on-your-own, is that there's just such a dearth of good information out there on sizing, tuning, performance gotchya's, etc - and the out of box ZFS experience at scale is quite bad. What information does exist is often in mailing lists, hidden amongst a lot of other, bad advice. I'm hoping to try to fix that as best I can with blog entries on here, but time I have to spend on this is erratic, and some of these topics are nearly impossible to address fully in a few paragraphs on a blog post, I'm afraid.

    60 disk is most assuredly not 'too big'. Average Nexenta deployment these days I'd say is probably around 96 disks per pool, or somewhere thereabouts. If you don't mind people poking around on the box via SSH (and it is in a place where that's possible), email me ( to work out login details, and I can try to find some off time to take a peek at it.

  9. I dropped you a note last weekend, but maybe your on spring break like I have been. I was thinking of just adding another jbod of 24 disks to the exiting pool, creating new zfs datasets, then copy the exiting data to them to spread it around the new disks. Go from 22 to 44 spindles. The whole operation should only take a few hours. Currently when I do zpool iostat I see maybe 1-2k ops/s with a high of 4k. What I don't like is the time to clone VM's, the max MB/s I get is around 500-550 and doubling the spindles would double that .. correct?

    Also.. how many minutes/seconds should a RSF1 with 96 disks take to fail over? I am curious what I would expect.

  10. Oops, email lost in the clutter. I've responded.

    It would very likely double your IOPS count, but not potentially double your throughput count, since there's more bottleneck concerns to consider there. I assume you're using NFS -- you might (and it IS beta, so bear that in mind) be interested in this: - we're in beta on the NFS VAAI plugin. I say that because you mentioned tasks like VM cloning and such, and NFS VAAI support could have a serious impact on certain VM image manipulation tasks in VMware when backed by NexentaStor. Possibly worth looking at (though again -- beta, probably not good for production, yet).

    The goal of RSF-1 is to fail over in the shortest safe time possible. I've seen failovers take under 20 seconds. That said, I've also seen them take over 4 minutes (which isn't bad when you put it in context -- at my last job, my Sun 7410 took *15 minutes* to fail over). There's a number of factors involved. Number of disks is one, number of datasets (zvols & filesystems) is another. In general I recommend people expect 60-120 seconds, which is why I have the blog post up on VM Timeouts and suggest at least 180 second timeout values everywhere (personally I use higher than even that, as I see no reason to go read-only when I know the SAN will come back *some day*).

  11. What about Zpool fragmentation? That seems to be another issue with ZFS that you don't see much discussion about. As your pools get older, they tend to get slower and slower because of the fragmentation, and in the case of a root filesystem on a zpool, that can even mean that you can't create a new swap device or dump device because there is no contiguous space left. Zpools really need a defrag utility. Today the only solution is to create a new pool and migrate all your data to it.

    A related issue is that there are no tools to even easily check the pool fragmentation. Locally, we estimate the fragmentation based on the output of "zdb -mm", but even that falls down when you have zpools that are using an "alternate root" (for example in a zone in a cluster). "zpool list" sees those pools fine, but zdb does not.

    Are you aware of any work being done on solutions to those issues?

    1. BK:

      Fragmentation does remain a long-term problem of ZFS pools. The only real answer at the moment is to move the data around -- eg: zfs send|zfs recv it to another pool, then wipe out the original pool and recreate, then send back.

      The 'proper' fix for ZFS fragmentation is known -- it is generally referred to as 'block pointer rewrite', or BPR for short. I am not presently aware of anyone actively working on this functionality, I'm afraid.

      For most pools, especially ones kept under 50-60% utilization that are mostly-read, it could be years before fragmentation becomes a significant issue. Hopefully by then, a new version of ZFS will have come along with BPR in it.

  12. Ok I have a quick question and I'll include specific info below the actual question just in case you need it: I have a home "all-in-one" ESX/OpenIndiana ZFS machine. Right now I have a 8 disk RAIDz2 array with 8 2TB drives. 2 Samsung, 2 WD, 2 Seagate and 2 Hitachi drives (just worked out that way). The two Hitachi drives are 7200RPM the rest are 5400-5900RPM drives. 6 of them are 4k and I think the two hitachi's are "regular" 512byte drives. I want to know if I'm making a terrible mistake mixing these drives? I don't mind "loosing" the performance of those 7200RPM drives over the 5400 ones I just don't want data risk due to that. I could probably find someone who would happily trade those 7200RPMs for 5400s but if I can leave it as is I would prefer that.

    Second, I have pulled a spare 2TB 5400rpm WD green from a external case and was going to put it in as a hotspare. Would I be better off just rebuilding the array as a z3 instead (or tossing it into the z2 array for a "free" 2TB?) Or leaving it in a box and keeping it for when something dies? (BTW) This question *might* have a different answer after you read my specs below.

    Specs: Supermicro dual CPU MB (2x L5520 Xeons) with 48GB of ECC registered samsung ram. 4 PCI-E 8x slots, 3 PCI-E 8x (4x electrical) slots. 1 LSI 3081E-R 3gb/s HBA, 1 M1015 6gb/s HBA, 2 (incoming, not installed yet) 3801E 3gb/s HBAs (for the drives soon to go into my DIY DAS), 1x Mellanox DDR Infiniband card. Drives: the 8 2TB drives previously mentioned in a z2 pool and 8 300GB raptor 10kRPM in a 8 disk RAID10 array for putting VMs on (overkill honestly, but fun). Right now its one pool on one card, one pool on the other. SOFTWARE: ESXi 4.1U3, Open Indiana VM with 24GB of ram handed over from ESXi.

    My future plans were to use 5 1TB drives I had laying around to create another pool. My pool ideas were raidz1 with all 5, raidz2 with all 5, or RAID10 with 6 (and locate/buy a 6th 1TB drive). Given that I was going to have this second pool, possibly setup with RAIDz1 I was seriously considering using that 9th 2TB drive as a global hotspare so it could be pulled into either pool (even the pool with the 1TB drives right?). And even more bizzare an idea could I also flag that 2TB drive to be used as a spare for the RAID10 raptors? Obviously it would impact performance if it got "called-to-duty" but it would protect the array until I could source a new 10k drive right? If thats a really stupid idea then I'll just order another 10k drive this weekend and toss it in as a spare for the mirror set, or if you prefer, sit on it in a box and swap it in when something actually dies (which saves power so I'm ok with it).

    Right now the data that is considered "vital" and business important (my wifes small business) is sitting on the 8 disk Z2 array, and two different physical locations elsewhere in the house, *and* backed up on tape. Regardless how we setup the ZFS pools all important data will be residing on the ZFS box *AND* *TWO SEPARATE* other locations/machines/drives (IE, it will always be on 3 separate machines on at least 3 separate drives. The array with the 1TB drives will be housing TV shows and Movies that can be lost without many tears so while I'd prefer not to lose it, its not *vital* like the array with the 2TB disks. I would be open to changing the *vital* array/pool to a RAID10 or RAIDz3 if you believe its worthwhile considering my requirements.

    Thanks for any help you feel like giving! :) -Ben

    1. Let me see if I can get through all this!

      1) Mixing RPM's -- generally speaking, don't do it. You'll only go as fast as the slowest disk in the pool. If you don't mind that as well as the occasional performance hiccup as well, maybe that's fine. I suspect it would perform worse than you might even expect - it isn't a situation I've spent any time investigating, and it isn't something ZFS has specifically set out to work well at.

      2) Mixing 4K & non-4K disks -- I've not spent any real time thinking this one out or seen it, but in general I'd also suggest not doing it if you can avoid it. It would also be pretty important that all the disks were appearing the same, even if some technically aren't the same. However, of note -- it sounds like what you actually have are 4K disks that REPORT as 512. There are a few of these running around, and they lead to really terrible performance. You can Google around (or maybe I'll see about writing up an entry for this at some point) about zpool 'ashift'.

      3) I'd throw the extra disk into the existing z2, making it a z3, if its uptime is vital. You'll find z2/z3 slightly more failure-resistant than the same disks in a raid10, so I don't recommend that, at least. You could in theory use it as a global spare (though I don't like hotspares, just warm or cold spares), though putting it into service paired to a 10K disk would indeed lead to very odd performance issues and wouldn't generally be something I'd recommend (I tend to be conservative and cautious when it comes to storage, though).

  13. Andrew,

    I'm curious about your following rules. Could you be so kind as to point me to some further information on these? Also shouldn't the number of data disks always be 2**n, meaning that raidz2 should start at 6 not 5?

    Do not use raidz1 for disks 1TB or greater in size.
    For raidz1, do not use less than 3 disks, nor more than 7 disks in each vdev (and again, they should be under 1 TB in size, preferably under 750 GB in size).
    For raidz2, do not use less than 5 disks, nor more than 10 disks in each vdev.
    For raidz3, do not use less than 7 disks, nor more than 15 disks in each vdev.


    1. I need a prize if you find a typo. You are correct, I've updated it to 6.

      The primary source for that information is internal experience (1000's of live zpools), that is not knowledge I picked up from websites or blog posts. The 'do not use less' rule is fairly obvious - it's silly; why would you use less than 7 disks in a raidz3 (at just 5, you're left with more parity than data, and should probably have gone with raidz2 at 6 disks).

      The 'more' rule logic is around the nightmare scenario of 'lose another disk while resilvering from an initial loss'. You do not want this to happen to you, and keeping the number of disks per vdev low is how you mitigate it. I will actually bend these rules based on disk type, JBOD layout, known workload, etc. I'll be more conservative in environments where the number of JBOD's is low, the workload is high (thus making resilvers take longer as they compete for I/O), or the chosen disk type is very large (3+ TB) since they take forever to resilver, or a vendor or model of disk I'm less confident in, since I'll then expect more failures). I'll be less conservative and even go out of my own ranges if it's a very strong environment with no SPOF on the JBOD's, good disks that are not more than 2 TB in size, and the workload is light or confined to only certain periods of the day, etc.

      When making this decision, it is also important to be cognizant of IOPS requirements - your environment may be one that would otherwise be OK to lean towards the high end of these ranges, but you have an IOPS requirement that precludes it, and requires you go with smaller vdevs to hit the IOPS needs.

      Let me know if that didn't cover anything you were curious about.

  14. 'vdevs IOP potential is equal to that of the slowest disk in the set - a vdev of 100 disks will have the IO potential of a single disk.'

    is this how it works in traditional RAID6 (if we are talking raidz2) hardware arrays too?

    I'm a bit concerned as my current volumes are comprised of 12 vdevs of 6 disks each (raidz2), which if I understand correctly means i'm really only seeing about 12 disks worth of write IOPs. Which would explain why it doesn't seem that fantastic till we put a pair of Averes in front of it.

    1. No, traditional RAID5/6 arrays tend to have IOPS potential roughly equivalent to some % of the number of data drives - parity drives. This is one of the largest performance differences between a traditional hardware RAID card and ZFS-based RAID -- when doing parity RAID on lots of disks, the traditional hardware RAID card has significantly higher raw IOPS potential.

      ZFS mirror versus hardware RAID10 is a reasonable comparison, performance wise, but ZFS will win no wars versus traditional RAID5/6/50/60. Then again, it also won't lose your data, and isn't subject to the raid write hole problem. :)

      I often have to remind people that ZFS wasn't designed for performance. It's fairly clear from the documentation, the initial communication from Sun team, and the source, that ZFS was designed with data integrity as the primary goal, followed I'd say by ease of administration and simplification of traditionally annoying storage stuff (like adding drives, etc) -- /performance/ was a distant second or third or even fourth priority. Things like ARC and the fact that ZFS is just newer than many really ancient filesystems gives people this mistaken impression that it's a speed demon -- it isn't. It never will be.

      If your use-case is mostly-read and reasonably cacheable, ARC/L2ARC utilization can make a ZFS filesystem outperform alternatives, but it's doing so by way of the caching layer, not because the underlying IOPS potential is higher (that's rarely the case). If your use-case isn't that, then the only reason you'd go ZFS is for the data integrity first and foremost, and also possibly for the features (snapshots, cloning, compression, gigantic potential namespace, etc); not because you couldn't find a better performing alternative.

  15. Hi Andrew,

    I am a "normal" home user with a "normal" home media server and after reading (and some testing with virtualbox) been considering moving to zfs (freeBSD or solaris) from windows 7. Most likely going to try Esxi (no experience on this either but I have no problems learning) to run one VM for file server (zfs) and another for Plex media server.

    Specs for my "server": Currently running windows 7, Intel motherboard (DP67BG if I remember correctly), i7 2600k and 16 GB of DDR3 (non ECC) ram, one 500 GB HD for the OS and 8 3TB (sata, non enterprise class) HDs for data (bought 4 then the other 4 later).

    The 8 data disks are on a raid 6 hardware array (adaptec 6805) with around 9 TB of used space. 90% of that space are movies in mkv format (I rip all my blurays with makemkv so I have 20-30 gb files) and 10% of random files (backup of family photos and stuff from main box, that I back up to another 2 different HDs).

    Main purpose of my "media server" is Plex, serving 2 HTPCs and some devices (iPads). I want to move from one windows 7 box to Esxi with VMs to have storage and Plex on different VMs and optionally a third VM for misc stuff (like encoding video/testing). Everytime I install/update something I have to reboot the windows box and if anyone is watching a movie has to wait for it to get back online.

    Apart from a learning experience, would zfs (solaris or freeBSD) be better or am I just fine and should just try to use Esxi with windows VMs?? Would a zraid2 be better than the hw raid6 array I currently have (for my use)?

    My plan is one VM for zfs (still don't know what to install here, solaris, freeBSD, nexenta, etc.), one for Plex media server (windows or linux) and one windows VM for misc stuff.

    Thanks a lot for any feedback.

    1. I can't wait until we're "normal" home users, Simon. Pretty sure we're not, at the moment. :)

      I could easily run over the 4,096 character limit trying to advise you here. The tl;dr version would be: No, not unless you've got a backup of the data or a place to put it while migrating, and preferably only if you're willing to change boxes to something with ECC RAM in the process (that in and of itself is not a deal breaker) and definitely to an HBA instead of a RAID card. So you're definitely migrating data. It's potentially a lot of work for some data integrity gains. If you're not planning to use any ZFS features (which your present list of requirements doesn't seem to indicate you would -- you mention nothing that sounds like zfs snapshots, clones, rollback and so on would /really/ improve your life, those data integrity gains may not be worth the move (definitely not if not also going to ECC RAM & an HBA).

      Moving off Windows to a *nix derivative for the storage portion is very sane. Separating the storage to its own box or VM is reasonably sane. The level of effort to get you there safely on ZFS would almost necessitate buying a whole new server.

      As for choice, if you do decide to go through with a migration to a new box and ZFS, in order of what I feel to be the best options at the moment (9/21/2013):

      If you prefer command line administration:
      1. OmniOS
      2. FreeBSD 9.1 (or, really, wait for 10!)

      If you prefer UI administration:
      1. OmniOS with free version of napp-it if over 18 TB of space required
      2. NexentaStor Community Edition if under 18 TB of space required
      2. FreeNAS

      I can't currently recommend you use anything sporting ZFS On Linux. Lack of fault management, lack of dtrace/mdb, few other niggling things keep it off my list for now.

  16. > I can't wait until we're "normal" home users, Simon.

    I just stumbled upon this looking for advice how to install a global hot spare on Solaris 11. There is conflicting information about this capability and after reading your post, I am thinking that one or two warm spares might be more appropriate.

    My setup is a Solaris 11 home server, doing multiple duties for the family as a Sun Ray server, virtual machine host (so all five family members can have as many instances of Windows, Linux or whatever) and media server. Thin clients are scattered around the house and in many rooms,. Rooms each have a 24 port switch with gigabit fiber backhauled to the rack.

    The server is HP DL585 G2 with 4x dual core Opterons and 60GB of RAM, two fibre HBAs each with two ports, connected to two Brocade switches in a way that any HBA, cable or switch can fail without losing a path to the disks. Disks are 500GB in four arrays of 11 disks, each with dual paths.

    Your notes on backup are spot on. Right now the main pool consists of 3 vdevs, each containing 8 disks in RAIDZ2 (6+2), allowing any two disk failures before the array becomes critical. The remaining 20 disks are a backup pool as a single RAIDZ3 vdev (17+3). Snapshots are synchronized to the backup pool every 8 hours using the zrep script.

    The disks are not terribly reliable with one failing every week or three. I have about 50 cold spares, so the loss of a spindle is not an issue, but I often cannot get time to make the replacement too quickly. I was thinking that it made sense to reduce the size of the backup pool and allocate two global hot spares, so that any failure would rebuild automatically and give me time to respond.

    Your post brought back scary memories of a single array going offline, causing ZFS to scramble to build hot spares and declaring the whole pool invalid. I think I will take your advice and simply allocate one or two warm spares.

    1. Yeah - I stress hard not to do hot spares, and never feel quite as good about builds that the client ends up demanding it in, claiming it is required or because they actually meet my criteria for using them (I can never feel very good about a SAN that nobody will even be able to remotely login to upon notification of a failure for over 72 hours).

      Glad I could remind you of a scary memory, I suppose. :)

      PS: Nice home setup!

  17. Andrew, here is another reason to be careful with ZFS. I have a less reliable spare array and many unused fast 73GB disks. I also wanted to investigate how L2ARC would impact performance, or even if ZFS would populate L2ARC storage. No problem, power up the spare array, put in some disks and add them as cache.

    bash-4.1$ sudo zpool add tank c0t20000011C692521Bd0
    vdev verification failed: use -f to override the following errors:
    /dev/dsk/c0t20000011C692521Bd0s0 is part of exported or potentially active ZFS pool slow. Please see zpool(1M).
    Unable to build pool from specified devices: device already in use

    Oh yeah, those were part of an old pool. No problem, override.

    bash-4.1$ sudo zpool add -f tank c0t20000011C692521Bd0

    Did you catch the error? My array now looks like this:

    capacity operations bandwidth
    pool alloc free read write read write
    ------------------------- ----- ----- ----- ----- ----- -----
    tank 4.77T 6.18T 0 0 63.9K 0
    raidz2 1.59T 2.04T 0 0 0 0
    c0t20000011C61A75FFd0 - - 0 0 0 0
    c0t20000011C619D560d0 - - 0 0 0 0
    c0t20000011C619A481d0 - - 0 0 0 0
    c0t20000011C619DBDCd0 - - 0 0 0 0
    c0t20000014C3D47348d0 - - 0 0 0 0
    c0t20000011C619D695d0 - - 0 0 0 0
    c0t20000011C619D742d0 - - 0 0 0 0
    c0t20000011C619A4ADd0 - - 0 0 0 0
    raidz2 1.59T 2.04T 0 0 63.9K 0
    c0t20000011C619D657d0 - - 0 0 10.7K 0
    c0t20000011C61A75A6d0 - - 0 0 10.7K 0
    c0t20000011C619D4ECd0 - - 0 0 10.7K 0
    c0t20000011C619A043d0 - - 0 0 10.5K 0
    c0t20000011C619D669d0 - - 0 0 10.5K 0
    c0t20000011C61A7F9Cd0 - - 0 0 0 0
    c0t20000011C619D6C5d0 - - 0 0 0 0
    c0t20000011C619D220d0 - - 0 0 10.7K 0
    raidz2 1.59T 2.04T 0 0 0 0
    c0t20000011C619DCD3d0 - - 0 0 0 0
    c0t20000011C619D7FCd0 - - 0 0 0 0
    c0t20000011C619D646d0 - - 0 0 0 0
    c0t20000011C619A41Fd0 - - 0 0 0 0
    c0t20000011C6199E5Ed0 - - 0 0 0 0
    c0t20000011C619D43Fd0 - - 0 0 0 0
    c0t20000011C61A7F82d0 - - 0 0 0 0
    c0t20000011C619D636d0 - - 0 0 0 0
    c0t20000011C692521Bd0 33.8M 68.0G 0 0 0 0
    cache - - - - - -
    c0t20000011C615FDBAd0 0 68.4G 0 0 0 0
    c0t20000011C6924F09d0 0 68.4G 0 0 0 0
    c0t20000011C6C2163Cd0 0 68.4G 0 0 0 0
    c0t20000011C6C2C468d0 0 68.4G 0 0 0 0
    c0t20000011C6C2C4B8d0 1.15M 68.4G 0 0 0 0
    ------------------------- ----- ----- ----- ----- ----- -----

    So now, my entire pool is critical due to a a single vdev located on an unreliable array. My only hope is to mirror it (which I have done) and pray that the spare array stays alive until I can rebuild the ENTIRE POOL from a backup.

    That's rather lame.

    1. Ouch.

      Yes. This is much like forgetting what your current path is and rm -rf'ing. :(

      This is the sort of thing that generally prompts the creation of 'safe' tools to use in lieu of the underlying 'unsafe' tools. I say this somewhat tongue in cheek, since my employer makes one of those 'safe' tools and yet I'm fairly sure it would have still let you do this in the interface (though I'll make a point of bringing it up to our development staff to add some logic to keep you from doing so without a warning).

    2. Yes, not very fun. The underlying issue (besides me not realizing my mistake sooner) was that the -f override silenced the warning I had seen, and most critically, the warning I had not seen yet (and would never see).

  18. Andrew - could you expand on what tragedy might happen if you mixed disk sizes, speeds, redundancy types?

    I'm thinking of expanding a pool that so far has only one vdev.
    RAID_Z2[ 10 x 600G 10k ] + SLOG + Hot-Spare (before I found your blog)

    Proposed 2nd vdev = RAID_Z3[ 11 x 1TB 7.2k ] + SLOG

    Single 24-slot 2.5in JBOD chassis. SLOG devices are STEC s840z. NexentaStor with the RSF-1 H-A plugin.

    Thanks for any response

    1. Performance, mostly, including a few performance corner cases you'd be hard-pressed to actually hit in a homogeneous pool. Before I answer generically, let me state that as a NexentaStor user, if you have a license key with an active support contract, be aware that Nexenta Support does not support the use of heterogeneous pools. Contact them for more information.

      If you were to add an 11x 1-TB disk raidz3 vdev to an existing pool comprised of a single 10x 600-GB raidz2 vdev, you'd be effectively adding a larger, slower vdev that is also at start less utilized. First, this will make ZFS 'prefer', it as it's emptier (which shouldn't be read as completely ignoring the other vdev for purposes of writes, but it is going to push more % of the writes to the new vdev). Second, it's larger, so it will prefer it even longer. Third, it's slower, and this 'preference' is not synonymous with 'all on new vdev'. So at the end of the day, you've added another vdev which should have almost doubled your write performance, but instead it won't double it, it in fact will probably only increase it by 20-50%, because not only is every write only as fast as the slowest vdev involved in the write (and now you've got a 7200 RPM vdev in there), but it's going to write a larger majority of the new data onto that slower vdev for awhile, as well.

      Even if you rewrite data often enough that you eventually 'normalize', it will still end up only improving your pool's write IOPS by less than double the original speed, as the new vdev isn't as fast as the old one.

      I feel compelled to point out, though, that the part about normalizing and preferring the new vdev is going to happen regardless of similarity in the vdevs - that's one of the reasons I like to explain this early if I get the chance, so people know what to expect when it comes to 'expanding' a pool (it expands the space, but you can't expect it to expand the performance nearly as linearly, especially if you don't rewrite existing data that often).

      If all you're concerned about is more space, and you have no performance problems, you might be OK, but if you presently have a system that is nearing its maximum performance whatsoever, adding this vdev is likely to end up tanking you in the end, if adding capacity means you also add client demand at the same rate. The new space (and the old space) won't respond as quickly on a 'speed per GB' basis as it did pre-addition, so if you had 20 clients before and you add 30 more (as you're adding more than double the original space) for a total of 50 clients, there's every expectation the pool will fall over, performance wise. Hopefully that makes sense.

  19. Can you tell me more about this please?

    15. ARC and L2ARC
    (9/12/2013) There are presently issues related to memory handling and the ARC that have me strongly suggesting you physically limit RAM in any ZFS-based SAN to 128 GB. Go to > 128 GB at your own peril (it might work fine for you, or might cause you some serious headaches). Once resolved, I will remove this note.

    We have 384GB of RAM and on one system I notice that the disk pool goes to 100% over time (3 days) but then if I export and an re-import it we are good for another couple of days. We are running Solaris x86 11.1 and specifically SRU 0.5.11- Later SRUs exhibit the same problem.

    Any ideas much appreciated!


    1. In the production ZFS Appliances from Oracle you can purchase them with up to 1TB of RAM per controller so they must not agree with not going over 128GB the 7420s we use have 512GB per controller.

    2. The real issue with more RAM is ARC evictions when you delete large things. For the most part this can be alleviated by adjusting zfs:zfs_arc_shrink_shift.

      I have several ZFS systems with 256GB ram and have this setting in /etc/system:

      set zfs:zfs_arc_shrink_shift=12

      It does not fix the case of deleting a single large file. In my case that means any file over a few TB needs a bit of timing consideration and temporarily disabling RSF-1. In my environment that has been only twice in the past 3 years. It has not been an issue otherwise.

      I will not hesitate to build with 1TB or more RAM if I find it necessary. The arc shrink shift will have to be adjusted accordingly.

      Oracle's latest offering has 1.5TB of RAM. They have latched on to how ZFS keeps getting faster with more RAM.

      For the most part I would say the rule about limiting to 128GB of RAM has been blown away. However, ZFS needs some enhancement to take better advantage of modern SSDs for L2ARC. The default tuning no longer makes any sense and writing needs to be made more parallel to take better advantage of multiple SSDs.


    3. This course delivers ZFS leading technology to build advanced, professional and efficient storage that meets modern business needs and reduce the complexity and risk.
      By using ZFS which becomes an open source technology now, you can build your own professional storage, which has almost the same features found in any commercial hardware storage.

  20. Hello,

    I was wondering you could clarify this point for me:
    For raidz2, do not use less than 6 disks, nor more than 10 disks in each vdev (8 is a typical average).

    I am doing a home NAS for my media and was considering doing a raidz2 pool with 4x2.0TB WD Reds. Why would I want to use a minimum of six as opposed to four? It is a mini-ATX case so space is tight and I really wanted to add a second RAID0 group for a hot backup location. I would have to sacrifice this to get six drives in my raidz2 pool. Can you elaborate? Thank you for this guide as well.

    1. For home use, 4 disks is fine. For enterprise use, follow the recommendations in this guide,

  21. I'm considering building a 24 disk storage box for home use (insane hobby). Although not recommended for production use, would there be a significant downside to just go with 2x12 disk RAIDZ2 vdevs in one pool?

    1. This course delivers ZFS leading technology to build advanced, professional and efficient storage that meets modern business needs and reduce the complexity and risk.
      By using ZFS which becomes an open source technology now, you can build your own professional storage, which has almost the same features found in any commercial hardware storage.

  22. If you go with 3x8, at the cost of two disks for parity, you'll increase your write IOPS by 50%. The recommendations Andrew gives are a balance between speed, space, and redundancy.

  23. I'm building out a 12-bay solution that will mostly be used to house VMWare virtual machines. My plan right now is to have 64G of ECC RAM, a 128G ZIL SSD drive, 2x240G SSD in a mirror pool for applications that need extra performance and 8x4TB WD Red setup as a mirrored pool for the main storage. Anything in particular I need to watch out for? The majority of the servers will connect to the storage via a 4G fiber channel switch, but there will also be connections via regular 1G ethernet. If I understand the math correctly, my theoretical max throughput for the main storage would be 4 x the throughput of a single WD Red disk, so appx. 600MB/s, right?

  24. HI

    The article and following comments has given more insight to my knowledge with regard to ZFS.

    I am trying to build a storage of about 100TB usable with the following configuration. Please let me know if any precaution to be taken in terms of performance. The requirement is for an NFS storage for mailstore for around 100000 Mail users. Mailing solution will be based on Postfix + Dovecot.

    3TB NLSAS or Enterprise SATA HDDs x 55Nos, configuring Raidz3. I will be using a server class hardware system (SuperMicro or Intel Server System) with Dual 6 Core Xeon CPU and can have around 128GB RAM. Do you recommend more RAM or do I need to invest in SSDs for ZIL or L2ARC.

    Kindly help with any precautions that I may to take take before procuring this infrastructure

    Thanks in advance.

  25. I have 2 Linux ROCKS clusters currently with hardware RAID 24 bay SATA drive systems. I intend to replace the 3TB SATA drives with 4TB SAS, and install JBOD HBA's to move from XFS to ZFS. I suspect that storage capacity and redundancy will be prized over outright performance. Can anyone suggest a starting point for the ZFS setup? I will likely start with 128GB RAM, and I need to investigate the particulars of the backplane in these Supermicro boxes. Will we likely take a big performance hit using what I presume is a 1x3 expander? Should I be looking at replacing the backplane and using three 8 channel HBA's?

  26. Hi,

    first of all, thanks to Andrew for this great article and to the people who commented on it. We have used Nexenta for a year now and this has given a great deal of valuable information.

    I've got one question about recordize. For our first pools we used 128k recordsize because we were told it was the default value and most suitable for many cases.

    We trusted the techies from Nexenta who gave that advice, until we started to have experience some bumps in the road with our production pools.

    One of the things I tested was different recordsizes.

    Our use case is Xen Community edition accessing Nexenta 3.x through NFS. In ZFS we store .img files, one for each vm.

    So, I did a lot of testing with iozone and my conclusion was that if you align NFS mountpoint and ZFS recordsize to either 4k or 8k, you get the best possible performance of all the possible combinations which go from 4k to 128k on both sides.

    I also used Dtrace to get as much information as possible from the pool, and I saw that more than 90% of the requests to the ARC are either 4k or 8k blocks, no matter what blocksize on Linux or recordsize on ZFS you use, you always get the same kind of requests from Xen.

    I'm telling you this because I've seen many articles and posts in forums about this which say the contrary, that you should use recordsizes of 32k or bigger, or even stick to the default 128k.

    I would like to know if anyone has ever done this kind of tests and what they got. Why are my results so different than the recommended values?

    I have no graphs or anything "nice-looking" to show you, just a text file with all the results, but if anyone is interested in my findings I am more than willing to publish it somewhere in a human-readable way.


    1. Jordi:
      I recently testing a nuymber of block sizes using the free NexentaStor 4,0 release. The winner was 32k but this was with a Linux box. Windows still uses a native 4k buffer, so depending on your mix of wht you run (Linux and Windows) it may take a compromise (8k?). The duffer flushing mechanism used by Nexenta is set to be 32k so an engineer I spoke with there recommended 32k for everything.

  27. minor correction: "without further adieu" should be "without further ado"

    1. Thanks for pointing out that very important detail.

  28. Pankaj International manufactures and supplies wide variety of best quality pole line hardware, high tensile fasteners, fence fittings like 2h nuts, b7 studs, hex nuts, hex bolts and many other line construction hardware products.

    High tensile fasteners india

  29. Greetings! I enjoyed your article. Thank you.

    My FreeNAS servers are currently pretty anemic Dell r410s (only 8GB RAM, 2x E5506), with LSI SAS2308 based HBAs. I have 4 norco ds24E SAS expander jbods 2 on each r410. Each ds24E set is configured as 1 with 24 (Hitachi Deskstar 7k1000 (HDS721075KLA330)) and the other with 24 (ST4000VN000). Each box is a pool of 4 6-disk raidz2 vdevs. Because of volume of data, the backup solution is periodic rsync across r410s. Where periodic ends up being whenever I add a movie/show to my library.

    In any case, the smaller ds24E houses the plexmediaserver jail and transcoded full quality .mkvs and the larger one has all of the
    1:1 .isos. It's been a long, _LONG_, process to rip my entire library.

    Some questions:
    What is the authoritative way to tell what block size is, or _should_ be, being reported to the OS? Using zdb I can see ashift is set to 12 (4k) for _both_ of my pools. Using diskinfo, I see 512B sectors for all 48 disks. Clearly that's not right. However, when using smartctl, I find 'Sector Sizes: 512 bytes logical/physical' for the Hitachi drives and
    'Sector Sizes: 512 bytes logical, 4096 bytes physical' for the Seagate drives. Is smartctl an/the authoritative source for this information?

    Are there issues with using 4k ashift value on true 512B hard drives? I'm also wondering if FreeNAS is just pinning the setting to 12?

    Because the Dells are too anemic with multi-stream transcodes occurring and I just upgraded to a Haswell-E system, I had designs on making my Sandybridge-E system a replacement for one of the r410s. It is not anemic and has a decent amount of RAM (64GB). Although it is not ECC. Other benefits include, multiple PCI-E slots. The r410s only have the one slot.

    50GB .isos are slow going over Gbe so I picked up an xs712t, and two intel x540-t2 NICs. Right there I can't use the r410s outside of the 2Gbe lagg. So, it would seem that moving to the Sandybridge-E box has a lot going for it. I don't/would not plan on duplicating it so it would be a SPOF, but I would be fine with using an r410 in a pinch. Is this a reasonable plan?

    I also have just acquired an Areca ARC1883ix-16 and an 8028-24 to convert an additional ds24e to 12Gb. Yeah yeah, 6GB sata drives != 12Gb SAS. There should still be some perf benefit to be had. I was thinking I would try
    the hw RAID solution in a FreeNAS box. And then I was wondering how that was going
    to really work with zfs. And _that's_ how I came across your article.

    Given that, what is the downfall if the hw raid card were exporting 4 (raid 6) devices and those were put into a raidz1 pool? Beyond 6 extra disks worth of unuseable space? I understand you mentioned raidz1 <= 1TB and preferably 750GB. And it may well be asinine, but I'm curious as to if it was just guarding against the additional, potentially imminent, failures occurring whilst attempting to
    resilver a spare?

    Thanks in advance.

  30. Just a quick note on the content here. It's a real breath of fresh air after digging through the - well, frankly snotty tone at the site.

    The biggest issue for neophytes with ZFS systems is the number of ways you can lose all your data, not some files or a disk. That gets stated a lot in the online content about ZFS, but it's only mentioned as an explanation of why to do or not do something. I think that's inverted: ZFS systems need a big, red, blinking warning that there are subtle ways to lose your entire pool of data if you do it wrong, and then the list of what to do or not do needs to follow.

    As so well stated here, ZFS is fundamentally different from other file systems and has different pitfalls.

  31. I have two identical systems with 12 drives each. Dual L5640, 32GB, 10gb ethernet.

    Initially they were set up as a HA pair, BSD 9.2, using HAST & CARP. They performed the initial synchronization in 8 hours, at over 1000 MBs, which lead me to believe that HAST works very well.

    Then a ZFS pool was created, using 2 virdev of 6 drives raidz2
    Two ZFS volumes were created and used for ISCSI

    The best write speed initially was less than 150MBs and diminished substantially within a month.

    The HA was taken apart, Each was reconfigured as a standalone NAS, with memory increased to 64GB. They each write ~170MBs and has been consistent for over a month.

    I decided to experiment with NAS-1 and set up ZFS using several layouts, none of which made much difference, until I changed the ISCSI from a ZFS Volume to a raw file on ZFS. Doing so increased the write speed to ~275MBs. (+100MBs)

    Do you know why using a raw file vs a ZFS volume would make such a difference?

    Is there a better way to deliver the disk space to a XenCenter group?

    (Footnote - XenCenter 6.2 apparently does not recommend NFS v4 and our distribution had bugs with NFS v3, so we opted for iSCSI.

    Thank you,

  32. What, in zfs terms, is a "warm spare"?

  33. It means, the drive is in server, but not in any of disk pools.

  34. I'm building two NAS4FREE boxes, one with two new 2TB drives and the other just to play with using whatever I've got laying around. I'm building the fun one first, for experience. What I've got laying around are two 500GB drives, one 300GB drive, and one 200GB drive. It it possible to first combine the 300 and the 200 into a 500GB vdev and then combine that and the other two 500GB drives into a RAID? Keep in mind this isn't for anything serious or critical, I'm just messing around with left over parts I have lying around. NAS4FREE itself will reside on yet another disk, these are just for data.

    1. Rick: Yes and no. Without getting fancy, you can't add the 200 and 300GB drives to create a single vdev of 500GB effective size. However, you can simply add each drive by itself as a basic vdev, and you do get all 500GB of combined space. Doing this however is effectively raid0 and if any of those drives fail, you lose all the data on all the drives.

      A fancy way to do what you have in mind, is to create on pool with the 200 and 300 added as basic vdevs, then create a zvol from that, and then create a 2nd pool adding each 500gb drive and the zvol from the first pool. This would end up with a pool of 3 vdevs each 500GB in size, but is in no way functionally different than the 1st example of 4 basic vdevs in a single pool.

    2. As always I'm late to the party, but I to am setting up a NAS4Free box and have 4x 3TB drives (and a 320GB for a boot & log data drive). It was recommended that I rebuild what I have using ZFS rather than GEOM RAID as I did initially. I'd like to have some measure of data security in the event of a failed drive and wondered if that was possible given my hardware constraints. Do I need more drives to get what I want?

  35. This course delivers ZFS leading technology to build advanced, professional and efficient storage that meets modern business needs and reduce the complexity and risk.
    By using ZFS which becomes an open source technology now, you can build your own professional storage, which has almost the same features found in any commercial hardware storage.

  36. This course delivers ZFS leading technology to build advanced, professional and efficient storage that meets modern business needs and reduce the complexity and risk.
    By using ZFS which becomes an open source technology now, you can build your own professional storage, which has almost the same features found in any commercial hardware storage.

  37. Nothing useful here, just wanted to say thanks for the article!

  38. Now it is known to me that articles is nothing but inspiring is everything to do something great. This is a great article for the people who want to come in freelancing.

    CCNA Training in Chennai

  39. Thanks for this article. It's very helpful. I do have a question, or, more precisely, I am looking for opinions. What affect does setting userquotas have on performance? I've done this on some of my zfs pools and it doesn't seem to be a huge issue, but where I work we are getting ready to migrate our home directories to a zfs server and we are thinking there has to be some sort of a performance hit even if it is very, very low. Thanks!

  40. I am really enjoying reading your well-written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
    Best HTML5 course
    Best HTML5 training

  41. I accept there are numerous more pleasurable open doors ahead for people that took a gander at your site.I wish to show thanks to you just for bailing me out of this particular trouble.As a result of checking through the net and meeting techniques that were not productive, I thought my life was done.
    Angularjs training in Chennai
    Angularjs training in Velachery

  42. Excellent Wok, Thanks for sharing with us this valuable blog. I get solution of my problem. Visit for
    Europe Honeymoon Packages

  43. clickfunnels 14 day trial is what I can recommend any day as it's my favorite!

  44. This comment has been removed by the author.

  45. This comment has been removed by the author.


  46. Nice information.. Thanks for sharing this blog. see my website also
    If you are searching for Freelance seo expert in delhi, india, Freelance seo consultant in delhi, india who can put your site on page 1 and make it worthwhile to VIEW MORE:- Freelance Seo Expert in Delhi, India

  47. Love this!!! Please make more!!!:) Our service can always help with essay writing .

  48. I just found a good info for my work on , and the post I read on your site showed me what I want to write about in my paper. Thank you for inspiration!

  49. Perfect Keto Max Above all else choose how a lot weight that you must lose, and set yourself a wise target, preferably with the help of your dietitian or specialist. It's the first research to show that bettering weight loss program quality over at least a dozen years is associated with decrease complete and cardiovascular mortality, and underscores the importance of maintaining healthy eating patterns over the long run. Weight Watchers International offers weight reduction solutions to folks struggling to shed these further pounds. Keep in mind, weight loss is all about leading a balanced lifestyle and massage is a necessary a part of any balanced life-style. Consuming more usually may help you reduce weight. People who enhance the standard of their diets over time, eating more entire grains, greens, fruits, nuts, and fish and less pink and processed meats and sugary drinks, might considerably reduce their risk of premature death, in response to a brand new research from Harvard T.H. Chan Faculty of Public Well being.


  50. After reading this web site I am very satisfied simply because this site is providing comprehensive knowledge for you to audience.
    Thank you to the perform as well as discuss anything incredibly important in my opinion. We loose time waiting for your next article writing in addition to I beg one to get back to pay a visit to our website in

    Selenium training in bangalore
    Selenium training in Chennai
    Selenium training in Bangalore
    Selenium training in Pune
    Selenium Online training
    Selenium interview questions and answers

  51. This comment has been removed by the author.

  52. This is the exact information I am been searching for, Thanks for sharing the required infos with the clear update and required points. To appreciate this I like to share some useful information regarding Microsoft Azure which is latest and newest,


    Azure Training in Chennai
    Azure Training Center in Chennai
    Best Azure Training in Chennai
    Azure Devops Training in Chenna
    Azure Training Institute in Chennai
    Azure Training in Chennai OMR
    Azure Training in Chennai Velachery
    Azure Online Training
    Azure Training in Chennai CredoSystemz

  53. I simply wanted to thank you so much again. I am not sure the things that I might have gone through without the type of hints revealed by you regarding that situation.
    nebosh course in chennai
    offshore safety course in chennai

  54. This is the exact information I am been searching for, Thanks for sharing the required info with the clear update and required points.
    Best java training institute in chennai

  55. Nice post!Everything about the future(học toán cho trẻ mẫu giáo) is uncertain, but one thing is certain: God has set tomorrow for all of us(toán mẫu giáo 5 tuổi). We must now trust him and in this regard, you must be(cách dạy bé học số) very patient.

  56. Keto 180 Shark Tank defines personal characteristics of health merchandise.Keto 180 is a 100% pure weight loss product which Diminishing stored fats and calories from the bodies.

  57. Nice Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
    Check out : big data training in chennai
    big data course in chennai
    big data hadoop training in chennai
    big data certification in chennai

  58. I read out your blog and I got a lot of valuable information. Visit Oge Infosystem for the Creative and Ecommerce Website Designing in Delhi.
    SEO Service in Delhi

  59. Your blog was really impressive and meaningful. Thank you so much for sharing with us. Visit Y&H Cargo for Shipping and Freight Forwarding.
    Logistics Company in India

  60. Time is free but it's priceless(khóa học toán tư duy) . You cannot own it, but you can use it(cách dạy bé học số) . You can use it, but you can't keep it(toán tư duy logic là gì). Once you lose it, you will not be able to get it back.

  61. This is the exact information I am been searching for, Thanks for sharing the required infos with the clear update and required points. To appreciate this I like to share some useful information regarding Microsoft Azure which is latest and newest,


    Azure Training in Chennai
    Azure Training Center in Chennai
    Best Azure Training in Chennai
    Azure Devops Training in Chenna
    Azure Training Institute in Chennai
    Azure Training in Chennai OMR
    Azure Training in Chennai Velachery
    Azure Online Training
    Azure Training in Chennai Credo Systemz
    DevOps Training in Chennai Credo Systemz

  62. Wow this blog is awesome. Wish to see this much more like this. Thanks for sharing your information!
    Best website designing company in Kashmiri Gate

  63. This is very useful information for me. Thank you very much! The medical research topic has stirred the world.

  64. I’m surprised at how fast your blog I love loaded on my cell phone. I’m not even using WIFI, just 3G. Anyways, awesome blog!
    nebosh course in chennai
    offshore safety course in chennai

  65. Article really helped me a lot. Thanks so much for sharing this blog.
    Question and Answer Submission Site List

  66. Thank you so much for sharing such a valuable blog with us. Mobile Apps Development Company Delhi

  67. Thanks dear for such amazing blog sharing with us. Visit our page to get the best Website Designing and Development Services in Delhi.
    SEO Service in Delhi


  68. Hey there ! i come here for the fist time ! and i impressed with your writing and your blog

    โปรโมชั่นGclub ของทางทีมงานตอนนี้แจกฟรีโบนัส 50%
    เพียงแค่คุณสมัคร Gclub กับทางทีมงานของเราเพียงเท่านั้น
    สมัครสมาชิกที่นี่ >>> Gclub online

  69. his is really an amazing article. Your article is really good and your article has always good thank you for information.

    เว็บไซต์คาสิโนออนไลน์ที่ได้คุณภาพอับดับ 1 ของประเทศ
    เป็นเว็บไซต์การพนันออนไลน์ที่มีคนมา สมัคร Gclub Royal1688
    และยังมีเกมส์สล็อตออนไลน์ 1688 slot อีกมากมายให้คุณได้ลอง
    สมัครสมาชิกที่นี่ >>> Gclub Royal1688

  70. Nice blog, Get the mutual fund benefits and there investment schemes at Mutual Fund Wala.
    Mutual Fund Agent

  71. Nice blog, Get the latest mutual fund investment schemes and performance of the mutual fund schemes.
    Mutual Fund Distributor

  72. Thanks for posting its really a informative blog and we are providing best web designing services in Faridabad..

  73. Nice Blog, Visit Kala Kutir Pvt Ltd for Flex Board Priniting in Delhi, india.

  74. Great efforts put it to find the list of articles which is very useful to know, Definitely will share the same to other forums.
    Check out : Hadoop training in velachery
    big data analytics training and placement
    big data training in chennai chennai tamilnadu
    big data workshop in chennai

  75. really amazing article. really nice way of define the things. Android App Development Company in Delhi


  76. Really nice blog post.provided a helpful information.I hope that you will post more updates like this
    Tableau online Training

    Android development Course

    Data Science onine Course

  77. Your content was really amazing, thank you so much for sharing this informative blog.
    Lifestyle Magazine

  78. Excellent article! This is very easily understanding to me and also very impressed. Thanks to you for your excellent post.
    AWS Online Training
    AWS Training in Hyderabad
    Amazon Web Services Online Training

  79. This is a standout amongst the best involvement for me to establish out this profitable and valuable blog. Much thanks to you such a great amount for imparting to us. Get Website Designing Services by Ogen Infosystem in Delhi, India.
    SEO Company in Delhi

  80. Thanks for sharing this post, this is really very nice informative post. Here we are presenting
    SEO SEM Specialist Recruiters | SEO/SEM Placement Consultant
    SEO SEM Specialist
    Information security Placement Consultants
    Data Analyst Recruitment Agency

  81. Nice Blog, keep it up for more updates about this type of blog.Carolina Classics is the manufacturer of best F-100 Classic Ford Truck Parts | Buy Ford F100 truck parts online at Carolina Classics. Classic Ford Truck Parts
    F-100 Ford Truck Parts
    Classic Ford Truck Body Parts

  82. That was such an awesome content to read and going through it.Thanks for such a good information.our product related for servo voltage stabilizer and transformer manufecturer company in Delhi Our company is also Step Down Transformer Manufecturer in Delhi.

    Servo Stabilizer Manufacturer in india
    what is Step Down Transformer
    Distribution Transformers Manufacturer in india
    Step Down Transformer


  83. Hi, lucky guy! My name is Mukan and I am here just for you. I love to spend my night with a person like you. Actually, I need to be companions with you first, make you agreeable and afterward do the joy thing. That is to say, are not you exhausted of working? Are not you felt worn out on of unreasonable pressure?

    Kolkata Escorts
    Independent female escorts in Kolkata
    Female escorts in Kolkata
    Kolkata female escorts
    Kolkata independent female escorts
    kolkata female escorts services
    Independent Kolkta escorts
    Female escorts services in Kolkata
    Female escorts services in Kolkata
    Kolkata escorts

  84. Organa Keto: In the field of healthcare, almost €61 million of EU funds will help purchase new equipment for the University Hospital of Kraków, Malopolskie, benefitting over 3.3 million inhabitants. Concern exists, however, that more intake of ALA may increase risk of prostate cancer, according to the University of Maryland Medical Center. Eating healthy food is the best way to prevent the inflammation and other consequences of these medical conditions, according to a 2017 report in Molecules The authors note that healthy food can also play a positive role in fighting cancer.Most fluids and foods contain water that will help to keep our bodies hydrated, but fresh, clean, plain water is still the best and healthiest beverage for maintaining a healthy body. Building huge muscles is intentional and takes a great amount of work, says certified strength and conditioning coach Mike LoBue, including lifting heavy weight at a high volume many times a week, following a weight-gain diet with copious amounts of lean protein each day and taking supplements.

  85. Organa Keto: The population in question can be as small as a handful of people or as large as all the inhabitants of several continents (for instance, in the case of a pandemic ). Public health has many sub-fields, but typically includes the interdisciplinary categories of epidemiology , biostatistics and health services Environmental health , community health , behavioral health , and occupational health are also important areas of public health. Tracking this type of medical information during a patient's life offers clinicians context to a person's health, which can aid in treatment decisions. Wypieprzylam sie wczoraj na nierównym chodniku pod praca jak jakas sierota XD kostka skrecona, jakies wiezadlo zerwane, super. Anger may be more harmful to an older person's physical health than sadness, potentially increasing inflammation, which is associated with such chronic illnesses as heart disease, arthritis and cancer, according to new research published by the American Psychological Association.


  86. Organa Keto Paul Ryan Wis., favor repealing Obama's law, capping federal Medicaid spending and turning the program over to states to manage, which could result in as many as 44 million fewer people being covered , according to an Urban Institute analysis of a Ryan-authored plan Those without health insurance would have to turn to overwhelmed hospital emergency rooms and clinics to get medical care.|From the tears brimming in Lucy's eyes to Nathaniel's smile, each expression adds to our understanding that there is no one face” for mental health problems. In October 2017, the Food and Drug Administration revoked the health claim that soy protein reduces the risk of heart disease.|Udzielenie świadczenia poza kolejnością nie oznacza, że omijamy kolejkę do rejestracji i jesteśmy przyjęci natychmiast. Most fluids and foods contain water that will help to keep our bodies hydrated, but fresh, clean, plain water is still the best and healthiest beverage for maintaining a healthy body.|We are breaking down the stigma around mental health and people feel more able to talk about their difficulties. The inability to pay for necessary medical care is no longer a problem affecting only the uninsured, but is increasingly becoming a problem for those with health insurance as well.

  87. Great Information! Thanks for Shearing this information. This is very useful for everyone.
    mobile application development service in USA

  88. Puri Hair Making beans a regular part of your diet may also help with weight control, according to a 2016 review study from AJCN. With the average cost for a family of four to eat fast food being four times what it costs to eat a home-cooked meal, you not only eat healthier on a plant-based diet, you save a lot of money. Low melatonin levels contribute to poor sleep- linked to numerous health conditions including anxiety, depression, fibromyalgia, obesity, Chronic Fatigue Syndrome, hypothyroid, low metabolism, accelerated aging, heart disease, high blood pressure, chronic pain, diabetes, and migraine headaches. Your Medical ID will tell people essentially anything you want it to. You can enter your demographic information (age, sex, height, weight), as well as medical conditions, medications and allergies. Savvy marketing campaigns have convinced many people that coconut and olive oils are good for us. In reality, they are empty calories and not good for the heart. This provision, referenced as the Reinsurance Program”, creates a reinsurance” subsidy for plan sponsors of retiree health plans providing coverage for pre- Medicare retirees over the age of 55.|According to the study published in the Journal of Nature Metabolism, excessive consumption of branched-chain amino acids (BCAAs) may reduce lifespan, negatively impact mood and lead to weight gain. Naukowcy pod wodza dr. Karela Tymla z University of Western Ontario i Lawson Health Research Institute odkryli, ze witamina C nie tylko zapobiega sepsie, ale moze równiez cofnac juz istniejace objawy. Wyszedlem z gabinetu, skierowalem swe kroki ku wyjsciu i slysze, ze powinienem sie wstydzic. Zreszta to jest specyfika ubezpieczen, ze nie ubezpieczaja wszystkiego ale to na co chcesz. Tylko od 1 do 5% (ciezko ustalic jednoznacznie) zgwalconych kobiet zachodzi w ciaze, gdzie ulamek tej liczby decyduje sie na aborcje. Furthermore, researchers claim medicare patients tend to weigh more before having gastric bypass surgery, they said, and are more prone to depression, symptoms of high blood pressure, heart disease, diabetes, cholesterol and sleep apnea. After all, many people on the S.A.D. diet frequent fast food restaurants.|Kiedy ja konczylam studia to wyjazd gdziekolwiek aby sie ksztalcic dalej byl niemozliwy - teraz mlodzi ludzie moga wyjechac na staze do najlepszych klinik na swiecie i przywiezc tam uzyteczna wiedze ale nie - wola siedziec na tylkach na panstwowym - podczas gdy przecietny absolwent moze jedynie pomarzyc aby w ogóle miec jakakolwiek umowe prace. The Lesson: A healthy diet includes healthy fats, and a cap on added sugar. Udzielenie swiadczenia poza kolejnoscia nie oznacza, ze omijamy kolejke do rejestracji i jestesmy przyjeci natychmiast. A diet that includes healthy dietary fiber can also reduce your risk of chronic diseases. Some HMOs may provide directly the entire range of health services, including rehabilitation, dental, and mental health care. Ot nic nie znaczaca informacja: urodzila sie zdrowa i jest najwiekszym skarbem dla rodziców. Balance is what it takes to achieve the weight you have been dreaming of, you can add to this a healthy diet, without any form of starvation and you can even use the helping tips to improve your fat burning process.


  89. Keto Lit Bhb ReviewsBut when the researchers looked at platelet count for this study, all blood samples—including those from healthy participants—had a comparable number of platelets. Naukowcy pod wodza dr. Karela Tymla z University of Western Ontario i Lawson Health Research Institute odkryli, ze witamina C nie tylko zapobiega sepsie, ale moze równiez cofnac juz istniejace objawy. In a powerful new portrait series, photographer Charlie Clift aims to spark a conversation about mental health. The analysis of 1,598 hospitals is a broad-based study of prices paid by private health plans to hospitals and is unique in presenting price information about a larger nough the services offered by the medical, nursing, and allied health professions. Achieving and maintaining health is an ongoing process, shaped by both the evolution of health care knowledge and practices as well as personal strategies and organized interventions for staying healthy. In each case premiums or taxes protect the insured from high or unexpected health care expenses. An unhealthy circulatory system paints an entirely different picture. Hammond, Robert M., «An experimental verification of the phonemic status of open and closed vowels in Caribbean Spanish», en Humberto López Morales, ed. , Corrientes actuales en la dialectología del Caribe hispánico.


  90. Keto Lit Bhb ReviewsBut in a new study published in Psychosomatic Medicine, Dr. Firth and colleagues brought together all existing data from clinical trials of diets for mental health conditions. Exercise also helps you maintain a healthy weight, increase your energy levels and sleep better at night. Their plan: Mashavu Networked Healthcare Solutions, a self-sustaining telemedicine system, based on kiosks that use computer and smart phone technology, as well as ruggedized biomedical devices - designed by students to be implemented in rural Kenya. If you have a history of high cholesterol or heart disease, though, play it safe by consulting with your doctor about how foods like prawns might fit into your heart-healthy diet. The researchers linked the list of excluded providers to their Medicare fee-for-service patients from 2012-2015 and compared them to beneficiaries being treated by non-fraudulent health care providers during the same time period.Naukowcy pod wodza dr. Karela Tymla z University of Western Ontario i Lawson Health Research Institute odkryli, ze witamina C nie tylko zapobiega sepsie, ale moze równiez cofnac juz istniejace objawy. Like body weight, any third-party app data for these metrics will show up in Health, too. This patriotic sentiment runs counter to innumerable studies that show Americans spend more on health care but don't get better health in return. To start tracking your weight-loss progress, tap "Health Data" and then "Body Measurements." If you've entered body weight data before, you'll find it at the top of the screen in orange. Wyszedlem z gabinetu, skierowalem swe kroki ku wyjsciu i slysze, ze powinienem sie wstydzic. At the time of the creation of the World Health Organization (WHO), in 1948, Health was defined as being "a state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity".But a 2015 thesis from Liverpool John Moores University tested this recommendation in healthy men, with the results indicating no adverse effects from eating prawns. The current research examined the impacts that dietary BCAAs and other essential amino acids such as tryptophan had on the health and body composition of mice. Researchers analyzed health care claims for more than 4 million people, with information coming from self-insured employers, two state all payer claims databases and records from health insurance plans that chose to participate. 1. Promoting a holistic view of health that includes both clinical and social determinants of health, well-being, disease and disability and the multidisciplinary and cross-sector interventions and policies required to address them, such as early childhood education, economic development and environmental protection. According to a new study published in the journal Hepatology, researchers focused on the effects coffee, alcohol, black tea, green tea and soft drinks have on mortality risks from cirrhosis.

  91. Thanks for a nice share you have given to us with such an large collection of information.
    Great work you have done by sharing them to all.
    Digital marketing service in sehore
    website designer in sehore

  92. Keto Ignite It is wise to buy health insurance by HDFC ERGO, as it promises to cover medical expenses without restrictions. While talking about medical issues online or with friends can provide emotional support, it is worth remembering when they are not qualified healthcare professionals and cannot always give reliable medical advice.|Samsung Health provides core features to keep up your body fit and healthy. We go beyond health insurance, caring for your body as well as mind. No i ludzi w 100% kierujacych sie zasadami religii jest tez moze promil - zawsze ludzie znajda jakies "ale nie to".|I needed health insurance policy and i got it from them swiftly and easily. Dobrze czuje sie tylko wiosna i latem chociaz nie w kazdy dzien, bo jak sa nagle zmiany pogodowe to powraca ten syf. The underlying point of MyHealthEData is to encourage healthcare organizations to pursue interoperability of health data as a way of allowing patients more access to their records.|Besides the serious mental illnesses like dementia, Alzheimer's disease, schizophrenia, hydrocephalus, and brain cancer, mental health can also be affected by anxiety , depression , and stress. We address the topics that matter most in the health care policy debate.|Oracle provides healthcare researchers the tools for both grants management and a comprehensive platform for translational research. I hope this short documentary can add to the realisation that a lot has to be done to improve the healthcare in developing countries and to stimulate aid aimed on doing exactly that.

  93. Garcinia Market Her mission is to help people live healthier lives by making smarter food choices and staying active. Because taking care of myself means eating every meal, and making sure I get enough sleep, and getting some exercise, as well as working through the mental side of things.Against this backdrop, the randomized, prospective phase 2 SUNSHINE trial recruited patients at 11 academic and community centers across the United States to test whether vitamin D supplementation can improve outcomes in patients with metastatic colorectal cancer.

  94. Agar aap apne husband se pareshan hai or uss se door rhana cahati hai toh aap Talaq lene ki dua ko kijiye aap pati aapko khud ba khud talaq de dega

  95. Thanks for sharing this blog. this top is very importent for us.We provide uber clone taxi app development service.

  96. Keto Diet Offers: The focus of public health interventions is to prevent and manage diseases, injuries and other health conditions through surveillance of cases and the promotion of healthy behavior , communities , and (in aspects relevant to human health) environments Its aim is to prevent health problems from happening or re-occurring by implementing educational programs , developing policies , administering services and conducting research 53 In many cases, treating a disease or controlling a pathogen can be vital to preventing it in others, such as during an outbreak Vaccination programs and distribution of condoms to prevent the spread of communicable diseases are examples of common preventive public health measures, as are educational campaigns to promote vaccination and the use of condoms (including overcoming resistance to such).

  97. Amaze your professors this term with an A+ grade by availing our accounting homework help online. We have top experts who can draft your homework prudently.

  98. If you can relate to any of these problems, write do my homework on our live chat portal. Let our brilliant team of writers handle the pressure for you.


  99. Thanks for sharing excellent information.If you Are looking Best smart autocad classes in india,
    provide best service for us.
    autocad in bhopal
    3ds max classes in bhopal
    CPCT Coaching in Bhopal
    java coaching in bhopal
    Autocad classes in bhopal
    Catia coaching in bhopal

    Thanks for this post, I really appriciate. I have read posts,
    all are in working condition. and I really like your writing style.
    autocad in bhopal
    3ds max classes in bhopal
    CPCT Coaching in Bhopal
    java coaching in bhopal
    Autocad classes in bhopal
    Catia coaching in bhopal

  100. Check Packers and Movers company customers Reviews, Feedback, and Complaints in Gile Shikwe and Phark Padta Hai websites. Choose the best packers and movers services for your relocation.

    read more

  101. This is the most supportive blog which I have ever observed. I might want to state, this post will help me a ton to support my positioning on the SERP. Much appreciated for sharing.

  102. GOod Evening & Very GOod Blog We are Regular Reader of Technology Topics & YOur Blog IS Very Meaning Full Thanks FOr SHaring Feel Free Then CHeck Out Our Services Also : Website Designing Company In South Delhi

  103. nice article thanks for sharing the post..!

  104. nice article thanks for sharing the post..!

  105. nice article thanks for sharing the post..!

  106. nice article thanks for sharing the post..!

  107. One of the world's premier academic and research institutions, the UV Gullas College of Medicine has driven new ways of thinking since our 1919 founding. We offer students high quality teaching and research in a safe and friendly setting for their studies, the perfect place to learn and grow."

  108. me project centers in chennai Real time projects centers provide for bulk best final year Cse, ece based IEEE me, mtech, be, BTech, MSC, mca, ms, MBA, BSC, BCA, mini, Ph.D., PHP, diploma project in Chennai for Engineering students in java, dot net, android, VLSI, Matlab, robotics, raspberry pi, python, embedded system, Iot, and Arduino . We are one of the leading IEEE project Center in Chennai.

  109. Hi,

    Your blog was so interesting and useful.Keep posting!

    Digital Marketing Training in Chennai

  110. Extraordinary Article! I truly acknowledge this.You are so wonderful! This issue has and still is so significant and you have tended to it so Informative.
    Contact us :-

  111. this post are enlightening in Classified Submission Site List India . An obligation of thankfulness is all together for sharing this outline, Actually I found on various domains and after that continued with this site so I discovered this is hugely improved and related.


  113. Nice blog, Visit for the best Truck Painting & Branding, Floor Marking Paint and School Bus Painting.
    School Bus Painting

  114. Do you Want to do MBBS in Philippines? Then make your decision with us.! Here no need any entrance examination.UV Gullas College Of Medicine is the World's priemier research institution since 1919 in philippines.
    visit :

  115. phd projects in chennai Real time projects centers provide for bulk best final year Cse, ece based IEEE me, mtech, be, BTech, MSC, mca, ms, MBA, BSC, BCA, mini, Ph.D., PHP, diploma project in Chennai for Engineering students in java, dot net, android, VLSI, Matlab, robotics, raspberry pi, python, embedded system, Iot, and Arduino . We are one of the leading IEEE project Center in Chennai.

  116. The best Indian Ethnic Wear Online shopping supplier in surat.we have a large collection of traditional and western textiles,kurti, lehenga choli.

  117. The best Indian Ethnic Wear Online shopping supplier in surat.we have a large collection of traditional and western textiles,kurti, lehenga choli.


  118. Appericated the efforts you put in the content of Artificial intelligence.The Content provided by you for Artificial intelligence is up to date and its explained in very detailed for Artificial intelligence like even beginers can able to catch.Requesting you to please keep updating the content on regular basis so the peoples who follwing this content for Artificial intelligencecan easily gets the updated data.
    Thanks and regards,
    Artificial intelligence training in chennai.
    Artificial intelligence course in chennai with placement.
    Artificial intelligence certification in Chennai.
    Artificial intelligence course in OMR.
    Top Artificial intelligence institute in Chennai.
    Best Artificial intelligence in Chennai.

  119. Very gossipy post! I'm learning a lot from your articles. Keep us updated by sharing more such posts.
    dot net training in chennai

  120. ipt training in chennai for your final year. DLK Career Development Center conduct national level inplant training programs.

  121. final year project centers provides best projects for BE,ME,BCA,MCA and all other streams.we are best in doing all kinds ofmca projects in chennai.we refer project centers students the great topic to select with and implement their ideas with new technologies.

  122. An ISO 9001:2008 recognized university with a rich legacy of 50 years in creating leaders & scholars.
    lyceum northwestern university

  123. I really like what you write in this blog, I also have some relevant Information about if you want more information. Thanks for sharing a piece of useful information.. we have learned so much information from your blog mtech project centers in chennai..... keep sharing

  124. Many websites have differenet information but in your blog you shared unique and useful information. Thanks
    for putting in much effort for this information

  125. Very interesting blog Awesome post. your article is really informative and helpful for me and other bloggers too

    Workday Online Training

  126. Many websites have differenet information but in your blog you shared unique and useful information. Thanks
    for putting in much effort for this information

  127. Just seen your Article, it amazed me and surpised me with god thoughts that eveyone will benefit from it. It is really a very informative post for all those budding entreprenuers planning to take advantage of post for business expansions. You always share such a wonderful articlewhich helps us to gain knowledge .Thanks for sharing such a wonderful article, It will be deinitely helpful and fruitful article.

  128. It is one of the best packers and movers in Delhi NCR. it provides packers and movers services in all over India. it provides packers and movers services very easily. if you want more information click here-
    packers services in delhi ncr
    packers services in noida

  129. This comment has been removed by the author.

  130. This is a nice article with some useful tips.This is good site and nice point of view.I learn lots of useful information. Thanks for this helpful information.
    anyone want to learn advance develops training Projects visit: Ece project centers in chennai

  131. If you're searching for distributors of wholesale clothing in India, madhusudan is a famous textile wholesale and online shopping on the surat wholesale market.Buy Wholesale Kurti Suit Saree in surat

  132. This comment has been removed by the author.