The FTT Setting And How It Affects VMware Cloud On AWS
Now that the floodgates have been opened and VMware Cloud is available on AWS, there are a few things you should know about the storage included with the service.
If you have dealt with VMware’s vSAN previously then you’re already ahead of the curve as vSAN underpins VMware Cloud on AWS. vSAN uses policies you define to manage how your Virtual Machines are maintained on the aggregated storage of all nodes participating. vSAN policies allow for replica based policies as well as EC (erasure coding is only available on all flash configurations) based polices with either 2 or 3 copy replica policies and single or double parity (think RAID 5 or RAID 6) EC policies. At the moment EC policies are not available on the VMware Cloud on AWS platform but expect it to be available in the future.
FTT (Failures to Tolerate) & Storage Protection
FTT (Failures to Tolerate) plays a role in both styles of policies and is the determining factor in how much protection you actually have in your storage. An FTT of 1 would give you either a 2 copy replica policy or a RAID5 style parity configuration. This allows for a single host to be removed from the vSAN cluster, usually for maintenance purposes, without impacting storage integrity. An FTT of 2, on the other hand, gives you a 3 copy replica policy or a RAID6 style parity configuration. This is a much safer config as it allows for up to 2 hosts to drop from the vSAN cluster. There is a configuration for FTT=3 as well which would give you a 4 replica policy or a RAIDZ3 sytle parity configuration. Considering that the capacity overhead for FTT=3 on a replica policy would be 4X the original size of the Virtual Machine, I don’t see much use for this configuration on VMC today.
The number one reason a host would need to be pulled from the vSAN cluster is maintenance, having FTT=2 policies for you Virtual Machines allows for safe continuous maintenance and development of your environment. Nobody wants to see an FTT=1 policy on VMs running in a Cluster with a host down for maintenance. An actual failure of another host or slight misstep could bring down the entire ball of wax.
The Downside to FTT=2 on VMware Cloud on AWS
The downside is capacity overhead and compute overhead. Because EC policies are not yet supported on VMC changing from the default FTT=1 to FTT=2 can have a drastic effect on net capacity of the environment. For instance if you have a VM on traditional local or shared storage that has a 25GB vmdk, on an FTT=1 policy on vSAN that same VM would consume 50GB (the original 25GB VM + an additional 25GB replica). Effectively a default configuration doubles the size of each of your VMs or halves the overall storage capacity of your vSAN datastore. Either way of thinking can be correct, it just depends on if you are a glass is half full or a glass half empty kind of person. I would recommend though, to consider the double capacity VM as the best way to think of an FTT=1 replica policy because that’s how it will be reported. You will see the Gross capacity of the vSAN datastore on VMC and you will see double the used capacity of the VM. Extending this line of thinking to FTT=2 and that 25GB VM now consumes 150GB cutting you overall capacity by 2/3rds.
Compute overhead comes into play when you need more storage capacity or want a higher FTT but already have enough compute. With VMC and/or vSAN, storage only comes with additional compute and there is a minimum buy-in in order to hit the FTT targets.
Buying more compute just to get the storage that comes with it is not always a convincing argument when trying to stay within budget constraints. In this storage administrator’s opinion though, the ability to concurrently run maintenance operations in your environment without fully losing redundancy is a must.
We will be watching intently as VMware Cloud on AWS rolls out and adds features (like EC policies, sparse swap files, and efficiencies), but in the meantime (and well into the future) Faction Cloud Control Volumes are a compelling alternative when higher capacity or greater protection policies are required.