Storage overloading during the execution of backup can result in many unpleasant errors. One of them is the SCSI Command Abort, it appears when the HBA is overloaded or when the queue command (q-depth) is full. In both cases, the effect is the same, in SCSI host subsystem commands that will not be executed at a specified time are deleted. This means a dramatic drop in performance and increase latency in milliseconds. The backup method adopted in virtual environments, where we read the VM from one place and save their data elsewhere, means that during the backup we need to process large amounts of data. Incorrect or laid-back setting (eg. backup schedule) can overload the entire SAN infrastructure in our environment.
Each monitoring system will show us our datastore latency in milliseconds that occur normally and during the execution of the backup. However, not everyone will tell if our system have “Datastore Commad aborts” (if anyone knows how to get this information in vRealize Operations?). Here very helpful is Veeam ONE, turns out that in this case send us e-mails with warnings.
We can also during the backup window watch in real time what is happening on each ESXi. Run ESXTOP and once it starts, select “u” to go to the disk and then “f”. Here, select “L” (the errors), “G” (cmd statistics) and “A” (the name of disks), the rest can deselect it. Normal operation looks something like this.
ABRTS/s equal to 1 occurs when the VM operating system reports an error “storage is not responding” (eg. When in Windows for 60 seconds the disk drive will not accept any single I/O). How to defend themselves? If we have a lot of volumes, dissect virtual machines (the ones that will be backed up at the same time) evenly between them. Based on historical data in Veeam ONE (Datastore -> Disk Issues -> Past week), we can determine which datastore generates most errors and relieve it (based on the alerts, you can determine which specific machines reported errors in access to the disk).
Let’s set triggering backups so as not to overlap. If we have Veeam Backup Enterprise in the options we can turn on the “Enable storage latency control”.
The first parameter makes Veeam wait for the datastore until it is not too loaded before start the next task (which is why you should accurately determine what is our average). The second parameter refers to already running tasks, if the load datastore exceeds the threshold a backup task will be to slow down (it will take a little longer but no overload datastore beyond measure). Proper adjustment of thresholds obviously depends on the outcome of what you want to achieve. Another parameter which we can manipulate it is “Max concurent tasks” in the VMware Backup Proxy.
The default value is four, we can lower it in order to achieve a balance (not charge datastore excessive amount of concurrent tasks and lengthen backup beyond measure). If this does not help it in the options we can completely turn off the “Enable parallel processing”, then the backups will go sequentially. Finally, let us also note that in the course of the backup windows do not go run other tasks that cause additional storage load (eg. night code compilation). As you can see the possibilities are many, easily can set the backup so that the environment does not appear SCSI command abort.