Here at silverorange, we rely on a mix of our own hosting infrastructure and a variety of cloud-based services. In our on-premise server-room, oVirt has become an increasingly important component of our infrastructure. We’ve been using oVirt for a number of years, and have since moved many of our physical servers to virtual machines. Along with using Ansible as the source of truth for configuration management, we perform file-level backups with Borg.
While we could technically rebuild any of our servers using these tools, it can be a time consuming process, especially with many VMs involved. The last thing we want to be doing in a disaster situation is piecing back together dozens of VMs in the middle of the night. That said, it’s also important for us to perform full VM image backups for greater flexibility in disaster recovery scenarios.
What’s out there already?
One would think that backing up a VM on a modern virtualization platform such as oVirt would be a simple task. Unfortunately, that’s not the case. The oVirt API provides the means to perform backups but there’s actually no official tool provided to back up multiple running VMs automatically on a schedule. We came across a few open-source options, but they were either abandoned, their dependencies were too complex, or in some cases, required a VM as a proxy to attach cloned snapshots in order to back them up (which is resource intensive).
By far the best option we tried was vProtect by Storware. It did a great job handling oVirt backups and included robust features like incremental backup support, the ability to exclude certain VM disks from the backup, and scheduling policies. Storeware graciously offers the product for free for up to 10 VMs. Unfortunately, once you hit the limit you have to move to their paid license. Being a software development company, we support paying for software and understand the amount of work and costs involved in shipping and maintaining a product. The issue is that many of these backup vendors target enterprise-level businesses and often offer nothing for personal or small-business use. For example, a vProtect license for just three oVirt hosts costs $3,000 USD per year. They also license per-VM, but even with just 10 VMs you’re looking at a cost of $1,200 USD, minimum. For a large business these costs may seem like a drop in the bucket, but many oVirt users do not fall within that category (if they did, they’d more than likely be paying for Redhat Virtualization instead of oVirt).
Another drawback to vProtect is their lack of support for CentOS/RHEL 8x, although we’ve been informed that it is on their development roadmap.
Our homemade solution
After researching and testing some of the available options for oVirt backups we wondered if we could roll our own no-frills solution that would easily allow anyone to perform full-image backups of running oVirt VMs. We originally started by looking at the oVirt REST API, but later discovered we could do what we needed with Ansible.
What we ended up with is a set of simple Ansible playbooks that performs authentication to oVirt, exports a given list of VMs as OVA format, then performs a basic cleanup task to keep backups from getting out of hand. The OVA format is essentially a single file with the entire VM configuration and all data included. It’s a portable file that can be easily imported back into oVirt or any other platform that supports the format.
We found that the “
ovirt_vm” Ansible module could be used to start an OVA
export task on the oVirt hosted engine, which can export to a local or remote
path attached to a specified oVirt host. For example, you can export a VM to
/backup on oVirt host0, which could be a local partition/device or network
attached storage like an NFS mount. The main hurdle we faced when using the
ovirt_vm Ansible module is that Ansible is satisfied the task is complete once
the export task is successfully submitted. This means that if we fed it multiple
VMs, an export task would be started for all VMs at the same time which is far
too resource intensive. We needed to find a way to loop over a list of VMs but
wait for the export to finish before moving on to the next. We ended up looping
the VMs over an includetask and adding a _wait within that included task
wait_for module. The wait simply waits for the filename to be
present in storage before moving on. The export process writes a temporary file
first and is only renamed once the entire export operation is completed. It
would be possible to implement a more sophisticated
wait_for task which
doesn’t need to watch the storage location, but this was the easiest option
since we had local access to the backup mount.
Another hurdle we faced were timeouts, especially when dealing with large VMs
(we have some over 500GB in size). The first timeout had to do with the
wait_for Ansible module. wait_for has a default timeout of 300 seconds, which
was easily overcome by specifying a higher timeout value in the task.
The second timeout we had to consider was the Ansible playbook timeouts on the
oVirt hosted engine itself. Under the hood, oVirt is actually running Ansible
plays to perform the OVA export (Ansible inception!). The oVirt engine has
timeout values set for playbook execution which we found can be overridden by
defining a custom value for “
ANSIBLE_PLAYBOOK_EXEC_DEFAULT_TIMEOUT”. We did
this by creating a new file and included in
/etc/ovirt-engine/engine.conf.d/99-ansible-playbook-timeout.conf on the hosted
engine VM. Note that the hosted-engine service requires restarting after this
change has been made.
Once we had a working playbook that could export a list of VMs (including very
large VMs) one at a time successfully, the last thing we needed was a simple way
to control backup retention. We ended up adding a cleanup task that loops over
the VM list the same way the export task does. This cleanup task is comprised of
a couple basic shell commands using the Ansible shell module. The first task
identifies the most recent backup of the given VM and stores it in a variable
while the second task finds all backups of the given VM older than the
retain_days var, excluding the most recent backup found by the first task. It
then deletes them. This way we can retain X amount of days while also ensuring
that at least one copy of any given VM is always retained.
To automate backups we simply run the playbook with a crontab entry.
We’ve set up a GitHub repo containing our playbooks: https://github.com/silverorange/ovirt_ansible_backup
Ideas for improvement
Some ideas that could be implemented in the next revision:
- Instead of specifying a list of VMs, we could get a list of them using the
ovirt_vm_infoAnsible module If this route was taken, it’d likely be necessary to also include some logic to exclude VMs from backups say, for example,stopped VMs.
- Implement a better wait_for solution, one which does not require access to
the storage system. We think that it may be possible to use the
ovirt_event_info” Ansible module to determine when the export completes instead of relying on a file check.
- Implement a more robust backup retention schema. What we’ve implemented in the first revision does the job, but there is room for improvement.
We have now used the playbook to perform several full backups of our oVirt VMs, including VMs larger than 500 GB with no issues to report. We plan to continue working on improving the process, and hope that this information will be useful to the oVirt community.