Backing up oVirt VMs with Ansible

Close up of Dell PowerEdge server

Here at silverorange, we rely on a mix of our own hosting infrastructure and a variety of cloud-based services. In our on-premise server-room, oVirt has become an increasingly important component of our infrastructure. We’ve been using oVirt for a number of years, and have since moved many of our physical servers to virtual machines. Along with using Ansible as the source of truth for configuration management, we perform file-level backups with Borg.

While we could technically rebuild any of our servers using these tools, it can be a time consuming process, especially with many VMs involved. The last thing we want to be doing in a disaster situation is piecing back together dozens of VMs in the middle of the night. That said, it’s also important for us to perform full VM image backups for greater flexibility in disaster recovery scenarios.

What’s out there already?

One would think that backing up a VM on a modern virtualization platform such as oVirt would be a simple task. Unfortunately, that’s not the case. The oVirt API provides the means to perform backups but there’s actually no official tool provided to back up multiple running VMs automatically on a schedule. We came across a few open-source options, but they were either abandoned, their dependencies were too complex, or in some cases, required a VM as a proxy to attach cloned snapshots in order to back them up (which is resource intensive).

By far the best option we tried was vProtect by Storware. It did a great job handling oVirt backups and included robust features like incremental backup support, the ability to exclude certain VM disks from the backup, and scheduling policies. Storeware graciously offers the product for free for up to 10 VMs. Unfortunately, once you hit the limit you have to move to their paid license. Being a software development company, we support paying for software and understand the amount of work and costs involved in shipping and maintaining a product. The issue is that many of these backup vendors target enterprise-level businesses and often offer nothing for personal or small-business use. For example, a vProtect license for just three oVirt hosts costs $3,000 USD per year. They also license per-VM, but even with just 10 VMs you’re looking at a cost of $1,200 USD, minimum. For a large business these costs may seem like a drop in the bucket, but many oVirt users do not fall within that category (if they did, they’d more than likely be paying for Redhat Virtualization instead of oVirt).

Another drawback to vProtect is their lack of support for CentOS/RHEL 8x, although we’ve been informed that it is on their development roadmap.

Our homemade solution

After researching and testing some of the available options for oVirt backups we wondered if we could roll our own no-frills solution that would easily allow anyone to perform full-image backups of running oVirt VMs. We originally started by looking at the oVirt REST API, but later discovered we could do what we needed with Ansible.

What we ended up with is a set of simple Ansible playbooks that performs authentication to oVirt, exports a given list of VMs as OVA format, then performs a basic cleanup task to keep backups from getting out of hand. The OVA format is essentially a single file with the entire VM configuration and all data included. It’s a portable file that can be easily imported back into oVirt or any other platform that supports the format.

We found that the “ovirt_vm” Ansible module could be used to start an OVA export task on the oVirt hosted engine, which can export to a local or remote path attached to a specified oVirt host. For example, you can export a VM to /backup on oVirt host0, which could be a local partition/device or network attached storage like an NFS mount. The main hurdle we faced when using the ovirt_vm Ansible module is that Ansible is satisfied the task is complete once the export task is successfully submitted. This means that if we fed it multiple VMs, an export task would be started for all VMs at the same time which is far too resource intensive. We needed to find a way to loop over a list of VMs but wait for the export to finish before moving on to the next. We ended up looping the VMs over an includetask and adding a _wait within that included task using the wait_for module. The wait simply waits for the filename to be present in storage before moving on. The export process writes a temporary file first and is only renamed once the entire export operation is completed. It would be possible to implement a more sophisticated wait_for task which doesn’t need to watch the storage location, but this was the easiest option since we had local access to the backup mount.

Another hurdle we faced were timeouts, especially when dealing with large VMs (we have some over 500GB in size). The first timeout had to do with the wait_for Ansible module. wait_for has a default timeout of 300 seconds, which was easily overcome by specifying a higher timeout value in the task.

The second timeout we had to consider was the Ansible playbook timeouts on the oVirt hosted engine itself. Under the hood, oVirt is actually running Ansible plays to perform the OVA export (Ansible inception!). The oVirt engine has timeout values set for playbook execution which we found can be overridden by defining a custom value for “ANSIBLE_PLAYBOOK_EXEC_DEFAULT_TIMEOUT”. We did this by creating a new file and included in /etc/ovirt-engine/engine.conf.d/99-ansible-playbook-timeout.conf on the hosted engine VM. Note that the hosted-engine service requires restarting after this change has been made.

Once we had a working playbook that could export a list of VMs (including very large VMs) one at a time successfully, the last thing we needed was a simple way to control backup retention. We ended up adding a cleanup task that loops over the VM list the same way the export task does. This cleanup task is comprised of a couple basic shell commands using the Ansible shell module. The first task identifies the most recent backup of the given VM and stores it in a variable while the second task finds all backups of the given VM older than the retain_days var, excluding the most recent backup found by the first task. It then deletes them. This way we can retain X amount of days while also ensuring that at least one copy of any given VM is always retained.

To automate backups we simply run the playbook with a crontab entry.

Resources

We’ve set up a GitHub repo containing our playbooks: https://github.com/silverorange/ovirt_ansible_backup

Ideas for improvement

Some ideas that could be implemented in the next revision:

  1. Instead of specifying a list of VMs, we could get a list of them using the ovirt_vm_info Ansible module If this route was taken, it’d likely be necessary to also include some logic to exclude VMs from backups say, for example,stopped VMs.
  2. Implement a better wait_for solution, one which does not require access to the storage system. We think that it may be possible to use the “ovirt_event_info” Ansible module to determine when the export completes instead of relying on a file check.
  3. Implement a more robust backup retention schema. What we’ve implemented in the first revision does the job, but there is room for improvement.

We have now used the playbook to perform several full backups of our oVirt VMs, including VMs larger than 500 GB with no issues to report. We plan to continue working on improving the process, and hope that this information will be useful to the oVirt community.