Creating a backup strategy

Before jumping into writing backup scripts and scheduling configurations, it's important to understand what exactly needs to be backed up, where and how often.

In this lesson we'll create a simple backup plan and strategy. We'll identify all the things that need to be backed up, the locations we'll be backing them up to, and cover some of the tooling that we're going to use.

What can go wrong

Your backup strategy does not have to be complex, however a simple "backup the whole server" approach is also not great for many reasons, including time, disk, network and CPU utilization.

I like to approach backups from a "what if" angle. Ask yourself:

What if I make a configuration mistake?
What if a site editor deletes some content?
What if the site admin deletes the entire database?
What if a malicious theme/plugin destroys or infects some files?
What if an AI agent deletes all files on the server?
What if the server is fully compromised?
What if the server hard drive fails?
What if the server hardware fails beyond recovery?
What if the data center is out of power?
What if a data center technician decommissions our server?
What if my hosting account is suspended?
What if my hosting provider is under a heavy DDoS and my sites are offline?

Some of these are less likely than others, but listing out some of the most probable cases is a great start. Normally you wouldn't need full protection against all of them, but having a solid understanding of required actions for each case will lead to the things that must be in place for a successful recovery.

Examples

Here are some example action plans for some of the above cases.

Configuration mistake: not a problem at all. All configuration is in a Git repository, so reverting a configuration change should be straightforward. We do need to get into a habit of committing and pushing our changes to the repository.

Site editor/admin mistake: we'll need the ability to find deleted/changed content in a previous snapshot of the database, as well as a full restore to a previous state. Since admins can delete plugins, media and other files, we'll need the ability to restore individual files and directories from backups.

Hardware failure: if we lost one drive in a RAID mirrored setup, we need to get in touch with a data center technician to replace the faulty drive. This may involve some downtime and/or performance degradation until the drive is in place and fully in sync. The same applies to memory, CPUs, power, networking and other equipment that can be replaced relatively quickly.

If the failure is beyond recovery, we need to provision a new server. Get our software and configuration up and running from our Git repository. Restore all sites, files and databases from our remote off-site backups location. This often includes juggling new DNS entries, sometimes even bringing in new hosting vendors.

It is a common "last resort" for many catastrophic scenarios.

Downtime & data loss

At this point it is also worth thinking about how much downtime should be expected for some of these scenarios, and how long it will take to recover with manual intervention.

If your data center is having an incident, it may be perfectly fine to accept an hour of downtime or service disruption to avoid hours of manual work to restore sites at a new location.

Data loss is also something worth considering at this stage, and understanding how much of it you're willing to accept for any given scenario. This will determine the frequency of your backups, and the amount you keep on-site and off-site. Note that zero-loss strategies often require complex solutions, with off-site asynchronous database replication, binary logs backups, file replication and more.

Retention vs storage

The longer you plan to keep your backups, the more storage you will require. This can quickly drive costs up, especially for remote backups to cloud storage providers such as Amazon S3.

If you do require lengthy retention rates for large databases and media archives, you should definitely explore tools for differential backups, copy-on-write snapshots, binary logging and other more advanced techniques.

I usually start with full daily backups with 14 days of retention on all of my projects. Then adjust based on how quickly the data is growing and how often it is changing.

On-site vs off-site

On-site backups are ones that live very close to the original source. This could be on the server itself, on a separate disk, on a NAS service provided by the hosting vendor, or on a separate server in the same data center.

These are very fast to work with, and great for quick recovery options.

Off-site backups live in a different physical location, ideally connected to a different power grid, with a different hosting provider, often in a different country.

These are much slower to work with, as they require shipping data back and forth between the backups location and your server. But these are also your last line of defense for some of the more catastrophic events mentioned earlier. It is also quite common practice to reduce the frequency and retention for off-site backups to save on storage costs.

DNS

Looking for the DNS provider credentials or going through account recovery during an incident, then panic-changing the DNS records can quickly make things worse.

I like to keep a DNS.md file around in my configuration repo, with a list of records for every domain associated with the server. This includes A/CNAME records pointing to the server itself, as well as other related records for email, search console and webmaster tools verification, etc.

This article is for premium members only. One-time payment of $196 unlocks lifetime access to all existing and future content on wpshell.com, and many other perks.

Get Premium