Binary Backups for PostgreSQL

hot-backups-for-postgresql

Creating a PostgreSQL backup is a simple task. You only need to run the below command, and save the result as your backup:

$ pg_dump --username=your-user-name --file=output-file dbname

The result is a file full of SQL commands that, when fed back to the server, will recreate the database in the same state as it was at the time of the dump. So to restore from that, you only need to run the below command:

$ psql -U your-user-name dbname < backup-file

I can't imagine a simpler process for the backup and restore of a database; However as the backup file contains many many SQL commands, creating and restoring backups for a large database may take a very long time and that could make your backups useless - especially if you need to restore in a failover scenario.

The good news is there are some other backup methods you can use to have a faster backup/restore process.

File system level backup

This is the fastest method you can use for taking a PostgreSQL backup. If you directly copy the files that PostgreSQL uses to store the data in, you will have a File system level backup which can be restored by copy back those files.

Although this method is faster than creating a dump file, you can't use it while the PostgreSQL server is up and running, because the files you are trying to copy may change during the process and the result file-set will not be consistent.

Essentially this means that you can not use this method to backup a production PostgreSQL server as you are not able to shutdown your database every hour to take a backup.

Binary Backups

PostgreSQL uses a mechanism called write ahead log (WAL) to records every change made to the database's data files.
It means if you take a file system backup and re-play the WAL commands generated during the backup process on the result file-set, then you will have a consistent backup which can be used to restore PostgreSQL to the exact state the database was when the backup finished. We call these backups Binary Backups and these are the steps you need to follow to have one.

1. Configure wal_level

There is an item in Postgresql configuration file called wal_level which determines how much information is written to the WAL.
The default value is minimal, which writes only the information needed to recover from a crash or immediate shutdown.
For Binary Backups you need to set this value to hot_standby.
Just open the PostgreSQL config file (postgresql.conf) and find the wal_level. Set hot_standby as its value and restart your PostgreSQL service.

2. Configure max_wal_senders

max_wal_senders is the maximum number of simultaneously running WAL sender processes.

PostgreSQL uses WAL to sync changes with replica servers in Streaming Replication method.So if you have Streaming Replication in place, you need one WAL sender process for each slave.
Also you need two WAL sender processes for creating a Standalone Hot Backups.
Based on the above information you need to calculate the number of WAL sender processes and update the config.
Open the PostgreSQL config file (postgresql.conf) and find the max_wal_senders. Set the calculated number as its value and restart your PostgreSQL service.

3. Configure authentication

To be able to create Binary Backups you need to use a superuser or a user which has REPLICATION permissions.
In PostgreSQL, the Client authentication is controlled by a configuration file called pg_hba.conf and is stored in the database cluster's data directory.
Open pg_hba.conf and add the below line to it (replace your-postgresal-username with your username) and reload the PostgreSQL service :

host replication your-postgresal-username 127.0.0.1/32 md5

4. Use pg_basebackup to create a backup

pg_basebackup is used to take base backups of a running PostgreSQL database cluster. This tool has an option ( -x ) which if you use will include WAL files to created result.

Run the below command to create the backup (replace your-username with your username):

$ pg_basebackup -U your-username -x -D /path/to/backup/folder

After pg_basebackup finishes its work, your Binary Backup is ready to use. You can use tar to create one archive file and save it in a safe place to restore when needed.

Restore Binary Backups

Now that you have the backup folder, you can use it to restore PostgreSQL to the state the backup is finished. These are the steps you need to go through to restore a Binary Backup:

Prepare the Folders

Stop the PostgreSQL service and then delete your PostgreSQL data directory (note: You may want to move it somewhere else instead of deleting it for redundancy).

Then create a new empty directory as your PostgreSQL data directory, and use rsync to move the content of your backup folder to the newly created folder (replace /path/to/backup/folder/ and /path/to/postgresql/data/folder with correct values):

$ sudo rsync -az /path/to/backup/folder/ /path/to/postgresql/data/folder

Fix permissions of the PostgreSQL data folder:

$ sudo chown -R postgres:postgres /path/to/postgresql/data/folder

And

$ sudo chmod -R 0700 /path/to/postgresql/data/folder

Lastly, Start the PostgreSQL service. Now you should have PostgreSQL server with restored data!

As you see the whole process of Backup/Restore for Binary Backups is more complicated that dealing with SQL dump files, but if you have a large database and restore time is important for you, this method could be a great replacement.