Crunchy Data, a Red Hat Technology Connect and OpenShift Primed Partner and active collaborator with OpenShift, discusses in this blog how PostgreSQL Point In Time Recovery (PITR) works on OpenShift.
PITR support is offered within the Crunchy Container Suite starting with Release 1.2.3 which is available from the Github repository https://github.com/crunchydata/crunchy-containers.
The Container Suite containers are pre-built and can be found on DockerHub at https://hub.docker.com/u/crunchydata/
PostgreSQL supports PITR (point in time recovery) as described in the PostgreSQL documentation. PITR lets a DBA restore a database at a given time in the database life. Some examples of PITRPpoints in time useare recorded in a uses with a PostgreSQL database by a DBA, for example, after or before a major change in the database contents or prior to a database upgrade.
To better support PostgreSQL administration, Crunchy developed logic within the crunchy-postgres container to support PITR, including:
- Ability to archive WAL (write ahead log) files to a mounted volume
- Ability to specify the frequency of when WAL files are written
- Ability to restore a database using a base backup and archived WAL files
- Ability to specify a point in time to recover the database
PITR Design
The crunchy-postgres container was modified to provide the container with a dedicated volume mount to be used to persist WAL files outside of the container. Before I go into the example below I will introduce some concepts related to PITR on OpenShift. WAL files will begin being written if the container’s ARCHIVE_MODE environment variable in the OpenShift JSON template file is set as follows:
}, {
"name": "ARCHIVE_MODE",
"value": "on"
},
WAL files are written to a volume mount path of /pgwal
. A separate volume allows the WAL files to live somewhere other than where the PostgreSQL data files are persisted to (e.g. /pgdata
).
Figure 1 -crunchy-postgres volumes
WAL files are written at a frequency specified by the ARCHIVE_TIMEOUT environment variable in the OpenShift JSON template as follows:
}, {
"name": "ARCHIVE_TIMEOUT",
"value": "60"
}],
Setting this value to 60 means that WAL files will be written every 60 seconds. You will want to adjust this time to meet your database recovery requirements. The smaller the timeout, the more WAL files will be written resulting in increased disk volume consumption. The larger the timeout value, the possibility of not recording transactions to WAL files increases, so there is a tradeoff here to consider in a production deployment.
As a database executes, the DBA can create a custom recovery name by issuing SQL commands which create a recovery target. Recovery targets can be either a user-defined name created by the DBA or a timestamp.
When a database is recovered by the crunchy-postgres container, the recovery target is specified by the RECOVERY_TARGET_NAME or RECOVERY_TARGET_TIME environment variable within the OpenShift JSON template as follows:
}, {
"name": "RECOVERY_TARGET_NAME",
"value": "beforechanges"
}, {
Lastly, when performing a database recovery, the crunchy-postgres container needs to be able to access the archived WAL files from the database to be recovered. For this, a volume mount path of /recover
is used (read-only) to reference the WAL files. It is the presence of the /recover volume that indicates to the container that we want to perform a PITR.
Figure 2 - PITR recover volume
Example PITR Workflow
An example of running a PITR is provided in the examples/openshift/pitr directory in the Github repository. For more details on how to run the example, refer to the docs/openshift.asciidoc file. The example makes use of NFS as a persistent volume type.
Figure 3 - PITR workflow
To start the example, you first need to create a database container. This is performed by running the following bash script which contains all the necessary OpenShift CLI commands:
run-master-pitr.sh
Executing the script above will create the following objects within OpenShift:
Figure 4 - OpenShift components used for the initial database
Next, PITR requires a full database backup. The database backup is performed using the PostgreSQL pg_basebackup utility. We create the database backup by executing the bash script example as follows which contains all the necessary OpenShift CLI commands:
run-master-pitr-backup.sh
The backup is executed and the backup files are stored to a persistent disk volume such as a NFS mount. Backups are stored in a time stamped directory to allow for any number of backups to be retained. The OpenShift objects created in the backup script are depicted below:
Figure 5 - OpenShift components used in the backup
Next, to demonstrate the PITR feature, we will create a recovery target name called beforechanges. Then we will insert some sample data, and finally create a recovery target name called afterchanges. Execute the following SQL commands using a PostgreSQL client such as sql
to create the recovery targets:
select pg_create_restore_point('beforechanges');
create table pitrtest (id int);
select pg_create_restore_point('afterchanges');
select pg_create_restore_point('nomorechanges');
checkpoint;
Finally we will create a recovered database using both the base backup and WAL files as input to the crunchy-postgres container. Together, these archived files will be used to recover to a known target name.
You can modify the RECOVERY_TARGET_NAME value in the example’s master-pitr-restore.json
file to recover to the point before inserting sample data or the point after inserting the data. If you remove the RECOVERY_TARGET_NAME from the JSON template, the database will be restored using all of the WAL files, meaning you will restore to the latest point in time. Execute the following script which contains the necessary OpenShift CLI commands to create the recovered database:
run-restore-pitr.sh
After running the restore script above, the resulting OpenShift objects created are depicted below:
Figure 6 - OpenShift components used when recovering a database
Conclusion
In this blog we described how the Crunchy Container Suite supports PostgreSQL PITR. One of the central design goals of the Crunchy Container Suite is to assist the DBA in managing containerized PostgreSQL. In an upcoming release of the Suite, even more advanced backup and restore capabilities are planned.
About Crunchy Data
Crunchy Data is a leading provider of trusted open source PostgreSQL and PostgreSQL support, technology and training. Crunchy Data is the provider of Crunchy Certified PostgreSQL, an open source PostgreSQL 9.5 distribution including popular extensions such as PostGIS and enhanced audit logging capability. Crunchy Certified PostgreSQL is currently the only open source database Common Criteria certified at the EAL2+ level.
When combined with Crunchy’s Secure Enterprise Support, Crunchy Certified PostgreSQL provides enterprises with an open source and trusted relational database management solution backed by enterprise support from leading experts in PostgreSQL technology. For enterprises requiring dedicated PostgreSQL support, Crunchy provides on-premise PostgreSQL professional services and PostgreSQL training. Learn more at www.crunchydata.com
Categories