Physical Database Backups: Archiving the Entire PostgreSQL Data Directory
When managing PostgreSQL in Docker, there are two fundamental ways to back up your data: Logical (exporting SQL text via pg_dump) and Physical (copying the actual raw data files on the disk).
While SQL exports are excellent for portability and schema updates, Physical Backups are significantly faster for large databases (100GB+) and ensure that your entire environment—including configuration files, indexes, and Write-Ahead Logs (WAL)—is preserved exactly as it is. In this technical guide, I’ll show you how to automate a compressed archive of a complete PostgreSQL data folder living on the host filesystem.
1. The Architecture
This workflow involves interacting directly with the Host File System and the Docker Engine daemon to ensure complete data consistency during the backup phase.
Component Breakdown:
- pg_data Folder: The physical directory mapped onto your Ubuntu instance where Postgres stores its internal files (tables, schemas, WAL logs).
- Docker Compose: Used to "Pause/Stop" the database. This is crucial because copying a "live" database folder can lead to permanently corrupted files if Postgres writes to the disk during the copy interval.
- Tar & Gzip: Native Linux tools used to bundle the entire directory into a single, highly compressed immutable
.tar.gzarchive. - Cron: The strict Linux scheduler that triggers this process hands-free.
2. The Automated Backup Script
This bash script safely orchestrates the process: it gracefully stops the database, heavily archives the directory, and brings the microservice back online immediately. By explicitly capturing the exit code of `tar`, we guarantee the database boots back up even if the disk runs out of space.
File: ~/scripts/backup_folder_all.sh
#!/usr/bin/env bash
# Exit immediately on any strictly unhandled error
set -e
# --- CONFIGURATION ---
BACKUP_BASE_DIR="$HOME/backups/physical"
TODAY=$(date +"%Y-%m-%d")
BACKUP_DIR="$BACKUP_BASE_DIR/$TODAY"
RETENTION_DAYS=7
# Ensure daily backup directory exists
mkdir -p "$BACKUP_DIR"
backup_physical_folder() {
local env_name=$1
local compose_dir=$2
local data_folder_name="pg_data" # Change this if your mounted volume folder name differs
echo "--- Starting Physical Backup for: $env_name ---"
cd "$compose_dir"
# 1. STOP the database to prevent data corruption during copy operations
echo "-> Stopping container to ensure strict data consistency..."
docker compose stop postgres
# 2. Archive the volume folder
echo "-> Creating highly compressed block archive..."
if tar -czf "$BACKUP_DIR/${env_name}_full_dir.tar.gz" -C "$compose_dir" "$data_folder_name"; then
echo " [SUCCESS] Native archive created."
else
echo " [ERROR] Archiving failed (check disk space or inode exhaustion)."
# Rescue condition: Ensure DB starts even if backup pipeline fails
docker compose start postgres
exit 1
fi
# 3. RESTART the database
echo "-> Restarting container to resume network traffic..."
docker compose start postgres
echo "--- Finished $env_name ---"
}
# --- EXECUTION ---
# Syntax: backup_physical_folder "ProjectName" "PathToDockerComposeFolder"
backup_physical_folder "ProjectName" "PathToDockerComposeFolder"
# --- CLEANUP (Retention Policy) ---
# Prevent disk burnout by purging physical backups older than 7 days
find "$BACKUP_BASE_DIR" -type d -mtime +$RETENTION_DAYS -exec rm -rf {} +
echo "Backup pipeline finished optimally at $(date)"
3. Automation via Crontab
To ensure this runs every Monday at 8:00 AM without human intervention, we schedule it utilizing Cron equipped with process politeness settings to avoid tanking the host's performance.
Step 1: Execute Permissions
Make the script securely executable:
chmod +x ~/scripts/backup_folder_all.sh
Step 2: Edit the Schedule
Inject the payload into the cron daemon:
crontab -e
Add the exact schedule syntax with CPU adjustments:
00 08 * * 1 nice -n 19 ionice -c 3 /bin/bash /home/ubuntu/scripts/backup_folder_all.sh >> /home/ubuntu/backups/cron_folder.log 2>&1
nice and ionice?Since we are heavily compressing an entire folder (which can easily exceed multiple gigabytes), it burns massive CPU cycles and Disk IOPS:
nice -n 19: Yields the highest niceness. Gives the backup shell process the lowest possible CPU priority relative to production traffic.ionice -c 3: Instructs the Linux kernel to only write the tarball stream to the disk when the hard drive is completely idle. This strictly ensures your server remains responsive and handles HTTP requests without spikes during the compression.
4. Best Practices for Folder Backups
A. The "Consistency" Rule
Never copy a pg_data folder while the database is aggressively running. Postgres constantly flushes data to "Write-Ahead Logs" (WAL) asynchronously. If you copy the folder mid-write utilizing rsync or cp, the resulting backup will suffer from corrupted byte boundaries and will be entirely unstartable. Always rigidly wrap your copies in docker compose stop or utilize native filesystem snapshots (like ZFS/LVM) if available.
B. Version Matching
Physical backups are intrinsically "version-dependent". If you back up a raw pg_data folder synthesized from PostgreSQL 15, you absolutely cannot easily restore it into a running PostgreSQL 16 container. The underlying binary layouts are completely different. Always rigorously document which immutable version tag of Postgres you were running at the instance of the physical backup.
C. Test Your Restores
A backup is solely a "promise" until you validate it. To safely test a physical backup pipeline:
- Scp the
.tar.gzpayload to a staging machine. - Decompress it:
tar -xzf backup.tar.gz. - Mount a fresh isolated Docker container securely to that newly generated folder.
- Verify your indices and critical data tables are fully accessible inside the
psqlshell.
Conclusion
By capturing the entire pg_data directory, we have effectively constructed an immutable "snapshot in time" of our entire infrastructural state space. This physical methodology translates to the fastest technical path to recover from a total catastrophic server failure, as it uniquely allows us to restore the database configuration, users, layouts, and indices exactly as they were prior to the incident.
Always mix logical logical backups with physical off-site tarballs for true enterprise resilience!