Fix: ImpulseDB Postgres backups are failing
A customer's backups aren't refreshing, or the audit log shows repeated backup.failed entries for a service or host.
Last updated 20 days ago
Fix: ImpulseDB Postgres backups are failing
What you're seeing
- A customer's most recent backup timestamp under Servers > (server) > Backups is days old.
- The audit log shows repeated
backup.failedentries for a service or shared host. - A customer support ticket: "your backup card says my last backup is from a week ago."
- WAL-G base backups stopped landing in the S3 bucket.
Why it happens
ImpulseDB Postgres runs nightly logical dumps via pg_dump on every host, plus optional continuous WAL streaming via WAL-G to an S3-compatible target. A backup failure usually traces to one of:
- S3 credentials revoked or rotated and the new values weren't pasted into Settings > Backups.
- The destination bucket is out of quota or doesn't exist.
- WAL-G binary missing on the Postgres host (a partial cloud-init that didn't include it) or out of date.
- Network outage from the Postgres host to S3 (or to the ImpulseMinio admin proxy if using the one-click ImpulseMinio integration).
- Disk on the Postgres host filled up, so the local dump can't write before upload.
Fix
- Filter the audit log to the failing service or server. Open Tools > Audit Log, filter to
module=impulsedb-postgresandevent=backup.failed. The most recent entry has the verbatim error from WAL-G orpg_dumpβ it names the failing step. - For credential / S3 endpoint errors, open Settings > Backups and re-paste the access key, secret, endpoint, and bucket. If you're using the ImpulseMinio integration, confirm the region credentials in
mod_impulseminio_regionsare still valid by opening the ImpulseMinio admin tab and checking that the region row is green. Save in ImpulseDB Postgres and click Reconfigure backups on the affected server row β it re-runs the WAL-G config over SSH without rebooting Postgres. - For bucket quota errors, check the S3 provider console for the destination bucket's usage. Either raise the quota, prune old base backups (
wal-g delete retain FULL 7on the host keeps the 7 most recent), or rotate to a new bucket. - For "WAL-G not found" errors, SSH to the host:
If it's missing, the server was provisioned before WAL-G credentials were configured. Click Reconfigure backups on the server row to re-run the WAL-G install over SSH.which wal-g wal-g --version - For disk-full errors, SSH in and check
df -h /var/backups/postgres. The shipped retention prune runs daily; if backups are accumulating because the prune step itself errored, drop the oldest dumps by hand, then re-run the backup task. The nightly cron isimpulsepostgres_usage.php(runs at 02:00). The task is registered with ImpulseCore's runner, so you can also trigger it ad-hoc from Tools > Diagnostics. - For network errors, test S3 reachability from the Postgres host:
A connection-refused or timeout points at firewall / outbound egress on the host's network.curl -I https://<s3-endpoint>/<bucket>
How to confirm it worked
Trigger a one-off backup from the per-server Backup Now button (WAL-G base backup) or from the Tools > Diagnostics backup task. Within a few minutes:
- The audit log appends a
backup.completedevent for the service or host. - The Servers > (server) > Backups card updates with a fresh timestamp and size.
- For WAL-G,
wal-g backup-liston the host shows the new base backup.