backup

We will use pg_basebackup to take a complete backup of the PostgreSQL data directory. A separate directory will be created for storing backups, preferably on a separate mount point. The ownership of this directory will be assigned to the postgres user to ensure proper permissions.

sudo mkdir -p /db_backup/base_backup sudo chown -R postgres:postgres db_backup/

sudo su – postgres cd /db_backup

Create a script for the backup that streams the WAL files and deletes backups older than 4 days.

nano pg_basebackup

#!/bin/bash # Backup directory BACKUP_DIR=”/db_backup/base_backup” # Get the current date for the backup filename DATE=$(date +”%Y-%m-%d_%H-%M-%S”) # Run pg_basebackup as the postgres user pg_basebackup -D “$BACKUP_DIR/$DATE” -Ft -z -P -X stream # Check if the backup was successful if [ $? -eq 0 ]; then echo “Backup successful: $BACKUP_DIR/$DATE” else echo “Backup failed!” exit 1 fi # Delete backups older than 4 days find “$BACKUP_DIR” -type d -mtime +4 -exec rm -rf {} \; echo “Old backups deleted (older than 4 days).”

edit the permission to make it executable

chmod +x pg_basebackup

Create another file to store the backup logs, which will keep track of the backup status.

touch pg_backup.log

test scrip by running it one time

./pg_basebackup

Schedule the backup using crontab to run every day at 2:00 AM. pg_basebackup is an online backup method and has less impact on PostgreSQL operations compared to pg_dump. However, it is still recommended to perform the backup during off-peak hours for minimal disruption.

crontab -e

past the following in crontab

0 2 * * * /db_backup/backup_script.sh >> /db_backup/pg_backup.log 2>&1

schedule vacuum and vacuum analyze

vacuum analyze

VACUUM ANALYZE is a critical tool that helps update statistics to assist the query planner in generating better execution plans. In PostgreSQL, when rows are deleted, they are not immediately removed from disk; they are only marked as deleted but continue to occupy storage. It’s advisable to update statistics regularly to ensure the query planner can select the best execution plan.

For this, we will schedule vacuumdb --analyze to update statistics and ensure it runs daily during off-peak hours.

Switch to the postgres user, create a script, and assign executable permissions.

su – postgres nano vacuum_all_databases.sh

add the below parameter in the file

#!/bin/bash # Run VACUUM ANALYZE on all databases vacuumdb –all –analyze

change the permission to make the file executable

chmod +x vacuum_all_databases.sh

create another file for keep log of the scheduled task

touch vacuumdb.log

test the script once ./vacuum_all_databases.sh

schedule to run during off peek

crontab -e

the below will schedule to run the script everyday at 3:00Am

0 3 * * * /usr/local/bin/vacuum_all_databases.sh >> /var/log/vacuumdb.log 2>&1

Note: vacuumdb will not work if the server is part of a cluster managed by Patroni and is in standby mode. Additionally, it will not work if executed on a server that is a standby in streaming replication. Therefore, it is advisable to schedule vacuumdb on the primary server only. Keep in mind that running VACUUM ANALYZE on the primary server will apply to all other nodes, so there is no need to run vacuumdb on every node in the Patroni cluster—just on the primary server.

vacuum full

PostgreSQL’s VACUUM command must regularly process each table for several important reasons:

To reclaim or reuse disk space occupied by updated or deleted rows.
To refresh data statistics used by the PostgreSQL query planner for efficient query execution.
To update the visibility map, which enhances the performance of index-only scans.
To prevent the loss of very old data due to transaction ID wraparound or multixact ID wraparound.

Each of these reasons requires running VACUUM with varying frequency and scope, as detailed in the following sections.

We will implement vacuumdb --full, which will remove rows marked for deletion and reclaim disk space, effectively reducing database bloat. Note that vacuumdb --full requires an exclusive lock on the tables, so it is best to run it during off-peak hours.

I will schedule the vacuumdb --full operation to run at midnight on the weekends.

Switch to the postgres user and create a script file, placing the file in the db_basebackup directory we created earlier, as this directory is owned by the postgres user.

sudo su – postgres cd /db_backup/ nano vacuum_full.sh

put the below parameter on the file

#!/bin/bash # Run VACUUM FULL on all databases vacuumdb –all –full

create another file for storing logs for scheduled vacuumdb full

touch vacuum_full.log

change permission to be executable

chmod +x vacuum_full.sh

Test the script once, but make sure to avoid running it during working hours. It’s recommended to test it during off-peak hours to minimize any potential impact on the system.

./vacuum_full.sh

open crontab -e and add schedule for vacuum

crontab -e

0 0 * * 6,7 /usr/local/bin/vacuum_full.sh >> /var/log/vacuum_full.log 2>&1

REINDEX

REINDEX rebuilds an index using the data from the associated table, replacing the old index. There are several situations where using REINDEX is necessary:

An index has become corrupted and no longer contains valid data. Although this is rare, it can happen due to software bugs or hardware failures. REINDEX offers a recovery solution.
An index has become “bloated,” meaning it contains many empty or nearly-empty pages. This can occur in B-tree indexes under certain access patterns. REINDEX helps reduce space consumption by rebuilding the index without the dead pages.
A storage parameter, such as fillfactor, has been altered for an index, and you want to ensure the change takes effect.
If an index build with the CONCURRENTLY option fails, the index is marked as “invalid.” These indexes are unusable but can be rebuilt using REINDEX.
REINDEX requires a lock on the object to execute, so it’s best to schedule it during off-peak hours to minimize disruption.

For my instance, I will schedule it to run at midnight on the weekend.

Switch to the postgres user and create a script file, placing the file in the db_basebackup directory we created earlier, as this directory is owned by the postgres user.

sudo su – postgres cd /db_backup/ nano reindex.sh

put the below parameter on the file

#!/bin/bash # Loop through all databases and run REINDEX on each for db in $(psql -U postgres -t -c “SELECT datname FROM pg_database WHERE datistemplate = false;”) do echo “Reindexing database: $db” psql -U postgres -d “$db” -c “REINDEX DATABASE \”$db\”;” done

create another file for storing logs for scheduled reindex full

touch reindex_all.log

change permission to be executable

chmod +x reindex.sh

Test the script once, but make sure to avoid running it during working hours. It’s recommended to test it during off-peak hours to minimize any potential impact on the system.

./reindex.sh

open crontab -e and add schedule for vacuum

crontab -e

0 5 * * 5 /db_basebackup/reindex_all_databases.sh >> /var/log/reindex_all.log 2>&1

Visited 13 times, 1 visit(s) today

Category: PostgreSql, Postgresql Dba Guide