Set Up a PostgreSQL High Availability Cluster using Ansible on Ubuntu 22.04

This tutorial will guide you through the process of creating a fault-tolerant, scalable, and highly available PostgreSQL cluster from scratch using Ansible automation on an Ubuntu 22.04. This guide is also applicable if you are running an older release of Ubuntu 18.04, or Ubuntu 20.04 in your environment.

By the end of this guide, you'll have a robust PostgreSQL cluster ready for your production use.

PostgreSQL HA cluster is comprised of the following components:

Patroni is a cluster manager used to customize and automate deployment and maintenance of PostgreSQL HA (High Availability) clusters.
etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. We will use etcd to store the state of the PostgreSQL cluster.
PgBouncer is an open-source, lightweight, single-binary connection pooler for PostgreSQL. PgBouncer maintains a pool of connections for each unique user, database pair. It’s typically configured to hand out one of these connections to a new incoming client connection, and return it back in to the pool when the client disconnects.
HAProxy is a free and open source software that provides a high availability load balancer and reverse proxy for TCP and HTTP-based applications that spreads requests across multiple servers.
Keepalived implements a set of health checkers to dynamically and adaptively maintain and manage load balanced server pools according to their health. When designing load balanced topologies, it is important to account for the availability of the load balancer itself as well as the real servers behind it.

The sole purpose of this guide is to automate the entire setup process with Ansible, and to make PostgreSQL database highly available in order to avoid any single point of failure.

PostgreSQL Cluster Hardware Requirements

The following sections will help you to understand the least minimum system requirements for the various components of the patroni cluster.

Patroni/Postgres system requirements:
2 Core CPU
4 GB RAM
80 GB HDD

For extensive usage, it is recommended to assign 100 GB to PGDATA directory, or even more disk space according to your data retention policy.

ETCD system requirements:
2 Core CPU
4 GB RAM
80 GB SSD

It is highly recommended to use dedicated SSD storage because etcd is iops heavy, and writes data to disk, strongly depends on disk performance.

PgBouncer system requirements:
2 Core CPU
4 GB RAM
20 GB HDD

PgBouncer is just a connection pooler for postgres database, and it doesn't require much disk space.

HAProxy system requirements:
2 Core CPU
4 GB RAM
80 GB HDD

The above HAProxy system requirements is given based on an average of 1000 connections per second. For a larger number of connections, the memory and CPU have to be increased in the ratio of 1:2 (memory in GB = 2 x number of CPU cores). The CPU cores have to be as fast as possible for better HAProxy performance.

Number of nodes requirements

It is recommended to have an odd number of nodes in a patroni cluster. An odd-size cluster tolerates the same number of failures as an even-size cluster but with fewer nodes. The difference can be seen by comparing even and odd sized clusters:

Cluster Size	Majority	Failure Tolerance
1	1	0
2	2	0
3	2	1
4	3	1
5	3	2
6	4	2
7	4	3
8	5	3
9	5	4

The same odd number of nodes recommendation goes for etcd cluster as well. Safely, you can say 3-nodes cluster is better, but 5-nodes cluster is good to go.

Production environment

You can take an example of the following recommended set up to build a highly available patroni cluster for your production use.

Solution 1:

Take 3-nodes for patroni cluster - install (postgres, patroni, pgbouncer, haproxy, keepalived) on these three machines.
Take 3-nodes for etcd cluster - install etcd on these three machines.

This will become a 6-nodes patroni cluster, which you can scale up in future by adding more nodes into your patroni and etcd cluster to meet your production needs.

Solution 2:

Take 3-nodes for patroni cluster - install (patroni+postgres) on these three machines.
Take 3-nodes for etcd cluster - install only (etcd) on these three machines.
Take 2-nodes for HAProxy - install (haproxy+keepalived+pgbouncer)on these two machines.

This will become a 8-nodes patroni cluster which you can scale up in future by adding more nodes into your patroni and etcd cluster to meet your production needs.

Basic environment for PostgreSQL HA cluster with Patroni

For the sake of this guide, we will go with the minimum requirement to set up a 3-node patroni cluster on three separate virtual machines:

HOSTNAME	IP ADDRESS	SERVICE	Shared_IP
pg_node1	192.168.0.10	PostgreSQL, Patroni, PgBouncer, etcd, HAProxy, Keepalived	192.168.0.200
pg_node2	192.168.0.11
pg_node3	192.168.0.12

We will install (PostgreSQL, Patroni, etcd, pgbouncer, haproxy, and keepalived) on each of these three nodes.

Now that you have understood PostgreSQL HA cluster requirement, let's begin with the following steps to build a highly available Patroni cluster with PostgreSQL database version 14 using Ansible automation tool on Ubuntu 22.04.

Prepare Your Ansible Control Node

In Ansible, a control node is the machine where you run Ansible commands and playbooks. It is the system from which you connect to and manage other machines, which are referred to as managed nodes. The control node in simple term is a workstation where you write and execute Ansible playbooks, manage inventory files, and store configuration.

To prepare a control node for Ansible, you need to follow these general steps:

Edit /etc/hosts file on your ansible control node:

sudo nano /etc/hosts

Add each of your node's IP address and its respective name like below:

127.0.0.1       localhost
192.168.0.10    pg_node1        pgnode1
192.168.0.11    pg_node2        pgnode2
192.168.0.12    pg_node3        pgnode3
192.168.0.200   shared_ip       haproxy

Make sure you replace highlighted text with yours. Save and close the editor when you are finished.

Install Ansible

Ansible is typically available through package managers like apt, yum, or can be installed using Python's pip.

For example, on Ubuntu, you can install Ansible using:

sudo apt update
sudo apt install -y ansible jq

Set Up SSH Keys

Ansible uses SSH to connect to and communicate with managed nodes. Ensure that your control node has SSH keys set up, and the public key is added to the authorized_keys file on each managed node. This allows Ansible to establish secure connections without requiring a password each time.

ssh-keygen

Follow the prompts to generate SSH keys.

Add public key to each of your node using the following command:

ssh-copy-id sshuser@pg_node1
ssh-copy-id sshuser@pg_node2
ssh-copy-id sshuser@pg_node3

This will prompt you to enter ssh user password.

Make ssh connection to each of your node with (IP address, and Hostname) to ensure password-less access is working:

ssh 192.168.0.10
ssh 192.168.0.11
ssh 192.168.0.12

ssh pg_node1
ssh pg_node2
ssh pg_node3

When you are finished with the above, proceed with the following step to create ansible directory structure.

Create Ansible Directory Structure

You can create ansible directory structure for your PostgreSQL HA cluster deployment like an example below:

mkdir postgres_ansible
cd postgres_ansible

mkdir -p roles/{common,postgres,etcd,patroni,haproxy,pgbouncer,keepalived}

mkdir -p roles/common/tasks
mkdir -p roles/postgres/tasks
mkdir -p roles/etcd/tasks
mkdir -p roles/patroni/tasks
mkdir -p roles/haproxy/tasks
mkdir -p roles/pgbouncer/tasks
mkdir -p roles/pgbouncer/files
mkdir -p roles/keepalived/tasks

Create ansible.cfg file inside postgres_ansible directory:

nano ansible.cfg

Add following content into ansible.cfg file:

[defaults]
inventory = hosts

Save and close the editor when you are finished.

Create a hosts file inside postgres_ansible directory:

nano hosts

Populate hosts file with the required variables like below:

[dbservers]
pg_node1 ansible_host=192.168.0.10
pg_node2 ansible_host=192.168.0.11
pg_node3 ansible_host=192.168.0.12

[dbservers:vars]
ansible_user=administrator

Make sure you replace highlighted text with yours. Save and close the editor when you are finished.

Create Ansible Common task

Create a main.yml file, and place it inside postgres_ansible/roles/common/tasks directory:

nano roles/common/tasks/main.yml

Add following set of instruction:

    - name: Enable NOPASSWD for sudo group
      become_user: root
      lineinfile:
        path: /etc/sudoers
        regexp: '^%sudo\s+ALL=\(ALL:ALL\) ALL$'
        line: '%sudo ALL=(ALL) NOPASSWD: ALL'

    - name: Ensure UFW is installed
      apt:
        name: ufw
        state: present

    - name: Allow SSH connections
      ufw:
        rule: allow
        port: 22
        comment: "Allow SSH"

    - name: Allow PostgreSQL connections
      ufw:
        rule: allow
        port: 5432
        comment: "Allow PostgreSQL"

    - name: Allow pgbouncer connections
      ufw:
        rule: allow
        port: 6432
        comment: "Allow pgbouncer"

    - name: Allow patroni RESTAPI
      ufw:
        rule: allow
        port: 8008
        comment: "Allow patroni RESTAPI"

    - name: Allow etcd connections
      ufw:
        rule: allow
        port: 2379
        comment: "Allow etcd"

    - name: Allow etcd peer urls connections
      ufw:
        rule: allow
        port: 2380
        comment: "Allow etcd peer urls"

    - name: HAProxy listening port master database node
      ufw:
        rule: allow
        port: 5000
        comment: "Allow HAProxy connectivity to master database node"

    - name: HAProxy listening port for replica database nodes
      ufw:
        rule: allow
        port: 5001
        comment: "Allow HAProxy connectivity to replica database nodes"

    - name: HAProxy HTTP
      ufw:
        rule: allow
        port: 7000
        comment: "Allow HAProxy HTTP to access stats dashboard"

    - name: Keepalived VRRP multicast      ufw:
        rule: allow
        src: 0.0.0.0/0
        dest: 224.0.0.18
        comment: "Allow Keepalived VRRP"

    - name: Enable UFW
      ufw:
        state: enabled

    - name: Set timezone
      community.general.timezone:
        name: Asia/Karachi

    - name: Copy /etc/hosts file
      ansible.builtin.copy:
        src: /etc/hosts
        dest: /etc/hosts
        owner: root
        group: root

Make sure you replace highlighted text with yours. Save and close the editor when you are finished.

Create Ansible PostgreSQL task

Create a main.yml file, and place it inside postgres_ansible/roles/postgres/tasks directory:

nano roles/postgres/tasks/main.yml

Add following set of instruction:

    - name: Update cache
      apt:
        update_cache: yes

    - name: Check if file exists
      stat:
        path: /etc/apt/trusted.gpg.d/ACCC4CF8.gpg
      register: result

    - name: Import PostgreSQL GPG key
      ansible.builtin.shell:
        cmd: "curl -fsSL https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo gpg --dearmor -o /etc/apt/trusted.gpg.d/ACCC4CF8.gpg"
      args:
        executable: /bin/bash
      when: not result.stat.exists

    - name: Add PostgreSQL repository
      ansible.builtin.shell:
        cmd: |
          echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/ACCC4CF8.gpg] https://apt.postgresql.org/pub/repos/apt {{ ansible_distribution_release }}-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list
      args:
        executable: /bin/bash
      when: not result.stat.exists

    - name: Update cache
      ansible.builtin.apt:
        update_cache: yes

    - name: Install prerequisites
      ansible.builtin.apt:
        name:
          - wget
          - ca-certificates
          - curl
          - lsb-release
          - jq
        state: present
        update_cache: yes

    - name: Install PostgreSQL Version 14
      ansible.builtin.apt:
        name: postgresql-14
        state: present
        update_cache: yes

    - name: Stop PostgreSQL service
      ansible.builtin.systemd:
        name: postgresql
        state: stopped
        enabled: no

    - name: Check if source directory exists
      ansible.builtin.stat:
        path: "/usr/lib/postgresql/14/bin"
      register: src_directory

    - name: Create symlinks to PostgreSQL binaries
      ansible.builtin.shell:
        cmd: "ln -s /usr/lib/postgresql/14/bin/* /usr/sbin/"
        executable: /bin/bash  # Specify the shell for the command
      when: src_directory.stat.exists

    - name: Remove everything from PostgreSQL data directory
      ansible.builtin.shell:
        cmd: "rm -rf /var/lib/postgresql/14/main/*"
        executable: /bin/bash

    - name: Load watchdog soft module on boot
      copy:
        content: |
          #!/bin/sh -e
          modprobe softdog
          chown -R postgres:postgres /dev/watchdog
        dest: /etc/rc.local
        mode: '0755'

    - name: Load watchdog soft module
      ansible.builtin.command:
        cmd: modprobe softdog

    - name: Set ownership for /dev/watchdog
      ansible.builtin.file:
        path: /dev/watchdog
        owner: postgres
        group: postgres

Make sure you replace PostgreSQL version with the version you intend to install. Save and close the editor when you are finished.

Create Ansible etcd task

Create a main.yml file, and place it inside postgres_ansible/roles/etcd/tasks directory:

nano roles/etcd/tasks/main.yml

Add following set of instruction:

    - name: Install etcd
      ansible.builtin.apt:
        name: etcd
        state: present
        update_cache: yes

    - name: Stop etcd service
      ansible.builtin.systemd:
        name: etcd
        state: stopped
        enabled: no

    - name: Remove etcd data directory
      ansible.builtin.command:
        cmd: "rm -rf /var/lib/etcd/default"

    - name: Get enp0s3 IPv4 address
      set_fact:
        enp0s3_ipv4_address: "{{ ansible_facts['enp0s3']['ipv4']['address'] }}"

    - name: Create etcd configuration file
      ansible.builtin.blockinfile:
        path: /etc/default/etcd
        block: |
          ETCD_NAME="{{ inventory_hostname }}"
          ETCD_DATA_DIR="/var/lib/etcd/{{ inventory_hostname }}"
          ETCD_LISTEN_PEER_URLS="http://{{ enp0s3_ipv4_address }}:2380"
          ETCD_LISTEN_CLIENT_URLS="http://{{ enp0s3_ipv4_address }}:2379,http://127.0.0.1:2379"
          ETCD_INITIAL_ADVERTISE_PEER_URLS="http://{{ enp0s3_ipv4_address }}:2380"
          ETCD_INITIAL_CLUSTER="{% for host in groups['dbservers'] %}{{ hostvars[host]['inventory_hostname'] }}=http://{{ hostvars[host]['enp0s3_ipv4_address'] }}:2380{% if not loop.last %},{% endif %}{% endfor %}"
          ETCD_INITIAL_CLUSTER_STATE="new"
          ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
          ETCD_ADVERTISE_CLIENT_URLS="http://{{ enp0s3_ipv4_address }}:2379,http://127.0.0.1:2379"
          ETCD_ENABLE_V2="true"

    - name: Add etcd environment variables
      ansible.builtin.lineinfile:
        path: .profile
        line: "{{ item }}"
        insertafter: EOF
      loop:
        - 'export PGDATA="/var/lib/postgresql/14/main"'
        - 'export ETCDCTL_API="3"'
        - 'export PATRONI_ETCD_URL="http://127.0.0.1:2379"'
        - 'export PATRONI_SCOPE="pg_cluster"'
        - "ENDPOINTS={% for host in groups['dbservers'] %}{{ hostvars[host]['inventory_hostname'] }}:2379{% if not loop.last %},{% endif %}{% endfor %}"

    - name: Start etcd
      ansible.builtin.systemd:
        name: etcd
        state: started

I have two network interfaces configured on each node:

enp0s3 for intranet with a static IP address
enp0s8 for internet with a dynamic default IP address

The Ansible set_fact directive in the script will specifically check enp0s3 for its IP address, and the {{ enp0s3_ipv4_address }} directive will take that IP address and configure it for the etcd cluster automatically.

Please, make sure you replace enp0s3 with the actual interface name configured on each of your nodes.

Save and close the editor when you are finished.

Create Ansible Patroni task

Create a main.yml file, and place it inside postgres_ansible/roles/patroni/tasks directory:

nano roles/patroni/tasks/main.yml

Add following set of instruction:

    - name: Install patroni prerequisites
      ansible.builtin.apt:
        name:
          - python3
          - python3-pip
          - python3-dev
          - libpq-dev
        state: present

    - name: Install python modules
      ansible.builtin.pip:
        name:
          - launchpadlib
          - --upgrade
          - setuptools
          - psycopg2

    - name: Install patroni
      ansible.builtin.apt:
        name:
          - python3-etcd
          - patroni
        state: present

    - name: Stop Patroni service
      ansible.builtin.systemd:
        name: patroni
        state: stopped
        enabled: no

    - name: Create /etc/patroni directory
      ansible.builtin.file:
        path: /etc/patroni
        state: directory

    - name: Check if /etc/patroni/config.yml exists
      ansible.builtin.stat:
        path: /etc/patroni/config.yml
      register: config_file

    - name: Add patroni configuration in config.yml
      ansible.builtin.copy:
        content: |
          scope: pg_cluster
          namespace: /service/
          name: "{{ inventory_hostname }}"

          restapi:
            listen: "{{ inventory_hostname }}:8008"
            connect_address: "{{ inventory_hostname }}:8008"

          etcd:
            hosts: "{% for host in groups['dbservers'] %}{{ hostvars[host]['inventory_hostname'] }}:2379{% if not loop.last %},{% endif %}{% endfor %}"

          bootstrap:
            dcs:
              ttl: 30
              loop_wait: 10
              retry_timeout: 10
              maximum_lag_on_failover: 1048576
              postgresql:
                use_pg_rewind: true
                use_slots: true
                parameters:
                  - encoding: UTF8
                  - data-checksums

            pg_hba:
              - host replication replicator 127.0.0.1/32 md5
              - host replication replicator 0.0.0.0/0 md5
              - host all all 0.0.0.0/0 md5

            users:
              admin:
                password: admin
                options:
                  - createrole
                  - createdb

          postgresql:
            listen: "{{ inventory_hostname }}:5432"
            connect_address: "{{ inventory_hostname }}:5432"
            data_dir: /var/lib/postgresql/14/main
            bin_dir: /usr/lib/postgresql/14/bin
            pgpass: /tmp/pgpass
            authentication:
              replication:
                username: replicator
                password: replicator
              superuser:
                username: postgres
                password: postgres

          watchdog:
            mode: required
            device: /dev/watchdog
            safety_margin: 5

          tags:
            nofailover: false
            noloadbalance: false
            clonefrom: false
            nosync: false

        dest: /etc/patroni/config.yml
      when: config_file.stat.exists == false

    - name: Start Patroni
      systemd:
        name: patroni
        state: started

    - name: Waiting for patroni leader node to take charge
      ansible.builtin.pause:
        seconds: 30

Make sure you replace highlighted text with yours. Save and close the editor when you are finished.

Create Ansible keepalived task

Create a main.yml file, and place it inside postgres_ansible/roles/keepalived/tasks directory:

nano roles/keepalived/tasks/main.yml

Add following set of instruction:

    - name: Install keepalived
      ansible.builtin.apt:
        name: keepalived
        state: present

    - name: Stop keepalived
      ansible.builtin.systemd:
        name: keepalived
        state: stopped
        enabled: no

    - name: Add lines to sysctl.conf
      ansible.builtin.lineinfile:
        path: /etc/sysctl.conf
        line: "{{ item }}"
        insertafter: EOF
      loop:
        - 'net.ipv4.ip_nonlocal_bind = 1'
        - 'net.ipv4.ip_forward = 1'

    - name: Apply sysctl changes
      ansible.builtin.command:
        cmd: 'sysctl --system'

    - name: Reload sysctl configuration
      ansible.builtin.command:
        cmd: 'sysctl -p'

    - name: Create keepalived.conf file if it doesn't exist
      ansible.builtin.file:
        path: /etc/keepalived/keepalived.conf
        state: touch

    - name: Configure keepalived
      ansible.builtin.blockinfile:
        path: /etc/keepalived/keepalived.conf
        block: |
          global_defs {
             router_id LVS_1
             script_user root
             enable_script_security
          }

          vrrp_script chk_haproxy {
              script "/usr/bin/killall -0 haproxy"
              interval 2
              weight 2
              fall 2
              rise 1
          }

          vrrp_script chk_lb {
              script "/usr/bin/killall -0 keepalived"
              interval 1
              weight 6
              fall 2
              rise 1
          }

          vrrp_instance vrrp_1 {
              interface enp0s3
              {% if inventory_hostname == 'pg_node3' %}
              state BACKUP
              priority 99
              {% elif inventory_hostname == 'pg_node2' %}
              state BACKUP
              priority 100
              {% else %}
              state MASTER
              priority 101
              {% endif %}
              virtual_router_id 51
              virtual_ipaddress {
                  {{ shared_ip }}
              }
              track_script {
                  chk_haproxy
                  chk_lb
              }
          }

    - name: Start keepalived
      ansible.builtin.systemd:
        name: keepalived
        state: started

Make sure you replace highlighted text with yours. Save and close the editor when you are finished.

Create Ansible HAProxy task

Create a main.yml file, and place it inside postgres_ansible/roles/haproxy/tasks directory:

nano roles/haproxy/tasks/main.yml

Add following set of instruction:

    - name: Install HAProxy
      ansible.builtin.apt:
        name: haproxy
        state: present

    - name: Stop haproxy service
      ansible.builtin.systemd:
        name: haproxy
        state: stopped
        enabled: no

    - name: Backup haproxy.cfg
      ansible.builtin.command:
        cmd: "mv /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg.orig"
      register: backup_result
      ignore_errors: true

    - name: Get enp0s3 IPv4 address
      set_fact:
        enp0s3_ipv4_address: "{{ ansible_facts['enp0s3']['ipv4']['address'] }}"

    - name: Add configuration in haproxy.cfg file
      ansible.builtin.copy:
        content: |
          global
              log 127.0.0.1   local2
              log /dev/log    local0
              log /dev/log    local1 notice
              chroot /var/lib/haproxy
              stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
              stats timeout 30s
              user haproxy
              group haproxy
              maxconn 4000
              daemon

          defaults
              mode                    tcp
              log                     global
              option                  tcplog
              retries                 3
              timeout queue           1m
              timeout connect         10s
              timeout client          1m
              timeout server          1m
              timeout check           10s
              maxconn                 3000

          listen stats
              mode http
              bind *:7000
              stats enable
              stats uri /

          listen primary
              bind {{ shared_ip }}:5000
              option httpchk OPTIONS /master
              http-check expect status 200
              default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
              {% for host in groups['dbservers'] %}
          server {{ host }} {{ hostvars[host]['enp0s3_ipv4_address'] }}:6432 maxconn 100 check port 8008
              {% endfor %}

          listen standby
              bind {{ shared_ip }}:5001
              balance roundrobin
              option httpchk OPTIONS /replica
              http-check expect status 200
              default-server inter 3s fall 3 rise 2 on-marked-down shutdown-sessions
              {% for host in groups['dbservers'] %}
          server {{ host }} {{ hostvars[host]['enp0s3_ipv4_address'] }}:6432 maxconn 100 check port 8008
              {% endfor %}

        dest: /etc/haproxy/haproxy.cfg

    - name: Start haproxy
      ansible.builtin.systemd:
        name: haproxy
        state: started

Make sure you replace the highlighted text with yours. Save and close the editor when you are finished.

Create Ansible PgBouncer task

Create a main.yml file, and place it inside postgres_ansible/roles/pgbouncer/tasks directory:

nano roles/pgbouncer/tasks/main.yml

Add following set of instruction:

    - name: Install pgbouncer
      ansible.builtin.apt:
        name: pgbouncer
        state: present

    - name: Stop pgbouncer service
      ansible.builtin.systemd:
        name: pgbouncer
        state: stopped
        enabled: no

    - name: Change permissions
      ansible.builtin.file:
        path: /etc/pgbouncer/
        mode: '0777'
        recurse: yes

    - name: Check patroni master node
      ansible.builtin.shell: patronictl -d etcd://{{ inventory_hostname }}:2379 list pg_cluster --format json | jq -r '.[] | select(.State == "running") | select(.Role == "Leader") | .Host'
      register: master_node
      changed_when: false

    - name: Execute SQL command
      ansible.builtin.shell: |
        echo "ALTER SYSTEM SET password_encryption = 'md5'; SELECT pg_reload_conf();" | psql -U postgres
      environment:
        PGPASSWORD: "{{ postgres_pass }}"
      when: inventory_hostname in master_node.stdout

    - name: Copy pgbouncer.sql to master node
      ansible.builtin.copy:
        src: files/pgbouncer.sql
        dest: /tmp/pgbouncer.sql
      when: inventory_hostname in master_node.stdout

    - name: Execute pgbouncer.sql on master node
      ansible.builtin.shell: psql -Atq -U postgres -f /tmp/pgbouncer.sql
      environment:
        PGPASSWORD: "{{ postgres_pass }}"
      when: inventory_hostname in master_node.stdout

    - name: Copy userlist.sql to each node
      ansible.builtin.copy:
        src: files/userlist.sql
        dest: /tmp/userlist.sql

    - name: Execute userlist.sql
      ansible.builtin.shell: psql -Atq -U postgres -f /tmp/userlist.sql
      environment:
        PGPASSWORD: "{{ postgres_pass }}"

    - name: Add connection settings in pgbouncer.ini
      ansible.builtin.replace:
        path: /etc/pgbouncer/pgbouncer.ini
        regexp: '^;foodb =.*$'
        replace: '* = host={{ inventory_hostname }} port=5432 dbname=postgres'

    - name: Update pgbouncer.ini
      ansible.builtin.replace:
        path: /etc/pgbouncer/pgbouncer.ini
        regexp: 'listen_addr = localhost'
        replace: 'listen_addr = *'

    - name: Add auth_user and auth_query to pgbouncer.ini
      ansible.builtin.blockinfile:
        path: /etc/pgbouncer/pgbouncer.ini
        block: |
          auth_user = pgbouncer
          auth_query = SELECT p_user, p_password FROM public.lookup($1)
        insertafter: "auth_file = /etc/pgbouncer/userlist.txt"

    - name: Change owner and group to postgres
      ansible.builtin.file:
        path: /etc/pgbouncer/
        owner: postgres
        group: postgres
        recurse: yes

    - name: Change permissions to 0644
      ansible.builtin.file:
        path: /etc/pgbouncer/
        mode: '0644'
        recurse: yes

    - name: Start pgbouncer
      ansible.builtin.shell: pgbouncer -d -u postgres /etc/pgbouncer/pgbouncer.ini

Save and close the editor when you are finished.

When clients establish a connection with PgBouncer, it’s crucial for PgBouncer to authenticate them to ensure secure access. PgBouncer needs to verify the identity of each client that attempts to connect. This is done by matching the client’s provided credentials (username and password) against a set of valid credentials. If a match is found, the client is authenticated and allowed to connect; if not, the connection is denied.

This process requires us to configure PgBouncer with a list of valid credentials. This can be done in a number of ways, such as providing a static list of usernames and passwords, or configuring PgBouncer to query a database for the current valid credentials.

If user credentials change frequently, constantly updating the user list can become tedious. In such cases, it’s more efficient to use an “authentication user” who can connect to the database and retrieve the password.

To ensure the database password isn’t visible to everyone, we restrict password access to this authentication user only.

The pg_shadow is accessible solely to superusers. Therefore, we create a SECURITY DEFINER function to grant pgbouncer access to the passwords.

Create PgBouncer SECURITY DEFINER Function

Create a pgbouncer.sql file, and place it in postgres_ansible/roles/pgbouncer/files directory:

nano roles/pgbouncer/files/pgbouncer.sql

Add following content:

CREATE ROLE pgbouncer LOGIN ENCRYPTED PASSWORD 'pgbouncer' VALID UNTIL 'infinity';

CREATE FUNCTION public.lookup (
    INOUT p_user name,
    OUT p_password text
) RETURNS record
LANGUAGE sql SECURITY DEFINER SET search_path = pg_catalog AS
$$SELECT usename, passwd FROM pg_shadow WHERE usename = p_user$$;

Save and close the editor when you are finished.

With this SECURITY DEFINER function, whenever a user other than pgbouncer authenticates, the database will be queried for that user’s current password.

Since the auth_query connection will be established with the target database, you need to add the function to each database you wish to access through pgbouncer.

Create a userlist.sql file, and place it in postgres_ansible/roles/pgbouncer/files directory:

nano roles/pgbouncer/files/userlist.sql

Add following content:

\o /etc/pgbouncer/userlist.txt
SELECT concat('"', usename, '" "', passwd, '"') FROM pg_shadow WHERE usename = 'pgbouncer';

Save and close the editor when you are finished.

Encrypt Ansible Vault

You need to define a variable for your postgres password to be used with Ansible playbook in order to configure pgbouncer. For that you can use an Ansible Vault to encrypt passwords, and then reference the variable in your playbook.

To encrypt a variable with Ansible Vault, use the following command:

ansible-vault encrypt_string 'your_postgress_password' --name 'postgres_pass'

This will prompt you to enter ansible New Vault password and Confirm New Vault password as shown in image below:

Make sure you enter ansible Vault password different than the postgres password. You will see an encrypted string on your terminal as shown in image below:

Save this entire encrypted string on a notepad, adjust numbers in one line by removing line-breaks and make sure you don't mess with it otherwise it will not work in ansible playbook.

When you are finished with the above, proceed with the following step.

Create an Ansible Playbook

You need to create an ansible playbook to execute all the tasks you have created for your PostgreSQL HA cluster deployment.

Create a setup.yml file in postgres_ansible directory:

nano setup.yml

Add following set of instruction:

---
- hosts: dbservers
  become: true
  vars:
    shared_ip: 192.168.0.200

    postgres_pass: !vault |
              $ANSIBLE_VAULT;1.1;AES256
              363837386436306430393639656130616635313765393265643665346139326234323763396132363933353938626663326137663430626566333964633331360a336338623538613133343836323636356238643961656363646266313234373239316363333332643165633137663135383161323363333266623962316634370a3838636536383230633332323866643765343738666362623130343962306136

  roles:
    - common
    - postgres
    - etcd
    - patroni
    - keepalived
    - haproxy
    - pgbouncer

It is very important to carefully set the sequence of each role execution, otherwise, the deployment will not be successful. I have defined each role execution sequentially in the setup.yml file the way it is required to be executed.

Please make sure you replace the highlighted text with yours. Save and close the editor when you are finished.

Check Ansible Playbooks Syntax

It is a recommended to check syntax errors in ansible playbooks so that you can fix the issue beforehand.

Check syntax errors with the following command:

ansible-playbook --syntax setup.yml

This will check syntax errors in setup.yml, as well as in each YAML file that exists in roles directory.

If there is any syntax error found, you will see it on you screen similar to as shown in image below:

The above sytax error was due to linebreak which I have fixed.

You should rerun the syntax check command after each fix until all the syntax issues are fixed:

There are no more syntax errors, so we can proceed with the next step.

Run Ansible Playbook

You can execute your ansible playbook using the following command:

ansible-playbook -K setup.yml --ask-vault-pass -v

You will see the following prompt on your screen:

Explanation:

Become password is your ssh user password of your dbservers needed to execute tasks with sudo privileges.

The next prompt will ask you to enter ansible Vault password so that setup.yml can use postgres_pass variable.

The setup.yml will start executing each task, while you can see progress on your screen.

Rerun Ansible Playbook

If Ansible playbook failed at any task for any reason, stop there, fix the issue, and rerun the playbook starting from the failed task.

ansible-playbook --start-at-task "Your failed task name" setup.yml --ask-vault-pass -v

You do not need to use the -K switch while rerunning the playbook once your common tasks executed successfully. However, you have to use --ask-vault-pass each time you rerun the playbook until your pgbouncer tasks finished successfully.

You will see the output similar to the following on your screen upon successful deployment of each tasks from your ansible playbook.

At this stage, your PostgreSQL HA cluster deployment was successful with ansible automation. The next step is to test, and verify your PostgreSQL HA cluster each component's functionality.

Test PostgreSQL HA Cluster

You can begin testing, and verifying PostgreSQL HA cluster functionality by initiating a database connection request from any of your application (psql) for example.

First, create .pgpass file with database credentials:

nano ~/.pgpass

Add database credentials:

shared_ip:5001:postgres:TypePostgresPassHere:postgres
shared_ip:5000:postgres:TypePostgresPassHere:postgres

Save and close the editor when you are finished.

Connect to your database using psql command:

psql -h shared_ip -p 5000 -U postgres -t

This will connect you to your master database node via shared_ip:5000 as you can see in the screenshot below.

Verify PostgreSQL HA Cluster Load Balancing

Execute two reads-requests to verify that HAProxy round-robin load balancing mechanism is working as expected:

psql -h shared_ip -p 5001 -U postgres -t -c "select inet_server_addr()"

This should return your replica node IP address as shown in screenshot below:

Execute the same command again:

psql -h shared_ip -p 5001 -U postgres -t -c "select inet_server_addr()"

This should return your other replica node IP address as shown in screenshot below:

Execute the same command again but this time with port 5000:

psql -h shared_ip -p 5000 -U postgres -t -c "select inet_server_addr()"

This should return your master database node IP address as shown in screenshot below:

At this stage, your HAProxy round-robin load balancing mechanism is working as expected.

HAProxy Stats Dashboard

You can access HAProxy stats dashboard by typing http://shared_ip:7000/ in your browser address bar.

As you can see, in the primary section (pg_node1) row is highlighted in green. This indicates that 192.168.0.10 is currently a master database node in the cluster.

In the standby section, the (pg_node2, pg_node3) row is highlighted as green. This indicates that both nodes are replica nodes in the cluster.

If you kill the leader node using (sudo systemctl stop patroni) or by completely shutting down the server, the haproxy dashboard will look similar to like below:

As you can see, in the primary section (pg_node2) row is now highlighted in green. This indicates that 192.168.0.11 has taken over master role in the cluster. In the standby section, the (pg_node1, pg_node3) row is highlighted as green. This indicates that both nodes are replica nodes in the cluster.

Please note that, in this particular scenario, it just so happens that the second node in the cluster is promoted to leader. This might not always be the case and it is equally likely that the 3rd node may be promoted to leader.

Verify PostgreSQL Database Replication

Create a test database to see if it is replicated to other nodes in the cluster. For this guide, we will use (psql) to connect to database via haproxy like below:

psql -h shared_ip -p 5000 -U postgres -t

Create a test database like below:

create database testDB;
create user testuser with encrypted password 'mystrongpass';
grant all privileges on database testDB to testuser;

\q

Connect to testDB via shared_ip:5001 to see if it is replicated to your replica nodes:

psql -h shared_ip -p 5001 testuser -d testDB

This will connect you to your testDB on a replica node which means database replication is working as expected.

Verify Patroni Failover

With patronictl command, you can administer, manage and troubleshoot your PostgreSQL HA cluster.

Check patroni cluster member nodes:

patronictl -d etcd://127.0.0.1:2379 list pg_cluster

Check patroni failover, and switchover history:

patronictl -d etcd://127.0.0.1:2379 history pg_cluster

For the sake of testing you can manually initiate failover to a replica node using the following command:

patronictl -d etcd://127.0.0.1:2379 failover pg_cluster

This will prompt you to select a node you wish to be promoted as leader.

I have promoted pg_node1 as the leader node in the cluster.

Disable Patroni Auto Failover

In some cases it is necessary to perform maintenance task on a single node such as applying patches or release updates. When you manually disable auto failover, patroni won’t change the state of the PostgreSQL HA cluster.

You can disable auto failover with below command:

patronictl -d etcd://127.0.0.1:2379 pause pg_cluster

Patroni Switchover

There are two possibilities to run a switchover, either in scheduled mode or immediately. At the given time, the switchover will take place, and you will see an entry of switchover activity in the log file.

patronictl -d etcd://127.0.0.1:2379 switchover pg_cluster --master your_leader_node --candidate your_replica_node

If you go with [now] option, switchover will take place immediately.

Simulate Patroni Cluster Failure Scenario

To simulate failure scenarios in production environment, we will execute continuous reads and writes to the database using a simple Python script as we are interested in observing the state of the cluster upon a server failure.

You should have a workstation (physical or virtual) with any of your favorite Linux distribution (Ubuntu or Debian) installed.

Type following command to install PostgreSQL client, and driver for python on your Linux workstation:

sudo apt -y install postgresql-client python3-psycopg2

Download python script on your Linux workstation:

cd ~

git clone https://github.com/manwerjalil/pgscripts.git

chmod +x ~/pgscripts/pgsqlhatest.py

Edit pgsqlhatest.py and replace database credentials with yours:

nano ~/pgscripts/pgsqlhatest.py

Replace following highlighted text with yours:

# CONNECTION DETAILS
host = "shared_ip"
dbname = "postgres"
user = "postgres"
password = "Type_postgres_password_Here"

Save and close the editor when you are finished.

You need to create a target table "PGSQLHATEST" in your database:

psql -h shared_ip -p 5000 -U postgres -t -c "CREATE TABLE PGSQLHATEST (TM TIMESTAMP);"

psql -h shared_ip -p 5000 -U postgres -t -c "CREATE UNIQUE INDEX idx_pgsqlhatext ON pgsqlhatest (tm desc);"

We will use tmux terminal for multiple tabs on a single screen to monitor state of the patroni cluster in real time:

On the left side of the screen, we have one ssh session open for each of the 3 nodes, continuously monitoring patroni cluster state:

On the right side of the screen, we are running python script sending writes through port 5000, and reads through port 5001 from our workstation:

To observe what happens with database traffic when the environment experiences a failure, we will manually stop patroni on master node using the following command:

sudo systemctl stop pg_node1

When we stopped patroni on leader node, within few seconds a replica node become leader, and continues writing to the database stopped for few seconds, then reconnected automatically. There is no disconnection happens to continues reading to the database because replica node was available throughout.

Let's bring back our patroni node using following command:

sudo systemctl start pg_node1

The node has automatically rejoined the cluster as replica:

PostgreSQL HA Cluster Environment Failure Scenarios

We leave it up to you to test and experiment with your patroni cluster to see what happens when environment experiences a failure such as:

Loss of network connectivity
Power breakdown

When simulating these tests, you should continuously monitor how the patroni cluster re-adjusts itself and how it affects read and write traffic for each failure scenario.

Conclusion

I hope this guide was helpful to automate setup process for deployment of your PostgreSQL high availability cluster with Ansible on Ubuntu 22.04.

We highly appreciate if you help us to improve this tutorial by leaving your suggestion in the comment section below.

Set Up a PostgreSQL High Availability Cluster using Ansible on Ubuntu 22.04

PostgreSQL Cluster Hardware Requirements

Number of nodes requirements

Production environment

Basic environment for PostgreSQL HA cluster with Patroni

Prepare Your Ansible Control Node

Install Ansible

Set Up SSH Keys

Create Ansible Directory Structure

Create Ansible Common task

Create Ansible PostgreSQL task

Create Ansible etcd task

Create Ansible Patroni task

Create Ansible keepalived task

Create Ansible HAProxy task

Create Ansible PgBouncer task

Create PgBouncer SECURITY DEFINER Function

Encrypt Ansible Vault

Create an Ansible Playbook

Check Ansible Playbooks Syntax

Run Ansible Playbook

Rerun Ansible Playbook

Test PostgreSQL HA Cluster

Verify PostgreSQL HA Cluster Load Balancing

HAProxy Stats Dashboard

Verify PostgreSQL Database Replication

Verify Patroni Failover

Disable Patroni Auto Failover

Patroni Switchover

Simulate Patroni Cluster Failure Scenario

PostgreSQL HA Cluster Environment Failure Scenarios

Conclusion

No comments:

Popular Posts

Video Tutorials

Categories

Random Posts

Tags

Recent Posts