Comparison Azure PostgreSQL (Single Server, Flexible Server, Hyperscale(Citus)

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Pre-requisites

This article expects that the viewer is aware of various Azure Data Store options and is considering to leverage the Azure PostgreSQL as a PAAS service. If you are still in the deciding phase whether to deploy PostgreSQL in a VM or as a service, I'd recommend you read this article.

 

Overview

Azure PostgreSQL(PAAS) as a service provides various options for deploying/migrating your database. However the choice between Single/Flexible/Hyperscale is critical prior to execution and should be made wisely. It determines the compatibility of your application after deployment/migration and crucial to meet your condition of success. At a broad level it is governed by your use case and feature each of the options in PAAS mode that are available. Azure PostgreSQL PAAS currently offers the following modes of spinning PostgreSQL databases as a service.

This article is written with the intent to help you with this choice. However please note Azure is an ever evolving world and we highly recommend to reference any hyperlinks provided (especially the Important Check) to be updated with the latest additions or features

 

Single Server Flexible Server Hyperscale (Citus)
Implementation    
Supported versions: 10, and 11 Supported versions: 11, 12, 13 Supported versions: 11, preview: 12, 13 

The architecture separates compute and storage.

 

The architecture separates compute and storage.

 

Citus is an open source extension to Postgres that distributes data and queries across multiple nodes in a cluster. The architecture separates data and its underlying compute in a cluster-oriented approach. The receiving of queries from application is handled by one special node called the coordinator.

 

Compute resources are provided as vCores, which represent the logical CPU of the underlying hardware.

The storage size for provisioning refers to the capacity available to the coordinator and worker nodes in your Hyperscale (Citus) server group.

The database engine runs on a proprietary compute container(windows), while data files reside on Azure storage.

 

The storage maintains three locally redundant synchronous copies of the database files ensuring data durability.

The database engine runs on a container inside a Linux virtual machine, while data files reside on Azure storage.

 

The storage maintains three locally redundant synchronous copies of the database files ensuring data durability.

 

If zone redundant high availability is configured, the service provisions and maintains a warm standby server across availability zone within the same Azure region. The data changes on the source server is synchronously replicated to the standby server to ensure zero data loss. (Standby replica cannot be used for read queries)

Citus clusters(Single Node) are spun up on Linux machines running RedHat OS internally. This is transparent to the user as they only work with server groups

 

Commodity database servers (called nodes/workers) coordinate with one another in a “shared nothing” architecture.  The nodes form a cluster that allows PostgreSQL to hold more data and use more CPU cores than would be possible on a single computer. 

 

Each Hyperscale (Citus) node has its own locally redundant storage (LRS) with three synchronous replicas maintained by Azure Storage service.

Underlying storage on General SSD

Underlying storage on General SSD

Underlying storage on General SSD

The single server service uses the FIPS 140-2 validated cryptographic module for storage encryption of data at-rest.

Data is encrypted on disk, including backups and the temporary files created while queries are running.

For storage encryption, Azure Database for PostgreSQL uses the FIPS 140-2 validated cryptographic module for storage encryption of data at-rest.

Data is encrypted on disk, including backups and the temporary files created while queries are running.

The Hyperscale (Citus) service uses the FIPS 140-2 validated cryptographic module for storage encryption of data at-rest.

Data, including backups, are encrypted on disk, including the temporary files created while running queries. 

Migration from OS : Both Windows and Linux Migration from OS : Both Windows and Linux Migration from OS : Both Windows and Linux

Migration Methodologies

Migration Methodologies

Migrating an existing application to Citus sometimes requires adjusting the schema and queries for optimal performance. Citus cluster involves thinking about the data model, tooling, and choice of SQL features used.

Migration Methodologies

At least 99.99% of the time customers will have connectivity between their Microsoft Azure Database for PostgreSQL Server- Single Server and our Internet gateway. Subject to SLA

At least 99.9% of the time customers will have connectivity to their Microsoft Azure Database for PostgreSQL – Flexible server configured without Zone Redundant High Availability.

 

At least 99.99% of the time customers will have connectivity to their Microsoft Azure Database for PostgreSQL – Flexible server configured with Zone Redundant High Availability. Subject to SLA

At least 99.95% of the time customers will have connectivity to their Microsoft Azure Database for PostgreSQL – Hyperscale (Citus) High Availability Nodes

Subject to SLA

Connectivity    

The connection pooling can be configured on the application side if the app framework or database driver supports it.

 

Recommended option is to leverage a proxy connection pooler service like PgBouncer or Pgpool running outside the application and connecting to the database server.

Connection Port : 5432

The flexible server comes with a built-in PgBouncer, a connection pooler. You can optionally enable it and connect your applications to your database server via PgBouncer using the same host name and port 6432

Connection Port : 5432, 6432

Hyperscale (Citus) is now offering a managed instance of PgBouncer for server groups. Coordinator node's port is 5432 (and 6432 for connection pooling

Max user connections : 1982

Max user connections : 5000

Hyperscale (Citus) offers a managed pgBouncer connection pooler configured for up to 2,000 simultaneous client connections

Supports AAD authentication

Does not support AAD authentication

Does not support AAD authentication

The service encrypts data in-motion with transport layer security (SSL/TLS) enforced by default. The service supports TLS versions 1.2, 1.1 and 1.0 with an ability to enforce minimum TLS version.

 

You can choose to disable requiring TLS if your client application


The service encrypts data in-motion with transport layer security (SSL/TLS) enforced by default. The service enforces and supports TLS versions 1.2 only.

 

You have an option to disable TLS\SSL for connections to Azure Database for PostgreSQL - Flexible Server


The service encrypts data in-motion with transport layer security (SSL/TLS) enforced by default and can’t be disabled. The service supports TLS versions 1.3, 1.2, 1.1 and 1.0.The default is TLSv1.2


A unique name that identifies your Azure Database for PostgreSQL server.

The domain name postgres.database.azure.com is appended to the server name that you provide.

A unique name that identifies your Azure Database for PostgreSQL server.

The domain name postgres.database.azure.com is appended to the server name you provide

The server group name will determine the DNS name your applications use to connect, in the form server-group-name.postgres.database.azure.com

Server admin username cant 

start with pg_ 

 

The admin username not allowed : 

azure_superuser,  azure_pg_admin,  

admin,  administrator, 

root,  guest, or public

Server admin username cant 

start with pg_ 

 

The admin username not allowed : 

azure_superuser,  azure_pg_admin,  

admin,  administrator, 

root,  guest, or public

Server admin username has to be citus

 

User name in connection string must be <user_name>@server_name.

For example, pgadmusr@mypgServer

Just username is required in the connection string.

For example, pgadmusr

User has to be mandatorily citus in the connection string

Private DNS Zone can be configured

If you  create flexible servers with a virtual network, a new private DNS zone is automatically provisioned for each server in your subscription by using the server name that you provided.

 

Flexible service creates DNS records such as serername.postgres.database.azure.com in the selected private DNS zone for the server with a private endpoint

Private DNS Zone can be configured

A new private DNS zone is automatically provisioned for each private endpoint, unless you select one of the private DNS zones previously created by Hyperscale (Citus)

 

Hyperscale (Citus) service creates DNS records such as c.privatelink.mygroup01.postgres.database.azure.com in the selected private DNS zone for each node with a private endpoint. Private DNS Zone can be configured

Public access through Firewall can be configured

Public access through Firewall can be configured

Public access through Firewall can be configured

Private Link allows you to create private endpoints for Azure Database for PostgreSQL - Single server to bring it inside your Virtual Network (VNet).

 

The private endpoint exposes a private IP within a subnet that you can use to connect to your database server just like any other resource in the VNet.

With Private access (VNet Integration), you can deploy your flexible server into your own Virtual Network (VNet)

When you create an Azure Database for PostgreSQL - Flexible Server instance (a flexible server), you must choose one of the following networking options: Private access (VNet integration) or Public access (allowed IP addresses). You can't change your networking option after the server is created.

Private access allows resources in an Azure virtual network to connect securely and privately to nodes in a Hyperscale (Citus) server group. 

Server/Database Operations

 

 

During server creation An empty database, postgreSQL, is created. You can optionally also create more databases. You will also find the below database created by default

  • azure_maintenance - This database is used to separate the processes that provide the managed service from user actions. You do not have access to this database.
  • azure_sys - A database for the Query Store. This database does not accumulate data when Query Store is off; this is the default setting.

 

During server creation An empty database, postgreSQL, is created. You will also find the below database created by default

  • azure_maintenance - This database is used to separate the processes that provide the managed service from user actions. You do not have access to this database.

 

The Azure portal provides credentials to connect to exactly one database per Hyperscale (Citus) server group, the citus database.

Cross region replicas can be configured which are updated asynchronously to provide a read-only server

You can replicate from the primary server to up to five replicas

Flexible server does not support cross-region read replicas. Depending on the type of workload, you may choose to use logical replication feature for cross-region disaster recovery (DR) purpose.

 

Cross region replicas can be configured which are updated

updated asynchronously to provide a read-only server group.

You can replicate from the primary server to an unlimited number of replicas.

Cannot start and stop the server

 

You can manually stop the server. Stopped servers will automatically start again after seven days

 

Cannot start and stop the server groups

Server restarts can be performed from portal.

The time required to complete a restart depends on the PostgreSQL recovery process.

Server restarts can be performed from portal.When configured with high availability, both the primary and the standby servers are restarted at the same time.

 

The time required to complete a restart depends on the PostgreSQL recovery process. 

Hyperscale(Citus) server group can be restarted from the portal. Restarting the server group applies to all nodes; you can't selectively restart individual nodes. The restart applies to the PostgreSQL server processes in the nodes.

 

The time required to complete a restart depends on the PostgreSQL recovery process. 

Decreasing server storage size is currently not supported.

Once configured, storage size cannot be reduced.

Storage on coordinator and worker nodes can be scaled up (increased) but can't be scaled down (decreased).

Automated migration between major database engine versions is currently not supported.

 

If you would like to upgrade to the next major version, take a dump and restore it to a server that was created with the new engine version

Automated migration between major database engine versions is currently not supported.

 

If you would like to upgrade to the next major version, take a dump and restore it to a server that was created with the new engine version

Upgrading PostgreSQL causes more changes than you might imagine, because Hyperscale (Citus) will also upgrade the database extensions, including the Citus extension. You can upgrade servers groups from the portal

You cannot scale from Basic to GP(General Purpose) or MO(Memory Optimized). You can scale in GP and MO tiers

You can scale across all tiers

You can scale from Basic to Standard but not the other way around.

Max. Storage size : 1 TB (Basic), 4 TB or 16 TB (GP,MO). Note: Not all regions support 16 TB.

 

Max storage size : 16 TB

Max storage size could go upto 2TB per coordinator and worker node. You can go up to 20 worker nodes (20*2TB = 40TB's)

Backup/Restore

 

 

Azure Database for PostgreSQL takes backups of the data files and the transaction log. Azure Database for PostgreSQL automatically creates server backups and stores them in user configured locally redundant or geo-redundant storage. 

Depending on the supported maximum storage size, we either take full and differential backups (4-TB max storage servers) or snapshot backups (up to 16-TB max storage servers). 

 

Flexible Server takes snapshot backups of data files and and stores them securely in zone-redundant storage or locally redundant storage, depending on the region. (General Purpose and Memory Optimized tiers)

 

The server also backs up transaction logs when the write-ahead log (WAL) file is ready to be archived. 

Hyperscale takes snapshot backups of data files and the database transaction log. It automatically creates backups of each node and stores them in locally redundant storage. 

 

In Azure regions that support availability zones, backup snapshots are stored in three availability zones. As long as at least one availability zone is online, the Hyperscale (Citus) server group is restorable.

 

Servers with up to 4-TB storage

Full backups occur once every week. Differential backups occur twice a day. Transaction log backups occur every five minutes.

Servers with up to 16-TB storage

The first full snapshot backup is scheduled immediately after a server is created. Subsequent snapshot backups are differential backups only. Differential snapshot backups do not occur on a fixed schedule. In a day, three differential snapshot backups are performed. Transaction log backups occur every five minutes.

The first snapshot backup is scheduled immediately after a server is created. Snapshot backups are currently taken once daily.

 

Transaction log backups happen at varied frequencies, depending on the workload and when the WAL file is filled and ready to be archived. In general, the delay (recovery point objective, or RPO) can be up to 15 minutes.

The first snapshot backup is scheduled immediately after a server is created. Snapshot backups are currently taken once daily.

These backup files cannot be exported. The backups can only be used for restore operations in Azure Database for PostgreSQL. You can use pg_dump to copy a database.

These backup files cannot be exported. The backups can only be used for restore operations in Azure Database for PostgreSQL. You can use pg_dump to copy a database.

Backup files can't be exported. They may only be used for restore operations in Azure Database for PostgreSQL.

All backups are encrypted using AES 256-bit encryption.

All backups are encrypted using AES 256-bit encryption.

All backups are encrypted using AES 256-bit encryption.

You can select a retention period of 7 to 35 days. The default retention period is 7 days

You can select a retention period of 7 to 35 days. The default retention period is 7 days

The retention period is currently 35 days for all server groups.

Azure Database for PostgreSQL allows you to restore the server back to a point-in-time and into to a new copy of the server in the same region as your original server.

 

The new server created during a restore does not have the firewall rules or VNet service endpoints that existed on the original server.

The subscription, location and resource group cannot be changed. These fields are non-editable

 

Azure Database for PostgreSQL allows you to restore the server back to a point-in-time and into to a new copy of the server. It gives you the flexibility to choose the availability zone and region

 

The new server created during a restore does not have the firewall rules or VNet service endpoints that existed on the original server.The subscription, location and resource group cannot be changed. These fields are non-editable

 

Hyperscale (Citus) server groups can be restored to either the earliest backup or to a custom restore point within your retention period in the same region as your original server.

 

The new server group does not have the firewall rules or VNet service endpoints that existed on the original server group.The subscription, location and resource group cannot be changed. These fields are non-editable. The server group has the original's configuration: the same number of nodes, number of vCores, storage size, user roles, PostgreSQL version, and version of the Citus extension.

 

Geo restore are supported.

The new server created by geo restore has the same server admin login name and password that was valid for the existing server at the time the restore was initiated. The new server created during a restore does not have the firewall rules or VNet service endpoints that existed on the original server.

Geo-restores are currently supported under a preview offering. The new server created by geo restore has the same server admin login name and password that was valid for the existing server at the time the restore was initiated. The new server created during a restore does not have the firewall rules or VNet service endpoints that existed on the original server

The model currently does not support GRS backups. Hence the Geo-restores are currently not supported

Use Cases

 

 

Basic - 

Workloads that require light compute and I/O performance. Examples include servers used for development or testing or small-scale infrequently used applications.

 

Burstable -

Best for workloads that don’t need the full CPU continuously.

Citus is inappropriate for workloads that don’t need a powerful distributed database and can't incorporate features like sharding, a distributed SQL engine, reference tables, and distributed tables

Basic Tier -

It allows you to run Hyperscale (Citus) on a single node. With Basic tier, you are scale-out ready. Basic tier also gives you an easy way to try Hyperscale (Citus). Basic (2 to 64 vCores, up to 256 GiB memory), coordinator and worker node unified - Best for starting out, or dev/test

General Purpose-

Most business workloads that require balanced compute and memory with scalable I/O throughput. Examples include servers for hosting web and mobile apps and other enterprise applications.

General Purpose-

Most business workloads that require balanced compute and memory with scalable I/O throughput. Examples include servers for hosting web and mobile apps and other enterprise applications.

Standard Tier -

Standard tier for Hyperscale (Citus) gives you a distributed Postgres cluster (called a server group) with 1 coordinator and 2 or more worker nodes. You can easily add more worker nodes with zero downtime. Best suited for Multi-Tenant Database B2B applications that already have the notion of a tenant, customer, or account built into their data model. Good fit for Real-Time Analytics which heavily rely on large datasets

Memory Optimized -

High-performance database workloads that require in-memory performance for faster transaction processing and higher concurrency. Examples include servers for processing real-time data and high-performance transactional or analytical apps.

Memory Optimized -

High-performance database workloads that require in-memory performance for faster transaction processing and higher concurrency. Examples include servers for processing real-time data and high-performance transactional or analytical apps.

Citus currently supports all SQL clauses except:

  • Correlated subqueries
  • Recursive CTEs
  • Table sample
  • SELECT … FOR UPDATE
  • Grouping sets

From <https://docs.citusdata.com/en/v11.0-beta/faq/faq.html>

Important Checks

 

 

Limitations

Limitations

Limitations

Update Feed

Update Feed

Update Feed

Extensions

Extensions

Extensions

 

 Dont forget to share a like.jfifif this helps

 

Credit:

Thanks Nathan Widdup, Samrendra Panda, Miho Yamamoto for reviews and guidance

 

FastTrack for Azure:  Move to Azure efficiently with customized guidance from Azure engineering. FastTrack for Azure – Benefits and FAQ | Microsoft Azure 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.