Best practices for utilizing Azure SQL to host Mainframe Data Tier & Workloads

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

Introduction

Mainframes are purpose-built systems with specialized hardware and software designed for high-throughput, reliability, and scalability. They excel at handling large, transaction-heavy workloads. On the other hand, Azure Cloud offers flexibility, scalability, resource sharing and operate on commodity hardware and virtualized infrastructure. Azure provides best of Compute, Storage, Networking, Security services at reasonable price so it becomes one of the primary options to rehost Mainframe applications. This migration can lead to performance challenges in scenarios where the workload was initially designed or optimized for a mainframe environment. Mainframe batch applications are usually chatty with millions of calls from Application to Database (e.g., Db2, IMS) usually through Cursor processing. On Mainframe since both database and application are present on the same box; latency is usually not observed. When mainframe application is migrated to Azure, as a standard practice database server is kept on different virtual machine than the application server for High Availability. So, every call from Application to database becomes a distributed call and may result in increase in latency.

 

In this technical blog, we have explained the best practices that should be followed to get the best performance on Azure while migrating Mainframe data tier to Azure SQL.

Note: This blog does not include common practices which should be followed by default in Db2 / Azure SQL like designing a well-structured schema with appropriate normalization, establishing proper indexing, enforcing data integrity, regularly maintaining, and optimizing the database, routine backup and disaster recovery tests, monitoring, logging, and auditing etc.

 

Mainframe Data Tier Modernization to Azure Best Practices

 

1.    Keep Application VM and Database VM (for IaaS database) co-located

A proximity placement group is a logical grouping used to make sure that Azure compute resources are physically located close to each other. Proximity placement groups are useful for workloads where low latency is a requirement. Database server and Application server can be kept in proximity group for low latency. Primary group further information.

 

ppg.png

 

2.    Use Accelerated Networking for Virtual Machines

Accelerated Networking in Azure significantly improves virtual machine (VM) network performance by offloading the network processing from the VM's virtualized environment to the underlying hardware. This approach leverages dedicated hardware-based network acceleration to reduce latency, increase throughput, and lower CPU utilization, ultimately resulting in enhanced network performance for your applications. Accelerated Networking offloads all network policies that the virtual switch applied, and it applies them in hardware. Because hardware applies policies, the NIC can forward network traffic directly to the VM. The NIC bypasses the host and the virtual switch, while it maintains all the policies that it applied in the host. Detailed information about accelerated networking is mentioned at location : link.

 

accelerated-networking.png

 

3.   Caching frequently accessed static data on Application Server

In batch or online program there might be scenarios where a given database table is being referred frequently e.g., Country code / Currency code / Zip Codes tables or any other relatively static table (table for which changes are not happening during current program execution). For rehosted COBOL Application, this table can be copied from database to COBOL internal array at the start of the execution of program. All references made to this table later during execution of the program can refer to this internal COBOL array using binary search. This will reduce network calls to database server and thereby will improve performance of the program.

 

4.    Use of SQL Server Stored Procedures 

Wherever possible group of SQL statements should be put in to Stored procedure rather than executing them from Application program. Stored procedures are compiled once and stored in executable form, so procedure calls are quick and efficient. Executable code is automatically cached and shared among users. This lowers memory requirements and invocation overhead.

By grouping SQL statements, a stored procedure allows them to be executed with a single call. This minimizes the use of slow networks, reduces network traffic, and improves round-trip response time. OLTP applications, in particular, benefit because result set processing eliminates network bottlenecks. Additionally, stored procedures enable you to take advantage of the computing resources of the server. For example, you can move computation-bound procedures from client to server, where they will execute faster. Likewise, stored functions called from SQL statements enhance performance by executing application logic within the server.

 

5.    Effective usage of TempDB 

Many Azure VMs provide temporary disk which is local SSD storage (e.g., E*bds VMs) option. Using local SSD storage by database / application leads to an increase in performance due to very little latency. SQL Server TempDB can be kept on local ephermal SSD (default D:\).  Also, tables which need not be persisted can be created on TempDB.  DML operations on tables in TempDB are very fast due to reasons like these operations on TempDB are minimally logged and TempDB can be configured to use local SSD disk for temporary storage.

 

6.    Use of Virtual machines with high memory to core ratio which are preferred for Database workloads  

Memory optimized virtual machines are good for Database workload. It is great for relational database servers, caches, and in-memory analytics. Azure offers VM sizes where you can constrain the vCPU count to reduce the cost of the software licensing while maintaining the same memory, storage and I/O bandwidth. The new Ebdsv5-series provides the highest I/O throughput-to-vCore ratio in Azure along with a memory-to-vCore ratio of 8. This series offers the best price-performance for SQL Server workloads on Azure VMs. Consider this series first for most SQL Server workloads.

 

7.    Use of Managed Disk with high IOPS and throughput 

Azure provides various storage options to store application / database files. Most preferable options are SSD Disk or Ultra Dik. Choice between SSD and Ultra Disk depends on your specific workload requirements, performance needs, budget considerations, and overall goals. Perform benchmarking and testing to ensure your chosen disk type meets the performance demands of your application. Below table provides a quick comparison of SSD and Ultra Disk. Detailed summary of comparison various disk types is provided at location : link

 

Sr.No. Use Case Premium SSD v2 Ultra Disk
1 Performance Requirements High IOPS, Low Latency Highest IOPS, Lowest Latency
2 Workloads Production, Databases, Apps Mission-Critical, Large Databases
3 Throughput High Highest
5 IOPS High Highest
6 Latency Tolerance Low
7 Cost Considerations Higher Cost Expensive
8 Scaling Up Suitable Highly Suitable


8.    Use of SQL Server Clustered Columnstore Index to get better database compression

On Mainframe some of the database table grow exponentially and have huge amount of data. Queries against these tables sometimes target aggregates function on the given column (min / max / avg / group by etc.) which is very time consuming if data is stored in traditional row format. Column store indexes are the standard for storing and querying large data warehousing fact tables. This index uses column-based data storage and query processing to achieve gains up to 10 times the query performance in data warehouse over traditional row-oriented storage. You can also achieve gains up to 10 times the data compression over the uncompressed data size. Column store indexes enable operational analytics, i.e., the ability to run performant real-time analytics on a transactional workload.

 

Sandip_Khandelwal_0-1694179641959.png

 

9.    Use of Memory optimized tables

Using memory optimized tables in Azure SQL can help improve performance by reducing or eliminating disk IO, TempDB contention, and locking overhead. Memory optimized tables store data in memory, and use a different data structure and algorithm than traditional disk-based tables. This allows for faster access and modification of data, as well as greater concurrency and scalability. However, memory optimized tables also have some limitations and requirements, such as the need to have at least one index, the support for only a subset of Transact-SQL features, and the dependency on database compatibility level. Further information about Memory-Optimized tables can be found at location : link

 

10.  Use of Table partitioning for Large Tables

Using partitioning in Azure SQL can help improve performance by dividing large tables and indexes into smaller, more manageable units. This can reduce the I/O and memory required for queries, improve concurrency and parallelism, and enable faster maintenance operations. Partitioning can also help with data archiving, loading, and deleting by using partition switching. Further information about Table partitioning can be found at location : link

Figure shows horizontal partitioning or sharding. In this example, product inventory data is divided into shards based on the product key. Each shard holds the data for a contiguous range of shard keys (A-G and H-Z), organized alphabetically. Sharding spreads the load over more computers, which reduces contention and improves performance.

 

datapartitioning01.png

 

11.  Use of Filtered Index

Using filtered index in Azure SQL can help improve performance by reducing the size and maintenance cost of the index, as well as providing more accurate statistics for the query optimizer. A filtered index is a non-clustered index that only includes a subset of rows from a table, based on a filter predicate. This means that the index will occupy less space, require less I/O, and be updated less frequently than a full-table index. Additionally, a filtered index will have statistics that reflect the distribution of values in the filtered subset, which can help the query optimizer choose a more efficient execution plan for queries that match the filter predicate. For more information, you can refer to documentation at location : link

 

12.  Large Data Insert into Azure SQL Methods

To insert a large amount of data extracted from Db2 on a mainframe into a Azure SQL, you can follow the following methods:

Azure Data Factory: Azure Data Factory (ADF) can be used to efficiently insert massive amounts of data into Azure SQL by orchestrating data pipelines that handle data extraction, transformation, and loading (ETL) tasks, enabling seamless and scalable data transfer.

BCP Utility: The Bulk Copy Program (BCP) utility is another command-line option to bulk insert data into Azure SQL. It's scriptable and can be used for automation.

BULK INSERT: For a straightforward, high-speed data transfer, you can use the SQL Server BULK INSERT command. This is particularly efficient for large, flat files like CSV or TSV.

 

13.  Enhancing Database Performance with Indexed Views

Indexed views also known as materialized views can significantly improve query performance by precomputing and storing the results of complex queries. When you query the indexed view, SQL Server can retrieve the data from the view instead of re-computing it every time. Indexed views are particularly useful for aggregating and summarizing data. They can be used to create summary reports or provide fast access to aggregated data, such as totals, averages, or counts, without the need for expensive calculations during query execution.

 

14.  Migrating Db2 large objects to Azure SQL

In Db2, the BLOB (Binary Large Object) data type is used to store binary data, typically large and unstructured data. BLOB columns can contain a wide variety of binary data, including:

a) Images such as JPEG, PNG, or GIF files.

b) Documents like PDFs, Word documents, Excel spreadsheets, or text documents.

c) Media Files such as MP3, WAV, MP4, or AVI files.

d) Serialized Objects which can be Java objects, .NET objects etc.

e) Binary Data: Any other binary data, such as executables, firmware, or binary data generated by applications, can also be stored in BLOB columns.

While transferring BLOB / CLOB data from Mainframe Db2 to Azure SQL, we do not recommend storing this data in Azure SQL Tables since storing binary data in relational databases presents challenges in terms of database size, as it can lead to increased storage requirements and slower backup and query performance. Ensuring data integrity, managing concurrency, and handling the complexity of securing and backing up large binary files are also significant concerns.

One of the options is to transfer LOB data from Db2 table to ADLS Gen2 storage and update blob metadata information in Azure SQL. This approach allows to store LOB data in Azure storage account and reference the same from Azure SQL database via blob object metadata. Detailed solutioning of this approach is mentioned on technical blog location : link

 

Summary

Migrating Mainframe application from Single box system to distributed systems on Azure needs slight change in the thought process. Suggestions highlighted in the above blog encompass various strategies to enhance performance of migrated application on Azure. First, by employing proximity placement groups, the co-location of Application and Database virtual machines (VMs) is emphasized to achieve low latency. Additionally, caching frequently accessed static data within COBOL programs is proposed to reduce network calls. The adoption of Azure SQL Stored Procedures is recommended for reducing multiple network calls. The utilization of the merge query in Azure SQL is highlighted to simplify update and insert operations, while transitioning to set-based operations is proposed for improved performance and reduced network traffic. TempDB utilization on local SSD storage is recommended for faster operations, and selecting memory optimized VMs is suggested for database workloads to balance performance and cost-effective licensing. Lastly, the use of SQL Server Clustered Columnstore Index / Filtered Index / Memory optimized tables / Table partitioning / Indexed views is advised to enhance query speed and data compression for tables which have large amount of data and are being used to perform analytics.

 

Feedback and suggestions

If you have feedback or suggestions for improving this data migration asset, please send an email to Databases SQL CSE Engineering Team.

 

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.