Packed/Batched Transact-SQL (TSQL) RPC Invocation

This post has been republished via RSS; it originally appeared at: New blog articles in Microsoft Community Hub.

The SQL Server TDS protocol provides two main paths for query execution (Language and RPC events.)  You can trace these events using the Batch::Starting/Completed (Language) and RPC:Starting/Completed (RPC) XEvents.

 

Language events are text streams requiring full parsing and may be susceptible to injection attacks.  Language events also require educated guesses.  For example, should select 1 return a smallint, bigint?

 

Example: SQLExecDirect(“sp_who 50”)

SQL Server receives

Type=Language
Query=”sp_who 50”

SQL Server must parse the query, make an educated guess for the data type of 50 and then execute the command.

 

Injection attack example: An application prompts the user for the session id value and fails to validate the input.  Instead, the user enters “50;drop database production” The query submitted to SQL Server is “sp_who 50;drop database production” which is clearly not expected.  In fact, malicious users can hide payload using the string null termination hidden character (0x0)  “50;0x0drop database production”  The SQL language standard allows null termination and other extended characters, but most editors treat them as string terminators.  Looking at such a pattern may only show you 50; hiding the malicious payload that SQL Server sees, parses, and executes.

 

RPCs are precisely bundled commands.  The term Remote Procedure Call is commonly associated with COM objects or other client server communication protocols.  At the high level an RPC provides an Id and marshaled parameters.  In SQL Server the Id is a stored procedure name the SQL Server maps to an object id and the parameters are natively streamed.  For example:

 

int ID = 50;
{call sp_who(?)}  ß ID is bound to ‘?’ as an int

SQL Server receives

Type = RPC
ID = sp_who
Parameter Count = 1
Parameter[0].Type = int
Parameter[0].Value = 50 


The RPC Advantages

  • The parameters are sent to the SQL Server in native format removing any guessing. 
  • Variable length parameters are sent to SQL Server with a length specifier, read in from the input stream directly, avoiding parsing. 
  • The Id requires lookup to match and object id, making it difficult for an attacker to perform TSQL injection.

The RPC is faster because all the parsing and guessing is avoided.  SQL Server can look up the definition of the procedure to be executed, use the natively bundled parameters, and execute.

 

What is a Packed/Batch RPC

The TDS protocol allows multiple RPC requests to be executed in a batch.

 

Packed/Batched RPC Execution

foreach insert in OneMillionList
            command += “{call sp_prepexec …. {Insert} }

SQLExecDirect(command)
Process results 

The client and SQL Server send and read the entire command stream into memory and then process the RPCs.  SQL Server does not read 1st RPC and process it and then read 2nd RPC and process it, etc. 

 

The TDS protocol does not accommodate the read, execute, repeat loop because the client sent ## RPCs in the same batch and is waiting for a response that the entire command was received.  Once the entire command is acknowledged by SQL Server the client can process results.  The client does not have logic (perhaps outside MARS connections or TSQL cursors) to interleave the initial command stream and results processing.  Thus, all million requests need to be read on the server and stored in memory before execution can begin.

 

Non-Packed RPC Execution

foreach insert in OneMillionList
{
   SQLExecDirect(“{call sp_prepexec …. {Insert} }   -- Each of these is a command submit and requires result processing
   Process results (SqlFetch, SQLNextResult, etc…)

The packed/batched RPCs require the client to stream everything in a single request, using more memory but reducing the network roundtrips used by Non-Packed (single request) submission loops.  The packed/batched RPCs require more memory on the SQL Server than the single submission loops but are faster because SQL can optimize processing the same RPC Id from the packed/batched mode.

 

Not A Bulk Copy Replacement

Many documents have confused the packed/batch RPC mode with bulk copy activities.  While faster than Non-Packed RPCs, Packed/Batched RPC are not a BULK COPY replacement.

Bulk copy sets up in insert stream and inserts the rows as they are streamed to the SQL Server.  The rows go onto the data pages and are fully hashed, checkpoint/lazy writer aware where-as the packed RPC must place the parameters in memory and then transfer to the data pages.  The memory is not released until the entire packed/batched RPC is complete.

 

sqlcmd Example

{call sp_who(1)}
{call sp_who(1)}
{call sp_who(1)}
{call sp_who(1)}
go

SQL Server receives all 4 RPC execution requests in a single stream from the client.

 

pyodbc Example

Any SQL Server client can submit a packed/batched RPC request.  Here is an example in pyodbc.

 

'''
use tempdb
go
drop table if exists t
go
drop procedure if exists spTest
go
create table t (id int,  id2 int)
go
create procedure spTest @id int,  @id2 int
as
begin
       insert into t values (@id, @id2)
end
go

Packed/Batched RPC: fast_executemany = True

    event_sequence  name   [TextData]   session_id   timestamp
    223      rpc_starting declare @p1 int  set @p1=NULL  exec sp_prepare @p1 output,N'@P1 int,@P2 int',N'insert into t(id, id2) values (@P1, @P2)',1  select @p1  68     2023-10-23 08:18:21.9143952
    224      rpc_starting exec sp_execute 1,0,0     68     2023-10-23 08:18:21.9165193
    225      rpc_starting exec sp_execute 1,1,1     68     2023-10-23 08:18:21.9217097

    event_sequence  name   [TextData]   session_id   timestamp
    230      rpc_starting exec sp_describe_undeclared_parameters N' EXEC spTest @P1,@P2  '  68       2023-10-23 08:18:21.9351613
    231      rpc_starting exec spTest @id=0,@id2=0  68     2023-10-23 08:18:21.9408572
    232      rpc_starting exec spTest @id=1,@id2=1  68     2023-10-23 08:18:21.9754044 Singleton RPC: fast_executemany = False

    event_sequence  name   [TextData]   session_id   timestamp
    21 rpc_starting declare @p1 int  set @p1=NULL  exec sp_prepexec @p1 output,N'@P1 int,@P2 int',N'insert into t(id, id2) values (@P1, @P2)',0,0  select @p1  66     2023-10-23 08:20:30.0116869
    22 rpc_starting declare @p1 int  set @p1=1  exec sp_prepexec @p1 output,N'@P1 int,@P2 int',N'insert into t(id, id2) values (@P1, @P2)',1,1  select @p1  66     2023-10-23 08:20:30.0178314
    23 rpc_starting declare @p1 int  set @p1=2  exec sp_prepexec @p1 output,N'@P1 int,@P2 int',N'insert into t(id, id2) values (@P1, @P2)',2,2  select @p1  66     2023-10-23 08:20:30.0211486     event_sequence  name   [TextData]   session_id   timestamp
    27 rpc_starting declare @p1 int  set @p1=NULL  exec sp_prepexecrpc @p1 output,N'spTest',0,0  select @p1   66     2023-10-23 08:20:30.0718299
    28 rpc_starting declare @p1 int  set @p1=6  exec sp_prepexecrpc @p1 output,N'spTest',1,1  select @p1   66     2023-10-23 08:20:30.0791241
    29 rpc_starting declare @p1 int  set @p1=7  exec sp_prepexecrpc @p1 output,N'spTest',2,2  select @p1   66     2023-10-23 08:20:30.0829108 ''' 

import pyodbc 
connection_string = ' DRIVER={ODBC Driver 17 for SQL Server};SERVER=localhost;DATABASE=tempdb;UID=sa;PWD=xxxxxxxxxxxxx;' 
conn = pyodbc.connect(connection_string)
conn.autocommit = False
conn.execute("set nocount on")

'''
cursor = conn.cursor()
params = [(i,i) for i in range(5)]
cursor.fast_executemany = False
cursor.executemany("insert into t(id, id2) values (?, ?)", params)
cursor.commit()
cursor.close()
''' 

cursor = conn.cursor()
params = [(i,i) for i in range(2000000)]
cursor.fast_executemany = True
cursor.executemany("{call spTest(?, ?)}", params)
cursor.commit()
cursor.close()

 

USERSTORE_SXC

The memory for handling packed/batched RPCs is associated with the USERSTORE_SXC.   

select * from sys.dm_os_memory_clerks

A packed/batched RPC allocates memory to memory objects (usually MEMOBJ_PROCESSRPC) associated with the USERSTORE_SXC.

select * from sys.dm_os_memory_objects

Cache and User Stores are monitored by the SQL Server Resource Monitor(s) and when a cache or store grows large or there is memory pressure the cache or store is issued shrink commands.

 

The reported problem when using packed/batched RPCs is that even a dbcc freesystemcache(‘ALL’) execution is unable to shrink the USERSTORE_SXC.  This is because the memory is in use by MEMOBJ_PROCESSPMO’s and cannot be released until the packed/batched RPC has fully completed.

 

If your SQL Server experiences spikes in the USERSTORE_SXC and MEMOBJ_PROCESSRPC objects locate the packed/batched RPC requests and ask the client to honor a reasonable batch size limit Instead of submitting 1 million requests in a single packed/batched RPC submitting 100,000 reduces the memory usage by a factor of 10x.

 

Better yet, if these are insert requests ask the client application to use the Bulk Copy interfaces which reduce memory and increase performance over the packed/batched RPC.

 

SQL Internals - As of Oct 2022

The packed/batched RPCs are consumed by SQL Server in a routine named GetCommandInput. The GetCommandInput loops calling GetNextRPC, consuming the entire request stream.

 

As the individual RPC requests are read from the stream, RpcInfo objects are created using a MEMOBJ_PROCESSRPC memory object.  The RpcInfo holds information about the RPC execution such as, the Id, number of parameters, parameter types and values, is the parameter an output parameter, etc.  The RpcInfo objects are chained together and passed to the SQL Server execution engine for processing.

 

Note: While the GetCommandInput loop is processing the session_id does NOT appear in sys.dm_exec_requests because the request is not executing yet.  This is an aspect that is being looked at for improved visibility.

 

Note: Network performance has a critical impact on the memory consumption timespan on the SQL Server.  For example, I executed the pyodbc example and after it consumed significant USERSTORE_SXC I attached the debugger to the python process, stalling the client.  SQL Server continued to hold the memory until I resumed the python client.

 

Note: Packet size can be a key factor in network performance.

 

Here is the stack of s stalled client attempting to stream the packed/batched RPCs while SQL Server is trying to load them into memory from the incoming TDS stream.

 

msodbcsql17!AddRpcSprocParameters
msodbcsql17!BuildTDSRPCs
msodbcsql17!BuildExecSproc
msodbcsql17!ExportImp::SQLExecDirectW
msodbcsql17!ExportImp::SQLExecute
ODBC32!SQLExecute

The RpcInfo list is then processed by the SQL Server execution engine and at this point the request appears in sys.dm_exec_requests.

 

Note: While GetCommandInput is reading in the stream the client’s command timeout may not be honored.  A TDS client streams the command and then waits for the acknowledgement that the command was fully received.  The timeout applies to the wait for the acknowledgement not the streaming of the command.  This means a large packed/batched RPC fully streams and allocates memory and then honors the cancel request.  (However, a TSQL KILL statement is honored, cancelling the GetCommandInput streaming activity.)

 

Once all the RpcInfo list has been processed the packed/batched RPC sends the final done (SRV_DONEFINAL) to the client and cleans up the RpcInfo list.  The cleanup is where the memory from the execution is released, the USERSTORE_SXC memory drops and the MEMOBJ_PROCESSRPC objects are destroyed.

 

Pattern of Packed/Batched RPCs

MEMOBJ_PROCESSRPC are commonly limited to 10 memory pages in SQL Server (8192 * 10).  GetCommandInput places as many of the RPC requests onto 10-page sized PMOs as possible.  Each time a 10-page sized PMO is filled a new MEMOBJ_PROCESSRPC PMO is created and added to the RpcInfo chain.  The number of MEMOBJ_PROCESSRPC entries of size (8192 * 10) in sys.dm_os_memory_objects is a sign of ongoing packed/batched RPC activities.

 

Recommendation

The client application should limit the size of the packed/batched RPC request to a size that provides limited memory consumption while allowing for the performance advantages.

- Bob Dorr

 

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.