There are a large number of best practices around SQL Server performance tuning – I could easily write a whole book on the topic, especially when you consider the number of different database settings, SQL Server settings, coding practices, SQL wait types, and so on that can affect performance. For this post I decided to step back a bit from a list of specifics (such as my series on various latch classes over on the SQLPerformance.com blog) and list some general recommendations for how to approach SQL Server performance tuning to maximize effort and minimize distractions.
1. Don’t Assume the Symptom of Poor SQL Server Performance Is the Root Cause
Many database administrators (DBAs) and developers tend towards what I call “knee-jerk performance troubleshooting” with SQL performance tuning, where a minimal amount of analysis and investigation is performed, and the assumption is made that the most prevalent symptom of poor SQL performance must be the root cause. When this happens, an effort is made to try to address the supposed root cause of SQL performance issues, and it can lead to a lot of wasted time and frustration when the mitigation efforts don’t help the situation.
My favorite example of this is when average disk latency is high. The classic knee-jerk reaction is it must be an issue with the I/O subsystem, so the company spends money on a better I/O subsystem and the performance problem goes away for a while and then comes back again, because the problem isn’t the hardware itself, but something happening within SQL Server.
For a case like this, it’s generally better to take a mental step back and ask why SQL Server is overloading the I/O subsystem or, more precisely, why SQL Server is doing so many physical reads. There are many reasons this could be happening, such as (but not limited to):
- An inefficient SQL query plan doing a large, parallel table scan instead of using a nonclustered index because of something like a missing index, or implicit conversion, or out-of-date statistics
- Memory pressure on the buffer pool (meaning there isn’t enough space to hold the usual “working set” of database pages) from the OS
It always pays to do some investigation instead of jumping to a quick conclusion on the root cause when performance tuning in SQL Server.
2. Determine the Scope of the SQL Performance Problem
It’s important to figure out what the scope of the problem is in SQL Server, as it determines how you’ll go about investigating the problem, what metrics to gather, and what scripts and performance tuning tools to use. For instance, being asked to investigate stored procedure XYZ which takes twice as long to run as it usually does is different from being asked to tune all long-running stored procedures.
Stored procedure metrics can be obtained by running the query in SQL Server Management Studio or Azure Data Studio, and noting duration, CPU, and IO statistics. The performance information can also be obtained from the plan cache, and you can also leverage the plan cache when you need to find the longest-running stored procedures. The following SQL query, adapted from the popular set of DMV scripts here, lists the slowest 25 procedures, based on average duration:
SELECT TOP (25) [p].[name] AS [SP Name], [eps].[min_elapsed_time], [eps].[total_elapsed_time] / [eps].[execution_count] AS [avg_elapsed_time], [eps].[max_elapsed_time], [eps].[last_elapsed_time], [eps].[total_elapsed_time], [eps].[execution_count], ISNULL ([eps].[execution_count] / DATEDIFF (MINUTE, [eps].[cached_time], GETDATE ()), 0) AS [Executions/Minute], FORMAT ([eps].[last_execution_time], 'yyyy-MM-dd HH:mm:ss', 'en-US') AS [Last Execution Time], FORMAT ([eps].[cached_time], 'yyyy-MM-dd HH:mm:ss', 'en-US') AS [Plan Cached Time] -- ,[qp].[query_plan] AS [Query Plan] -- Uncomment if you want the Query Plan FROM sys.procedures AS [p] WITH (NOLOCK) INNER JOIN sys.dm_exec_procedure_stats AS [eps] WITH (NOLOCK) ON [p].[object_id] = [eps].[object_id] CROSS APPLY sys.dm_exec_query_plan ([eps]. [plan_handle]) AS [qp] WHERE [eps].[database_id] = DB_ID () AND DATEDIFF (MINUTE, [eps].[cached_time], GETDATE()) > 0 ORDER BY [avg_elapsed_time] DESC OPTION (RECOMPILE)
There are also tools like the Top SQL functionality in SolarWinds® SQL Sentry designed to help identify highest impact and highest resource using queries.
3. Define the Goal of Successful SQL Server Performance Tuning
Once you have the scope of the problem, the next step is to determine the goal of the SQL Server performance tuning effort, so you know when you have achieved success and can move on to another task. Don’t allow the goal to be something undefined and open-ended like “stored procedure XYZ needs to be faster,” it needs to be well-defined such as “stored procedure XYZ needs to run at the speed it did before, i.e., at 50% of the current elapsed time.”
Sometimes the investigation will be a bit more involved if the scope is wider, requiring capturing SQL Server metrics and information over time before any analysis and mitigation can start. For instance, one of the first consulting clients I worked with had a somewhat open-ended goal for me which was, paraphrasing, “tempdb runs out of space once a week, and we need it not to do that” without any idea why. The investigation involved me setting up two SQL Agent jobs; one every 10 seconds to look for large uses of tempdb and log information to a table, and another once an hour to email me any results from the previous hour. The general code I wrote to find space-hogs in tempdb is below:
-- InternalMB/Pages: worktables (cursor, spool) , workfiles (hash joins), sort -- UserMB/Pages: everything else -- SELECT GETDATE () AS [Date], [tsu].[session_id] AS [SessionID], [tsu].[exec_context_id] AS [ExecContextID], -- anything > 0 means parallelism ([tsu].[user_objects_alloc_page_count] - [tsu].[user_objects_dealloc_page_count]) AS [UserPages], ROUND (CONVERT (FLOAT, ([tsu].[user_objects_alloc_page_count] - [tsu].[user_objects_dealloc_page_count]) * 8) / 1024.0, 2) AS [UserMB], ([tsu].[internal_objects_alloc_page_count] - [tsu].[internal_objects_dealloc_page_count]) AS [InternalPages], ROUND (CONVERT (FLOAT, ([tsu].[internal_objects_alloc_page_count] - [tsu].[internal_objects_dealloc_page_count]) * 8) / 1024.0, 2) AS [InternalMB], [er].[plan_handle] AS [Plan], [est].[text] AS [Text] FROM sys.dm_db_task_space_usage [tsu] JOIN sys.dm_exec_requests [er] ON [er].[session_id] = [tsu].[session_id] CROSS APPLY sys.dm_exec_sql_text ([er].[sql_handle]) [est] /* WHERE -- Optionally, filter by a size limit -- E.g., the 16384 is 128MB in 8KB pages (([user_objects_alloc_page_count] - [user_objects_dealloc_page_count]) + ([internal_objects_alloc_page_count] - [internal_objects_dealloc_page_count])) >= 16384 */ ORDER BY (([user_objects_alloc_page_count] - [user_objects_dealloc_page_count]) + ([internal_objects_alloc_page_count] - [internal_objects_dealloc_page_count])) DESC;
4. Understand the Limitations
Before you start proposing or making SQL performance tuning changes, it’s important to know if there are things you cannot do. Here are some examples:
- If the application is written by a vendor, you can’t make code changes to improve SQL performance
- If the application is written by a vendor, you might not even be able to add or change SQL indexes without voiding the vendor’s support agreement
- You might not be able to change a setting like MAXDOP or parameter sniffing for the whole SQL Server, which may mean using an ALTER DATABASE SCOPED CONFIGURATION option for a single database
Even if you can change code, there may be a lengthy testing process which prevents a change from being immediately implemented, so you may need to pursue alternative SQL performance tuning solutions (potentially short-term) to quickly fix the problem.
5. Tune One Thing at a Time
One of the most confusing things to do when SQL performance tuning is to make multiple changes at the same time, as then you won’t know which change had an effect, or whether multiple changes cancelled each other out. Always change one thing at a time and keep a note of what you changed and what effect it had, if any. Also, if a change doesn’t have any effect, then revert the change so it doesn’t become a complication if the workload evolves later.
6. Do Not Test SQL Tuning Changes in Production
One of the worst things to do when performance tuning in SQL Server is to make changes directly in production, as this can lead to dire consequences for the workload and business if a change creates a huge negative effect. This means you need a separate test/QA environment that can be used to evaluate changes under production workload conditions, or as close to it as possible. And that leads nicely into the next point.
7. Understand How Test Compares to Production
If your test system doesn’t compare to production, then you may not see the same change in SQL Server performance in production as you do in test. Classic examples of this include:
- A production system with a certain CPUs (e.g., four eight-core processors) and a lower powered test system to save money (e.g., four quad-core processors)
- Along the same lines, test having a lot less memory than production, or a different NUMA configuration, or a lower-rated storage subsystem
- Test only having a subset of the production data to test with
- Test not being able to simulate the production workload
These things can result in the test system producing different SQL query plans, or the workload in test having different characteristics than in production. This means you’ll be performance tuning for a different workload and SQL environment and the efficacy of the changes may not translate to the production environment.
8. Understand the Implications of SQL Tuning
After you’ve determined what the necessary change is, you need to consider what wider effect, if any, making that change will have. For example, if you need to change MAXDOP or the cost threshold for parallelism, that will flush the SQL Server query plan cache, and you might run the risk of parameter-sensitive queries recompiling with sub-optimal plans.
Other SQL performance tuning changes might be more environmental, like offloading parts of a query workload to a readable secondary in an availability group. That can lead to index fragmentation issues on the primary database, which can be performance problems of their own.
You don’t want to solve one SQL performance problem and end up with an unexpected different problem to solve.
9. Create a Rollback Plan
It’s important you have a complete log of what’s been changed and have the ability to revert the SQL tuning changes if something goes wrong. This means preserving original copies of all code and schema and ideally having a script you can run to quickly roll back the changes.
If this would be hard to do and would entail restoring the database from backups, one thing to consider is creating a database snapshot of the database and keeping it around for a few days. A database snapshot automatically keeps a pre-change copy of all changed data file pages since the time the database snapshot was created and allows you to effectively put the database back to that time with a one-line T-SQL command (internally SQL Server does this by pushing the pre-change pages back into the real database – called “reverting the database to the database snapshot”).
10. Remove Diagnostic Elements From Production
Once you’ve finished the investigation and reached your SQL performance tuning goal, make sure you remove all of the diagnostics you implemented to help with the investigation, as they could cause performance problems themselves if left in place, especially SQL Server Extended Event sessions as they can become “silent killers” that use up a lot of CPU resources with no other clue they’re the problem.
You can see which SQL Server Extended Event sessions are running using the following code:
SELECT [ses].[name] AS [Session Name], CASE WHEN [xs].[address] IS NOT NULL THEN 'Running' ELSE 'Stopped' END AS [State], [xs].[create_time] AS [Start Time] FROM sys.server_event_sessions AS [ses] left outer JOIN sys.dm_xe_sessions AS [xs] ON [ses].[name] = [xs].[name] ORDER BY [State], [Start Time];
You should always take a step-by-step approach to SQL Server performance tuning rather than jumping right in and changing things haphazardly in production, and I hope this post has provided you with a simple framework you can put into practice. There’s a lot of code out there to help you with various performance investigations, plus free tools like SolarWinds Plan Explorer (I can’t recommend this enough!) and more heavy-duty performance monitoring solutions for your entire SQL Server estate like SolarWinds SQL Sentry. Happy tuning!