SQL and Database Performance Tuning Guide and Checklist: Expert Tips

Posted on May 23, 2017
 

A Database Administrator (DBA) is like a racecar technician. Building and optimizing a high-performance vehicle requires miniscule fine-tuning and attention to detail. A tweak here and an adjustment there could shave just a fraction of a second from your lap time. Of course, that fraction of a second could determine whether you win or lose the race.

As a DBA, you are responsible for fine-tuning your SQL queries so that they have extra horsepower. Even a small improvement in speed could put you ahead of your competition and make you a winner in your client’s (or your boss’s) eyes.

While SQL performance tuning isn’t something you become an expert at overnight, there are a few guidelines to follow as you’re getting started as a cloud DBA. This guide will give you a basic framework to build on as you grow your skill set and knowledge base. You will learn a few vital tricks of the trade, like how to think in buckets and to group correlated metrics.

Also included is a general checklist of database maintenance tasks and a free demo of SQL tuning software.

1. What is SQL Performance Tuning?

It may be helpful to begin with a clear definition of what SQL performance tuning is and an understanding of what you are trying to accomplish.

SQL performance tuning is the process of improving the queries executing within your database with the intention of returning the desired results as efficiently and quickly as possible. There may be dozens of ways to return the same results, and SQL performance tuning is all about finding the fastest/least expensive one.

There is no one silver bullet in SQL tuning. What may be the best practice in one instance wouldn’t work for another situation. All databases are not created equal in terms of purpose (think OLTP vs OLAP/data warehouse) or size (think gigs vs teras). This is a continual process of testing and recording the results, then testing again.

When conducting performance tuning tasks, it’s important to:

  1. Think in buckets
  2. Focus on correlated metrics
  3. Measure results over time
  4. Conduct regular database maintenance

2. Think in Buckets

When presented with the task of database performance tuning, it can be hard to decide where to start. You may not yet have an idea of what aspect of your database requires attention. When talking about administering a database in the cloud, you may have to bring into consideration things you didn’t have to worry so much about when your database was on-premises.

The best way to formulate your plan of action is to think in buckets. Look at your database like a mechanic analyzing a car that just came into the shop. It’s not very helpful to look at everything at once, so the mechanic looks at issues based on their function and location. Is it an issue with suspension? The engine? Or does it require body work? You could call these the “buckets” of automotive repair.

When it comes to database performance tuning, separating potential issues into buckets means organizing everything based on commonality. For databases and tuning SQL, the buckets can be thought of as:

  • Resources (physical or virtual)
  • Indexing/Data model
  • Query structure
  • Concurrency (blocking - multiple threads requesting locks on the same database resource)

From there, you can drill down into other “buckets” to identify the source of the problem. Take resources for example, since it may be the one thing you have to focus on a bit more if your database lives in the cloud. Below are some of the resource categories to consider when looking at making SQL statements run better:

  • Memory
  • Disk I/O (volume and latency)
  • CPU
  • Network

The same further breakdown can be used for the other “buckets,” but we won’t dive deeper into those as they don’t significantly change when thinking about on-premises vs cloud.

3. Focus on Correlated Metrics

When you test your queries and fine-tune your database, you’ll be tracking specific metrics in order to chart improvement over time. These metrics could include:

  • Wait states (where the database is spending time executing the query for the end-user)
  • CPU utilization and signal waits
  • Commit response time
  • SQL statement compilations
  • Transaction throughput
  • Memory pressure (latch contention and data flushed out of memory quickly)
  • Disk I/O latencies
  • Network bandwidth and usage (this is especially important for supporting databases in the cloud)

Use wait states as a guide for what metrics to use for correlation. If your database is spending all its time doing disk reads, then looking at memory pressure (why does it keep having to go back to disk?) and disk performance (read latency) as correlating metrics to see if those specific resources are causing performance issues. Another great example is blocking. If your database is spending all of its time waiting for locks to be released, you cannot correlate CPU, memory, disk, or network contention or pressure.

When it comes to performance metrics, database analysis tools are your best friend. The dashboard function in Database Performance Analyzer provides a valuable window into how metrics are correlated. Multi-dimensional performance analysis gives you the right context, so you can look at your data from multiple angles and triangulate the root cause of performance issues. The multi-dimensional view provides you with the who, what, when, where, and why answers you need.

4. Measure Results Over Time

To understand how your SQL queries are performing, you need to track your metrics over time. Collecting data over the course of a few weeks or even months will give you a big picture view of your database’s overall performance.This will help you see the aggregate as opposed to a snapshot of a single event. It helps you answer the question, “Is my performance worse now than it has been historically?” With the ability to chart performance over time, that can be used as a baseline to determine if performance for any given time frame is outside the norm. An example might be comparing end-of-month activity to the end of last month instead of just last week.

5. Database Maintenance Tasks

Database maintenance is essential to SQL tuning because this task enables you to find potential issues and solve problems you may have never known existed.

Maintenance tasks may vary by environment, but there are some common or general tasks that are recommended.

Before going into the specific maintenance tasks, here are a few acronyms that every DBA should be familiar with:

  • SLA - Service Level Agreements (what you and the business end-user agree upon)
  • RPO - Recovery Point Objective
  • RTO - Recovery Time Objective
  • MTTI - Mean Time To Innocence (the amount of time it takes to prove that the issue doesn’t rest with the database)

Keeping these acronyms in mind, here are the main areas where you need to perform routine maintenance as a DBA:

1. Backups – This needs to be done in a way that meets RPO/RTO/SLA objectives, but it will vary by system. One possible approach would be to conduct full weekly backups, daily differential/incremental backups, and hourly transaction log backups (for SQL Server®) during business hours. This will give you a decent amount of protection out of the gate, but is not appropriate to all systems.

For example, a 30PB database is not likely to need weekly full backups. Also, it’s very important that you know how long it will take to recover to yesterday, last week, last month, or last year. You need to build a recovery plan and make your backups fit that plan.

Remember, you may not get a raise if you can’t tune your database environment, but you WILL NOT HAVE A JOB if you can’t guarantee you can restore.

2. Index maintenance – Your indexes need to be checked frequently, most likely daily. However, you may not want to rebuild indexes every night. Again, it depends on the system and the length of time needed to rebuild. It’s possible that the level of fragmentation in the indexes is such that the updating of stats or reorganizing the index are better options than rebuilding the index.

3. Update statistics – Statistics ideally need to be checked daily. These checks depend on the system, the length of time to update stats, the amount of churn within the database objects (tables and indexes), and the queries that will be forced to recompile the next time they are run.

4. Corruption checks – You should make checking for corruption a top priority, at least once a month.

5. Capture configuration details – For both the database and the server, you want to grab details of the configuration settings daily, and track when changes are made. You could extend this to capture other things like logins/users created or deleted, permissions for those users, etc. The level of detail is up to the diligence of each individual DBA, and their desire to keep MTTI low.

6. Helpful Resources

In addition to the Maintenance Task Checklist, here are some helpful resources to aid in your development as a Database Administrator:

Conclusion

As you can see, being a cloud DBA encompasses a lot of the same concerns and activities as a traditional DBA. We can use a lot of the same tools and approaches for troubleshooting performance issues as we have in the past. DBAs of the cloud and future will likely be required to have a broader understanding of technologies. So, dust off those network skills and get your head in the cloud.

Download this simplified, single-page sql database maintenance checklist and tape it to your office wall as a reminder of these maintenance tasks.

If you’ve found this article to be informative and valuable, please take a moment to share it on LinkedIn® and your other social media profiles.

The Evolution of the Web and Digital Experience Monitoring

Posted on May 23, 2017
 

Digital Experience Monitoring (DEM) is so business critical that it's featured in a Gartner magic quadrant. At the same time, it is so new that even Wikipedia does not know what it is.

(related read: What is Digital Experience Monitoring (and Why Should You Care)?

Why is that? Monitoring as a practice is as old as dirt, and digital experiences have been around since at least the birth of the web in the 1990s. The next most logical question is: How did developers and operations engineers (pre-DevOps) fix application performance anomalies in the past?

The answer reveals many fascinating stories, many of which come with lessons on how to improve digital experiences in the future.

Applications Are Built on Hope

The distance between a development sandbox and a production environment can be tiny or even non-existent. Sometimes they are virtual machines on the same box. Yet the experience of using an app in the sandbox is normally vastly different from an app in the wild. In the early days of app development, the best you could do is test locally and hope for the best.

It was clear that there had to be a better way to test if a fix addressed the problem in production and if that fix triggered other problems elsewhere in the code. That was the beginning of regression testing.

Performance variation for web-based apps turned out to be often related to the user's OS and choice of browser, which required manual testing. In response, developers put their heads together and co-created automated testing platforms like Selenium.

Selenium generates test scripts in the most common languages, such as C#, Java, PHP, Python or Ruby. These tests then produce reports on how web apps will run on virtually any browser and OS. Selenium became popular by being open-sourced and free to use under the Apache 2.0 license.

A World Lit Only by Browsers

Before the mobile revolution, which kicked off with the first iPhone at Macworld 2007, Internet Explorer (IE) still ruled the browser market. In 2004, IE6 held onto around 95 percent of the browser market because MS Windows was still the predominant OS for businesses.

At the time, new browsers like Firefox were coming online with the most advanced features built in. These included tabbed browsing, spell check, bookmarks, download managers, geolocators and address bar search.

Although Firefox saw 60 million downloads within nine months, IE6 was still the standard. All developers had to ensure their web apps worked on IE6 and couldn’t afford to take advantage of the latest advances in browser technology. This is a good example of how the digital experience tends to be limited by the delivery channel rather than what is possible in terms of development. IE would soon lose relevance, however, as a result of the mobile revolution.

How Apps Broke Away of the Web

There were smartphones long before the iPhone, and there were apps long before the app store. What shook the world in 2007 was that the iPhone took 13 percent of the market within a year, eclipsing the Blackberry and pushing Google into releasing the Android OS for free. Microsoft’s Windows Mobile would be out of commission within months and the app ecosphere soon exploded. By 2013, there were 2 million apps, a million in Apple’s App Store and another million on Google Play.

Native mobile app development very quickly caught up to web app development, yet both delivery pathways continued to be problematic from the standpoint of application performance monitoring (APM). While web apps continued to face challenges from disruption in the browser market, native apps introduced a new problem.

By minimizing network and connectivity issues, apps could run blazingly fast on the tiny processors inside phones. Users became accustomed, and then psychologically dependent, on that kind of performance speed. Once you have driven a Lamborghini, it is frustrating to drive a Kia.

While there are various metrics on the ratio of loading times to revenue loss, Kissmetrics estimated that a one-second page delay is worth about $2.5 million in lost sales to an average e-commerce site. No matter how many customers you serve or what industry you are in, microsecond delays now mean the difference between a user staying in your app or abandoning it for the competition.

The most careful development teams have built ways to find and eliminate problems before they drive away users. What they created was synthetic monitoring, which is also known as proactive monitoring, or using scripts to simulate what a real user would do with your app. You can think of it as the pre-production version of DEM.

The Constancy of Change

The performance of web-based and native apps are only the beginning. Now IoT is on the forefront of development today as data collection and processing moves into practically every device you touch throughout the day.

Cars, kitchen appliances, digital assistants and even clothing are becoming aware. All of these channels and more must be optimized for the best digital experience and operate flawlessly. Users have become accustomed to instantly accessing what they need, and now have very little tolerance for poor performance. New competitors arise from every corner of the globe as barriers to entry tumble. They do not need to wait, so abandoning a poor process is just good thinking.

Any company that has a digital presence, which is essentially every company, can no longer afford bad performance. That translates into user churn and lost revenue. For now and the foreseeable future, business continuity is a matter of DEM.

Policy Compliance or Network Configuration Management?

Posted on August 08, 2016
 

Is there a silver bullet for making policy and risk compliance easier? While there is no single solution which will manage compliance end-to-end, the answer may surprise you. But first a little background.

Introduction

Risk Management is fundamental to maintain a successful enterprise. The purpose of Risk Management is to identify uncertainties, hazards, exposures, liabilities and other risks which may cause harm to the organization, forecast the potential impact and then implement measures designed to reduce these affects. Considering IT operations is vital to business continuity, it should be no surprise that Risk Management is a central tenant of IT governance.

One of the tools used to manage risk is the Risk Policy. In IT, the risk policy is a top-level document which specifies what standards the organization will observe in order to safeguard the confidentiality, integrity and availability of its IT systems and data. The Risk Policy will often include additional sections addressing security and any obligations to industry and regulatory mandates like Payment Card Industry Data Security Standard (PCI DSS), Health Insurance Portability and Accountability Act (HIPAA) and similar.

It's important to observe that policy standards define what safeguards will be required but not how a standard will be implemented. For example, the PCI DSS policy has a standard which states that the organization will "install and maintain a firewall configuration to protect cardholder data”. However, it does not specify what type of firewall to use. It does not specify what firewall rules to deploy. It does not even specify what it means to maintain the firewall. Which leads us to our discussion on controls.

Policies are operationalized by implementing controls which support policy standards. There are two primary types of controls: technical and procedural. Procedural controls are "managerial” or "operational” in nature. They define how people are to perform a task or job function. For example an IT manager will retain a full system backup for seven years. On the other hand, technical control are directly implemented on IT systems using supported configuration options. For example, a server shall be configured to require a logon password. Once technical controls are in place, the challenge becomes to keep them in place. This leads us to the topic of compliance.

Policy compliance seeks to verify that all controls, as defined by policy, are: 1) implemented and 2) remain operational as implemented. As such, policy compliance can be thought of as a "continuous process” – one of implementation, monitoring and verification. While it's common to think that compliance is something an auditor does, in reality IT operations is responsible. An auditor only provides independent verification that policy objectives are being met.

As mentioned earlier, technical controls are implemented as configuration options. And more specifically, network controls are implemented as configuration options in network routers, switches and similar devices. So the big reveal here is that perhaps the best risk and policy management tool in your toolbox is your Network Configuration and Change Management (NCCM) software. By definition, NCCM software manages configuration changes and protects configurations and devices from unwanted changes.

Since the network forms the core foundation for IT services, and the quality of network service is defined by the configuration of its constituent routers, switches, controllers, access control devices and more, then it makes sense to carefully manage and monitor these configurations. This is why NCCM plays such a critical role in IT operations and risk management.

So just how does NCCM accomplish all of this? Here is a quick overview.

Configuration Management

Device access – Perhaps the first place to start is by removing ad-hoc access to devices. An NCCM will help you eliminate ad-hoc (un-authorized) device access and require configuration changes to be made using the NCCM management console.

Change control – By requiring all configuration changes to be made using the NCCM console, you are able to assign administrative privileges and implement a formal change review and approval process. This eliminates unplanned and unauthorized actions and maintains a history of all changes made.

Configuration templates – An NCCM will allow you to create a standardized script or change template for making reoccurring changes. This ensures that changes will be uniformly made as approved. Configuration change templates are device and vendor-neutral. They provide an automated way to mass-deploy new services, or quickly remediate a policy violation or security vulnerability across the network.

Job scheduling – Want to control when changes are made? An NCCM provides job scheduling to execute changes during maintenance windows.

Backup and recovery – Hardware failure and human error can break your network. Recover from these disasters quickly. An NCCM will schedule, back up, find, and restore device configurations.

Configuration Monitoring

Change detection and analysis – You've spent time and effort getting your configs to a baseline. How do you know when something changes? What if a change is made to the running config but not saved to the startup? An NCCM will monitor device configurations and notify you when any change is made. You can even compare two configs side-by-side and see what statements were added or removed.

Configuration Auditing

Audit Policies – Want to help ensure your do configs contain (or don't contain) specific configuration statements? An NCCM policy can be used to identify what is expected (or forbidden) in a configuration. Policies are useful to verify compliance with internal policies as well as DSS PCI, HIPAA, SOX, DISA STIG and other industry mandated policies.

Remediation – An NCCM will allow you to correct violations fast with remediation scripts (defined as part of the policy).

Summary

Change management, monitoring, and auditing are three ways an NCCM helps you manage and protect your network configurations as well as manage IT risk.

In March 2016, SC Magazine awarded SolarWinds Network Configuration Manager (NCM) with the "Best Policy/Risk Management Solution” for the fourth straight year. SolarWinds NCM won't manage every aspect of your risk policy. However, using a single tool, you can not only manage your network configurations, but also manage many of the procedural and technical controls for your network.

To learn about the SC Magazine 2016 awards and other category winners, read this article. To learn more about Network Configuration Manager, visit the NCM product page. To learn how to write a NCM policy, read this thwack® blog post.

Top 5 Developments at Cisco Live

Posted on February, 15 2017
 

Cisco Live is always a great place to check in on the latest developments in the networking world, and this year was no exception. There were 27,000 attendees and no shortage of booths, demonstrations, talks, and presentations from which to learn. If you were unable to attend, or just didn’t get to everything, here are our top five Cisco Live favorites.

1. The creation of NetPath
We found this development over at the SolarWinds booth. The team there has found a way to map your entire path, hop-by-hop, even to outside sites or remote locations. It’s called NetPath, and it’s a new feature in the latest release of Network Performance Monitor (NPM 12). With more environments moving to hybrid IT (more on that in a minute), we see this as a game changer for troubleshooting issues. Now network admins can see exactly where along the path the failure is occurring, and with historical data available, the mystery of “Why was X site down yesterday?” can finally be solved.

2. The prevalence of hybrid IT
(See, told you there would be more in a minute.) Mentions of hybrid IT and hybrid IT support were everywhere. This has been slowly growing for a while, but this is the first year it really seemed to take hold. Companies are moving to a blended (hybrid) model of providing IT services, with some elements still contained on-premises, and others moving to the cloud. With this shift come unique challenges where network admins are responsible for ensuring that services are accessible, even when those services live outside of the environment the admins control. With this in mind, now you see why we think that NetPath is such a game changer.

3. The emphasis on DevNet
The area devoted to DevNet—the network engineer-flavored DevOps—has grown substantially in recent years. It has gone from a booth or two a year ago, to a whole section of the floor in Berlin 2016, to an entire conference bay of non-vendor floor space at Cisco Live US 2016. Interactive demonstrations draw the crowds. This year, two dozen workstations were arranged around a model train, and attendees could try to find ways to change the speed and direction of the train by coding the environment. Fun!

4. The presence of IoT
You would think that a show devoted to making networks cleaner, faster, stronger, and more reliable would hold the Internet of Things as anathema, but at Cisco Live it was just the opposite. IoT was embraced in multiple areas, with sessions and even hands-on demonstrations being given in how to manage and even leverage those mini-sensors that are going into everything these days. Possibly one of the coolest applications, which many were talking and tweeting about, was the use of tiny sensor “backpacks” to investigate the health and collapse of honey bee colonies. (https://communities.cisco.com/community/developer/blog/2016/06/27/video-analytics-iot-sensors-honey-bee-health-at-cisco-live)

5. Cisco Live itself
Like asking a goldfish to describe water, sometimes the most amazing thing is all around you and therefore goes unnoticed. The Cisco Live infrastructure moved nearly 40 terabytes of traffic in four days across dual 10Gb links, required over 300 access points to support 18,000 wireless devices, and covered 250,000 square feet. And all of the infrastructure took just five days to set up onsite (although it required a full 11 months of planning)! Now that’s some serious networking.

6. #SocksOfCLUS
(A bonus item!) One of the best parts of Cisco Live is the human networking, and seeing how technology can help make personal connections with groups of like-minded folks. And what would geek-based IT culture be if all of that social interaction didn't include some completely off-the-wall goofiness? From #KiltedMonday to #SocksOfCLUS, it was fun to see online trends manifested in the real world wardrobe choices of attendees. Check out the Twitter links for pictures!

SQL Server Performance Tuning Tips

Posted on August 08, 2016
 

Query tuning is often the fastest way to accelerate SQL Server performance. Most often system-level server performance (memory, processors, and so on) improvement measures are ineffective and expensive. Expert developers believe most performance issues can be traced to poorly written queries & inefficient indexing, not hardware constraints. In fact, some performance issues can only be resolved through query tuning.

But when it comes to SQL Server query tuning, DBAs often grapple with the question of where to start. How do you assess a query? How can you discover flaws in the way a query was written? How can you uncover hidden opportunities for improvement? Query tuning is half science and also, half art as there are no right or wrong answers, only the most appropriate solution for a given situation.

SQL Server query tuning can be categorized into three broad steps:

  1. Basic query analysis
  2. Advance query analysis
  3. Facilitate tuning by using DB Performance monitoring tool

Here are 12 quick tips that can help a DBA improve query performance in a measurable way and at the same time provide certainty that the specific alteration has actually improved the speed of the query.

1. Basic query analysis

DBAs need visibility into all layers and information on expensive queries in order to isolate the root cause. Effective tuning requires knowing top SQL statements, top wait types, SQL plans, blocked queries, resource contention, and the effect of missing indexes. Start with the basics—knowing exactly what you’re dealing with before you dive in can help.

Tip 1: Know your tables and row counts
First, make sure you are actually operating on a table, not view or table-valued function. Table-valued functions have their own performance implications. You can use SSMS to hover over query elements to examine these details. Check the row count by querying the DMVs.
Tip 2: Examine the query filters, WHERE and JOIN clauses and note the filtered row count
If there are no filters, and the majority of table is returned, consider whether all that data is needed. If there are no filters at all, this could be a red flag and warrants further investigation. This can really slow a query down.
Tip 3: Know the selectivity of your tables
Based upon the tables and the filters in the previous two tips , know how many rows you’ll be working with, or the size of the actual, logical set. We recommend the use of SQL diagramming as a powerful tool in assessing queries and query selectivity.
Tip 4: Analyze the additional query columns
Examine closely the SELECT * or scalar functions to determine whether extra columns are involved. The more columns you bring back, the less optimal it may become for an execution plan to use certain index operations, and this can, in turn, degrade performance.

2. Advanced query analysis

Tip 5: Knowing and using constraints can help
Knowing and using constraints can be helpful as you start to tune. Review the existing keys, constraints, indexes to make sure you avoid duplication of effort or overlapping of indexes that already exist.

To get information about your indexes, run the sp_helpindex stored procedure:
Tip 6: Examine the actual execution plan (not the estimated plan)
Estimated plans use estimated statistics to determine the estimated rows; actual plans use actual statistics at runtime. If the actual and estimated plans are different, you may need to investigate further.
Tip 7: Record your results, focusing on the number of logical I/Os
If you don’t record the results, you won’t be able to determine the true impact of your changes.
Tip 8: Adjust the query based on your findings and make small, single changes at a time
Making too many changes at one time can be ineffective as they can cancel each other out! Begin by looking for the most expensive operations first. There is no right or wrong answer, but only what is optimal for the given situation.
Tip 9: Re-run the query and record results from the change you made
If you see an improvement in logical I/Os, but the improvement isn’t enough, return to tip 8 to examine other factors that may need adjusting. Keep making one change at a time, rerun the query and comparing results until you are satisfied that you have addressed all the expensive operations that you can.
Tip 10: If you still need more improvement, consider adjusting the indexes to reduce logical I/O
Adding or adjusting indexes isn’t always the best thing to do, but if you can’t alter the code, it may be the only thing you can do. You can consider the existing indexes, a covering index and a filtered index for improvements.
Tip 11: Rerun the query and record results
If you have made adjustments, rerun the query and record those results again.
Tip 12: Engineer out the stupid
Lookout for frequently encountered inhibitors of performance like: code first generators, abuse of wildcards, scalar functions, Nested views, cursors and row by row processing.

3. Use a DB Performance monitoring tool to facilitate query tuning.

Traditional database monitoring tools focus on health metrics. Current application performance management tools provide hints, but do not help find the root cause.

Tip:
You can make query tuning significantly easier by using a continuous database performance monitoring solution such as SolarWinds Database Performance Analyzer (DPA) to consolidate performance information in a single place. With DPA you can:
  • Identify the specific query that got delayed
  • Identify the specific bottleneck (wait event) that causes a delay
  • Show the time impact of the identified bottleneck


Learn more about SolarWinds Database Performance Analyzer (DPA).