Thursday, July 24, 2008

SQL Best Practices

This post will be quite lengthy and it will detail my findings on best practices with SQL, specifically Microsoft SQL Server.

I started with an article written by Vyas Kondreddi in 2001: SQL Server TSQL Coding Conventions, Best Practices, and Programming Guidelines. In 2001 people were not microblogging!

Well, to summarize the article and bring it up to date a little, here are some of the most important points (in my view):
  • Decide upon a database naming convention, standardize it across your organization, and be consistent in following it. It helps make your code more readable and understandable.
  • Write comments in your stored procedures, triggers and SQL batches generously, whenever something is not very obvious.
  • Try to avoid server side cursors as much as possible.
    As Vyas Kondreddi himself says: "I have personally tested and concluded that a WHILE loop is always faster than a cursor"
  • Avoid the creation of temporary tables while processing data as much as possible, as creating a temporary table means more disk I/O. Consider using advanced SQL, views, SQL Server 2000 table variable, or derived tables, instead of temporary tables.
    This is interesting, because I usually use a lot of temporary tables in my stored procedures to make the code more orderly. I guess that in the case of SQL Server 2005 and later one can always use Common Table Expressions to make the code more readable. For SQL 2000 and such I found two interesting articles about not using temporary tables and replacing them with either derived tables (selects in selects) or with table variables, although they do have some limitations, thoroughly explained in the latter post. Here are the links: Eliminate the Use of Temporary Tables For HUGE Performance Gains and Should I use a #temp table or a @table variable?
  • Try to avoid wildcard characters at the beginning of a word while searching using the LIKE keyword, as that results in an index scan, which defeats the purpose of an index.
    For a short analysis of index scans go to SQL SERVER - Index Seek Vs. Index Scan (Table Scan).
  • Use the graphical execution plan in Query Analyzer or SHOWPLAN_TEXT or SHOWPLAN_ALL commands to analyze your queries.
  • Use SET NOCOUNT ON at the beginning of your SQL batches, stored procedures and triggers in production environments, as this suppresses messages like '(1 row(s) affected)' after executing INSERT, UPDATE, DELETE and SELECT statements. This improves the performance of stored procedures by reducing network traffic.
  • Use the more readable ANSI-Standard Join clauses instead of the old style joins.
  • Incorporate your frequently required, complicated joins and calculations into a view so that you don't have to repeat those joins/calculations in all your queries.
  • Use User Defined Datatypes if a particular column repeats in a lot of your tables, so that the datatype of that column is consistent across all your tables.
    Here is a great article about Sql UDTs (not the new .NET CLR types): What's the Point of [SQL Server] User-Defined Types?. Never used them, myself, but then again I am not an SQL guy. For me it seems easier to control data from .Net code
  • Do not let your front-end applications query/manipulate the data directly using SELECT or INSERT/UPDATE/DELETE statements. Instead, create stored procedures, and let your applications access these stored procedures.
    I am afraid I also fail at this point. I don't use stored procedures for simple actions like selecting a specific item or deleting a row. Many time I have to build search pages with lots of parameters and I find it really difficult to add a variable number of parameters to a stored procedure. For example a string that I have to split by spaces and search for all found words. Would it be worth to use a stored procedure in such a situation?
  • Avoid dynamic SQL statements as much as possible. Dynamic SQL tends to be slower than static SQL, as SQL Server must generate an execution plan every time at runtime.
    Personally, I never use dynamic SQL. If I need to create an SQL string I do it from .Net code, not from SQL.
  • Consider the following drawbacks before using the IDENTITY property for generating primary keys. IDENTITY is very much SQL Server specific, and you will have problems porting your database application to some other RDBMS. IDENTITY columns have other inherent problems. For example, IDENTITY columns can run out of numbers at some point, depending on the data type selected; numbers can't be reused automatically, after deleting rows; and replication and IDENTITY columns don't always get along well.
    So, come up with an algorithm to generate a primary key in the front-end or from within the inserting stored procedure. There still could be issues with generating your own primary keys too, like concurrency while generating the key, or running out of values. So, consider both options and go with the one that suits you best.
    This is interesting because I always use identity columns for primary keys. I don't think a data export or a database engine change justify creating a custom identity system. However I do have to agree that in the case that data is somehow corrupted a GUID or some other identifier would be more useful. I am sticking with my IDENTITY columns for now.
  • Use Unicode datatypes, like NCHAR, NVARCHAR, or NTEXT.
  • Perform all your referential integrity checks and data validations using constraints (foreign key and check constraints) instead of triggers, as they are faster.
  • Always access tables in the same order in all your stored procedures and triggers consistently. This helps in avoiding deadlocks. Other things to keep in mind to avoid deadlocks are: Keep your transactions as short as possible. Touch as few data as possible during a transaction. Never, ever wait for user input in the middle of a transaction. Do not use higher level locking hints or restrictive isolation levels unless they are absolutely needed. Make your front-end applications deadlock-intelligent, that is, these applications should be able to resubmit the transaction incase the previous transaction fails with error 1205. In your applications, process all the results returned by SQL Server immediately so that the locks on the processed rows are released, hence no blocking.
    I don't have much experience with transactions. Even if I would need transactions in some complex scenarios, I would probably use the .Net transaction system.
  • Offload tasks, like string manipulations, concatenations, row numbering, case conversions, type conversions etc., to the front-end applications.
    Totally agree, except the row numbering, where SQL 2005 added all those nice Getting the index or rank of rows in SQL Server 2005 aggregate ranking options
  • Always add a @Debug parameter to your stored procedures. This can be of BIT data type. When a 1 is passed for this parameter, print all the intermediate results, variable contents using SELECT or PRINT statements and when 0 is passed do not print anything. This helps in quick debugging stored procedures, as you don't have to add and remove these PRINT/SELECT statements before and after troubleshooting problems.
    Interesting, I may investigate this further, although the SQL debugging methods have improved significantly since the article was written.
  • Make sure your stored procedures always return a value indicating their status. Standardize on the return values of stored procedures for success and failures. The RETURN statement is meant for returning the execution status only, but not data. If you need to return data, use OUTPUT parameters
  • If your stored procedure always returns a single row resultset, consider returning the resultset using OUTPUT parameters instead of a SELECT statement, as ADO handles output parameters faster than resultsets returned by SELECT statements.
  • Though T-SQL has no concept of constants (like the ones in the C language), variables can serve the same purpose. Using variables instead of constant values within your queries improves readability and maintainability of your code.

The next stop was SQL Server Best Practices from Microsoft.

Here are the articles I found most important, covering stuff from testing the I/O system of the system you want to install SQL server to up to Database backup, mirroring and maintainance:
Predeployment I/O Best Practices
SQL Server 2005 Deployment Guidance for Web Hosting Environments
SQL Server 2005 Security Best Practices - Operational and Administrative Tasks
Comparing Tables Organized with Clustered Indexes versus Heaps
Troubleshooting Performance Problems in SQL Server 2005
Implementing Application Failover with Database Mirroring
SQL Server 2005 Waits and Queues
TEMPDB Capacity Planning and Concurrency Considerations for Index Create and Rebuild
The Impact of Changing Collations and of Changing Data Types from Non-Unicode to Unicode
XML Best Practices for Microsoft SQL Server 2005
Performance Optimizations for the XML Data Type in SQL Server 2005
Top 10 Hidden Gems in SQL Server 2005

Lastly some links that I will not go in depth on:

SQL Server 2000 Best Practices
SQL SERVER - 2005 Best Practices Analyzer Tutorial - Sample Example describes the Microsoft Best Practices Analyser application. I tried it myself, it's not much. It touches mainly on the maintainance and security issues that I don't really concern myself with.
Top 10 Best Practices for Building a Large Scale Relational Data Warehouse. I don't think I will need it soon, but it is a short and interesting read.
SQL Server Pre-Code Review Tips. This Pinal Dave guy is pretty cool. He seems like a good resource for SQL related issues.
CMS Database Administration SQL Server Standards, a set of SQL coding standards for a medical government agency.

I am sure there are a lot of interesting resources on the net. I will update this post with new information once I get to it.