In today’s computing world, it usually doesn’t take long before you hear the word “parallelism”, certainly if you’re a SQL Server DBA. As servers now support more and more processors, the modern database application needs to be able to take advantage of parallel processing for query processing. This two part post should give you a good foundation on what parallelism is and how SQL Server makes use of it.
What is Parallelism?
To really understand what parallelism is, first we need to talk a bit about processor architecture. The CPU on any machine is a one-man band. It’s constantly working on all the different tasks the computer sends to it. However, that little machine has a big limitation: It can only do one thing at a time. Granted, it can do that one thing very quickly and very well, but the CPU must finish each individual task before it can move on to the next. This is where we get the processor queue. The CPU takes each task that is sent to it and puts it into a list. As each task is completed, it will go to this list to get the next job it needs to do.
Obviously, this can be a pretty big bottleneck on your system. So manufacturers started adding additional processors to servers to spread the workload out. Then, if you had 2 processors in a machine, you now had 2 workers that can get tasks done, 4 processors to give you 4 workers, and so on. Of course, you could quickly run out of physical space on a motherboard for all these processers, so then came the introduction of multi-core processors, helping to expand this even further and providing even more execution units for our workload.
Parallelism is born out of taking advantage of these multiple workers. Normally, an entire SQL query and the work associated with it would be assigned to a single processing core. On a machine with one or two cores this might not be a huge impact, but if your server has 16 available cores, you’ll have a lot of lost work time as cores sit idle. What SQL Server does is take your query and break it up into smaller workloads, distributing those workloads across the available cores, and then consolidating the results as each workload, or thread, completes. So parallelism is, simply put, doing a task in parallel.
While there’s a lot more involved here, the full detail of CPU architecture and how SQL server takes advantage of it, you should check out Chapter 5 in the excellent Professional SQL Server 2008 Internals and Troubleshooting.
Seeing Parallelism In Action
So how can we find parallel queries on our server? If you want to use the GUI, you can just pull up the Activity Monitor, open up processes, and look for any process ID that has multiple rows, as each row represents a different thread of work. However, the quickest way is to query the DMVs:
select er.session_id, er.status, er.command, er.blocking_session_id, er.wait_type, ot.exec_context_id, ot.task_state, st.text from sys.dm_exec_requests er join sys.dm_os_tasks ot on (er.session_id = ot.session_id) cross apply sys.dm_exec_sql_text(er.sql_handle) st where er.session_id in (select session_id from sys.dm_os_tasks group by session_id having count(exec_context_id)>1)
sys.dm_os_tasks is the key view here, as it will show you each individual thread that SQL Server is making use of. I then join to sys.dm_exec_requests and sys.dm_exec_sql_text for additional information on the processes.
And I’m still waiting!
One thing you will probably see a lot of when your server has a lot of parallelism is the infamous CXPACKET wait. When SQL Server breaks up a query for parallel processing, the optimizer tries to balance the workload as best as possible, but you will almost always have a case where some threads will finish ahead of the rest of the job. This is where CXPACKET shows up, because the query is waiting on the remaining threads to complete before it can consolidate the results and resolve the entire process.
The issue with CXPACKET is that it’s not always indicative of a problem. In a lot of ways, it’s very much business as usual, as it will always show up when you have parallelism. The key is trying to identify when it’s happening to much, the result of you over parallelizing your workload.
We’ll go over this in part two, as we look at how we can control parallelism in SQL Server and how different workloads make use of parallelism. Hope to see you then!