March 2017 – Michellea David, Manager DBAs, AVP

Filtered Index can’t use like or functions in the where

Posted on March 29, 2017 by Michellea David

Today I needed to create a filtered index to help a large update go faster. I wanted to use a like statement, but found out I could not use like in the where clause. Thanks to google; I found another way to make my update statement faster.

Filtered index I wanted:

CREATE NONCLUSTERED INDEX [INDX_Filter_IR_URI] ON [dbo].[IR_RecMedia]

([MediaURI] ASC,[RecordingDate] ASC)

INCLUDE ( [ExpirationDate])

Where MediaURI like ‘HTTP%’;

Received ERROR Message:

Msg 156, Level 15, State 1, Line 16

Incorrect syntax near the keyword ‘like’.

Decided to use a function to create the filtered index, but this also received an error.

CREATE NONCLUSTERED INDEX [NonClusteredIndex-20170329-200714] ON [dbo].[IR_RecMedia]

([MediaURI] ASC,[RecordingDate] ASC)

INCLUDE ( [ExpirationDate])

Where right(MediaURI, 4) = ‘HTTP’;

Received the Error:

Msg 10735, Level 15, State 1, Line 8

Incorrect WHERE clause for filtered index ‘NonClusteredIndex-20170329-200714’ on table ‘dbo.IR_RecMedia’.

How did I fix this issue:

I created an extra column on the table to store 1 (for true) or 0 (for false) based on if the MediaURI matched like ‘HTTP%’. Then I created an index on that column and RecordingDate. I had to change my update statement to use where HTTPbit = 1, instead of where MediaURI like ‘HTTP%’.

Free Webinars hosted by GroupBy

Posted on March 21, 2017March 21, 2017 by Michellea David

Developers, Query writers, and DBAs – Register for Free Webinars hosted by GroupBy. You do not need to attend all the sessions.

https://groupby.org/

Sessions:

Friday, April 21:

8AM Eastern – How to Use Parameters Like a Pro and Boost Performance by Guy Glantser
10AM – Introducing the SQL Server 2016 Query Store by Enrico van de Laar
Noon – Worst Practices & Less Known Limitations for Columnstore Indexes by Niko Neugebauer
2PM – Gems to Help You Troubleshoot Query Performance by Pedro Lopes
4PM – Crash Course on Better SQL Development by Vladimir Oselsky

Friday, April 28:

8AM Eastern – SQL Server 2016 Features for Performance Tuning Lovers by Matan Yungman
10AM – SAN Primer for the DBA by Doug Bernhardt
Noon – Bringing DevOps to the Database by Steve Jones
2PM – Azure SQL Databases: A Guided Tour by Mike Walsh
4PM – Hacking SQL Server by André Melancia

What to do When Failover Cluster fails

Posted on March 20, 2017 by Michellea David

Featured Image and great Blog with an Introduction to Windows Clusters for SQL DBAs from Brent Ozar’s site: Intro to SQL Server Clusters (Love Kendra Little’s Art)

WFC (Windows Failover Clustering or FCM Failover Cluster Manager) is used to configure the failover cluster for a clustered SQL Server database or SQL Server Always On use. If you need High availability with SQL Server then you will need a cluster. If you are a DBA, hopefully the cluster is created for you by the Sys Admin. If you need to create the cluster there are many blogs on how to create the cluster. Here is a good starting link on how to create a cluster:Microsoft page on Creating Clusters

What do you do when the cluster fails

What do you do when the cluster fails and you (the DBA) need to troubleshoot the issue.

Open Windows Failover Cluster and get the node names and server names.
Go to PowerShell and create the cluster log on each node (Server). This log needs to be manually created. The command is “Get-ClusterLog -Node NodeName”
Review the created log (C:\Windows\Cluster\Reports\Cluster.log) for information on the time of failover and the reason why the cluster failed over. The log will be on the actual server for the node you ran the log for.
Go to Failover Cluster Manager and review the Cluster Events – Same on all nodes
Go to Administrative Tools\Event Viewer for each node and review the events for the time period of the cluster failover in these files:
- Windows Logs\Application
- Windows Logs\System
- Applications and Services Logs\Microsoft\Hyper-V-High-Availability\Admin
- Applications and Services Logs\Microsoft\FailoverClustering-Manager
- Applications and Services Logs\Microsoft\FailoverClustering\Operational
optional: Create an excel spreadsheet with each meaningful event from the log files. This helps me to see the events and order them. I also like to send this to management as documentation.
- Cluster Node
- Cluster Server
- date
- time
- Where found (example: Event Viewer\Hyper-V-High-Availablity\Admin)
- level (Error,Information, Warning, or Critical)
- Source
- event source
- eventID
- Task Category
- Desc
Go to (C:\Program Files (x86)\Microsoft SQL Server\110\Tools\Policies\DatabaseEngine\1033\Windows Event Log Cluster Disk Resource Corruption Error.XML) on each node and review.

Gather all your data, review, and send an email out with the information on why the cluster failed. If you find the error is Always-On or SQL Server related, more research is needed into the SQL Server logs.

Windows Cluster LOG ERR [RCM] [GIM] ResType Virtual Machine has no resources

Posted on March 20, 2017March 28, 2017 by Michellea David

While reviewing my cluster log on Node A, I found the error: ERR [RCM] [GIM] ResType Virtual Machine has no resources, not collecting local utilization info

Short Answer: If there is no Hyper-V this error can be ignored.

Long Answer:

I found the answer on Windows 2012 board

“I realize this is old, but wanted to provide some direction for those who may be seeing this error and don’t know why. At a high level, the cluster logging process uses a set of cluster resources types that it queries for health/status info. You can see these by running the PowerShell command Get-ClusterResourceType.

The problem is if you do not have a resource that matches the type, it will return an error when you have debug logging enabled. Meaning, if you are running a SQL Failover Cluster, your cluster resource type would be “SQL Server”. The error you are seeing is because there are no Virtual Machine resource types, which would come in to play in a Hyper-V cluster.

You are running SQL, not Hyper-V, so the error can be ignored, or if you know you will not be running Hyper-V at any time on the cluster, you can simply remove that cluster resource type by running the following in and admin PowerShell prompt.”

Remove-ClusterResourceType “Virtual Machine”

Size does matter: SSIS DefaultBuffer

Posted on March 20, 2017March 20, 2017 by Michellea David

While researching how to improve performance for SSIS code, I found two properties that should always be changed on every SSIS package data task. Unfortunately I could not find anywhere on the web information on where to make the change, just that I should.

There are many blogs on how to calculate the values for the DefaultBufferMaxRows and DefaultBufferSize, but none show you where to make the change. Go to the bottom of this blog page for the links on How and Why to change the DefaultBuffer properties for SSIS Data Tasks.

Where to change the DefaultBuffer properties

The change can only be made on Data Flow Tasks. Make sure the properties window is showing and then click on each data task. Calculate the values for DefaultBufferMaxRows and DefaultBufferSize and then input in the values in the properties window.

Blogs on Why and How to Calculate the DefaultBuffers:

Intelligent SQL – Tuning Max Buffer Rows

Desert Isle – Size does Matter

Sam Vanga – Simple Tips To Speed Up Your SSIS Data Flow

SQL Solace: Tuning Buffer Size

Try_Convert and Try_Cast

Posted on March 10, 2017 by Michellea David

Today I had a job fail, that has been running without errors for months. The error was “Error converting data type varchar to bigint. [SQLSTATE 42000]”.

I went into the procedure that failed and ran it in pieces. I found the error was coming from a merge (update/insert) statement. The data for this one column came in as char(18), but the column to insert or update was a bigint.

I needed to find which rows were causing the issues. I couldn’t use Try/Catch, because it would fail but not show me the data. Then I found TRY_CAST!!

Muhammad Imran “SQL SERVER 2012 -Conversion Function -TRY_CONVERT”

Muhammad Imran “SQL SERVER 2012 -Conversion Function -TRY_CAST”

casterror

With the next query I was able to find all the data that would fail in a cast.

try_castFail

How to Troubleshoot a failed Job

Posted on March 10, 2017 by Michellea David

How To Debug a job that failed in SQL Server 2016 ( This process of troubleshooting failed jobs has been the same at least since SS2005)

Make sure SQL Agent is running on the server in question.
Go to the job and right click. Select “View History”
The Log File Viewer window will show. On that window select the latest failed job time, click the + symbol to open up the steps, and then click on the failed step. View the error in the lower grey area. (I didn’t pass coloring in the lines in Kinder)
Go to the job step properties and find out what procedure, query, SSIS, etc. was running at the time of the failure.
1. right click on the job name and select properties
2. In the Job Properties Window, select steps.
3. In the job step list, select the step that failed.
4. click on the Edit button on the bottom of the window
5. The Job Step Properties window will open. Note the Type of task, the database, and the command.

From here troubleshooting will depend on the type of task. For my issue, the task was a T-SQL procedure. So I will walk through what to do if the step is a procedure.

If there is a development environment, go to that environment for troubleshooting.
Find the database, click the plus symbol next to the database to expand the directory.
Click the plus symbol next to Programmability
Click on the plus symbol next to Stored Procedures
select the procedure and right click, then select Modify
From here I take each select statement and run individually to see if the error is in the selects.
Then I make a copy of the table that will be inserted into or updated, or deleted from and change the procedure code to point to the test table. Then I run the procedure to see if the error is in the update, insert, or delete.
Usually you will find the error and be able to fix from here.

Paul Randal Online SQL Waits Class

Posted on March 7, 2017 by Michellea David

Paul Randall had a great online class on SQL waits today. If you did not get to see it today, you can get the slides, scripts, and recording (give them a day to put online) at http://fundamentals.pass.org/MeetingArchive.aspx Or just the video at the youtube channel https://www.youtube.com/c/DBAFundamentals .

Paul’s online blog is https://www.sqlskills.com/sql-server-resources/

FREE Training with Paul Randal

Posted on March 6, 2017 by Michellea David

FREE Training with SQL PASS DBA Fundamentals

Next Meeting -Tomorrow

Tue, Mar 07 2017 11:00 (UTC-06:00) Central Time (US & Canada)

Performance Troubleshooting Using Wait Statistics

RSVP: https://attendee.gotowebinar.com/register/1960571447408183809

Performance Troubleshooting Using Wait Statistics

One of the first things you should check when investigating performance issues are wait statistics – as these can often point you in the direction for further analysis. Unfortunately many people misinterpret what SQL Server is telling them and jump to conclusions about how to solve the problem – what is often called ‘knee-jerk performance tuning’. In this session, you will learn what waits are, how to analyze them, and potential solutions to common problem patterns.

Paul’s an author, consultant, and top-rated speaker having spoken at PASS, TechEd, SQLintersection, and numerous other events around the world. He spent 9 years working on the SQL Server team, writing DBCC CHECKDB, and ultimately responsible for the entire Storage Engine. In 2007 Paul left Microsoft to co-own and run SQLskills.com, and has been a SQL Server MVP ever since. When he’s not tweeting, blogging, or helping someone recover from a disaster, he’s likely to be underwater somewhere in the world with his wife, Kimberly L. Tripp. Full bio at http://www.sqlskills.com/about/paul-s-randal/

IO issues: Is it the SAN or SQL Server?

Posted on March 4, 2017March 4, 2017 by Michellea David

At my company, we have IO issues because of the SAN. I know this because the SAN administrator and I have been talking about the SQL Server IO latency.

networklatency032017

But what if I didn’t know the SAN had an issue? Just saying SAN is slow is like saying your database is slow. OK, why do you think that? Where should I look and in what order? What proof do I need to have?

Steps to verify there is an issue:

“Emergency Slow”

Query the SQL Server Error Log from SQL Server Management Studio

EXEC sys.xp_readerrorlog 0,1, N’taking longer than 15 seconds’ ioissues1

If you find an entry in the SQL Server error log for the days and times in question, run/don’t walk to your SAN administrator for help.

Problem is you need more data to give the SAN admin.

To find out more on how to run the sys.xp_readerrorlog command read: Reading the SQL Server log files using TSQL by Greg Robidoux
https://www.mssqltips.com/sqlservertip/1476/reading-the-sql-server-log-files-using-tsql/

“Semi-slow”

So there were no messages stating “I/O requests taking longer than 15 seconds. What is the next step?

Data to gather and analyze

Sys.dm_io_virtual_file_stats –This view contains all the read and writes since the SQL Server was started. Take a sample, write to a temp table, then in 5 min calculate the difference between two samples, what was the latency? Analyze it.

The query below will calculate the current latency within the 5 minutes you run the script. This script will run 5 minutes.

SELECT
GETDATE() AS collection_time,
d.name AS [Database],
f.physical_name AS [File],
(fs.num_of_bytes_read / 1024.0 / 1024.0) [Total MB Read],
(fs.num_of_bytes_written / 1024.0 / 1024.0) AS [Total MB Written],
(fs.num_of_reads + fs.num_of_writes) AS [Total I/O Count],
fs.io_stall AS [Total I/O Wait Time (ms)],
fs.size_on_disk_bytes / 1024 / 1024 AS [Size (MB)]
INTO #past_coll_time
FROM sys.dm_io_virtual_file_stats(default, default) AS fs
INNER JOIN sys.master_files f 
ON fs.database_id = f.database_id 
AND fs.file_id = f.file_id
INNER JOIN sys.databases d 
ON d.database_id = fs.database_id;

WAITFOR DELAY '00:05:00';

SELECT
GETDATE() AS collection_time,
d.name AS [Database],
f.physical_name AS [File],
(fs.num_of_bytes_read / 1024.0 / 1024.0) [Total MB Read],
(fs.num_of_bytes_written / 1024.0 / 1024.0) AS [Total MB Written],
(fs.num_of_reads + fs.num_of_writes) AS [Total I/O Count],
fs.io_stall AS [Total I/O Wait Time (ms)],
fs.size_on_disk_bytes / 1024 / 1024 AS [Size (MB)]
INTO #curr_coll_time
FROM sys.dm_io_virtual_file_stats(default, default) AS fs
INNER JOIN sys.master_files f 
ON fs.database_id = f.database_id 
AND fs.file_id = f.file_id
INNER JOIN sys.databases d 
ON d.database_id = fs.database_id;

SELECT
cur.[Database],
cur.[File] AS [File Name],
CONVERT (numeric(28,1), (cur.[Total MB Read] - prev.[Total MB Read]) 
* 1000 /DATEDIFF (millisecond, prev.collection_time, cur.collection_time)) 
AS [MB/sec Read],
CONVERT (numeric(28,1), (cur.[Total MB Written] - prev.[Total MB Written]) 
* 1000 / DATEDIFF (millisecond, prev.collection_time, cur.collection_time)) 
AS [MB/sec Written],
CONVERT (numeric(28,1), (cur.[Total I/O Wait Time (ms)] 
- prev.[Total I/O Wait Time (ms)]) ) AS [Total IO Stall (ms)],

-- protect from div-by-zero

CASE
WHEN (cur.[Total I/O Count] - prev.[Total I/O Count]) = 0 THEN 0
ELSE
(cur.[Total I/O Wait Time (ms)] - prev.[Total I/O Wait Time (ms)])
/ (cur.[Total I/O Count] - prev.[Total I/O Count])
END AS [Response Time (ms per transaction)]
INTO #response_time_cal
FROM #curr_coll_time AS cur
INNER JOIN #past_coll_time AS prev 
ON prev.[Database] = cur.[Database] 
AND prev.[File] = cur.[File];

SELECT *
FROM #response_time_cal
WHERE [Response Time (ms per transaction)] > 0;

DROP table #response_time_cal;

DROP TABLE #curr_coll_time;

DROP TABLE #past_coll_time;

Is the latency above acceptable thresholds? MS published Storage latency numbers, some latency is acceptable. From Microsoft’s “Analyzing I/O Characteristics and Sizing Storage Systems for SQL Server Database Applications” back from 2008 MS Advisory team.

Data file latency: <= 20 ms (OLTP) <=30 ms (DW)

Log file files: <= 5 ms

There are a bunch of free scripts to sample this, if you don’t feel like writing your own. Here are two:

Paul Randal’s script from SQLSkills

sp_AskBrent from Brent Ozar Unlimited

What analysis should you do before going to the SAN group?

https://blog.sqlauthority.com/2015/05/15/sql-server-warning-sql-server-has-encountered-n-occurrences-of-io-requests-taking-longer-than-15-seconds/

Pinal recommends, “Here are the thing which a DBA should do before going to hardware team.

Note the pattern of the message. If they are logged at any specific time? Check if there is any scheduled maintenance job is running at the same time? Rebuild index and other IO intensive tasks would perform a lot of IO and disk may report slowness at that time.
Check Event log (system and application) and check if there are any disk related error or warning messages.
Keep monitoring sysprocesses and check for waits which are happening for SPIDs. If the disk is slow all the times then you should see waits of PAGEIOLATCH, WRITELOG or LOGBUFFER with high wait time. Refer my earlier blogs.
Capture disk related data using performance monitor (start > run > perfmon) for extended amount of time and have a look at counters. You should look at the performance counters related to storage performance to evaluate whether the numbers you are seeing are expected and within reasonable bounds. (See Table 3 I/O related performance counters and their meanings below for counters to review)
Make sure latest Windows patch has been applied to avoid any known issues with windows itself.
Once basics have been verified and we have confirmed that we are seeing symptoms of slow response from disk, hardware team need to be engaged for further investigation.

Gather disk latency counters from Windows Performance monitor

Very important to show the SAN or Sys Admin to prove your point.

From https://technet.microsoft.com/library/Cc966412

“Monitoring I/O Performance Using System Monitor

The Windows System Monitor tool can be used to get detailed information about I/O performance on a system. Table 3 contains detailed information specific to the I/O counters you should consider monitoring and how to interpret them. These counters can be captured by monitoring either Physical or Logical Disk counters. The choice of which object to use should be based on the specific storage configuration that is being tested.

Table 3 I/O related performance counters and their meanings

Performance Monitor Counter	Description
Disk Reads/sec & Disk Writes/sec	Number of I/Os per second (IOPs) being issued against a particular disk or volume. This number varies based on the size of I/Os issued. Consult the hardware vendor for an estimation of the number of I/Os per second support per spindle on their particular hardware. Perfmon captures the total number of individual disk IO requests completed over a period of one second
Average Disk/sec Read & Average Disk/sec Write	Measure of disk latency. Lower values are better but this can vary and is dependent on the size and nature of the I/Os being issued. Numbers also vary across different storage configurations (cache size/utilization can impact this greatly). On well-tuned I/O subsystems, ideal values would be: · 1–5 ms for Log (ideally 1 ms on arrays with cache) · 4–20 ms for Data on OLTP systems (ideally 10 ms or less) · 30 ms or less on DSS (decision support system) type. Latencies here can vary significantly depending on the number of simultaneous queries being issued against the system. Sustained values of more than this when the total throughput is less than expected should be investigated. Consider these in combination with what is normal for your particular system. Make sure to monitor disk latencies for trend analysis. The number of I/Os and latency specific to SQL Server data files can be found by using the sys.dm_io_virtual_file_stats dynamic management view in SQL Server 2005.
Average Disk Bytes/Read & Average Disk Bytes/Write	Size of I/Os being issued. This value may impact disk latency and larger I/Os may result in slightly higher latency. When used to monitor SQL Server, this will tell you the average size of the I/Os SQL Server is issuing. Displays the average size of the individual disk requests (IO size) in bytes, for the capture interval. Example: If the system had ninety nine IO requests of 8K and one IO request of 2048K, the average will be 28.4K. Calculation = (8k99) + (12048k) / 100.
Average Disk Queue Length	Average number of outstanding I/O requests. The general rule of thumb is <=2 per spindle but this may be hard to measure due to storage virtualization, differences in RAID level between configurations, and so on. Focus on higher than average disk queue length in combination with higher than average disk latencies. This combination could indicate that the storage array cache is being over utilized or spindle sharing with other applications is impacting performance.
Disk Read Bytes/sec & Disk Write Bytes/sec	Measure of the total bandwidth for a particular disk or LUN. Perfmon captures the total number of bytes sent to the disk (write) and retrieved from the disk (read) over a period of one second.

It is important to note that while System Monitor provides a very accurate picture of your I/O subsystem’s current performance, it may not provide enough information to successfully resolve all I/O performance issues in SAN environments. System Monitor can be used to diagnose I/O performance problems; however, resolution may lie in the storage array or in the HBA driver level, which can be more challenging to diagnose.”

Verify/Prove the issue is not at the network level

\Network Interface(*)\Bytes Total/sec – The percentage of network utilization is calculated by multiplying Bytes Total/sec by 8 to convert it to bits, multiply the result by 100, then divide by the network adapter’s current bandwidth. Use the following thresholds to evaluate network bandwidth utilization:
1. Less than 40% of the interface consumed = Healthy
2. 41%-64% of the interface consumed = Monitor or Caution
3. 65-100% of the interface consumed = Critical, performance will be adversely affected
4. 40,522,960*8= 324,183,680*100 = 32,418,368,000/1,000,000,000=32%
\Network Interface(*)\Output Queue Length – The output queue length measures the number of threads waiting on the network adapter. If there are more than 2 threads waiting on the network adapter, then the network may be a bottleneck. Common causes of this are poor network latency and/or high collision rates on the network. Use the following thresholds to evaluate output queue length:
1. 0 = Healthy
2. 1-2 = Monitor or Caution
3. Greater than 2 = Critical, performance will be adversely affected.

Network Fixes:

Ensure that the network adapters for all computers (physical and virtual) in the solution are configured to use the same value for maximum transmission unit (MTU).

For more information about configuring the MTU value see “Appendix A: TCP/IP Configuration Parameters” at http://go.microsoft.com/fwlink/?LinkId=113716.

If an output queue length of 2 or more is measured, consider adding one or more physical network adapters to the physical computer that hosts the virtual machines and bind the network adapters used by the guest operating systems to these physical network adapters.

How to engage the SAN Admin?

Disk latency/storage latency. How much am I waiting on storage? Get metrics to show evidence about where the disk latency is causing database issues. How much is latency hurting performance.
Here is the impact/slow down on work load we are seeing
What are your options? Present a range of options on how to fix. Do you really need faster storage?
Per Kendra Little, “Make all the data available to the storage admins, but only give the storage admin the summary.”

Possible Reasons:

Could it be a cable not connected?
It could indicate a poor disk subsystem, a misconfiguration, or a problem with the host bus adapter (HBA) driver or hardware connecting the server and SAN.
It could also indicate fragmentation, excessive I/O requests not being handled properly by the disk, or data files not optimally placed on the disk.
Was there maintenance?
Faulty component?
Check if the file moved to a new tier of storage?
Etc…

Options to fix:

Are we suffering read latency against data files? If high, how much memory have we given the SQL Server? Is it possible to add more memory? Much cheaper option than paying for san disk.
Adding memory frequently reduces read latency from storage for data files.
Can you improve the workload by adding an index? Check the indexes in the database. Have we created indexes for the most frequent queries? Optimize and reduce the amount of data SQL Server needs to read. Asking a lot less from storage and faster. Better and cheaper then speeding up storage.
Heavy write latency to transaction logs can usually only be sped up either by speeding up storage or (sometimes) changing application patterns if you’ve got a lot of tiny small commits.
- If there is a write latency greater than 5ms, look at the processes, jobs, and queries that are running in SQL Server. Sp_whoisactive

References:

Kendra Little – https://www.littlekendra.com/2016/06/16/outside-the-big-san-box-analyzing-storage-and-san-latency-in-sql-server-dear-sql-dba/

Pinal Dave- https://blog.sqlauthority.com/2015/05/15/sql-server-warning-sql-server-has-encountered-n-occurrences-of-io-requests-taking-longer-than-15-seconds/

Greg Robidoux – https://www.mssqltips.com/sqlservertip/1476/reading-the-sql-server-log-files-using-tsql/

Paul Randal’s script from SQLSkills

sp_AskBrent from Brent Ozar Unlimited

Microsoft – https://technet.microsoft.com/library/Cc966412

Microsoft – “Analyzing I/O Characteristics and Sizing Storage Systems for SQL Server Database Applications”

Microsoft – http://go.microsoft.com/fwlink/?LinkId=113716

Month: March 2017

Share this:

Share this:

What do you do when the cluster fails

Share this:

Share this:

Where to change the DefaultBuffer properties

Blogs on Why and How to Calculate the DefaultBuffers:

Share this:

Share this:

Share this:

Share this:

Share this:

Steps to verify there is an issue:

“Emergency Slow”

“Semi-slow”

Data to gather and analyze

What analysis should you do before going to the SAN group?

Gather disk latency counters from Windows Performance monitor

“Monitoring I/O Performance Using System Monitor

Verify/Prove the issue is not at the network level

Network Fixes:

How to engage the SAN Admin?

Possible Reasons:

Options to fix:

References:

Share this: