SQL: Select Nth rank of something. Three approaches.

Common Question on Forums, how to get the 3rd highest salary or similar. Also a common job interview question perhaps?
I am using the AdventureWorksDW2012 database. Tests were done with SQL Server 2012 SP2

I run each query twice. Once with the base table and once with a covering Index.
Here is the covering Index:

CREATE NONCLUSTERED INDEX _E_DimEmployee_BaseRate
ON [dbo].[DimEmployee] ([BaseRate])
INCLUDE ([FirstName],[LastName])

Correlated subquery:

SELECT FirstName, LastName, BaseRate
FROM DimEmployee e
WHERE (SELECT COUNT(DISTINCT BaseRate)
    FROM DimEmployee p WHERE e.BaseRate <= p.BaseRate) = 4

Why is this a good answer? It’s not really but this will work on any SQL implementation.
It’s fairly slow, it will do a lot of look ups.  The subquery is evaluated every time a row is processed by the outer query. This query uses dense ranking and can return multiple rows.

Here are the IO and time stats.
Without Index:
Table ‘Worktable’. Scan count 564, logical reads 1337, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘DimEmployee’. Scan count 54, logical reads 2646, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
elapsed time = 20 ms.

With covering Index:
Table ‘Worktable’. Scan count 349, logical reads 907, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘DimEmployee’. Scan count 54, logical reads 233, physical reads 0, read-ahead reads 4, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
elapsed time = 13 ms.

Double Order By with TOP statement:

SELECT TOP 1 FirstName, LastName, BaseRate
FROM ( SELECT TOP 4 FirstName, LastName, BaseRate
    FROM DimEmployee ORDER BY BaseRate DESC) AS MyTable
ORDER BY BaseRate ASC;


Why is this a good answer? Because it is an easy syntax to remember.
Let’s look at the subquery, which returns the N highest salaries in the DimEmployee table in descending order. Then, the outer query will re-order those values in ascending (default) order, this means the Nth highest salary will now be the topmost salary. Keep in mind that the TOP statement is MS SQL server specific. MySQL would use LIMIT 1 for instance. In addition this solution cannot do DENSE ranking and only returns one row even if two employees share the same BaseRate.

Edit June 2015: The addition of LIMIT/OFFSET on SQL Server 2012 made answer obsolete. The syntax for LIMIT/OFFSET has been added on the bottom of this post.

Here are the IO and time stats.
Without Index:
Table ‘DimEmployee’. Scan count 1, logical reads 49, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
elapsed time = 5 ms.

With covering Index:
Table ‘DimEmployee’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
elapsed time = 0 ms.

Use Windowing function:

SELECT FirstName, LastName, BaseRate
FROM (SELECT FirstName, LastName, BaseRate, DENSE_RANK() OVER (ORDER BY BaseRate DESC) Ranking
FROM DimEmployee) AS MyTable
WHERE Ranking = 4

Why is this a good answer? Because it performs the best – performance is king. The Syntax is also ANSI SQL however of the “Big 3” only Oracle and MS are using it. In addition you can chose to use ROW_NUMBER, DENSE_RANK or regular RANK.

Here are the IO and time stats.
Without Index:
Table ‘DimEmployee’. Scan count 1, logical reads 49, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
elapsed time = 2 ms.

With covering Index:
Table ‘DimEmployee’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
elapsed time = 0 ms.

Edit: June 2015
Use OFFSET (SQL Server 2012):

SELECT FirstName, LastName, BaseRate
FROM DimEmployee e
ORDER BY BaseRate DESC
OFFSET 3 ROWS
FETCH NEXT 1 ROWS ONLY

Why is this a good answer? Because similar Syntax exists on the other platforms.
Performance wise it runs as well as the windowing solution.

Advertisements

Screw it, I am going to search all tables, all columns!

Ever had the need to find out where data is stored in your MS SQL Database?
I sometimes do and I find it often faster to simply search every table and every column within these table than to try and hunt down the schema definitions. I get a coffee while the script crunches away.  The @SearchStrColumnName option is there mostly for when you search integers, as you may get too many false positives.

Edit: GitHub Gist works awesome, will have to do that for the previous posts too! Just paste the URL, and wordpress will do the rest, in this case display a fully highlighted T-SQL code.
Edit 4/8/2013 – @SearchStrColumnName wasn’t properly escaped before and this parameter didn’t really work.
Edit 4/9/2013 – Added parameter @FullRowResult, this will cause it to return the full row for each hit. This is usefull when you need to lookup or find related info to the search term.
Edit 4/12/2013 – Added parameter @SearchStrTableName, to limit in what tables we are going to search.
Edit 5/2/2013 – Now also searching in type uniqueidentifier (GUID)
Edit 5/22/2013 – Additional numeric data types will be searched.
Edit 1/7/2014 – Can now also search timestamp value (known as rowversion)
Edit 4/18/2014 – Added a Top parameter that works together with @FullRowResult. This is to help limit the return if a search string is found in a table too often

Cannot create an instance of ole db provider “oraoledb.oracle” for linked server

You know you’re dealing with something annoying when Google suggestions cannot wait to auto complete your error message!
Yet this is how I spent a good three hours of my life on earth, just chasing forums & blog posts, even translating a Russian one into English.
I needed that Oracle data, dammit. Sufficient to say that none of the suggestions worked, I kept receiving “Cannot create an instance of OLE DB provider “OraOLEDB.Oracle” for linked server”

Ok, lets go back to the start and list the environment.
Using my local SQL Server 2008 Express R2 on my Win7 x64 workstation, I had to query a Oracle g10 Database on one of my VMWare box.
Yes, you could use the MSDAORA provider by Microsoft for Oracle DBs, but this provider is deprecating and does not exist in x64 flavor.

It all started when i downloaded the ODAC pack for Oracle g10 (ODAC10203x64.zip). This fails to install: Java(TM) 2 Platform Standard Edition binary has stopped working
At least there are a few confirmations that it’s save to use the OLEDB Oracle g11 driver, which uses an “Xcopy” deployment. So I download ODAC112030Xcopy_x64.zip and extract it. Grab an oracle login here via bugmenot. As of Sep 2012, this is still the latest version.
Fair enough, it contains clear instructions and installs easily. I get the provider in the list of providers on my SQL server management studio – Awesome, i thought I was done!

Not so, not by a longshot, now the pain was just starting! First I went into the properties of the provider set the AllowInProcess to enabled.
Now attempting to link the server failed with the error message that I have decided to best be the title of this post. Now this post is not going to list all the possible causes there are, hell I never really want to touch this subject again after today, but I want to “broadcast” the thing that made it ultimately work for me.

Let me just say that I’ve attempted each suggestion that I found sensible, those that I think everyone else that has this issue should repeat are as below:
1.) Set NTFS permission to the install directly to Authenticated Users with Read/Execute.
2.) Enable AllowInProcess for the Provider as outlined above.
3.) Go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\MSDTC\MTxOCI and remove the three values there (create a backup export if you are so inclined). This step I am the least certain about, however why would you want to reference these 3 completely outdated dll’s in the registry?)
4.) Create the Sub folder NETWORK\ADMIN in your install directory and copy a working tnsnames.ora into it(Maybe from another machine that has the full feature Oracle SQL Developer installed)

Nothing helped – so what does one do at such a point? Unleash the furry of Procmon.exe!
This tool from Sysinternals/Mark Russinovich is the best, and my only regret that day was not launching it earlier instead of scouring Google and going insane.
I’ve limited Procmon to just sqlservr.exe, as it’s the SQL Service itself that loads/handles the providers and not the ssms.exe. Also of note is that the sqlservr.exe is a 64bit process while the management studio is still just 32bit. As the server service is loading the provider, and the service process is 64bit, the provider must also be available in 64 bit format.

The ODAC112021Xcopy_x64.zip was installed to C:\Oracle.
What Procmon showed me however is that sqlservr is attempting to find the oci.dll in any folder but his! (It iterates through the %Path% sysvariable). When it finally gives up on find the dll, the SQL Service is in a unstable shape and the only way to stop the service was to kill it via taskmgr/procexp.
Clearly I can see that the “xcopy” deployment – while not giving me any error messages – it also did not set the PATH variable!
And this is what this post is really about… adding C:\Oracle and C:\Oracle\Bin to the Path variable or maybe it’s about
employing investigative tools earlier in the process instead of relying on your search engine skills.

sqlservr.exe can now find the relevant DLL’s. The OCI.DLL in the root and the OraOLEDB11.DLL in the Bin subfolder.
At this point I could query the database! If you did my steps as above and you still get the same error, I strongly suggest using Procmon.exe as I have instead of jumping to the next search result.

I’ve left out the part where it initially could not find the tnsnames.ora file and it attempted to connect via the default oracle port instead of the custom one, and as an additional note if sqlservr cannot locate the tnsnames.ora in the first attempt, it’s not going to retry again until you restart the SQL service. I was actually surprised to even see the port connection attempt in procmon, it seems this tool just keeps getting better.

So this is what I’ve done on my “Day off” from work…. I hope you were reading this while at work yourself and can get back to the real work & fun stuff, writing queries!