SQL Interview Questions

Complimentary to the blog post on Qindi (Cindy) Zhang

SQL interview question bank:

On candidate’s basic understanding on data structure and SQL fundamentals:

  1. Can you interchange where and having clause?
  2. What is a correlated sub-query?

On candidate’s actual hands on SQL experience:

  1. How to perform unpivot in SQL?
  2. How large is the dataset you typically query against? What are some of the things you do to improve query efficiency?

On candidate’s business exposure and understanding:

  1. What is a business case where you need to use full join?
  2. How do you handle missing days in a transaction report if dashboard users want this information captured and reflected?


For the technical interview, it’s best to combine some actually code writing with conceptual questions  such as above. That way you as an interviewer can evaluate how familiar the candidate is with the basics, how comfortable she is at writing code, how much of her knowledge is from book vs. from practice. And furthermore, how well the analyst is able to grasp the underlying business requirement and deliver the analytics product that satisfies stakeholder’s need.
Interviewers, I hope this gives you an easy start!
Advertisements

SQL: Select Nth rank of something. Three approaches.

Common Question on Forums, how to get the 3rd highest salary or similar. Also a common job interview question perhaps?
I am using the AdventureWorksDW2012 database. Tests were done with SQL Server 2012 SP2

I run each query twice. Once with the base table and once with a covering Index.
Here is the covering Index:

CREATE NONCLUSTERED INDEX _E_DimEmployee_BaseRate
ON [dbo].[DimEmployee] ([BaseRate])
INCLUDE ([FirstName],[LastName])

Correlated subquery:

SELECT FirstName, LastName, BaseRate
FROM DimEmployee e
WHERE (SELECT COUNT(DISTINCT BaseRate)
    FROM DimEmployee p WHERE e.BaseRate <= p.BaseRate) = 4

Why is this a good answer? It’s not really but this will work on any SQL implementation.
It’s fairly slow, it will do a lot of look ups.  The subquery is evaluated every time a row is processed by the outer query. This query uses dense ranking and can return multiple rows.

Here are the IO and time stats.
Without Index:
Table ‘Worktable’. Scan count 564, logical reads 1337, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘DimEmployee’. Scan count 54, logical reads 2646, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
elapsed time = 20 ms.

With covering Index:
Table ‘Worktable’. Scan count 349, logical reads 907, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘DimEmployee’. Scan count 54, logical reads 233, physical reads 0, read-ahead reads 4, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
elapsed time = 13 ms.

Double Order By with TOP statement:

SELECT TOP 1 FirstName, LastName, BaseRate
FROM ( SELECT TOP 4 FirstName, LastName, BaseRate
    FROM DimEmployee ORDER BY BaseRate DESC) AS MyTable
ORDER BY BaseRate ASC;


Why is this a good answer? Because it is an easy syntax to remember.
Let’s look at the subquery, which returns the N highest salaries in the DimEmployee table in descending order. Then, the outer query will re-order those values in ascending (default) order, this means the Nth highest salary will now be the topmost salary. Keep in mind that the TOP statement is MS SQL server specific. MySQL would use LIMIT 1 for instance. In addition this solution cannot do DENSE ranking and only returns one row even if two employees share the same BaseRate.

Edit June 2015: The addition of LIMIT/OFFSET on SQL Server 2012 made answer obsolete. The syntax for LIMIT/OFFSET has been added on the bottom of this post.

Here are the IO and time stats.
Without Index:
Table ‘DimEmployee’. Scan count 1, logical reads 49, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
elapsed time = 5 ms.

With covering Index:
Table ‘DimEmployee’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
elapsed time = 0 ms.

Use Windowing function:

SELECT FirstName, LastName, BaseRate
FROM (SELECT FirstName, LastName, BaseRate, DENSE_RANK() OVER (ORDER BY BaseRate DESC) Ranking
FROM DimEmployee) AS MyTable
WHERE Ranking = 4

Why is this a good answer? Because it performs the best – performance is king. The Syntax is also ANSI SQL however of the “Big 3” only Oracle and MS are using it. In addition you can chose to use ROW_NUMBER, DENSE_RANK or regular RANK.

Here are the IO and time stats.
Without Index:
Table ‘DimEmployee’. Scan count 1, logical reads 49, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
elapsed time = 2 ms.

With covering Index:
Table ‘DimEmployee’. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
elapsed time = 0 ms.

Edit: June 2015
Use OFFSET (SQL Server 2012):

SELECT FirstName, LastName, BaseRate
FROM DimEmployee e
ORDER BY BaseRate DESC
OFFSET 3 ROWS
FETCH NEXT 1 ROWS ONLY

Why is this a good answer? Because similar Syntax exists on the other platforms.
Performance wise it runs as well as the windowing solution.

Microsoft Access Database Engine: External table is not in the expected format. (when opening an excel file that is read locked)

Another quick and dirty post. Because it took me over an hour to get to the issue.

VBS Error: Microsoft Access Database Engine: External table is not in the expected format.
This error occurs upon opening the ADODB.Connection

Turns out the issue was because I had read some stuff of the excel via OpenTextFile, previously but forgot to close that.
So the error occurred because the file was read locked and not because “External table is not in the expected format.”
Shows just how much time you could save with the “right” error message.

Note1: This is most likely just one possible cause of this generic error message.
Note2: if the file actually doesn’t exist the error message is much better:
Microsoft Access Database Engine: The Microsoft Access database engine could not find the object 'Scope Information$A1:B65535'. Make sure the object exists and that you spell its name and the path name correctly. If 'Scope Information$A1:B65535' is not a local object, check your network connection or contact the server administrator.

Getting the Language pack lp.cab out of the downloadable exe files.

You can get them out by running the EXE, as it extracts/creates the cab file. In my case the German package windows6.1-kb2483139-x86-de-de_acb9b88b96d432749ab63bd93423af054d23bf81.exe
However the issue, I had found, is that the EXE quits very fast and deletes the cab file right away.
I considered running PSSuspend against it, but the issue is that this would still be a game, as the cab file gets built slowly, but deleted very quickly if the OS refuses the install.
So the next best solution is NTFS permissions, simply create a temporary DACL for EVERYONE with DENY for “Delete Folders and Files”.
This will leave your lp.cab ready for the grabbin :)

Screw it, I am going to search all tables, all columns!

Ever had the need to find out where data is stored in your MS SQL Database?
I sometimes do and I find it often faster to simply search every table and every column within these table than to try and hunt down the schema definitions. I get a coffee while the script crunches away.  The @SearchStrColumnName option is there mostly for when you search integers, as you may get too many false positives.

Edit: GitHub Gist works awesome, will have to do that for the previous posts too! Just paste the URL, and wordpress will do the rest, in this case display a fully highlighted T-SQL code.
Edit 4/8/2013 – @SearchStrColumnName wasn’t properly escaped before and this parameter didn’t really work.
Edit 4/9/2013 – Added parameter @FullRowResult, this will cause it to return the full row for each hit. This is usefull when you need to lookup or find related info to the search term.
Edit 4/12/2013 – Added parameter @SearchStrTableName, to limit in what tables we are going to search.
Edit 5/2/2013 – Now also searching in type uniqueidentifier (GUID)
Edit 5/22/2013 – Additional numeric data types will be searched.
Edit 1/7/2014 – Can now also search timestamp value (known as rowversion)
Edit 4/18/2014 – Added a Top parameter that works together with @FullRowResult. This is to help limit the return if a search string is found in a table too often