Showing posts with label foreign keys. Show all posts
Showing posts with label foreign keys. Show all posts

Wednesday, October 25, 2017

Foreign Keys don't always need a primary key

In the post Your lack of constraints is disturbing we touched a little upon foreign key constraints but today we are going to take a closer look at foreign keys. The two things that we are going to cover are the fact that you don't need a primary key in order to define a foreign key relationship, SQL Server by default will not index foreign keys

You don't need a primary key in order to have a foreign key

Most people will define a foreign key relationship between the foreign key and a primary key. You don't have to have a primary key in order to have a foreign key, if you have a unique index or a unique constraint then those can be used as well.

Let's take a look at what that looks like with some code examples


A foreign key with a unique constraint instead of a primary key


First create a table to which we will add a unique constraint after creation

CREATE TABLE TestUniqueConstraint(id int)
GO
Add a unique constraint to the table

ALTER TABLE TestUniqueConstraint ADD CONSTRAINT ix_unique UNIQUE (id)
GO
Insert a value of 1, this should succeed

INSERT  TestUniqueConstraint VALUES(1)
GO

Insert a value of 1 again, this should fail

INSERT  TestUniqueConstraint VALUES(1)
GO

Msg 2627, Level 14, State 1, Line 2
Violation of UNIQUE KEY constraint 'ix_unique'. Cannot insert duplicate key in object 'dbo.TestUniqueConstraint'. The duplicate key value is (1).
The statement has been terminated.



Now that we verified that we can't have duplicates, it is time to create the table that will have the foreign key



CREATE TABLE TestForeignConstraint(id int)
GO
Add the foreign key to the table

ALTER TABLE dbo.TestForeignConstraint ADD CONSTRAINT
 FK_TestForeignConstraint_TestUniqueConstraint FOREIGN KEY 
(id) REFERENCES dbo.TestUniqueConstraint(id) 



Insert a value that exist in the table that is referenced by the foreign key constraint


INSERT TestForeignConstraint  VALUES(1)
INSERT TestForeignConstraint  VALUES(1)

Insert a value that does not exist in the table that is referenced by the foreign key constraint


INSERT TestForeignConstraint  VALUES(2)

Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the FOREIGN KEY constraint "FK_TestForeignConstraint_TestUniqueConstraint". The conflict occurred in database "tempdb", table "dbo.TestUniqueConstraint", column 'id'.

The statement has been terminated.


As you can see, you can't insert the value 2 since it doesn't exist in the TestUniqueConstraint table



A foreign key with a unique index instead of a primary key

This section will be similar to the previous section, the difference is that we will use a unique index instead of a unique constraint
First create a table to which we will add a unique index after creation

CREATE TABLE TestUniqueIndex(id int)
GO

Add the unique index



CREATE UNIQUE NONCLUSTERED INDEX ix_unique ON TestUniqueIndex(id)
GO
Insert a value of 1, this should succeed


INSERT  TestUniqueIndex VALUES(1)
GO
Insert a value of 1 again , this should now fail


INSERT  TestUniqueIndex VALUES(1)
GO

Msg 2601, Level 14, State 1, Line 2
Cannot insert duplicate key row in object 'dbo.TestUniqueIndex' with unique index 'ix_unique'. The duplicate key value is (1).
The statement has been terminated.


Now that we verified that we can't have duplicates, it is time to create the table that will have the foreign key


CREATE TABLE TestForeignIndex(id int)
GO

Add the foreign key constraint

ALTER TABLE dbo.TestForeignIndex ADD CONSTRAINT
 FK_TestForeignIndex_TestUniqueIndex FOREIGN KEY 
 (id) REFERENCES dbo.TestUniqueIndex(id)  




Insert a value that exist in the table that is referenced by the foreign key constraint

INSERT TestForeignIndex  VALUES(1)
INSERT TestForeignIndex  VALUES(1)

Insert a value that does not exist in the table that is referenced by the foreign key constraint


INSERT TestForeignIndex  VALUES(2)


Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the FOREIGN KEY constraint "FK_TestForeignIndex_TestUniqueIndex". The conflict occurred in database "tempdb", table "dbo.TestUniqueIndex", column 'id'.
The statement has been terminated.


That failed because you can't insert the value 2 since it doesn't exist in the TestUniqueIndex table

As you have seen with the code example, you can have a foreign key constraint that will reference a unique index or a unique constraint. The foreign key does not always need to reference a primary key

Foreign keys are not indexed by default

When you create a primary key, SQL Server will by default make that a clustered index. When you create a foreign key, there is no index created

Scroll up to where we added the unique constraint to the TestUniqueConstraint table, you will see this code

ALTER TABLE TestUniqueConstraint ADD CONSTRAINT ix_unique UNIQUE (id)

All we did was add the constraint, SQL Server added the index behind the scenes for us in order to help enforce uniqueness more efficiently

Now run this query below


SELECT OBJECT_NAME(object_id) as TableName,
name as IndexName, 
type_desc as StorageType
FROM sys.indexes
WHERE OBJECT_NAME(object_id) IN('TestUniqueIndex','TestUniqueConstraint')
AND name IS NOT NULL

You will get these results

TableName         IndexName StorageType
---------------------   -----------     --------------
TestUniqueConstraint ix_unique NONCLUSTERED
TestUniqueIndex         ix_unique NONCLUSTERED

As you can see both tables have an index

Now let's look at what the case is for the foreign key tables. Run the query below


SELECT OBJECT_NAME(object_id) as TableName,
name as IndexName, 
type_desc as StorageType
FROM sys.indexes
WHERE OBJECT_NAME(object_id) IN('TestForeignIndex','TestForeignConstraint')

Here are the results for that query

TableName       IndexName StorageType
--------------------- --------- -------------
TestForeignConstraint NULL HEAP
TestForeignIndex NULL HEAP




As you can see no indexes have been added to the tables. Should you add indexes? In order to answer that let's see what would happen if you did add indexes. Joins would perform faster since it can traverse the index instead of the whole table to find the matching join conditions. Updates and deletes will be faster as well since the index can be used to find the foreign keys rows to update or delete (remember this depends if you specified CASCADE or NO ACTION when you create the foreign key constraint)

I wrote about deletes being very slow because the columns were not indexed here: Are your foreign keys indexed? If not, you might have problems
So to answer the question, yes, I think you should index the foreign key columns



Friday, November 04, 2016

Are your foreign keys indexed? If not, you might have problems



When you add a primary key constraint to a table in SQL Server, an index will be created automatically. When you add a foreign key constraint no index will be created. This might cause issues if you don't know that this is the behavior in SQL Server. Maybe there should be an option to automatically index the foreign keys in SQL Server, what do you think?

The other day some deletes on a newer table in the test environment became really slow. We had a primary table with a couple of hundred rows, we loaded up between 200 and 300 million rows into the child table. Then we deleted the child rows, this was fast. After this, we deleted one row from the primary table and this took several seconds.

When I looked at this I noticed something interesting, the most time during the delete was spent doing a lookup at the child table. Then I noticed that the foreign key was not indexed. After we added the index the delete became thousands of times faster

Let's try to replicate that behavior here

First create these primary table, we will add 2048 rows to this table


--Table that will have 2048 rows
CREATE TABLE Store(StoreID int not null,
  DateCreated datetime not null,
  StoreName varchar(500),
  constraint pk_store primary key   (StoreID))
GO

--insert 2048 rows
INSERT Store
SELECT ROW_NUMBER() OVER(ORDER BY t1.number) AS StoreID,DATEADD(dd,t1.number,'20161101') 
AS datecreated, NEWID()
FROM master..spt_values t1
WHERE t1.type =  'p'

Now create the child table and add 500K rows, this might take up to 1 minute to run since the rows are pretty wide.


-- table that will also have 500000 rows, fk will be indexed  
CREATE TABLE GoodsSold (TransactionID int not null,
   StoreID int not null,
   DateCreated datetime not null,
   SomeValue char(5000),
   constraint pk_transaction primary key   (TransactionID))

INSERT GoodsSold
SELECT top 500000 ROW_NUMBER() OVER(ORDER BY t1.number) AS TransactionID, t2.StoreID, 
DATEADD(dd,t1.number,'20161101') AS datecreated, REPLICATE('A',5000)
FROM master..spt_values t1
CROSS JOIN  Store t2
WHERE t1.type =  'p'


Now it is time to add the foreign key constraint and index this foreign key constraint

-- adding the foreign key
ALTER TABLE GoodsSold  WITH CHECK ADD  CONSTRAINT FK_StoreID FOREIGN KEY(StoreID)
REFERENCES Store(StoreID)
GO

ALTER TABLE GoodsSold CHECK CONSTRAINT FK_StoreID
GO

-- index the foreign key
CREATE index ix_StoreID on GoodsSold(StoreID)
GO


We will create another set of tables, let's start with the primary table, we will just insert into this table all the rows from the primary table we created earlier

-- create another primary table
CREATE TABLE StoreFK(StoreID int not null,
  DateCreated datetime not null,
  StoreName varchar(500),
  constraint pk_storefk primary key   (StoreID))
GO


-- add the same 2048 rows from the primary table with indexed FK
INSERT StoreFK
SELECT * FROM Store
GO


For the child table, it is the same deal, we will add all the rows from the child table we created earlier into this table

-- Add another FK table
CREATE TABLE GoodsSoldFKNoIndex (TransactionID int not null,
   StoreID int not null,
   DateCreated datetime not null,
   SomeValue char(5000),
   constraint pk_transactionfk primary key   (TransactionID))

-- add same 500K rows from table with FK index
INSERT GoodsSoldFKNoIndex
SELECT * FROM GoodsSold

Let's add the foreign key constraint, but this time we are not indexing the foreign key constraint

-- add the FK but do not index this
ALTER TABLE GoodsSoldFKNoIndex  WITH CHECK ADD  CONSTRAINT FK_StoreID_FK 
FOREIGN KEY(StoreID)
REFERENCES StoreFK(StoreID)
GO

ALTER TABLE GoodsSoldFKNoIndex CHECK CONSTRAINT FK_StoreID_FK
GO

Let make sure that the tables have the same number of rows

-- check that the tables have the same rows
exec sp_spaceused 'GoodsSold'
exec sp_spaceused 'GoodsSoldFKNoIndex'



namerowsreserveddataindex_sizeunused
GoodsSold5000004024976 KB4000000 KB22520 KB2456 KB
GoodsSoldFKNoIndex5000004015440 KB4000000 KB14936 KB504 KB



Now that we are setup, let's wipe out all the rows from the child table for a specific StoreID, the SELECT statements should return 0 rows

DELETE GoodsSoldFKNoIndex
WHERE StoreID = 507


DELETE GoodsSold
WHERE StoreID = 507


SELECT * FROM GoodsSoldFKNoIndex
WHERE StoreID = 507

SELECT * FROM  GoodsSold
WHERE StoreID = 507


Now we are getting to the interesting part, turn on Include Actual Execution Plan, run statistics IO or run this in Plan Explorer


DELETE Store
WHERE StoreID = 507


DELETE StoreFK
WHERE StoreID = 507

You will see something like this

So 75% compared to 25%, not good but doesn't look catastrophic, if you have statistics time on, you will see the following

 SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 16 ms.


 SQL Server Execution Times:
   CPU time = 561 ms,  elapsed time = 575 ms.


Now it looks much worse

What about statistics io?

Table 'GoodsSold'. Scan count 1, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Store'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'GoodsSoldFKNoIndex'. Scan count 1, logical reads 501373, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'StoreFK'. Scan count 0, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

That is terrible....

Here is also the view from Plan Explorer, look at Est. CPU Cost and Reads




There you have it, not indexing foreign keys can have a big impact even though the child table might not have any data at all


See also When did SQL Server stop putting indexes on Foreign Key columns? by Kimberly Tripp