Sunday, February 12, 2012

Adding an identity column solely for the benefit of the clustering

- Instead having indexes:
ParentID, SomeValue (non-clustered)
ParentID, SomeOtherValue (non-clustered)
SomeValue, ParentID (non-clustered)
SomeOtherValue, ParentID (non-clustered)
Create just three indexes:
ParentID
SomeValue
SomeOtherValue
SQL Server is smart enough to use them when necessary.

> This table could end up containing 20+ million records.
> Less so on offline clients, which would only contain a subset
> of the master database, but which would be unable to spread
> the tables and indexes across multiple disk systems.
Is there any other column that you can use for a clustered index (remember,
a good candidate will be one that you use in range queries)?
If so, then you do not need to add the identity column.
AMB
"Joergen Bech @. post1.tele.dk>" wrote:

> Let's say I have a table containing string values:
> ValueTable:
> --
> ValueID guid
> ParentID guid
> SomeValue nvarchar(1000)
> SomeOtherValue nvarchar(1000)
> with indexes on
> ValueID (clustered)
> ParentID, SomeValue (non-clustered)
> ParentID, SomeOtherValue (non-clustered)
> SomeValue, ParentID (non-clustered)
> SomeOtherValue, ParentID (non-clustered)
> Note: GUIDs are required for this application. The table does
> not actually look like this, but is a simplification for the sake
> of the example.
> This table could end up containing 20+ million records.
> Less so on offline clients, which would only contain a subset
> of the master database, but which would be unable to spread
> the tables and indexes across multiple disk systems.
> Now the question is: Would it make sense to add an
> identity column (8-byte long), make this column the clustered
> index, and change the ValueID index to non-clustered?
> The idea is to reduce the size of the non-clustered indexes,
> seeing that the clustered column would only be half the size
> of the original - and force insertion of new records to the end
> of the table, rather than all over the place.
> But besides some savings in space, would I actually gain anything
> in terms of performance, seeing that each insertion would require
> several non-clustered GUID indexes to be updated?
> Pros and cons of adding an identity column in the above scenario?
> TIA,
> Joergen Bech
>
>On Fri, 25 Mar 2005 05:55:03 -0800, "Alejandro Mesa"
<AlejandroMesa@.discussions.microsoft.com> wrote:

>- Instead having indexes:
>ParentID, SomeValue (non-clustered)
>ParentID, SomeOtherValue (non-clustered)
>SomeValue, ParentID (non-clustered)
>SomeOtherValue, ParentID (non-clustered)
>Create just three indexes:
>ParentID
>SomeValue
>SomeOtherValue
>SQL Server is smart enough to use them when necessary.
Sorry. That would require bookmark lookups. By having "Value, ID"
and "ID, Value", I basically have covering indexes for all IDs
satisfying a specific value, as well as all values (typically 20-50)
for a specific ID. I tried the single-column index approach, but
this - though a great space-saver - requires a bit more work for
the server.

>Is there any other column that you can use for a clustered index (remember,
>a good candidate will be one that you use in range queries)?
>If so, then you do not need to add the identity column.
Not really. As I mentioned in the first post, the purpose of adding
the identity column was
1) to have a narrow clustered index, thereby saving space in the non-
clustered indexes.
2) to force insertion of new records to take place at the end of the
table, rather than causing splits all over the place, which would
happen when basing it on a GUID.
Then again: Even though the clustered index (the table itself) never
needs defragging - being identity-based and all - is probably not of
any use at all performance-wise, if it is not used for anything but
saving NC-space (i.e. it is not even used for joins of any kind).
/JB
>
>AMB
>
>"Joergen Bech @. post1.tele.dk>" wrote:
>|||
>Then again: Even though the clustered index (the table itself) never
>needs defragging - being identity-based and all - is probably not of
>any use at all performance-wise, if it is not used for anything but
>saving NC-space (i.e. it is not even used for joins of any kind).
To correct myself: They are, of course, used for bookmark lookups,
in which case an always-defragged clustering index is nice to have.
Though - if most high-performance queries are served by covering
indexes, bookmark lookups won't be needed anyway.
Oh well. As the identity index won't be referenced by any T-SQL
code, I can always do all the tweaking and testing I like later.
/JB

No comments:

Post a Comment