dcsimg
Results 1 to 14 of 14

Thread: [RESOLVED] Looking for advice on configuring indexing on Azure SQL Db.

  1. #1

    Thread Starter
    Fanatic Member cory_jackson's Avatar
    Join Date
    Dec 2011
    Location
    Fallbrook, California
    Posts
    858

    Resolved [RESOLVED] Looking for advice on configuring indexing on Azure SQL Db.

    I'm setting up an Azure SQL Db and I need some advice on indexing. There's one table with 36 columns of varchar, DateTime, Integer, and Boolean. There are typically between 2-5 million rows. The rows are imported in large batches, seeding just a couple columns. When the main process runs, it retrieves just these few columns, does something to each row, then spits data back to the table updating those rows by ID. It's a rate of about 2 per second. Then later a few queries are run on the data.
    I had a free trial with S0 service tier and it was dreadfully slow. At times a query would take over an hour. When I looked at the plan of execution, I found it was generating indexes. Now I had indexed the columns, non-clustered, for the columns used as criteria in the "Where" and "And". So I don't understand why it would want to index columns that were not a part of any condition. So question 1 is, why does it do this?
    Next I am trying to learn about indexes. All the rows have a primary key 'is identity' and auto generating integer increment by one. I never realized but when one just does the normal table design, it creates this clustered index of just the ID. I imagine for fast lookup by ID. Should I change that to have it index other columns? And if so, should I index all the columns or just the ones used in WHERE conditions? See question 1. Or should I leave the clustered index of the PK and then add a non-clustered index. Now in that one, should it be all the columns, or just ones used in WHERE conditions?
    I did upgrade to S2 which with be 50 DTU instead of 10. That should help but I think my main problem is being incompetent on setting up indices.

    Thanks for taking the time to read my post.

  2. #2
    Fanatic Member PlausiblyDamp's Avatar
    Join Date
    Dec 2016
    Location
    Newport, UK
    Posts
    1,006

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    [QUOTE=cory_jackson;5369539]I had a free trial with S0 service tier and it was dreadfully slow. At times a query would take over an hour. When I looked at the plan of execution, I found it was generating indexes. Now I had indexed the columns, non-clustered, for the columns used as criteria in the "Where" and "And". So I don't understand why it would want to index columns that were not a part of any condition. So question 1 is, why does it do this?
    [quote]
    When you say it was generating indexes do you mean Azure was adding additional indexes to your table or that it was updating the contents of the columns you had already indexed?

    An S0 instance has a pretty low DTU count of 10 - as a very rough guideline that is good for about 40 - 50 transactions per second, 2,000,000 rows at a rough rate of 50 per second would certainly take an hour or more... Without knowing more about the process that is inserting and updating the rows it is hard to say where any other potential bottlenecks could be.

    Quote Originally Posted by cory_jackson View Post
    Next I am trying to learn about indexes. All the rows have a primary key 'is identity' and auto generating integer increment by one. I never realized but when one just does the normal table design, it creates this clustered index of just the ID. I imagine for fast lookup by ID. Should I change that to have it index other columns? And if so, should I index all the columns or just the ones used in WHERE conditions? See question 1. Or should I leave the clustered index of the PK and then add a non-clustered index. Now in that one, should it be all the columns, or just ones used in WHERE conditions?
    I did upgrade to S2 which with be 50 DTU instead of 10. That should help but I think my main problem is being incompetent on setting up indices.

    Thanks for taking the time to read my post.
    Indexing strategies can be a very complicated thing, I am sure there are people on here with a lot more knowledge in that area than me, however as a very basic rule of thumb - index columns you search on and don't index columns you don't. It is rarely worth a single index containing all the columns as then the index is pretty much a copy of the table. If you have multiple where conditions then you might find indexing the columns individually can be enough or a composite index of all the columns might be better. Without doing some profiling and performance monitoring it is hard to say which way would be best.

  3. #3
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,197

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    Next I am trying to learn about indexes. All the rows have a primary key 'is identity' and auto generating integer increment by one. I never realized but when one just does the normal table design, it creates this clustered index of just the ID. I imagine for fast lookup by ID. Should I change that to have it index other columns? And if so, should I index all the columns or just the ones used in WHERE conditions? See question 1. Or should I leave the clustered index of the PK and then add a non-clustered index. Now in that one, should it be all the columns, or just ones used in WHERE conditions?
    I did upgrade to S2 which with be 50 DTU instead of 10. That should help but I think my main problem is being incompetent on setting up indices.
    There's no hard and fast rules but here are a few general tips:-
    Your clustered index is the fastest for seeking records (ie filters, joins etc) but slowest to insert into the middle of (not the end where it's fine) and you can only have one.
    Your non-clustered indexes are still pretty fast for seeks but better at coping with inserts into the middle and you can have multiples.
    There is no rule that says the clustered index has to be the primary key (though it often is - see below).

    So, your clustered index should normally be on the column that's most frequently used for seeks. This will normally be the primary key but not necessarily. Date columns often turn up in where clauses, for example, so can often be good candidates. What you should not do is put your clustered index on a column where you're likely to insert into the middle of the values. You can't really know what the best candidate is until you put a DB into production and observer the access pattern but since you're using identity fields for the primary key you will typically appending to the end of that key, meaning it is a very strong default candidate for your clustered index. I would probably choose that but be prepared to move it later if the access pattern indicated that it would be better elsewhere.

    In theory there's nothing wrong with adding extra columns to the clustered index as long as they're added after the values you're most interested in. That's because columns further to the right of the index don't affect the ordering of values in the columns to their left. That said, if you're putting the clustered index on your primary key, I think adding extra columns is likely to be confusing for whoever comes across it later. I personally would set up non-clustered indexes before I'd add further to the clustered. That said, by making that decision I would be sacrificing space. Indexes aren't free so there's a trade off.

    As for all the things you should create indexes for, indexes support the following operations: Wheres, Joins, Orderings, Groupings and Partitions. (I think that's the lot but I've got an uneasy feeling I'm missing some). So typically you would set up indexes to support those operations.

    Also, there are covering indexes. These are indexes designed to support a particular select and include all the columns in the select. The idea of these is that the select can take everything it needs directly off the index without needing to do a lookup against the table. They produce really performant queries but are very resource hungry so use them sparingly (I actually don't think I've ever used one).

    Above all remember that indexes are not free. They cost space and impact write performance so you don't just create them for the hell of it. You want to aim to support the greatest number of hungry operations with the fewest and narrowest indexes possible. It's a bit of a black art but, for most systems, the sweet spot is fairly wide so you should be able to get to a workable strategy without too many problems.
    Last edited by FunkyDexter; Mar 15th, 2019 at 10:52 AM.
    You can depend upon the Americans to do the right thing. But only after they have exhausted every other possibility - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  4. #4

    Thread Starter
    Fanatic Member cory_jackson's Avatar
    Join Date
    Dec 2011
    Location
    Fallbrook, California
    Posts
    858

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    When you say it was generating indexes do you mean Azure was adding additional indexes to your table or that it was updating the contents of the columns you had already indexed?
    I opted to show the actual execution plan. In other things it has made sense. If I had an index for columns 1, 2, and 3 then performed a query with a condition on 3, there would appear no steps in the execution plan to build a query. This made sense because as I understand it, each time I add a new row it's updating the index. The results were instantaneous. If I added a WHERE condition to column 4, then ran the query, I would see an additional step in the execution plan used that created a non-clustered index of 4. This index is temporary. It does not appear in my list of indices later. It seems strange to me, but I guess this is how it works.
    But in one of my queries I was selecting columns 1, 2, 3, and 4, but my condition was only on column 2. But it was indexing 4 also even though it wasn't needed in a filter criteria. And creating a temporary index of 5 million rows was costing 99.9%.
    But I must have been doing something weird because FunkyDexter confirmed what I thought and that was that we only need to index those columns used in the filter conditions. I'll do some more testing later. For now I'm going with my original notion that it need not index columns not used as filters.
    Thanks for confirming my suspicions on the performance. The whole DTU thing is confusing to me.

  5. #5
    #28 for the Yanks coming GaryMazzone's Avatar
    Join Date
    Aug 2005
    Location
    Dover,NH
    Posts
    7,295

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    And of course SQL Azure DB comes with auto index tuning.... it will do things you are never expecting also
    Sometimes the Programmer
    Sometimes the DBA

    Mazz1

  6. #6

    Thread Starter
    Fanatic Member cory_jackson's Avatar
    Join Date
    Dec 2011
    Location
    Fallbrook, California
    Posts
    858

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    Thanks FunkyDexter for that information. Most of that is what I was thinking but it's so nice to know someone as knowledgeable as you can confirm that I'm on the right track. And you taught me several new things as well.
    After considering all you said, I think I need one clustered index. When the main process fires, it gets the ID column (auto-incremented integer identity) (call it #1) and a short VarChar (Call it #3). But it uses an integer column (Call it #2), and a Boolean column (Call it #4) in the WHERE/AND. At other times other users will create queries on several other columns, but only a few are used in their WHERE/AND conditions (Call them #13 and #22). We often update columns but it's generally by #1 or sometimes #4. So I'm thinking I need to create one clustered index of 1, 2, 3, 4, 13, and 22.
    I decided on clustered because I believe we never insert into the middle. As I understand it, a clustered index means the data is physically arranged in the order, I assume of the ID column. We typically add rows in large imports and are all at the end. We often delete too. But let's say I delete all the rows less than 100. I would never insert rows less than 100. They would all be at the end as new auto-ID-generated rows.
    Does my plan make sense? If not, what am I missing?

  7. #7

    Thread Starter
    Fanatic Member cory_jackson's Avatar
    Join Date
    Dec 2011
    Location
    Fallbrook, California
    Posts
    858

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    Quote Originally Posted by GaryMazzone View Post
    And of course SQL Azure DB comes with auto index tuning.... it will do things you are never expecting also
    Really? Dang. I'll need to look into that. It's all so complicated. Thanks for helping me understand.

  8. #8

    Thread Starter
    Fanatic Member cory_jackson's Avatar
    Join Date
    Dec 2011
    Location
    Fallbrook, California
    Posts
    858

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    Let's say I just created a big index, is there a way to see when it's done building the index? Is there someplace to see the status of the indices?

  9. #9
    PowerPoster techgnome's Avatar
    Join Date
    May 2002
    Posts
    32,236

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    Quote Originally Posted by cory_jackson View Post
    Thanks FunkyDexter for that information. Most of that is what I was thinking but it's so nice to know someone as knowledgeable as you can confirm that I'm on the right track. And you taught me several new things as well.
    After considering all you said, I think I need one clustered index. When the main process fires, it gets the ID column (auto-incremented integer identity) (call it #1) and a short VarChar (Call it #3). But it uses an integer column (Call it #2), and a Boolean column (Call it #4) in the WHERE/AND. At other times other users will create queries on several other columns, but only a few are used in their WHERE/AND conditions (Call them #13 and #22). We often update columns but it's generally by #1 or sometimes #4. So I'm thinking I need to create one clustered index of 1, 2, 3, 4, 13, and 22.
    I decided on clustered because I believe we never insert into the middle. As I understand it, a clustered index means the data is physically arranged in the order, I assume of the ID column. We typically add rows in large imports and are all at the end. We often delete too. But let's say I delete all the rows less than 100. I would never insert rows less than 100. They would all be at the end as new auto-ID-generated rows.
    Does my plan make sense? If not, what am I missing?
    I'd never create a clustered index on multiple columns... or if I did, it would be on two cols - MAX. One thing you could do is create individual indexes on your columns, including your clustered index, then create covering indexes that include multiple columns that have been indexed. This works really well in situations where the WHERE has varying cases - ie, in one query it uses Col 1, and in another it uses Cols 1 & 3 ... you index Col 1, add a second index for Col 3, and then add a third index for Cols 1 & 3.

    Based on what you've written, I would NOT "create one clustered index of 1, 2, 3, 4, 13, and 22. " ... I would create the clustered index just on the ID (presumably field #1) since that's where most of the updates happen on. I'd then individually index fields 2 & 4, then create a covering index with both in it.

    VarChar fields tend to be poor indexers given their fluid and voluminous nature. So unless #3 is really short, I'm not sure I'd index it.

    Not enough info about #13 & 22 and their relationship to 2 & 4 were given for me to really render any judgement... but same applies... if they are primitive enough, index them individually and include them in covering indexes as appropriate.

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  10. #10

    Thread Starter
    Fanatic Member cory_jackson's Avatar
    Join Date
    Dec 2011
    Location
    Fallbrook, California
    Posts
    858

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    Not enough info about #13 & 22 and their relationship to 2 & 4 were given for me to really render any judgement
    These are typically dollar amounts in decimal columns or small text fields.

  11. #11

    Thread Starter
    Fanatic Member cory_jackson's Avatar
    Join Date
    Dec 2011
    Location
    Fallbrook, California
    Posts
    858

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    Techgnome I'm starting to understand a little. I'll start with indexing only the ID (identity primary key) column as you suggest. But I don't understand how multiple indices and multiple columns in an index work. Should I make one index for each column that is used as a condition ever? Or should I make an index for each of the common combinations of columns used as criteria? I simply don't understand how they work together so I don't know how to design them. Let's look at the main one that runs in my program. It uses 2 & 4 to retrieve 1 & 3. It then updates columns using only 1, the PK identity integer column. So for this I think, based on your advice, I should do a clustered PK index only. Then I should add an index for 2 (integer) & 4 (Boolean). Given the way I access and update the date, no inserts in the middle, should I make the second index clustered or not?

  12. #12
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,197

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    As I understand it, a clustered index means the data is physically arranged in the order
    Basically, yes. Technically they're stored on the BTree in that order and the BTree can be spread out across the hardware in all sorts of interesting ways so, at the database level, you have almost no control over how the data is actually arranged on disk. But if you imagine the data as being physically ordered by the clustered index you won't go to far wrong.

    let's say I delete all the rows less than 100. I would never insert rows less than 100
    This will have some impact but very minor. When you insert in the middle the engine is forced to rearrange the index right there and then - it has no other choice. That re-arranging can often be a very expensive operation. On a deletion it just leaves a gap. Depending on your settings the engine might try to fill that gap or it might just leave it for you to do manually - either way though, the operation doesn't have to be immediate and can be left until a better time.
    You can depend upon the Americans to do the right thing. But only after they have exhausted every other possibility - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  13. #13
    PowerPoster techgnome's Avatar
    Join Date
    May 2002
    Posts
    32,236

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    It uses 2 & 4 to retrieve 1 & 3. It then updates columns using only 1, the PK identity integer column. So for this I think, based on your advice, I should do a clustered PK index only. Then I should add an index for 2 (integer) & 4 (Boolean). Given the way I access and update the date, no inserts in the middle, should I make the second index clustered or not?
    OK, I'm going to break that down a bit...

    if it uses 2 & 4 to get 1 & 3, then I would create an index (non-clustered) on 2. A second index (non-clustered) on 4. A third index (non-clustered) on 2 & 4.
    Since your inserts aren't in the middle, and you use 1 for updates, then create one clustered index on 1. Actually, do that one FIRST, then create the other three above.
    So in the end, just based on that info, there would be four indexes:

    So here's the indexes:
    one clustered index on 1
    one non-clustered index on 2
    one non-clustered index on 4
    one non-clustered covering index on 2 & 4

    You can't make a second clustered index on a table. Tables can only have ONE clustered index at a time.

    Actually, let me ask this... which do you want fastest? Your lookups? or your updates? Because the answer to that will the dictate which index you should be clustering. If you want the lookups to be more efficient then the it changes the indexes to this:
    one non-clustered index on 2
    one non-clustered index on 4
    one clustered covering index on 2 & 4
    one non-clustered index on 1


    If you want the updates faster, then go with the original list.

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  14. #14
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,197

    Re: Looking for advice on configuring indexing on Azure SQL Db.

    TG's advice is good but I'm going to temper it ever so slightly. If you have an index on 2 and 4 you don't need an index just on 2. The reason is that the first index already covers 2 as it's first field. Forgoing the index just on 2 will save you a little bit of space.


    Should I make one index for each column that is used as a condition ever? Or should I make an index for each of the common combinations of columns used as criteria?
    You want combination rather than individual fields (although a single field can be a combination of 1). So if you always access a table by columns 1 and 2 then you don't need to create an index on 1 and an index on 2. Just create one on 1 and 2. If, however, you commonly access it 1, 2 or 1 and 2, then you might consider creating index for each combination.

    Note, indexes do not combine. So if you have a query that filters by columns 1 and 2, and an index on each but no index on both, the engine will not be able to use both indexes and combine them. Instead it will pick the more selective of the two indexes and then scan to resolve the second criteria.

    But this applies to operations, not, whole queries. So a query might use one index to resolve the where clause, another to join and another to sort. Note, though, that the engine doesn't "cleanly" separate these operations in the way that you or I might, instead it combines them where it can. So if it's going to join on column 1 and column 1 is in the where clause and there's an index on it, it will probably start by sorting by column 1(actually, querying from the index) so that the records are already in the right order for the join. You probably don't need to worry about that level of detail yet but it's worth having in the back of your mind.
    You can depend upon the Americans to do the right thing. But only after they have exhausted every other possibility - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width