dcsimg
Results 1 to 6 of 6

Thread: [RESOLVED] Does Columnstore eliminate the need for indexes?

  1. #1

    Thread Starter
    Fanatic Member cory_jackson's Avatar
    Join Date
    Dec 2011
    Location
    Fallbrook, California
    Posts
    896

    Resolved [RESOLVED] Does Columnstore eliminate the need for indexes?

    Columnstore is available on Azure SQL S3 and above and my client is considering the upgrade. I have read some about Columnstore and it sounds great the way is stores data in columns instead of rows, uses better compression, and maintains metadata on the columns to speed queries. In my reading, it sounds like if one converts a table to Columnstore, there is no need for indexes. Is this correct?

  2. #2
    PowerPoster techgnome's Avatar
    Join Date
    May 2002
    Posts
    32,381

    Re: Does Columnstore eliminate the need for indexes?

    Kind of... it is an index itself of sorts... But based on what I'm reading, it's not a silver bullet and it is not designed for all cases. It has its roots in the noSQL realm... and seems to be best suited for reporting databases such as data warehouses, not live transaction OLTP databases.

    Depending on the reasons for using it, I'd tread carefully. If you're thinking about using it for a live production OLTP database environment, I'd consider slapping you with a trout.

    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  3. #3

    Thread Starter
    Fanatic Member cory_jackson's Avatar
    Join Date
    Dec 2011
    Location
    Fallbrook, California
    Posts
    896

    Re: Does Columnstore eliminate the need for indexes?

    Holy mackerel.
    Yeah, that makes sense. Thanks.

  4. #4
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,300

    Re: Does Columnstore eliminate the need for indexes?

    Here's how to think about it. Be aware I'm simplifying though.

    In a standard BTree table, data is stored in rows. A bit like this:-
    Record1Field1, Record1Field2, Record1Field3
    Record2Field1, Record2Field2, Record2Field3
    Record3Field1, Record3Field2, Record3Field3

    That format is ideal for OLTP systems because you're typically interested in a row at a time and this makes it easier to get a whole row. But it is tricky to find the row you want which is why we use indexes. They help us find the row.

    In a Column Store, the data is stored by column. A bit like this:-
    Record1Field1
    Record2Field1
    Record3Field1

    Record1Field2
    Record2Field2
    Record3Field2

    Record1Field3
    Record2Field3
    Record3Field3

    This makes it really easy to find all the values for a certain column. This is particularly useful when you're doing big aggregation calculations in, for example, a data warehouse but no so good when getting a whole record because you'd have to jump around all over the place.

    I'm woolly about how this is "naturally indexed" but my understanding is that the structure of a column store is sort of like a set of "mini" indexes, one for each field. It's easy for the engine to jump to the start of the Field 3 block, for example, and grab all the data it needs. I'm not sure on that though so read further if you want to be certain.

    Anyway, the upshot is that, if you're building a data warehouse, which typically requires fast reads across the same column for a lot of records, columns stores give better performance. If you're building an OLTP system that requires fast read and writes against several fields on a single record, then a standard BTree structure is better.

    Here's a good article that explains it.


    As I've read further I realise I gave you a bum steer the other day when I said column store was the default for Azure. That's not quite right. It's the default for DataMarts in Azure which is where I've used it. Sorry for the bad info.
    You can depend upon the Americans to do the right thing. But only after they have exhausted every other possibility - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  5. #5
    PowerPoster techgnome's Avatar
    Join Date
    May 2002
    Posts
    32,381

    Re: Does Columnstore eliminate the need for indexes?

    Nice breakdown, that's pretty much what I got out of my research too. And it's why this format isn't suitable for OLTP databases. The shape of the data is physically changed. With DWs, time isn't necessarily a priority, accuracy and historical value is. With OLTP, you need to be able to get that data in and out as fast as you can.

    I'd also add that by doing that, it can also normalize the data to a certain degree, which is where the space savings comes in. Let's say that the data belongs to a client of the customer, and so Field2 is their name.... In all three records it's the same name, so

    Record1Field2
    Record2Field2
    Record3Field2

    Actually ends up becoming just

    RecordField2

    There's a lot of repeated data in data warehouses, causing a lot of bloat. That's why columnstores are starting to be used to re-shape the date and "normalize" it... DWs can get massively huge, so anytime you can do something to reduce that, the IT department is going to want to be all over it.


    -tg
    * I don't respond to private (PM) requests for help. It's not conducive to the general learning of others.*
    * I also don't respond to friend requests. Save a few bits and don't bother. I'll just end up rejecting anyways.*
    * How to get EFFECTIVE help: The Hitchhiker's Guide to Getting Help at VBF - Removing eels from your hovercraft *
    * How to Use Parameters * Create Disconnected ADO Recordset Clones * Set your VB6 ActiveX Compatibility * Get rid of those pesky VB Line Numbers * I swear I saved my data, where'd it run off to??? *

  6. #6

    Thread Starter
    Fanatic Member cory_jackson's Avatar
    Join Date
    Dec 2011
    Location
    Fallbrook, California
    Posts
    896

    Re: Does Columnstore eliminate the need for indexes?

    In my case the point is moot. I got to learn all about index fragmentation and once de-fragged (2 hours) the performance returned. Another thing is this table I'm working on is both. Entire rows are being updated many times a second and often bulk inserts and deletions. But it's also millions or rows are being queries at times. But this is fascinating about Columnstore.

    I've continued to read more about it and it's not just about storing in columns. It's things like the compaction Techgnome mentioned. It sounds more to me like there are essentially a collection of single column tables, one for each table column as we see it, and not tables of data and additional tables we call indexes. The data is naturally stored in a 'column' but it's more like the columns are actually like clustered indexes. Almost sound like a bitmap index. I was reading when one 'adds a column index' about how the table is best to be offline as the system will create an entire new table. I.E. it's not an index, it's replacing the table completely. And this means columns naturally compressed as there are essentially no duplication or space taken up by nulls. Also I read that they add a something a bit like a lightweight B-tree to the side that specifies ranges. Something like "Begins with letter A starts here, letter B starts here, letter C starts here..." to improve performance.

    Now as to my question, I read some articles where people were trying to add indexes to Columnstore and could not. In early cases they could create it, but later the system generated errors forcing it's removal. I believe the answer to my question is that with Columnstore on can not have and do not need conventional indexes.

    But this has been educational and fun to learn about. As with most things, it's far more complex that I initially thought. Thanks everyone for chiming in.

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width