[RESOLVED] Poor performance converting IEnumerable to 2D Array-VBForums
Results 1 to 12 of 12

Thread: [RESOLVED] Poor performance converting IEnumerable to 2D Array

  1. #1

    Thread Starter
    New Member
    Join Date
    Feb 2018
    Posts
    4

    Resolved [RESOLVED] Poor performance converting IEnumerable to 2D Array

    About a year ago, I developed a standalone app that imports data into an Excel spreadsheet. It has been very successful and to further improve upon it, I'm working on a VSTO add-in. The standalone app imported CSV data from a file and was pretty quick. However, for this implementation, the data already exists in memory without reading a file since it is returned from the SSH.NET library. It also needs be available for subsequent programmatic filtering.

    I spent a lot of time searching the internet for the best method for writing data to an Excel spreadsheet and the clear winner was to assign the value of a range to a two dimension array. My testing confirms this but, converting the data from its current form as a IEnumerable (LINQ query result) to a 2D array is painfully slow. The table below shows the elapsed time for several numbers of records

    Code:
    Records		Time (ms)
    100		105
    500		2444
    1000		9249
    1500		20416
    2000		35300
    3000		82578
    So... I'm either doing the conversion incorrectly or I'm simply using the wrong concept of storing the in-memory data. I feel that the final data structure needs to be a 2D array since other methods for writing data to a spreadsheet are slow. I am open to using different structures to store the data but they must be compatible with LINQ since some querying is necessary and calculated fields are generated. Below is the loop that I'm using to go from the IEnumerable to the 2D array.

    Code:
    'query is the return object of a LINQ operation
    Dim rowcount As Long = traps.LongCount
    Dim colcount As Integer = 9
    Dim data(rowcount, colcount) As String
    Dim _trap
    For row = 0 To rowcount - 1
    	_trap = query(row)
    	data(row, 0) = _trap.X
    	data(row, 1) = _trap.EventDate
    	data(row, 2) = _trap.ObjectName
    	data(row, 3) = _trap.LevelText
    	data(row, 4) = _trap.Message
    	data(row, 5) = _trap.ObjectClass
    	data(row, 6) = _trap.RegionName
    	data(row, 7) = _trap.SiteName
    	data(row, 8) = _trap.ArrivalID
    Next
    I see two options: 1) abandon the additional functionality of programmatic filtering and write the data to a file. I can then import the data using the same method used in the standalone app or 2) use a different concept for storing the data in memory that converts to a 2D array easier than the current method.

    I'm interested in reading some thoughts and ideas.

  2. #2
    You don't want to know.
    Join Date
    Aug 2010
    Posts
    4,459

    Re: Poor performance converting IEnumerable to 2D Array

    This is expected. IEnumerables are not arrays. The .ToArray() method has to allocate an array, then copy each item one by one into that array. What you wrote is more or less how .ToArray() works.

    If that is producing the bottleneck (and it certainly will take a big chunk of time as n increases), then your best bet is to do your filtering and searching without LINQ. It isn't necessarily doing anything super hard to write, and in general LINQ costs you a tiny bit of performance anyway. You should start with an array, then manipulate the results to get the thing you want.

    Keep in mind resizing arrays is also a very expensive operation.
    This answer is wrong. You should be using TableAdapter and Dictionaries instead.

  3. #3
    Frenzied Member
    Join Date
    Jul 2011
    Location
    UK
    Posts
    1,133

    Re: Poor performance converting IEnumerable to 2D Array

    No.

    Your loop:
    Code:
    For row = 0 To rowcount - 1
    	_trap = query(row)
            ' etc
    Next
    re-evaluates the query every time that _trap = query(row) is executed.

    What you need to do is run the query just the one time by calling .ToArray on it, and then work against that returned array:
    Code:
    Dim rowcount As Long = numRows
    Dim colcount As Integer = 9
    Dim data(rowcount, colcount) As String
    Dim _trap
    
    Dim queryResults = query.ToArray
    
    For row = 0 To rowcount - 1
        _trap = queryResults(row)
        data(row, 0) = _trap.X
        data(row, 1) = _trap.EventDate
        data(row, 2) = _trap.ObjectName
        data(row, 3) = _trap.LevelText
        data(row, 4) = _trap.Message
        data(row, 5) = _trap.ObjectClass
        data(row, 6) = _trap.RegionName
        data(row, 7) = _trap.SiteName
        data(row, 8) = _trap.ArrivalID
    Next

  4. #4

    Thread Starter
    New Member
    Join Date
    Feb 2018
    Posts
    4

    Re: Poor performance converting IEnumerable to 2D Array

    Quote Originally Posted by Inferrd View Post
    No.
    ...
    What you need to do is run the query just the one time by calling .ToArray on it, and then work against that returned array:
    ...
    Woah! What a difference!

    Code:
    Records		Time(ms)
    1000		54
    1500		69
    2000		101
    3000		149
    Just so I can learn something new, what does "re-evaluate" actually mean? I thought my call was simply retrieving a single element from an existing object.

  5. #5
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    31,573

    Re: Poor performance converting IEnumerable to 2D Array

    You were figuring that query was essentially some kind of object, perhaps an array, or some such. It isn't. I don't know the correct terminology, but you could say that it is a potential object. One of the funny things about LINQ is that it doesn't run the moment the line is executed, it can actually run later on, when the return value is evaluated. There are loads of rules about that, so it can be hard to keep track whether or not your LINQ runs. The ToArray call certainly forces it to run, because that line is forcing it to evaluate the results to turn them into an array.

    By the way, you may be able to improve the performance even further, though you won't see the dramatic improvement you just saw for any other change. One thing I note is that you are turning everything into a string. If the items in _trap are not already strings, this means that you are doing a bunch of implicit conversions. Those all cost a bit, and the cost will add up over 3000 runs. Additionally, this means that they are all strings, and working with strings is normally slow. In this case, it may not matter, because you said you were just putting the data from the array into Excel, so maybe it doesn't make any difference.
    My usual boring signature: Nothing

  6. #6
    You don't want to know.
    Join Date
    Aug 2010
    Posts
    4,459

    Re: Poor performance converting IEnumerable to 2D Array

    I'm curious, though. The line of code is:
    Code:
    _trap = query(row)
    IEnumerable is not indexable. If 'query' is an IEnumerable (and thus probably a LINQ query that executes on each access), this wouldn't work, right? It would've already have to have been converted to a form like List<T> or Array.

    I hate it when we get a code snippet but it's not the whole thing. It's almost always the 10 lines that don't matter and "I didn't post the line that does because I think it's not important."
    This answer is wrong. You should be using TableAdapter and Dictionaries instead.

  7. #7
    Super Moderator Shaggy Hiker's Avatar
    Join Date
    Aug 2002
    Location
    Idaho
    Posts
    31,573

    Re: Poor performance converting IEnumerable to 2D Array

    Yeah, there's some guessing going on, since query is only described in a comment, but based on the results, I think it can be said that someone inferred the right answer.
    My usual boring signature: Nothing

  8. #8
    Frenzied Member
    Join Date
    Jul 2011
    Location
    UK
    Posts
    1,133

    Re: Poor performance converting IEnumerable to 2D Array

    I love the smell of pun in the morning

    Quote Originally Posted by Sitten Spynne View Post
    I'm curious, though. The line of code is:
    Code:
    _trap = query(row)
    IEnumerable is not indexable. If 'query' is an IEnumerable (and thus probably a LINQ query that executes on each access), this wouldn't work, right? It would've already have to have been converted to a form like List<T> or Array.
    Given that the code is being compiled with Option Strict turned off, I thought the compiler was implicitly adding the .ToArray ((_trap = query.ToArray(row))).

    However, it seems it is being compiled as
    Code:
    _trap = query.ElementAtOrDefault(row)

  9. #9
    You don't want to know.
    Join Date
    Aug 2010
    Posts
    4,459

    Re: Poor performance converting IEnumerable to 2D Array

    Ooooooh, OK, yes, that would cause at least a partial enumeration and, like you said, a requery.

    what does "re-evaluate" actually mean?
    Now I feel OK answering this question. This is a kind of weird thing about LINQ.

    LINQ doesn't work with arrays, or things that really look like arrays. It works with "enumerables". "Enumerables" can be arrays, but can also be just about anything else. What they guarantee is they have an "enumerator" that is more or less how the For Each loop works. An "enumerator" is kind of like a bit of code that looks at every piece of the enumerable. Think of it like a machine.

    Because of that, you can't really get to a specific item in the enumerable in one go (I call this "indexing".) If you ask for the third item in an enumerable, VB has to get the enumerator, start at the beginning, then move to the next item 3 times.

    If this is something like a List or an Array, that's not super expensive. But if it's a query result, there might be a lot of stuff involved. For example, say you have:

    Code:
    Dim query = From customer in customers
                Where customer.Name.StartsWith("B")
                OrderBy customer.Balance
    To get the enumerator for this query, VB has to look at the entire customers enumerable so it can find just the ones who have a Name that starts with "B". Then it has to look at THAT entire enumerable so it can sort them by Balance. This might take a little bit of time.

    So when you ask for query(3), it has to do all of that, THEN move to the third item in the results. And when you ask for query(4), it starts over! This may seem dumb, but the idea is "you always get the most up-to-date information".

    This doesn't happen if you use a For Each loop, they're designed to iterate over a query one by one without going through all that rigmarole. But you're doing something fairly complex and trying to create an array row out of every item, so you want indexing. Nothing's stopping us from keeping track of a row index ourselves, though:
    Code:
    Dim rowIndex = 0
    For Each item In query
        data(rowIndex, 0) = item.X
        data(rowIndex, 1) = item.EventDate
        ...
    
        rowIndex += 1
    Next
    For Each loops are designed to go over each item in an enumerable without re-executing the entire query.

    The alternative is to execute the query and make sure we store the results in something that is NOT a query. That's what Inferrd demonstrated.
    This answer is wrong. You should be using TableAdapter and Dictionaries instead.

  10. #10

    Thread Starter
    New Member
    Join Date
    Feb 2018
    Posts
    4

    Re: Poor performance converting IEnumerable to 2D Array

    Thank you all for the details. I never miss an opportunity to learn something new. I'm marking the thread as resolved and I'm very pleased with the new performance of the end-to-end operation. It was easy to work with all strings since I am serializing data to XML but it is likely that I'll clean everything up and define variables as the correct data types before deployment. To clear up some guessing, the entire revised procedure is below.

    Code:
    Private Function QueryCurrentTraps() As Object
    	Dim traps As List(Of clsTrap) = mCurrentTraps
    	Dim svr As clsRnmServer = mCurrentServer
    
    	Dim query = From t As clsTrap In traps
    			Group Join l In svr.TrapSeverityLabels
    				On t.SeverityEnum Equals l.SeverityEnum Into list1 = Group
    			Group Join r In svr.Regions
    				On t.RegionNumber Equals r.Number Into list2 = Group
    			Group Join s In svr.Sites
    				On t.SiteNumber Equals s.Number Into list3 = Group
    			From l In list1.DefaultIfEmpty
    			From r In list2.DefaultIfEmpty
    			From s In list3.DefaultIfEmpty
    			Select X = If(IsFiltered(t), "X", ""),
    				 t.EventDate, t.ObjectName, t.SeverityEnum, t.Message, t.RegionNumber, t.SiteNumber, t.ObjectClass, t.ArrivalID,
    				LevelText = If(l Is Nothing, t.SeverityEnum, l.LevelText),
    				RegionName = If(r Is Nothing, t.RegionNumber, r.Name),
    				SiteName = If(s Is Nothing, t.SiteNumber, s.Name)
    
    	Dim rowcount As Long = traps.LongCount
    	Dim colcount As Integer = 9
    	Dim data(rowcount, colcount) As String
    	Dim _trap
    	Dim queryresults = query.ToArray
    
    	For row = 0 To rowcount - 1
    		_trap = queryresults(row)
    		data(row, 0) = _trap.X
    		data(row, 1) = _trap.EventDate
    		data(row, 2) = _trap.ObjectName
    		data(row, 3) = _trap.LevelText
    		data(row, 4) = _trap.Message
    		data(row, 5) = _trap.ObjectClass
    		data(row, 6) = _trap.RegionName
    		data(row, 7) = _trap.SiteName
    		data(row, 8) = _trap.ArrivalID
    	Next
    
    	Return data
    
    End Function
    Thanks again!

  11. #11
    You don't want to know.
    Join Date
    Aug 2010
    Posts
    4,459

    Re: Poor performance converting IEnumerable to 2D Array

    You'd be even faster if you used a For Each. .ToArray() is an allocation plus iteration, For Each is just iteration.
    This answer is wrong. You should be using TableAdapter and Dictionaries instead.

  12. #12

    Thread Starter
    New Member
    Join Date
    Feb 2018
    Posts
    4

    Re: Poor performance converting IEnumerable to 2D Array

    Quote Originally Posted by Sitten Spynne View Post
    You'd be even faster if you used a For Each. .ToArray() is an allocation plus iteration, For Each is just iteration.
    You are quite right. I tweaked the code using your For/Each suggestion and there was another noticeable performance increase. I tested up to 12K items, which is well over the anticipated maximum number at any given time, and was very happy with the results.

    Looks like I came to the right place to ask questions.

    Thanks!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width