dcsimg
Results 1 to 3 of 3

Thread: [RESOLVED] Paging large amounts of data to disk and back

  1. #1

    Thread Starter
    Pro Grammar chris128's Avatar
    Join Date
    Jun 2007
    Location
    England
    Posts
    7,604

    Resolved [RESOLVED] Paging large amounts of data to disk and back

    I have some applications that produce reports of information about various things in networks, such as folders or Active Directory objects - in some cases people working at huge companies need to run these programs against literally millions of folders or AD objects, so the amount of data that gets returned can be massive. As the application is only 32 bit (due to some calls to native Windows APIs that I haven't got working correctly on x64 systems yet) it can only access 4 GB of memory at most so even on an x64 system with tons of RAM in some cases people are hitting this limit and the program throws an Out Of Memory exception.

    My only workaround for this issue so far has been to add options to write the results directly to CSV file, so that I can use a FileStream to write the results as they are found rather than adding them to a list that is stored in memory (the items are added to a list in memory normally so that they can be displayed to the user and then they can choose to export to CSV or Excel or HTML file from there). But this FileStream method has its drawbacks - for example the results can't be sorted at all because they're being written out to file as soon as they are created so there's no point where each individual item can be compared against all other items, plus I don't like the fact that the user can't see the results in the GUI first and see if they even want to export them.

    I can't help thinking there must be a way to make this actually work and have all the data visible in the GUI without it all being in memory at once, by paging the data to/from disk as needed. I know of other programs that must do this, but I don't know of any .NET programs that do this. I know it happens in the background anyway what with the way virtual memory gets paged to/from the Windows page file as required, but that doesn't help avoid the limit of only 4 GB of virtual memory being addressable to 32 bit programs.

    I know I could easily use a FileStream to write the data to my own file, writing each entry to disk as it is found by the program, and then just read "small chunks" in at a time - but my problem is how to make that actually work with a GUI. I mean lets say my in-memory list only has half the actual results loaded at the moment, and the user scrolls down my TreeView (or ListView if they're using the alternate view option in the program) or they choose to re sort a column, how do I make that work on ALL of the data when I can't load all of the data into memory at once? How would the scroll bar even work in a vaguely "normal" way at all considering the Tree/ListView won't ever actually have all of the data in it? I'm using WPF for the GUI (just in case someone suggests using some TreeView/ListView events that are specific to WinForms)

    I'm wondering if this is something that is too low level for .NET? If not perhaps someone knows if there is actually some way to do this that is going to result in a decent user experience



    Oh and inb4 dunfiddlin tells me I don't actually want to do this
    Last edited by chris128; Aug 24th, 2013 at 07:50 PM.
    My free .NET Windows API library (Version 2.2 Released 12/06/2011)

    Blog: cjwdev.wordpress.com
    Web: www.cjwdev.co.uk


  2. #2
    .NUT jmcilhinney's Avatar
    Join Date
    May 2005
    Location
    Sydney, Australia
    Posts
    102,400

    Re: Paging large amounts of data to disk and back

    I'm not sure about WPF but, in WinForms, the ListView and the DataGridView both support virtual modes, so you can load and unload pages of data from a source as required. I doubt that a TreeView supports that. Your big issue would still be sorting because sorting data without having access to all of it at the same time would be cumbersome.

    OK, I just checked and it appears that there is no virtual mode for the WPF ListView. I read this:
    No there's nothing native to WPF that would do this for you. You would have to create a custom virtualized collection and set it as the datasource of your ItemsControl (ListBox/View/whatever). As the ItemsControl interacts with the collection class you would have to detect that it's requesting data that you have not yet loaded and load them in. At the same time you would probably want to unload data at the "front" of the list as you progress through it as not to eventually end up storing all the records in memory. Of course you probably don't want to do any of this synchronusly because it would lock up the UI.

    Also, you would probably want to combine this with the new defered scrolling feature, ScrollViewer.IsDeferredScrollingEnabled, that will ensure that if they do a large scroll it will not simply move through the entire list of items sequentially which would force you to load all the items anyway.

    That said, in today's day and age I think this is usually a bad UI design. Rather than a huge list that people have to scroll through to find things you might want to give some kind of searching/filtering capabilities or introduce an explicit paging mechanism instead.
    I imagine that you could find help on creating such a collection without too much issue.

    With regards to the sorting, I'd actually be tempted to use a local database. You can store large amounts of data in a database without issues with memory and the database is already optimised to do things like sorting. You could store all your data in a database and let it do the sorting, then query the database each time you needed a page of data to display. If you were to use a data access technology that supported LINQ, e.g. LINQ to SQL or Entity Framework, getting a page would be as simple as calling Skip and Take.

  3. #3

    Thread Starter
    Pro Grammar chris128's Avatar
    Join Date
    Jun 2007
    Location
    England
    Posts
    7,604

    Re: Paging large amounts of data to disk and back

    Thanks JMC, how/where did you find that? I've done plenty of google searches on this subject and not come across anything that resembles my scenario so perhaps I'm searching for the wrong terminology. Most of the collection based controls in WPF do implement virtualization by default but the problem is that this just applies to the creation/rendering of the UI items rather than the actual retrieval of the data as far as I can tell. I thought this was the same in WinForms? I guess I'll look into creating a custom virtualized collection and see if there's any way to detect when more data is being requested then

    EDIT: Ah yes it seems I was searching for the wrong words previously - the key term to look for here seems to be "data virtualization". I had kind of ignored virtualization as a solution to this problem before because like I said in WPF the built in virtualization support only affects the UI elements rather than the actual data. I've now found 2 potential solutions for a ListView, both discussed here: http://www.zagstudio.com/blog/498#.Uho5O0CHd1E and this one for TreeView: http://www.zagstudio.com/blog/477#.Uho6pkCHd1E
    I'm sure I'll be able to make one of them work for my scenario, or just create a hybrid of the two that meets my requirements specifically Thanks for pointing me in the right direction JMC
    Last edited by chris128; Aug 25th, 2013 at 12:13 PM.
    My free .NET Windows API library (Version 2.2 Released 12/06/2011)

    Blog: cjwdev.wordpress.com
    Web: www.cjwdev.co.uk


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Featured


Click Here to Expand Forum to Full Width