Results 1 to 11 of 11

Thread: Processing lots of data

  1. #1

    Thread Starter
    Lively Member
    Join Date
    Mar 2013
    Posts
    92

    Processing lots of data

    Hi,
    I need to process a significant amount of data that is contained within flat files on the filesystem. I have built a basic application that provides a list within a listbox with a command button that will be pressed to start the processing. The processing could well take a few days so I'm wondering what the best approach would be in terms of running as a background process, or perhaps thinking about multi-threading. I'm not experienced in this area so any help would be appreciated.

    In summary, my question is what is the best way to handle processing large amounts of flat files without making the PC unresponsive.

    Any help would be appreciated.

    Thanks,
    N

  2. #2
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: Processing lots of data

    "Take a few days"???

    You are going to have to elaborate on what is actually happening during "days" of processing in order for us to suggest "methods" for doing such.

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  3. #3

    Thread Starter
    Lively Member
    Join Date
    Mar 2013
    Posts
    92

    Re: Processing lots of data

    ok, sorry. essentially i have hundreds of flat files (pretty much csv files) that I am reading using FileIO.FileSystem.FileReader and looping through each line extracting the information I need. During this process I also lookup some information (based on what is in the flat file) in a MySQL database. All of this information then gets written into another database in the structure / data types I need.

    When I run my program now, it sits there hanging whilst processing. I wondered if there was something I could do to allow the user to effectively halt the program. I'm also unsure how best to manage the process whilst it is running. Wondered if there were any tips.

    Thanks,
    N

  4. #4
    Karen Payne MVP kareninstructor's Avatar
    Join Date
    Jun 2008
    Location
    Oregon
    Posts
    6,713

    Re: Processing lots of data

    If a process must run this long it would be prudent to run the application on a non user machine rather than considering using secondary threads. No matter which way you go it would seem prudent to review your code to see if time can be knocked off.

    Our agency (Oregon Department of Revenue) processes huge amounts (ranging from several hundred Mega bytes to 1 Tera bytes) of both flat file and xml based files during peak season (from the IRS) where there are nightly task that are responsible to accept data any time after 7 PM and must be finished prior to 5 AM the following morning. Having been doing this for years the first thing is that data manipulation is all done on a dedicated file server (data is rec'd via a secure web service) then the application responsible for processing incoming data to Enterprise level databases breaks down processing between the application and programs and stored procedures which allow for ever efficent method of consuming the data since some processes run solely on the file server while other processes are executed on the database server. All code that runs on the file server has been streamlined, run uninhibited without any need to conpenstate for user interaction to the computer/server. I am not saying you should go this route and may not have the resources to do so. What I am suggesting is consider first reviewing your code, see if there are places that can be streamlined and if possible run the processes on at the very least a computer where no end users will work on.

  5. #5
    PowerPoster
    Join Date
    Feb 2012
    Location
    West Virginia
    Posts
    14,206

    Re: Processing lots of data

    I can't get past the few days part..... How big are these files anyway. I would think even if there were 1000 of them and they were 1 gig each that processing time would still be much less than a few days.

    I would think the code could be changed to speed this up dramatically, There may also be some issue with a Virus scanner slowing things down.

    That said Multithreading would allow the process to run on a different thread than the UI so the app would still be responsive while processing the data.

    Speaking of the long data processing time the company I used to work for had a utility that processed and sorted data files and in one case it took a few days to process this one file. One day I had some free time and decided to look at the code [written by someone else] to see why it was taking so long to do this. I ended up moving one line of code to a different function, creating a few public variables and changing one of the parameters from byVal to byRef and the file processing of that same file completed in 16 seconds.

  6. #6

    Thread Starter
    Lively Member
    Join Date
    Mar 2013
    Posts
    92

    Re: Processing lots of data

    The data is historical from many many years and so there is a lot of it. I'm only basing a few days on the fact that one of the smaller files takes about 2 minutes.

    Thanks for the comments, interesting to hear the comment DataMiser about the cutting down to 16secs. I'm pretty sure my code isn't optimised but will look through again.

    Ideally I'd like to close down my main form (on the click of the "process" button) and hand it over to another form in my application (that I've called logger) so that I can use this to log output whilst it is processing. I'm struggling to do this at the moment though so any advice would be welcome.

  7. #7
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: Processing lots of data

    I do something just like this - but I have created a SERVICE to run the processing.

    The SERVICE uses an HTTPLISTENER to wait for requests from UI app's on the network.

    When a request arrives a thread is started in the SERVICE that processes that data.

    UI waits for nothing. It can even shut down and the SERVICE still runs.

    btw - VB is notoriously poor at handling string data. Make sure to do things like use STRINGBUILDERS and watch the areas where you cut up and search for data. I've been writing C++ functions for a year now that actually do the data processing work in my SERVICE...

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  8. #8

    Thread Starter
    Lively Member
    Join Date
    Mar 2013
    Posts
    92

    Re: Processing lots of data

    thanks szlamany. I might try doing this, do you have any code examples of how to do this? even a link to existing threads?

  9. #9
    MS SQL Powerposter szlamany's Avatar
    Join Date
    Mar 2004
    Location
    Connecticut
    Posts
    18,263

    Re: Processing lots of data

    Unfortunately that is a bit too much to ask.

    Just getting a SERVICE app to run is a bit of a task - you will find questions I asked here to overcome some of those issues.

    You certainly can use any method you want to move the data from the UI to the backend SERVICE - file i/o, etc...

    *** Read the sticky in the DB forum about how to get your question answered quickly!! ***

    Please remember to rate posts! Rate any post you find helpful - even in old threads! Use the link to the left - "Rate this Post".

    Some Informative Links:
    [ SQL Rules to Live By ] [ Reserved SQL keywords ] [ When to use INDEX HINTS! ] [ Passing Multi-item Parameters to STORED PROCEDURES ]
    [ Solution to non-domain Windows Authentication ] [ Crazy things we do to shrink log files ] [ SQL 2005 Features ] [ Loading Pictures from DB ]

    MS MVP 2006, 2007, 2008

  10. #10
    Super Moderator FunkyDexter's Avatar
    Join Date
    Apr 2005
    Location
    An obscure body in the SK system. The inhabitants call it Earth
    Posts
    7,957

    Re: Processing lots of data

    If this was me I think I'd deliberately avoid handing it off to a separate, silent thread. This should really be running on a separate dedicated machine, in which case who cares whether a form remains visible or not. Failing that it's going to be running on a user's machine in which case I'd quite like the constant reminder of an entry in the task bar to help prevent the user closing the machine down at the end of the day and bringing it to a crashing halt.

    Don't forget, a running app won't make the machine unresponsive. It will only be the app that's unresponsive. It might hog some resources, particularly the disk head, but that's going to be the case whether it's running silent or not. And there's nothing to stop you just minimising it.

    SzLamany's suggestion of running it as a service is good but then you DEFINITELY want to run it on a dedicated machine if you don't want user stupidity to interrupt it part way through.
    The best argument against democracy is a five minute conversation with the average voter - Winston Churchill

    Hadoop actually sounds more like the way they greet each other in Yorkshire - Inferrd

  11. #11
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: Processing lots of data

    Quote Originally Posted by neopheus View Post
    The data is historical from many many years and so there is a lot of it. I'm only basing a few days on the fact that one of the smaller files takes about 2 minutes.

    Thanks for the comments, interesting to hear the comment DataMiser about the cutting down to 16secs. I'm pretty sure my code isn't optimised but will look through again.

    Ideally I'd like to close down my main form (on the click of the "process" button) and hand it over to another form in my application (that I've called logger) so that I can use this to log output whilst it is processing. I'm struggling to do this at the moment though so any advice would be welcome.
    So it seems this is a one time job... If it is in fact a one time job then I would just run it in a dedicated machine and let it go even it would take a few days to complete. Trying to rewrite an application to save some processing time but when the app is used only once doesn't make much sense to me... Anyway, you can fix the unresponsive issue by putting the task in a background thread.
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width