Re: Processing lots of data
"Take a few days"???
You are going to have to elaborate on what is actually happening during "days" of processing in order for us to suggest "methods" for doing such.
Re: Processing lots of data
ok, sorry. essentially i have hundreds of flat files (pretty much csv files) that I am reading using FileIO.FileSystem.FileReader and looping through each line extracting the information I need. During this process I also lookup some information (based on what is in the flat file) in a MySQL database. All of this information then gets written into another database in the structure / data types I need.
When I run my program now, it sits there hanging whilst processing. I wondered if there was something I could do to allow the user to effectively halt the program. I'm also unsure how best to manage the process whilst it is running. Wondered if there were any tips.
Thanks,
N
Re: Processing lots of data
If a process must run this long it would be prudent to run the application on a non user machine rather than considering using secondary threads. No matter which way you go it would seem prudent to review your code to see if time can be knocked off.
Our agency (Oregon Department of Revenue) processes huge amounts (ranging from several hundred Mega bytes to 1 Tera bytes) of both flat file and xml based files during peak season (from the IRS) where there are nightly task that are responsible to accept data any time after 7 PM and must be finished prior to 5 AM the following morning. Having been doing this for years the first thing is that data manipulation is all done on a dedicated file server (data is rec'd via a secure web service) then the application responsible for processing incoming data to Enterprise level databases breaks down processing between the application and programs and stored procedures which allow for ever efficent method of consuming the data since some processes run solely on the file server while other processes are executed on the database server. All code that runs on the file server has been streamlined, run uninhibited without any need to conpenstate for user interaction to the computer/server. I am not saying you should go this route and may not have the resources to do so. What I am suggesting is consider first reviewing your code, see if there are places that can be streamlined and if possible run the processes on at the very least a computer where no end users will work on.
Re: Processing lots of data
I can't get past the few days part..... How big are these files anyway. I would think even if there were 1000 of them and they were 1 gig each that processing time would still be much less than a few days.
I would think the code could be changed to speed this up dramatically, There may also be some issue with a Virus scanner slowing things down.
That said Multithreading would allow the process to run on a different thread than the UI so the app would still be responsive while processing the data.
Speaking of the long data processing time the company I used to work for had a utility that processed and sorted data files and in one case it took a few days to process this one file. One day I had some free time and decided to look at the code [written by someone else] to see why it was taking so long to do this. I ended up moving one line of code to a different function, creating a few public variables and changing one of the parameters from byVal to byRef and the file processing of that same file completed in 16 seconds.
Re: Processing lots of data
The data is historical from many many years and so there is a lot of it. I'm only basing a few days on the fact that one of the smaller files takes about 2 minutes.
Thanks for the comments, interesting to hear the comment DataMiser about the cutting down to 16secs. I'm pretty sure my code isn't optimised but will look through again.
Ideally I'd like to close down my main form (on the click of the "process" button) and hand it over to another form in my application (that I've called logger) so that I can use this to log output whilst it is processing. I'm struggling to do this at the moment though so any advice would be welcome.
Re: Processing lots of data
I do something just like this - but I have created a SERVICE to run the processing.
The SERVICE uses an HTTPLISTENER to wait for requests from UI app's on the network.
When a request arrives a thread is started in the SERVICE that processes that data.
UI waits for nothing. It can even shut down and the SERVICE still runs.
btw - VB is notoriously poor at handling string data. Make sure to do things like use STRINGBUILDERS and watch the areas where you cut up and search for data. I've been writing C++ functions for a year now that actually do the data processing work in my SERVICE...
Re: Processing lots of data
thanks szlamany. I might try doing this, do you have any code examples of how to do this? even a link to existing threads?
Re: Processing lots of data
Unfortunately that is a bit too much to ask.
Just getting a SERVICE app to run is a bit of a task - you will find questions I asked here to overcome some of those issues.
You certainly can use any method you want to move the data from the UI to the backend SERVICE - file i/o, etc...
Re: Processing lots of data
If this was me I think I'd deliberately avoid handing it off to a separate, silent thread. This should really be running on a separate dedicated machine, in which case who cares whether a form remains visible or not. Failing that it's going to be running on a user's machine in which case I'd quite like the constant reminder of an entry in the task bar to help prevent the user closing the machine down at the end of the day and bringing it to a crashing halt.
Don't forget, a running app won't make the machine unresponsive. It will only be the app that's unresponsive. It might hog some resources, particularly the disk head, but that's going to be the case whether it's running silent or not. And there's nothing to stop you just minimising it.
SzLamany's suggestion of running it as a service is good but then you DEFINITELY want to run it on a dedicated machine if you don't want user stupidity to interrupt it part way through.
Re: Processing lots of data
Quote:
Originally Posted by
neopheus
The data is historical from many many years and so there is a lot of it. I'm only basing a few days on the fact that one of the smaller files takes about 2 minutes.
Thanks for the comments, interesting to hear the comment DataMiser about the cutting down to 16secs. I'm pretty sure my code isn't optimised but will look through again.
Ideally I'd like to close down my main form (on the click of the "process" button) and hand it over to another form in my application (that I've called logger) so that I can use this to log output whilst it is processing. I'm struggling to do this at the moment though so any advice would be welcome.
So it seems this is a one time job... If it is in fact a one time job then I would just run it in a dedicated machine and let it go even it would take a few days to complete. Trying to rewrite an application to save some processing time but when the app is used only once doesn't make much sense to me... Anyway, you can fix the unresponsive issue by putting the task in a background thread.