|
-
May 4th, 2011, 07:01 AM
#1
Thread Starter
Hyperactive Member
How to handle very huge text/log files
Hi,
I have a very huge server log files like files are more than 500 to 600 mb and even some files are over 2 gb data. the files are maintained for few years. it will be maintained as is. each log file has at least 1 million lines to 20 million lines.
I am looking create an application which can find a line in the text file using regex, removing duplicate entries.
please let me know how these huge file can be handled in a way it works very quickly.
thanks in advance.
-
May 4th, 2011, 08:46 AM
#2
Re: How to handle very huge text/log files
Have you considered monitoring log file’s physical size (or by time period) and archiving old entries which would make it easier to manage opening/reading log information? Archive parts of these log files could be placed in an archive folder with a naming convention which allows anyone to go back in time to view information. Of course going this route in the beginning would take some effort on a developer to run thru the current log files and create many archives. I would look at Stream reader and writer.
http://msdn.microsoft.com/en-us/libr...(v=vs.71).aspx
http://msdn.microsoft.com/en-us/libr...eamwriter.aspx
-
May 4th, 2011, 09:17 AM
#3
Re: How to handle very huge text/log files
With such large log files, you certainly don't want to read the whole file into memory. You could, however, read it in chunks. I'd probably use a streamreader and streamwriter (as suggested by Kevin) in a loop and read x number of lines, start a new thread and pass those lines to it to do further processing...
Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
- Abraham Lincoln -
-
May 7th, 2011, 06:03 AM
#4
Thread Starter
Hyperactive Member
Re: How to handle very huge text/log files
I can do that going forward. but I have tons of files which are created past few years. So, i am looking for a solution to handle those files.
-
May 7th, 2011, 08:50 AM
#5
Re: How to handle very huge text/log files
 Originally Posted by csKanna
I can do that going forward. but I have tons of files which are created past few years. So, i am looking for a solution to handle those files.
The fact is simple in that you need to treat the large current files no different from the files, which will be logged too. Figure out the maximum files size that the split will occur and when that size is reached split the file up. The difficult part would be to remove duplicate lines no matter if this is done before or after the split operation if removing duplicates is a major concern. This is what we do as developers and there is no way around the fact.
To get started I would layout the processes on paper or software such as Visio then work from the design. Within the design, there should be an algorithm to name new files and a method to get the last file name used to create the next file then increment the last file used name. If you need to search within the files and depending on what the results are for you could write a search utility or use a third party search tool to search one or more files. How the split utility runs could be triggered by some type of task manager or be a manual process triggered by an event in a shared calendar.
-
May 8th, 2011, 01:10 AM
#6
Re: How to handle very huge text/log files
 Originally Posted by csKanna
I am looking create an application which can find a line in the text file using regex, removing duplicate entries.
Just so it's clear to me, you need to search the files in order to remove duplicate entries or do you just need to search in the files?
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
Click Here to Expand Forum to Full Width
|