Results 1 to 6 of 6

Thread: How to handle very huge text/log files

  1. #1

    Thread Starter
    Hyperactive Member csKanna's Avatar
    Join Date
    Dec 2005
    Location
    Tech-Tips-Now.com
    Posts
    339

    How to handle very huge text/log files

    Hi,

    I have a very huge server log files like files are more than 500 to 600 mb and even some files are over 2 gb data. the files are maintained for few years. it will be maintained as is. each log file has at least 1 million lines to 20 million lines.

    I am looking create an application which can find a line in the text file using regex, removing duplicate entries.

    please let me know how these huge file can be handled in a way it works very quickly.

    thanks in advance.
    Kanna

  2. #2
    Karen Payne MVP kareninstructor's Avatar
    Join Date
    Jun 2008
    Location
    Oregon
    Posts
    6,714

    Re: How to handle very huge text/log files

    Have you considered monitoring log file’s physical size (or by time period) and archiving old entries which would make it easier to manage opening/reading log information? Archive parts of these log files could be placed in an archive folder with a naming convention which allows anyone to go back in time to view information. Of course going this route in the beginning would take some effort on a developer to run thru the current log files and create many archives. I would look at Stream reader and writer.

    http://msdn.microsoft.com/en-us/libr...(v=vs.71).aspx
    http://msdn.microsoft.com/en-us/libr...eamwriter.aspx

  3. #3
    PowerPoster stanav's Avatar
    Join Date
    Jul 2006
    Location
    Providence, RI - USA
    Posts
    9,290

    Re: How to handle very huge text/log files

    With such large log files, you certainly don't want to read the whole file into memory. You could, however, read it in chunks. I'd probably use a streamreader and streamwriter (as suggested by Kevin) in a loop and read x number of lines, start a new thread and pass those lines to it to do further processing...
    Let us have faith that right makes might, and in that faith, let us, to the end, dare to do our duty as we understand it.
    - Abraham Lincoln -

  4. #4

    Thread Starter
    Hyperactive Member csKanna's Avatar
    Join Date
    Dec 2005
    Location
    Tech-Tips-Now.com
    Posts
    339

    Re: How to handle very huge text/log files

    I can do that going forward. but I have tons of files which are created past few years. So, i am looking for a solution to handle those files.
    Kanna

  5. #5
    Karen Payne MVP kareninstructor's Avatar
    Join Date
    Jun 2008
    Location
    Oregon
    Posts
    6,714

    Re: How to handle very huge text/log files

    Quote Originally Posted by csKanna View Post
    I can do that going forward. but I have tons of files which are created past few years. So, i am looking for a solution to handle those files.
    The fact is simple in that you need to treat the large current files no different from the files, which will be logged too. Figure out the maximum files size that the split will occur and when that size is reached split the file up. The difficult part would be to remove duplicate lines no matter if this is done before or after the split operation if removing duplicates is a major concern. This is what we do as developers and there is no way around the fact.

    To get started I would layout the processes on paper or software such as Visio then work from the design. Within the design, there should be an algorithm to name new files and a method to get the last file name used to create the next file then increment the last file used name. If you need to search within the files and depending on what the results are for you could write a search utility or use a third party search tool to search one or more files. How the split utility runs could be triggered by some type of task manager or be a manual process triggered by an event in a shared calendar.

  6. #6
    Frenzied Member ntg's Avatar
    Join Date
    Sep 2004
    Posts
    1,449

    Re: How to handle very huge text/log files

    Quote Originally Posted by csKanna View Post
    I am looking create an application which can find a line in the text file using regex, removing duplicate entries.
    Just so it's clear to me, you need to search the files in order to remove duplicate entries or do you just need to search in the files?
    "Feel the force...read the source..."
    Utilities: POPFileDebugViewProcess ExplorerWiresharkKeePassUltraVNCPic2Ascii
    .Net tools & open source: DotNetNukelog4NetCLRProfiler
    My open source projects: Thales SimulatorEFT CalculatorSystem Info ReporterVSS2SVNIBAN Functions
    Customer quote: "If the server has a RAID array, why should we bother with backups?"
    Programmer quote: "I never comment my code. Something that is hard to write should be impossible to comprehend."
    Ignorant quote: "I have no respect for universities, as they teach not practicle stuff, and charge money for"

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width