Results 1 to 14 of 14

Thread: [3.0/LINQ] How would you imporve this

  1. #1

    Thread Starter
    Fanatic Member Crash893's Avatar
    Join Date
    Dec 2005
    Posts
    930

    [3.0/LINQ] How would you imporve this

    hey all

    The objective is to read a massive textfile (200+mb)
    each line is a "entry"
    then output the totals of each entry

    rob
    rob
    rhaps
    tim

    output

    rob =2
    rhaps =1
    tim =1



    the code i came up with is here

    c# Code:
    1. System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
    2.             Dictionary<String,int> Count = new Dictionary<string,int>();
    3.             string path = @"C:\Documents and Settings\Rob\Desktop\test.txt";
    4.             int num = 0;
    5.             sw.Reset();
    6.             sw.Start();
    7.             using (StreamReader sr = new StreamReader(path))
    8.             {
    9.  
    10.                 string line;
    11.                 while ((line = sr.ReadLine()) != null)
    12.                 {
    13.                     if (Count.ContainsKey(line))
    14.                     {
    15.                         Count[line]++;
    16.                     }
    17.                     else
    18.                     {
    19.                         Count.Add(line, 0);
    20.                     }
    21.                     num++;
    22.                 }
    23.             }
    24.            
    25.             foreach (KeyValuePair<string, int> kvp in Count)
    26.             {
    27.                 Console.WriteLine(kvp.Key+": "+kvp.Value.ToString());
    28.             }
    29.             sw.Stop();
    30.             long time = sw.ElapsedMilliseconds;
    31.             Console.WriteLine("Time = " + time + " milliseconds.");
    32.             Console.WriteLine("Total Rows: " + num.ToString());
    33.             Console.WriteLine("lines per ms: "+ (num / time).ToString());



    I was thinking of somesort of parrellel while but im not sure how to do that or if it even exists and if it did exist would it work with the stream reader


    the other idea was to load it as a DB and then run a sql command to get the count or something but im not sure if that would be faster or not?


    any ideas are greatly appreshated
    thanks

  2. #2

    Thread Starter
    Fanatic Member Crash893's Avatar
    Join Date
    Dec 2005
    Posts
    930

    Re: [3.0/LINQ] How would you imporve this

    well i caught one error already

    count.add(line,0) should be 1 because its a total not zero based

  3. #3
    Fanatic Member
    Join Date
    Aug 2006
    Location
    Chicago, IL
    Posts
    514

    Re: [3.0/LINQ] How would you imporve this

    If it is a 200mb file, it'll take some time.

    You can use a backgroundworker (or a thread), to run the process in the background. You could then invoke delegate methods to update the UI to let the user know what is going on, and give them an option to cancel (obviously you'd need to switch to a Windows app instead).

    A count through SQL would very likely be faster than opening the file, reading through the entire thing and then telling someone what the count is. The only drawback is the time that it would take SQL to import the file.
    Warren Ayen
    Senior C# Developer
    DLS Software Studios (http://www.dlssoftwarestudios.com/)

    I use Microsoft Visual Studio 2005, 2008, working with Visual Basic and Visual C#
    Hey! If you like my post, or I solve your issue, please Rate Me!

  4. #4

    Thread Starter
    Fanatic Member Crash893's Avatar
    Join Date
    Dec 2005
    Posts
    930

    Re: [3.0/LINQ] How would you imporve this

    Im just doing a console application

    its pretty fast now 200mb in about 20 seconds

    I was just thinking that if i could spread the work over two cpus it might be a bit faster but im not sure how to cordnate the threading

    I was hopeing that there was something built in to do it for me.

  5. #5
    Fanatic Member
    Join Date
    Aug 2006
    Location
    Chicago, IL
    Posts
    514

    Re: [3.0/LINQ] How would you imporve this

    There is: System.Threading namespace. I'm not an expert, but I'm not sure you can "divide" the work between the processors. Although you can create new threads and run different methods in different ones, I think that you don't get a lot of control over which processor actually does the work..

    If I'm wrong though, someone please correct me
    Warren Ayen
    Senior C# Developer
    DLS Software Studios (http://www.dlssoftwarestudios.com/)

    I use Microsoft Visual Studio 2005, 2008, working with Visual Basic and Visual C#
    Hey! If you like my post, or I solve your issue, please Rate Me!

  6. #6
    Frenzied Member Lightning's Avatar
    Join Date
    Oct 2002
    Location
    Eygelshoven
    Posts
    1,611

    Re: [3.0/LINQ] How would you imporve this

    I think that the processor isn't the bottleneck in this situation, but the drive. So dividing the workload between different cores/processors doesn't speed it up very much. But MS has a parallel extension (sort of beta) that you could tryMS
    VB6 & C# (WCF LINQ) mostly


    If you need help with a WPF/WCF question post in the NEW WPF & WCF forum and we will try help the best we can

    My site

    My blog, couding troubles and solutions

    Free online tools

  7. #7
    I'm about to be a PowerPoster! mendhak's Avatar
    Join Date
    Feb 2002
    Location
    Ulaan Baator GooGoo: Frog
    Posts
    38,170

    Re: [3.0/LINQ] How would you imporve this

    I'd recommend the Parallel FX framework too. It should help you take advantage of the multi-core processors.

    Have the threads run through the file and then update the Dictionary. Lock the dictionary when updating the corresponding int.

  8. #8
    Frenzied Member
    Join Date
    Sep 2005
    Posts
    1,547

    Re: [3.0/LINQ] How would you imporve this

    The problem isn't that you are using 1 thread. The problem is with how you are using StreamReader. StreamReader uses a FileStream and reads 1kb at a time by default. So you are accessing the disk 204,800 times while parsing that 200mb file.

    Try changing
    using (StreamReader sr = new StreamReader(path))
    to
    using (StreamReader sr = new StreamReader(path, Encoding.UTF8, true, 0x100000/*1mb*/))

    That should increase the speed.

  9. #9

    Thread Starter
    Fanatic Member Crash893's Avatar
    Join Date
    Dec 2005
    Posts
    930

    Re: [3.0/LINQ] How would you imporve this

    High6

    I gave that a try and it had some impovment ( generaly around 100 extra lines a ms)



    The thing i thought i saw and i could be completely off my rocker was a parrellel foreach

    and basicly it would take a look at the work and determine how many threads were nesscary

    then it would take each part of the each and split it up

    thread 1 reads first 5th thread 2 reads 5th

    and it knew to take care of this


    I could be thinking of perl maybe

  10. #10

    Thread Starter
    Fanatic Member Crash893's Avatar
    Join Date
    Dec 2005
    Posts
    930

    Re: [3.0/LINQ] How would you imporve this

    I should also point out that this is not any sort of work related project its just for the hell of it


    and by hell of it im trying to show up my perl scripting coworker again.


  11. #11
    Frenzied Member
    Join Date
    Jul 2008
    Location
    Rep of Ireland
    Posts
    1,380

    Re: [3.0/LINQ] How would you imporve this

    tell me do you know the names before hand?

  12. #12
    Frenzied Member
    Join Date
    Jul 2008
    Location
    Rep of Ireland
    Posts
    1,380

    Re: [3.0/LINQ] How would you imporve this

    What about this. Get the file, make a copy. load the copy to memory the go through the list. When we hit a unique name create a regex for that name and use the regex to 1: count the amounts and 2: replace the line with a null so everytime your if() hits one of these lines it just skips over it. When the regex has finished move to the next unquie line and repeat. Could anyone verify if this would give a speed improvement as you are only parssing unique lines?

  13. #13
    I'm about to be a PowerPoster! mendhak's Avatar
    Join Date
    Feb 2002
    Location
    Ulaan Baator GooGoo: Frog
    Posts
    38,170

    Re: [3.0/LINQ] How would you imporve this

    Unfortunately, regex won't help with your speed here, it is slow even with RegexOptions.Compiled.

  14. #14
    Frenzied Member
    Join Date
    Jul 2008
    Location
    Rep of Ireland
    Posts
    1,380

    Re: [3.0/LINQ] How would you imporve this

    bugger!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  



Click Here to Expand Forum to Full Width