Need advice on removing duplicate lines in text file [Resolved]
Hey Guys,
I have been thinking about this problem and have not be able to get any relevent information for google on this. I need to remove duplicate lines in a text file. Anyone out there have any advice on how to accomplish this?
Re: Need advice on removing duplicate lines in text file
Depends how many lines you have. One way is to store the hashcode of each line in a collection. Keep adding them until the collection complains of a duplicate key. When that happens simply discard that line of the file.
Re: Need advice on removing duplicate lines in text file
The amount of lines change change but, i doubt it will ever be over 500 lines. So what your saying is I could creating a hashtable and write each line to the hashtables key if the key exsists then go on to the next line?
If so that's some great thinking and should be easy to create. There is probably a better more efficiant way but, this should work great for now.
Re: Need advice on removing duplicate lines in text file
Actually, there may not be a better way. The linnear approach would be to load them all into an array. As each is loaded, search all existing items for a duplication. Since this would mean comparing strings, and doing so MANY times, it would not be very efficient.
Another option would be to move the lines into the array, and sort it. All duplicates would be next to each other. You would then just need to go through the array one time, cleaning up. However, the sort would be somewhat slow because this is text, and you would have to do a comparison.
My understanding of hash tables is that they would be faster than this.
Re: Need advice on removing duplicate lines in text file
That method worked perfectly. Not the best way to go about it but hey it works :)
Re: Need advice on removing duplicate lines in text file
The reason I suggested using a HashCode was simply to save RAM. At the time, I wasn't thinking about hashtables specifically, but I guess that would be at least as good if not better than a collection. :thumb: