Actually, there may not be a better way. The linnear approach would be to load them all into an array. As each is loaded, search all existing items for a duplication. Since this would mean comparing strings, and doing so MANY times, it would not be very efficient.

Another option would be to move the lines into the array, and sort it. All duplicates would be next to each other. You would then just need to go through the array one time, cleaning up. However, the sort would be somewhat slow because this is text, and you would have to do a comparison.

My understanding of hash tables is that they would be faster than this.