I make a program.
Part of the program is to set the "differences" between 2 large text files.
The files aren't really that large. Like 10 million lines.
I save them on a hashtable. Again I only save the long hash.
The computer, and the system, and the program is all 64 bits.Code:Private Sub createMD5HashFromFileTheLoop(ByVal fileToWrite As String, ByVal fileToRead As String, ByRef ExisingHashSet As Generic.HashSet(Of Int64), appending As Boolean, ByVal actuallyWrite As Boolean) Using sr As New System.IO.StreamReader(fileToRead) Using sw As New System.IO.StreamWriter(fileToWrite, appending, defaultEncoding) While Not sr.EndOfStream 'Dim offset = sr.BaseStream.Position 'Dim nextOffset = sr.BaseStream.Position Dim l As String = sr.ReadLine() Dim hashmd5long = compute64bitHash(l) Do ' a trick to put the for each in a "nest" that we can exit from If ExisingHashSet.Contains(hashmd5long) Then Exit Do Else ExisingHashSet.Add(hashmd5long) End If 'hashSet.Item(hashmd5long).Add(offset) If actuallyWrite Then sw.WriteLine(l) End If Loop While False 'sr.BaseStream.Position = nextOffset System.Windows.Forms.Application.DoEvents() End While End Using End Using End Sub
The size of ExisingHashSet is 2893249 when error occurs. Each entry takes a long (4 bytes) so we're talking about 8-10 mb memory at most. Far less than 8 GB ram and virtual memories.
Taskmanager do not work and I got another windows telling I do ran out of memory. I also run mongodb on that same computer.
However I never got out of memory problem without running this program and this program used to run fine. I just wonder how those hastable took so much memory.


Reply With Quote

