[RESOLVED] Better Option Than List (Of...) For Disk Load Time
I have a football simulation game I've been working on. I have just created a design to store players' career stats. I chose using List Of collections for every stat (there are over 60) with each index representing the year the player played. I needed to create a simple array for each stat as well to simulate preseason, regular season, and post season stats (3). So a sample stat storage line in a player's constructor looks like this:
Code:
Private m_intPassAttempts(3) as New List(Of int16)
Then I initialize the array like this
Code:
for x = 0 to 2
m_intPassAttempts(x) = new List(of Int16)
next
This works great until I save a game file to disk and then reload it; it takes too damn long! It's not awful but I'm looking to speed it up the load time a bit.
My question is simple: is there a better collection class than List (Of...) for disk performance? Something with less overhead? Is Arraylist or Hashtable less memory intensive? I've changed all the smaller number stats like Games Played and INTS to Byte instead of Int16 which helped a bit, but I'm looking for another option as well.
Thanks,
Eric
Re: Better Option Than List (Of...) For Disk Load Time
How you store the values in memory will generally make virtually no difference to disk save/load speeds. If you are hard-coding a certain number of elements tho, a standard array would be the way to go (eg: Private m_intAttempts(3) as int16 ).
The important thing for disk speed is the way you are saving and loading the data. If that is being done via serialising each instance of the class then in-memory-storage might well have an effect, and switching to standard arrays if apt should help a bit (as there is slightly less data to save).
To get high speeds for load and save you should be saving the data to your own custom file format, which (to some degree at least) is optimised for size.
Re: Better Option Than List (Of...) For Disk Load Time
OK, so I'm attacking the problem from the wrong angle (mostly).
It's been a while since I created my loading code. I use this (where m_League is the save game object). Is this optimized as you suggested?:
Code:
Dim fs As Stream = New FileStream(diaSaveFile.FileName, FileMode.Create)
Dim bf As Runtime.Serialization.Formatters.Binary.BinaryFormatter = New Runtime.Serialization.Formatters.Binary.BinaryFormatter
bf.Serialize(fs, m_League)
fs.Close()
fs.Dispose()
Thank you!
Eric
Re: Better Option Than List (Of...) For Disk Load Time
30-40MB database in JSON format with quite complex structure (multiple objects inside other objects, etc.) is serialized to and deserialized from disk file in a second.
So.. if you don't show how you read and write from disk it is impossible to see what you did.
Re: Better Option Than List (Of...) For Disk Load Time
Ahh.. I only provided only half of the information, sorry. This is how I load.
Code:
Dim fs As Stream = New FileStream(diaOpenFile.FileName, FileMode.Open)
Dim bf As Runtime.Serialization.Formatters.Binary.BinaryFormatter = New Runtime.Serialization.Formatters.Binary.BinaryFormatter()
m_League = CType(bf.Deserialize(fs), ClLeague)
fs.Close()
fs.Dispose()
Re: Better Option Than List (Of...) For Disk Load Time
First of all you should avoid BinaryFormatter for multiple reasons. If you are interested why, here is what wrote Marc Gravell (author of many libraries used live on StackOverflow site): Why do I rag on BinaryFormatter?
If you want to test with JSON.NET (Newtonsoft JSON library) you don't even need to use file streams. All you need is to add Newtonsoft.Json from NuGet (right click on project in and then Manage NuGet packages).
How to serialize and write to file:
VB.NET Code:
Imports Newtonsoft.Json
Imports System.IO
...
File.WriteAllText("c:\test\save1.json", JsonConvert.SerializeObject(m_League))
Read from file and deserialize the data:
VB.NET Code:
Imports Newtonsoft.Json
Imports System.IO
...
Dim json = File.ReadAllText("c:\test\save1.json")
Dim m_League = JsonConvert.DeserializeObject(Of YourDataType)(json)
You have to define YourDataType to something you will use, e.g. List(Of List(Of Int16)), but I recommend to create separate class with proper field names. m_League should be defined as YourDataType.
Re: Better Option Than List (Of...) For Disk Load Time
I looked up JsonConvert articles and tried your code using my m_League object.
m_League doesn't just hold the data from my original post, it holds all the information about the user's saved game and has multiple classes created by me with many levels. Only the top level attributes of my m_League file were deserialized. For example, m_strLeagueName came through just fine, but objects of my own class CLTeam (which contains objects from CLPlayer, which contains objects from CLInjury, etc...) came up null.
Does JsonConvert deeply serialize and deserialize? Or is it meant for simple objects?
I also read the article imploring programmers to not use BinaryFormatter and was sufficiently convinced to try something else. Xml_Converter seems to have the same problem as I described above: no deep serialization.
I may just eliminate the List(Of...) for my stats and use conventional arrays double arrays redimming them every new season.
Re: Better Option Than List (Of...) For Disk Load Time
Quote:
Originally Posted by
neef
...
Does JsonConvert deeply serialize and deserialize? Or is it meant for simple objects?
JsonConvert can perform deep (recursive) serializing of objects.
Quote:
Originally Posted by
neef
I also read the article imploring programmers to not use BinaryFormatter and was sufficiently convinced to try something else. Xml_Converter seems to have the same problem as I described above: no deep serialization.
You can read more about ProtoBuf (there is library written by the same guy from Stack Overflow - Marc Gravell) which makes serialized data much smaller than Json.
Quote:
Originally Posted by
neef
I may just eliminate the List(Of...) for my stats and use conventional arrays double arrays redimming them every new season.
List(Of ...) is a good choice when you are scanning linearly objects. If you have only 3 items inside there is not difference in performance to arrays. For quick lookups (not related to the topic loading data from disk) Dictionary(Of TKey, TValue) is really good. All depends how you structured your data and your objects.
Still I can say that saving and reading data is not so slow. You can check your object m_League, which (as you wrote) is not what you explained in the first post. List(Of SomeKindOfInteger) is maybe the most simplified structure and now you say your objects are completely different?
Re: Better Option Than List (Of...) For Disk Load Time
The mini-project I worked today on was incorporating multiple seasons worth of data (not just one which was already in place) and setting up the backbone of that to work. I added the stats in the form of The List(of...)s which contains literally thousands of items of data to a class called CLPlayer which is in turn owned by CLTeam which is then owned by CLeague (m_League is the object created from that class). That caused the file load time to increase. I didn't explain that level of detail in the first post. Sorry for the confusion.
Just to give you an idea of the actual numbers I'm working with (and the numbers involved in a real football season). I have 60 stat categories per player. Multiply that by 3 for each season type(pre, regular, and post) then multiply that by 53 players each on 32 teams is over 300,000 stats (and that's not counting free agents not on teams). The file size went from 48 MB to 75 MB adding the List(Of...) career stats.
Thanks for your help. I'll check out deep serialization Json and ProtoBuf and I forgot all about Dictionary, which is a great option.
Re: Better Option Than List (Of...) For Disk Load Time
Serializing in-memory data, which (usually) is best designed for easy and fast access by the application, may not give you good file size.
I don't want to dig into details, but if you try SQLite, you may get better data structure (if you model the data in the database correctly), get faster processing and implement something that may grow in the future to something bigger like MSSQL. SQLite is still a file on your computer, but you can do much more via SQL queries.
Re: Better Option Than List (Of...) For Disk Load Time
Yeah, I don't know a lot about Json but it doesn't seem like the right tool for the job. A true relational database management system (like SQLite, SQL Server, MySql, MS Access) is what I'd use. They make it very ease and fast to read and write data. Most of the time you only need a small amount of the data for any given process, reading and writing all the data is a waste of time.
Re: Better Option Than List (Of...) For Disk Load Time
Storing to a database should provide a noticeable improvement to file size (perhaps reduce it from 75 MB to about 40 MB or less) and therefore noticeably improve the speed of loading/saving - and the save speed will be improved even more if you keep DataTable(s) open while your program is running (as then only data that actually changed in some way needs to be saved). Depending on how much data tends to change each time, the save speed could be improved very dramatically.
Unfortunately setting up the database and code to save and load the data will take quite a bit of effort, so it isn't a 5 minute fix.
Re: Better Option Than List (Of...) For Disk Load Time
Quote:
Originally Posted by
si_the_geek
Storing to a database should provide a noticeable improvement to file size (perhaps reduce it from 75 MB to about 40 MB or less) and therefore noticeably improve the speed of loading/saving - and the save speed will be improved even more if you keep DataTable(s) open while your program is running (as then only data that actually changed in some way needs to be saved). Depending on how much data tends to change each time, the save speed could be improved very dramatically.
Unfortunately setting up the database and code to save and load the data will take quite a bit of effort, so it isn't a 5 minute fix.
Having a look at EntityFramework and Linq can help with this - if the structures aren't overly complicated letting EF create the DB Schema and handle the querying might be a simple approach.
Re: Better Option Than List (Of...) For Disk Load Time
Bad data(base) design can't help for size and performance. Everything started from few lists of integers and now it reached complex objects with nested structure.
So it I am really curious why so much data is serialized? Is it really necessary?
Usually data in memory is kept denormalized for speed. In relational databases it is kept mostly normalized (for several reasons). Similar rules are applied for objects that are serialized.