[RESOLVED] Multiple backing-source data

**Merrion** · Feb 19th, 2014, 02:24 PM

So I'm thinking of having a mixture of database and flat file data in my latest application - any pointers to how this can be done while still retaining the unit of work happiness of something like Entity Framework?

**Niya** · Feb 20th, 2014, 02:23 AM

I usually do this when I have settings or other application related data to store. All the customer and product related stuff goes in the database and all the app related stuff can go into an XML file or My.Settings. Is this the type of thing you're talking about ?

**Merrion** · Feb 20th, 2014, 02:44 AM

Yes - pretty much. One thing we have is massive numbers of stock prices...at the moment held in one truly huge table but the idea is that they instead get written to one blob per stock / date and an "end of day" price is the only thing written to the database. Then the process is that if you only want an end of day price I go to the database but if you want intra-day prices I find the right file and stream it. This is to massively increase the parallel-ism of the system.

**Niya** · Feb 20th, 2014, 02:54 AM

Hmmm...I donno. To me the whole thing can stay in the database. The data is all related and separating them defeats the purpose of having a database in the first place. Assuming you're using SQL Server, Access or Oracle, you have to realize that these are called relational databases. Its meant to store data that is related. The end of day prices and intra day prices are related to the stock. These pieces of data belong together. Why are you considering separating the data ?

**FunkyDexter** · Feb 20th, 2014, 04:59 AM

These pieces of data belong together

I agree! If you really want to separate the data you can do so by having the intra day and end of day prices in separate tables but I see no good argument for separating one off into files. You're certainly not going to find it quicker to retrieve a price from a file than a table, quite the opposite in fact. And the problems you're going to introduce around transaction management (I assume that's what you mean by "Unit of Work") are going to be horrendous.

Is the goal to give dedicate hardware resource to an occasional expensive operation (reading intra day prices) so it doesn't impact on a frequent cheap operation (reading an end of day price)? If so you can always configure your database to put the intra day table in a file on one disk and the end of day table in a file on another disk. Or if you wanted to keep a unified view of all prices while still splitting the workload you could use horizontal partitioning.

**Merrion** · Feb 20th, 2014, 05:12 AM

The problem with a "Prices" table is that in fact the things in it are largely uncorrelated - for example the price of wheat and the price of General Motors at 12:15 would be consecutive records but in fact they are in to way the same.

Fortunately the thing about prices is that the records only go on the end of the table - once a price is recorded it cannot be changed, so any intra-day prices prior to today care read-only. So - once I have moved these into their own files then they can be distributed over the content-delivery-network so the file is available near the person who wants it. Each file name will be [stock id].[date].dat and because I know the price frequency I don't need to store the time in the file - it is effectively an array index.

**FunkyDexter** · Feb 20th, 2014, 05:51 AM

The problem with a "Prices" table is that in fact the things in it are largely uncorrelated

An index on Product and Time would sort that out for you. That's an important thing, records in a database aren't "consecutive" because they're not ordered except within the context of an index. You can have as many indexes as you like so "consectutive" becomes a concept you can have complete control over.

If you want to get data near to the person who wants it, use local databases replicated from a central source. In a situation like yours where the data is read only you'd use a simple publisher-subscriber model which is actually very easy to set up. And you could presumably set the refresh interval to a day at a time so the network traffic would be absolutely minimal. And your "Unit of Work" concern is neatly addressed in a replicated enviroment because the system will simply use a three phase commit where apropriate.

About the only good argument I've ever come across for splitting data across files and a DB was where a company needed razor fast writes into the system becuase multiple clients would all flush their data in at around the same time of day. Writes were actually made into a text file (which is muich quicker than a DB insert), effectively allowing deferal of the DB insert to a quieter time. For finding and reading data, as you seem to be, then having that data in a DB rather than a file is going to be orders of magnitude quicker - particularly for the "finding" part.

**Merrion** · Feb 20th, 2014, 06:05 AM

Aha - lightbulb moment.
If I partition the table by stock type I effectively have one file per stock type anyway but can still use SQL and EF. Hurrah - much easier...thanks all.

Thread: [RESOLVED] Multiple backing-source data

Thread Tools

Display

[RESOLVED] Multiple backing-source data

Re: Multiple backing-source data

Re: Multiple backing-source data

Re: Multiple backing-source data

Re: Multiple backing-source data

Re: Multiple backing-source data

Re: Multiple backing-source data

Re: Multiple backing-source data

Posting Permissions