nosql database question

Printable View

Sep 8th, 2014, 10:00 PM
codesearcher

nosql database question

I have been exposed to RDBMS so I know about tables schema, fields, datatypes, primary key, foreign key and relationships.

Initially, I read that nosql is different from it. I have not tried any database model because as I also read, nosql are classified into 4. So I would like to ask our experience members if they can explain this better. What and when to use those classes of nosql db and etc.
Sep 9th, 2014, 04:57 AM
NeedSomeAnswers

Re: nosql database question

No SQL databases like MongoDB or RavenDB for instance could also be considered Object db's. In a NoSql db you do not create tables or relationships you store objects of some type.

The only one i have used so far is RavenDB (which is a document NoSQL db) and we used it in a web app to store Form Objects. So in our app a user can complete an Online form, they can save or submit it, and the form can then be accepted or rejected (well that simplifies things a bit but you get the idea)

We needed to store a Document which had fields and recorded values against them, and also a status as in (Saved | Submitted | Accepted | Rejected | Complete).

We were using RavenDB to stored these as saved document objects, when we loaded the object we were getting it to build the page dynamically too as the object describes the data and fields. As our data was relatively simple structure a NoSQL db fit perfectly.

NoSQL db's are fast as when you get your object you get everything you need, no joins to other tables or sub querying, everything is stored in your object. NoSQL databases are increasingly being used for Web Apps because of this.

As a NoSQL db is schema-less, if you need to start capturing some more data in your object at a later data, you just add it to your object, and store your object in the db there are NO database changes required, your new object does not have to be exactly the same as you old ones even if they are of the same type. This makes it easy to start capturing new data.

The other types of NoSQL db are -

Column - you store data in columns (but not tables) which consist of - (Unique Name, Value , Timestamp)
Key Value - stores data in a dictionary data structure
Graph - uses graph structures with nodes, edges, and properties to represent and store data.

I have not used any of these as they need to fit the type of data your collecting to be useful.
Sep 9th, 2014, 08:03 AM
FunkyDexter

Re: nosql database question

I used MongoDB for a while and, like RavenDB, it's a document database. I used it because I wanted it's schemaless capabilities. My experience was that it was great as long as storing documents is all you want to do. If you want any kind of relationality (don't know if that's a real word but it'll do) between the entities then you have to maintain it yourself which can be hell from a consistency point of view. (It's worth mentioning, in ths context, that a document isn't like a word document. It's just a collection of arbitrary attributes and values.)

We're also making heavy use of Hadoop with a HAWQ front end where I am now. Hadoop is a document database with really good distributed processing abilities (I'll come to that in a second) and HAWQ is a SQL front end (uses PostGres) for it. To give you some idea of the pain involved in maintaining consistency I was talking about it's worth considering the limitations of HAWQ. You cannot update a record and you cannot delete a record based on a criteria. All you can do is append records to a table, read records from a table or drop an entire table. It was impossible for the HAWQ developers to offer you the ability to update or delete while maintaining consistency.

So a good question would be, why would I want to use Hadoop and Hawq then? The answer is speed of access. It's capable of serving me up billions of records in seconds. An RDBMS cannot do that. We do deep analysis on big data so this is our number one concern. HAWQ actually does this by hooking straight into Hadoops file store (I won't pretend I understand this properly) but the truth is that it's really Hadoop that achieving it because a Hadoop database is actually distributed across a whole bunch of servers and they all process their requests in parallel - that's the distributed bit.

And you can leverage that distributed nature directly as well. You create what are called Map-Reduce jobs. These are jobs which get distributed out to all the relevent nodes in a Hadoop cluster. This means the processing happens next to the data, the results are passed back to the caller and collated (this is a gross oversimplification) into a single final result. This differs from the RDBMS distribution aproach which would be to pull all the relevant data back to a single caller and do all the processing there. I'm still only just getting my head around this stuff so don't quote it verbatim but I can tell you that the outcome is powerfully fast data access.