Skip to main content

What exactly is NoSQL ?

Standard for NoSQL Classification
Reading and writing are the basic building blocks of a platform, whether it be a storage system or a service development system including a PC. Think about a football match. You must have two teams - and when one on the offense, the other must be on defense. Let's take a few examples.
The theoretical basis of all computers with built-in programs, in the John von Neumann structure, is a mathematical model called Turing machine. The model consists of the following components:
·      A tape that is infinitely long and has an infinite number of cells, each of which contains a symbol.
·      A header that can read a symbol from and write it to a cell, and can traverse the cells to the left or right.
·      A state transition table
(The current state, the symbol read in the current cell) ? (next state, the symbol to write in a cell, left/right)
Suppose a state transition table program is a service. Then what a platform does is reading and writing. Thus, read/write is the fundamental feature of any system of any size, from the micro architecture of a CPU to cloud services.

·      Intel core2 micro architecture: In its hardware configuration, everything except the pipelining, out-of-order execution, and ALU is the configuration for reading and writing operations.
·      MS Windows Azure: A storage that reads and writes is also the core part of a cloud system such as Azure.
A program repeats the process of reading-calculating-writing; it reads written data, processes it, and writes the intermediate result data. The same rule applies to Internet services. The reason for this lengthy explanation of the importance of reading and writing is to remind The Platform Magazine readers - who already know such basic knowledge - of the core features of a storage system such as NoSQL.
NoSQL System Standards
The following is a summary of standards that will be used to classify the NoSQL systems, which are the topic of our discussion.
Data model and query model
Although a Turing machine performs read/write in a single cell on a tape, on a practical system much more  diverse and complex read/write operations are required. I will define a storage model that is the target of read/write operations as a data model, and the command for such operations as a query model.
query model can be regarded as an API, but to emphasize that it is a reading/writing operation for a storage model, I will use the term query, which is often used in RDBMS. A data model and a query model are closely related to each other, like the two sides of a coin.  In a sense, the relationship between a storage model and a query is similar to that of an instance and a method that is defined in a class in an object-oriented language.
Data distribution
Sometimes, the processing power of a single computer is not sufficient to process a query requested by a service. This happens because the resources that a computer can use to process the query, such as the CPU, memory, disk, network capacity, and throughput, are limited. There is a solution to this problem, called thescale-up approach, but it is costly and often ineffective when the scale of data to be processed is huge, due to the nature of its architecture. For this reason, the scale-out approach, an alternative, is used to distribute the resources required to process queries between two or more computers.
The basic principle of distribution in a large-scale service is that the greater the separation within the required level of service, the more throughput for the query is required. It would be great if there was a method to scale-out to infinity; however, as it is not just realistic to assume that the scale-out approach method will always satisfy all the scale-out requirements, we will limit it with a condition, "within the required level of service."
Replication to increase availability
Data stored in a system may not be read or written when a failure occurs in the program or the resource action that are necessary to process queries. To ensure high availability of a service, resources for data and query processing must be distributed to and stored in two or more locations. This is what we call 'replication.'
REPLICATION MODELS: THERE ARE THREE MODELS IN REPLICATION, WHICH ARE DESCRIBED BELOW :
·      Transactional replication: A model that replicates at the transaction-level by using the Two Phase Commit (2PC) protocol (CUBRID RDBMS uses the transactional replication in its HA feature).
·      State machine replication: A model that replicates the events that occur from the source system to the target system by using atomic broadcast (transfers total ordering messages). In this model, the target system is considered a Finite State Machine.
·      Virtual Synchrony: A model that is often used in group communication and the like. It ensures that events that concurrently occur in a process group are delivered in the same order in which they occur. (This is for in-memory data).
Data consistency model for operations
RDBMS operates on a concept of transaction that can be summed up by the ACID (Atomicity, Consistency, Isolation, Durability) attribute. A transaction logically holds one or more queries; when such a transaction is applied to a DBMS, the system protects a certain attribute from being changed to maintain data consistency. The NoSQL system is no exception. We will explore how it protects data and data consistency from user operations.
RDBMS vs. NoSQL
Before studying the features of a variety of NoSQL systems, it would be useful to learn the background history of the NoSQL storage system.
NoSQL as antithesis
Here is a little mnemonic device. If NoSQL is No SQL, it must be an anti-RDBMS system. The logical flow explaining why NoSQL is the opposite of RDBMS is as follows:
·      RDBMS defines the SQL language by using a data manipulation method that can model relational data (relation and table), and supports transactions with the ACID attribute.
·      The scale-out approach which distributes data before processing it, is in high demand due to the high data-throughput demand of Internet-scale services.
·      The scale-out approach of existing RDBMSs is intended to maintain operational integrity of the relational model, which is the core of RDMBS, and transaction operations.
It is difficult to scale-out while maintaining integrity. The same problem arises when distributing or replicating data. So we make the ACID-based transaction attributes that are ensured by DBMS or the replication integrity model less strict before we use them to scale-out.
Scale-out of RDBMS
Scaling-out is difficult in RDBMS. If scaling-out to several thousand RDMBSs were easy, prominent database developers such as Oracle would have released a product or two for that a long time ago.
Now let us assume that the tables of RDBMS are distributed to several computers, and each piece of data is replicated before it is stored for high availability. First, executing distribution transaction while satisfying ACID is difficult in scale-out.
·      To satisfy the atomicity attribute of ACID, the distribution transaction protocol such as the 2PC protocolmust be used in all systems that are related to a specific transaction.
·      To match the isolation level among ACID attributes, data must be locked in general. The units of Lockingcan be a record, a table, or an index.
·      Therefore, to satisfy the Atomic and Isolated attributes in a distributed environment, all related locks must be applied to each system while the distribution transaction protocol is being processed; the higher the service load of the system, the heavier the lock competition becomes. This is what makes scaling-out difficult.
Another problem is that there is a limitation to scale-out by replicating and distributing data.
·      The transactional replication method using the 2PC method has a problem in which a transaction fails and becomes unavailable when one of the systems related to the replication process fails. In addition, the performance degrades when several systems are involved in the replication.
·      As an alternative, it is possible to pass the Write Ahead Logging (as known as WAL) data of a DBMS to the replication system and have it apply the data. If we consider the system in which replication occurs as a master (or primary), and the system to which the changes are applied as a slave (or backup), they are configured either as master-slave or multi-master.
·      When configuring master-slave: This is the most commonly used replication method. The speed of the process is in inverse proportion to the number of systems involved in replication in this method.
·      When configuring multi-master: It is difficult to solve the collision between data write processes or prevent it from happening when there are several masters. In The Danagers of Replication and a Solution  Jim Gray conducted a study on this issue (Received Turing Award in 1998 for his contribution to the related database and transaction processing) .
Sharding by developers
Generally speaking, it is extremely difficult to scale-out while satisfying the ACID attribute in the DBMS data model. For this reason, to scale-out based on DBMS, one must simplify the data model itself, partition the data by the number of N, and then execute the query within a separate piece of data.
The unit of partitioned data is called a 'shard'. Distribute and service N number of shards to M number of DBMSs. DBMS does not manage shards. This is the responsibility of the service developer.
The sharding method is focused on developers and has the following difficulties:
·      First, a shard must be defined.
·      Shard mapping:
The basic storage unit of a DBMS is the table. Because a table can contain one or more shards, there is a need to know which shard is mapped to which instance of a database. The locations of shard tables must be known to the application.
·      Shard distribution/redistribution:
As each shard is different, so are the throughput requirements and data size. As a result, a developer must add a new instance to or delete one from the database, and redistribute the shards manually. This is a painstaking and labor-intensive process.
The mapping information that has been modified by the distribution/redistribution process must be applied to the application.
Management, such as configuring for the replication, is necessary when modifying data.
Approach of NoSQL
The Internet-scale data storage system aims for the scale-out method through data distribution, and replicates data to ensure its high availability. We have already examined the fact that scaling-out is difficult to carry out under the process that supports a transaction of ACID attribute and the relation model of RDBMS. Now, we will look into the territory of NoSQL to see how it handles scaling-out.
Simplify data model for sharding
The data closure of a query operation will be the default unit of distribution, and the data model will be simplified so as to create an appropriate level of shard, which is a collection of a certain number of data closures. The default unit of distribution is key.
·      The simplest model reads and writes the whole data by recognizing it as an immutable object. Of the key-value storages, Dynamo and Membase are examples of this model, both of which support blob type data models.
·      Chunk, a component unit of a file in the Google file system, has a 64-bit identifier that is globally unique. Operations based on byte range (read/write/append) can be done with chunks. While the storage abstraction is provided to users as a file, the operation of this file is actually performed in chunk at the API level.
·      Some systems provide slightly more complex data models than that of the blob-type immutable object.
Key-Value DB: A database that provides a query containing information on what operations can be done on a specific structure, by modeling data in the Abstract Data Structure (as known as ADS) such as list and set. Redis and Tokyo Cabinet are most popular.
Document-oriented: A database that stores random and irregular objects by using the object representing notation, and provides queries based on the object of a property, such as CouchDB orMongoDB.
·      A system that supports the multi-dimensional map type data model in the row-column family-column or row-column-timestamp structure, such as Bigtable or Cassandra.
The default query supports reading and writing at this level of distribution. However, it may also support a separate query method, such as map-reduce in order to perform aggregation operations in multiple distribution units. In addition, document-oriented storage systems such as MongoDB allow users to apply an index other than document ID for a specific property of a document, thereby enabling users to specify additional quick access paths.
Distribution
·      Hash-based:
The key has no meaning in this method. It distributes data natural to throughput demand or data size.
To minimize mapping and redistribution problems resulting from the addition or deletion of nodes, it often distributes data based on consistent hashing.
·      Index-based:
It ensures a range query by maintaining the key-based order-preserving.
·      Metadata server-based:
A method that is used to manage the location information of a distribution unit from a separate metadata server.
Makes the ACID attribute of a query less strict.
For the consistency of query operation, it uses a less strict attribute than the ACID attribute of RDBMS. The relaxation phenomenon of query attribute is noticeable if you consider replications as well.  While there are systems that maintain the ACID attribute like CouchDB, in many cases, it makes Consistence or Isolated less strict for high availability.
We will now examine how some of the systems make attributes less strict in detail.
·      In GFS, multiple clients may write or execute record append operations to a file simultaneously. Although concurrent writings of multiple clients may show the same value for all replicas, the write of each client is undefined (in the dissertation term). The record append operation ensures consistency in the file area of each replica that is used at least once for atomicity.
·      Dynamo ensures view consistency for the replications and reads in the sloppy quorum method, as requested by a service that it to be 'always writeable.' The Quorum protocol determines the maximum number of systems to be used for R (read) , W (write), and each operation in N number of systems that are involved in the replication process, in order to prevent inconsistency of information from happening even when the network is partitioned (R + W > N, W > N/2). To cope with temporary system failure, Dynamo allows other systems to execute a key query that is not directly involved with them, by using the Hinted-handoff method (sloppy quorum). It also allows each system to manage data versions for possible data inconsistency and to determine the casual order of data modification among the systems that are involved in the replication by using the vector clock technique, so that each of them can automatically recognize which data is new (reconciled), and allows a client to determine such data when it is impossible to determine them, by transmitting versions of data to it.
·      Cassandra adjusts the level of consistency by allowing users to determine how many reads and writes must be successful in a node with N number of replication relations. It provides specifiable options when reading (Zero, Any, One Quorum, and All) and when writing (One, Quorum, and All). It also provides consistency (view consistency) when reading, by read-repairing through the timestamp-based ordering, similar to what Dynamo does.
We will now classify the NoSQL systems.
NoSQL Classification
The following is a classification of commonly used NoSQL storage systems:
System
Data Model
Distribution Unit
Distribution Method
Replication
GFS
GFS
chunk
Metadata Server
master-slave
Big table
multi dimensional map
row
index (B+ tree like)
Dependon GFS
Cassandra
multi dimensional map
row
hash/index
optional
MongoDB
Document (JSON)
document
index
master-slave
CouchDB
Document (BSON)
document
index
master-master
Dynamo
Blob
blob
hash
sloppy quorum
A method in which a master node that is unique in the whole system lends write authority to a chunk to one of the chunk servers (replicas) that are related to the replication is used. The replica that borrowed the authority from a master node is called primary. To reduce the burden of a master node, the primary continues to have the write authority as long as it is alive (piggybacking to a heartbeat message).
Basically, it executes consistent hashing. However, users can allow it to decide which Partitioner it will use through configuration  (random, order preserving, and collating order preserving).
Able to create a user-defined index
Index for ID and Sequence Number. Sequence Number increases per update
What Kind of Storage System Do I Need?
Now we will explore what needs to be considered when answering  a question like "What kind of storage system should I use for the service I am developing?"
RDBMS is still the standard
It must be considered before all else. RDBMS is even more important when read/write operations require ACID-type consistency. In most cases, implementing an RDBMS and additional storage systems will solve the problem of distributing the service load.
·      Read distribution through replication
·      The RDBMS scale-out technology in OLTP domain, such as MySQL Cluster, can also be considered.
·      RDBMS + Cache + Storage system
If the performance of RDBMS is not terribly good, you can use a caching system such as Arcus for time-consuming query items or frequently used items. Of course, you must carefully select a consistency model that is appropriate for the data that is being cached.
If a large data storage is required, you can also use a separate storage system like OwFS.
·      Finally, there is a sharding technique implemented by developers. That is, if you're willing to deal with replicating, troubleshooting, and restoring processes as well as the distributing and redistributing of shards.
NoSQL
Do you think it is difficult to implement a service on an RDBMS platform? Then use NoSQL. NoSQL is a system that solves the difficulties associated with replication, troubleshooting, restoration, and shard distribution/redistribution, regardless of the type of a service. It can be an ideal choice if the data model and consistency of the service are supported by the NoSQL system.
What exactly is NoSQL?
A NoSQL database provides a mechanism for storage and retrieval of data that uses looser consistency models than traditional relational databases. Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. NoSQL databases are often highly optimized key–value stores intended for simple retrieval and appending operations, with the goal being significant performance benefits in terms of latency and throughput. NoSQL databases are finding significant and growing industry use in big data and real-time web applications. NoSQL systems are also referred to as "Not only SQL" to emphasize that they do in fact allow SQL-like query languages to be used. wiki 

Why are developers considering NoSQL?
More flexible data model
The relational model takes data and separates it into many interrelated tables that contain rows and columns. Tables reference each other through foreign keys that are stored in columns as well.  When looking up data, the desired information needs to be collected from many tables (often hundreds in today’s enterprise applications) and combined before it can be provided to the application.


As you can see, there are three primary concerns you must balance when choosing a data management system: consistency, availability, and partition tolerance.
Consistency means that each client always has the same view of the data.
Availability means that all clients can always read and write.
Partition tolerance means that the system works well across physical network partitions.

Scalability and performance advantages
To deal with the increase in concurrent users (Big Users) and the amount of data (Big Data), applications and their underlying databases need to scale using one of two choices: scale up or scale out. Scaling up implies a centralized approach that relies on bigger and bigger servers. 

LIST OF NOSQL DATABASES

Wide Column Store / Column Families

·         Hadoop / HBase API: Java / any writer, Protocol: any write call, Query Method: MapReduce Java / any exec, Replication: HDFS Replication, Written in: Java,

·         Cassandra massively scalable, partitioned row store, masterless architecture, linear scale performance, no single points of failure, read/write support across multiple data centers & cloud availability zones. API / Query Method: CQL and Thrift, replication: peer-to-peer, written in: Java, Concurrency: tunable consistency, Misc: built-in data compression, MapReduce support, primary/secondary indexes, security features. 

·         Hypertable API: Thrift (Java, PHP, Perl, Python, Ruby, etc.), Protocol: Thrift, Query Method: HQL, native Thrift API, Replication: HDFS Replication, Concurrency: MVCC, Consistency Model: Fully consistent Misc: High performance C++ implementation of Google's Bigtable. 

Document Store
·         MongoDB API: BSON, Protocol: C, Query Method: dynamic object-based language & MapReduce, Replication: Master Slave & Auto-Sharding, Written in: C++,Concurrency: Update in Place. 

·         Elasticsearch API: REST and many languages, Protocol: REST, Query Method: via JSON, Replication + Sharding: automatic and configurable, written in: Java, 

·         Couchbase Server API: Memcached API+protocol (binary and ASCII) , most languages, Protocol: Memcached REST interface for cluster conf + management, Written in: C/C++ + Erlang (clustering), Replication: Peer to Peer, fully consistent, 

ADVANTAGES AND DISADVANTAGES
Let's look into the advantages and weaknesses of NoSQL by data model.
·      Key-Value (blob)
·      Simple and fast.
·      Often supports only atomic write/read at key level. In this case, there is no way to serialize the values of several keys as they are processed.
·      Get/put is very fast when the memory serves as the storage (Ideal when the size of data fits in the memory).
·      While the operation for a single Key is fast, an operation involving multiple keys may be slow as the network transmission is frequently delayed.  (There is a huge difference in speed when selecting data with 100,000 rows in an RDMBS table and when reading a key-value 100,000 times. The latter is significantly slower. This difference can be perceived when there are only 100, or even 10 items to be processed).
·      Key-Value (Structure)
·      A model that provides ADT such as List or Set; able to hold multiple values for a single key. Unlike the Key-BLOB model in which a single value is assigned to a single key, using this allows users to store multiple sets of data with a single key.
·      Processing takes a little longer than a model that performs a simple get/set  (However, the difference is barely noticeable. The performance differences inherent to a data model can be overcome by effective implementation of system.)
·      Document oriented
·      A data model that can add a random property without a schema.
·      As the name suggests, it is an optimal structure for storing document data such as JSon or XML.
·      It often maintains the order by document id or the value of a property.  (This enables efficient operation for the range of the corresponding key value, and  thus provides queries.)
·      The process overhead of this model may be larger than that of a key-value model or a key-structure model, as it must parse the data and compute it in the memory when processing a query.  (Performance will degrade when handling a large document.)
·      Multi-dimensional map
·      For Bigtable, data is mapped by row-column-timestamp. The data is binary.
·      For Cassandra, data is mapped in the form of row-column families-column. The data itself is binary.
·      Both models allow data to be modeled by structuralizing the grouping and access methods for data (column). The detailed process of modeling is beyond the scope of this document. For more information on modeling, refer to the Cassandra homepage and.
In Conclusion
If it is the mission of a scholar to find the unmovable truth from the uncertain, it is the mission of a developer to find a way to change what seems to be unchangeable.
For Not only SQL




Comments

Popular posts from this blog

Php Interview Questions

1. What Is PHP ? 2. How can I disable the output of error messages inside the HTML page? 3. Can I return other file formats (like Word, Excel, etc) using PHP? 4. Is there any way to force PHP to do garbage collection before the end of the request? 5. Why does require($file_name) in a loop just include the first file repeatedly? 6. Can you include and call C libraries in PHP scripts? How? 7. What's the best way to start writing a PHP program? 8. Passing variables with REQUIRE function (part II) 9. I use a /cgi-bin/ad.pl for displaying rotating banners at the top of my html files. How can I insert the output of this cgi in an "included" file ? 10. please tell me how to let a html document read content from a .txt file, spit this out in a table, and how to update the specific file with a form 11. How can I add authentication to my site with PHP? I have authentication working only with one page. 12. Where can I get documentation for the Zend API? 13....

5 Significant Reasons For Having A Good Website For Your Business

In this era, where everything is now available online, and people are becoming more and more Internet dependant, any company needs to have a well-built website. People tend to check any company online before getting any service from them. A well-built website will help to make you a good impression on your consumers, and they will gain reliability on you. A well-maintained site is the essence of a good company, and without a website, you will fail to make that vital first impression on your customers. It would be best if you also used   website promotion   to promote your website. The certain benefits of having a website are mentioned below. Accessible: A website makes your business available to people 24*7. People can search your website anytime they want and visiting your website will provide them with relevant information that they are looking for. If you hire a  responsive   web design company ,  then you can be sure of your website to make an excell...

Laravel CRUD With MongoDB

Laravel is the most popular framework of php. laravel better than another PHP framework because it handles the command base. so let us see about laravel 8 MongoDB CRUD tutorial example. it was released on Sept 03, 2019. Now, we follow the below step for creating the laravel 8 MongoDB CRUD operation(Laravel 8 CRUD example). Overview Step 1: Install Laravel 8 Step 2: Configure MongoDB database Step 3: Install laravel-mongodb Package Step 4: Add Route Step 5: Create Model and Controller Step 6: Create Blade Files Step 1: Install Laravel 8 We are going to install laravel 8, so first open the command prompt or terminal and go to xampp htdocs folder directory using the command prompt. after then run the below command. 1 composer create-project --prefer-dist laravel/laravel laravel8_crud_mongodb Step 2: Configure MongoDB database After the complete installation of laravel. we have to database configuration. now we will open the .env file and add the MONGO_DB_HOST, MONGO_DB_PORT, MONGO_DB_DATA...