Wednesday, September 07, 2011

Why NoSQL is in fashion?

Couple of years back NoSQL was relatively unheard term. It has become a new buzzword in town, catching people's imagination. Almost all big daddies of Internet (Google, Facebook, Yahoo, Twitter, Amazon, Linked-in etc.) are using NoSQL at massive scale and many of them have their own NoSQL products. Every month there is a launch of new NoSQL product catering to some specific use cases / requirements. So what are the reasons behind NoSQL's rise to popularity?

Background

RDBMS and SQL has ruled the world for last 3 decades. They were (are) the defacto standard for storing business data. In fact there was no alternative to RDBMS and SQL combination. When RDBMS were developed the disk space was scarce, hence a lot of thought has gone around saving precious disk space. Development in hardware technologies have made available literally unlimited storage space at lower prices. Evolution of Internet in last decade has paved way for altogether different kind of applications and use cases, which were never thought earlier. Web2 applications (say social networking) have entirely different requirements than typical enterprise applications. Other than that Internet has changed the way application developers think about the data and its usage & storage.

Requirement

Nowadays enormous amount of data is being produced everyday and the rate at which data is produced is also increasing. Data size is so huge that organizations spread it across data center as it does not fit in a single rack or data center. Though data availability is also an equally important factor in using multiple data centers for storage. Applications have to available 24x7 no matter what. Data has to survive the data center failure as well, forget about failure of a disk. Instead of archiving data, Facebook type of applications keep it online forever.

Social networking sites are coming up with new features quite frequently and sometimes rolling back them. At times their new features were never thought about earlier. They don't want to be confined by the schema constraints of RDBMS which is quite difficult (though not impossible) to change. Support for evolving schema has become a necessity and NoSQL solutions are a right fit there instead of RDBMS.

Data size, flexible schema and availability are the 3 key factors where traditional RDBMS do not perform well after a certain limit.

Thought Process

Web2 applications has changed how people think about application architecture. It is an altogether different thought process then what it was few years back.

While designing an enterprise application we used to think about "how to store the data" but today the primary thought is "how we will use the data". This is a paradigm shift. Data storage is designed based on how that is going to be used, leading to optimized performance for a given requirement at times at the cost of flexibility.

"Referencing" entities was a wonderful feature in RDBMS but for large scale web 2 application that does not matter at all. For them instead of referencing "embedding" is the killer feature. Storing related data (say personal detail and address) together makes sense as they don't have any meaning independently. This is similar to denormalization in RDBMS.

For many features of web 2 applications ACID transactions are also not a primary requirement. Say if a tweet does not gets published many users won't mind to type that again and hit the enter button as long as that does not happen frequently. If avoiding ACID transactions increases the throughput many fold for not so critical features, what else you want :-). It is similar to old MySQL though with much larger data-set than what MySQL can support.

Conclusion

There are several use cases where NoSQL perform much better and it makes perfect sense to use NoSQL instead of trying to scale the RDBMS by using expensive methods. RDBMS’s are really good at doing what they do, which is storing flat, relational, tabular data in a consistent manner and getting wonderful reports out of that data. They still remain the best solution for storing relational data.While NoSQL databases are good at performance, availability, schema-less persistent etc., which has become a basic requirement for today's applications. Support for these features is the main reason for meteoric rise to fame of the NoSQL.