Analytics on MongoDB is a completely different beast from typical business intelligence setups that you’re familiar with. Don’t expect to just fire up your existing BI tool, click at MongoDB, and let it run. It’s far more complicated than that. This article explains why MongoDB has essentially a different approach, which is necessary to understand before exploring your options.
Firstly, a general overview of a relational and non-relational database!
MongoDB’s approach is focused on integrating the important capabilities of relational databases with the innovations of non-relational databases like noSQL . Their key concept is to combine capabilities of relational database tools like Oracle, MySQL and others, with requirements of modern applications.
A relational database serves the purpose of structuring data, such as transactions, in tabular form, and offers features that remain critical today for businesses.
So what are key differences between relational vs non-relational databases?
Relational databases offer features that allow usto query, analyze, dissect and display structured charts of data found in these SQL databases. Relational databases enable you to:
- Utilise Structured Query Language (SQL) & Database Indexes - Users are able to access and maneuver their data in mature ways to assist both operational and analytical applications. The concept of a relational database enables you to connect information from different tables through the use of database indexes, which is essential for applications that are heavy into data analysis.
- In a relational database, data is stored in the form of tables and rows. Each row contains information about specific entries and each column contains a very specific type of data.
- A well-designed schema decreases data repetition and prevents tables from becoming messy. This is an important feature for many businesses, especially those who store large amounts of financial data.
However, with MongoDB, the value of a NoSQL database is mainly its simplified design, horizontal scaling and control capabilities over the availability of data.
- Adaptable Data Mode - A NoSQL system supports any form of data structure. Either document, graph, key-value or wide-column, all of them offer a flexible data model, which makes it easy to store and incorporate data of any structure and allow vital adjustments of the schema without impacting performance.
- MongoDB data is stored in various hosts of databases. MongoDB data is stored in a collection of JSON documents. The Mongo import feature can import JSON, CSV and TSV file formats.
- Scalability and performance - This important feature is an important differentiator between the two types of databases. With a NoSQL database; it has been built to scale, they all include sharding - a method for distributing data across multiple datasets, and partitioning - breaking down data into chunks. This allows the database to scale, having theoretically unlimited growth with the maximum rate of production and lower inactivity than a relational database.
- Speed - As MongoDB is a document-oriented database,it is easy to access your documents by indexing. Hence, this provides a faster query response. Some research has shown that the speed of MongoDB could be 100x faster than a relational database.
- Ad-hoc querying - MongoDB is very flexible and has advanced ad hoc query features.
- Easy setup - MongoDB is said to be easier to set-up than a relational database management system (RDBMS).
So what differentiates relational vs non-relational databases for analytic purposes?
MongoDB wasn’t initially created to support advanced analytics. But as time has progressed and the requests for real-time data from the database grows, this has driven MongoDB to build more analytic capabilities directly in the NoSQL database.
However, when enforcing Operational BI solutions, many users usually think of duplicating the operational data to an operational data store (ODS), either a data warehouse or data mart and analysing it there. This will immediately cause problems of how to update the information stored, fast enough to support the requirements of the users. With true real-time data streaming approaches, traditional batch extract, transform and load (ETL) tools will not be suitable. Furthermore, the ability for typical data warehouses to provide real-time updates and modifications of existing data is questionable.
Nevertheless, there are always advantages and disadvantages. Avoiding the need to store and maintain an alternative copy of a large body of data is generally good. If the analysis does not require joining data from another source, using the original source can also be helpful.
There will always be questions about performance impacts on the operational sources, and often security concerns as well. However, when comparing the types of queries possible for a NoSQL store or a document-oriented database, it is commonly acknowledged that normalizing data in a relational database provides a more query-friendly structure, which allows a wider variety of queries to be controlled. In the case of using an operational BI tool, queries are usually much simpler.
MongoDB supports several analytic capabilities.
It supports Apache Spark, the popular data science framework that is familiar with data scientists, engineers and data analyst. MongoDB is currently working on improving large-scale analytics features that will help users to perform analyses within the platform and converting data to charts, as well as a parallelized query execution engine and column-store format, which will speed up analytics by storing the data in a more efficiently.
MongoDB also offers a SQL-based BI Connector under their enterprise plans, that lets users explore their MongoDB data through different business intelligence tools like Holistics, Looker, Microsoft Power BI and more.
Common issues with MongoDB
Every company has a different set of data requirements and environments. An effective data infrastructure is built to meet business challenges, and to ease the process of data reporting. Depending on your organization’s needs, MongoDB may or may not be the best solution over a relational database.
Here are few things to consider when you are weighing your options:
- Joins are not supported - MongoDB does not support joins like a relational database, though workarounds exist, such as using Java to code joins manually. However this is a lengthier process and it may slow down your performance in terms of execution.
- Different skill-sets and tools required - There are few question that you need to answer when you are considering using MongoDB. How invested are you in your current relational database? Is your team familiar with SQL? If you are a smaller company with less resources to invest, MongoDB is one of the easier NoSQL databases to set up and manage.
- Data location - When working with MongoDB, you’ll need to be aware of a few things. Firstly, where is your data coming from, or will you be pulling data from certain locations? MongoDB’s powerful query application supports this system well.
- Data Size - Your BSON data or documentation size is limited to 16MB.
- Memory usage - MongoDB has high memory usage, it stores key names for each value pairs. Due to no functionality of joins, there is data repetition. This will result in unnecessary usage of memory.
Yes, these points above may demotivate you to consider moving into a MongoDB database. Hence, I have covered both the positive and negative points of the system. We have to understand the overall usability of the system.
The ability to perform joins is a key difference between the two systems. Not being able to do joins plays well for large volumes of data. If you have to fragment your data between numerous servers, joins may be quite expensive. MongoDB, like other NoSQL tools, plays well in the Web space with organisations requiring fact processing of large volumes of data with emergent processing needs. However, there comes many challenges with it as well. The use of XML and JSON are complicated and does not fit well into a relational system as there are several different ways of joining document collections, mainly when there are embedded lists and objects. There is nothing that compares with the ease of SQL, for example, to request a search through a collection. This would be a hectic and lengthy process using NoSQL; to look through a bundle of objects within each document, to see if you get a match, and to return the items that matches the list.
What do you think?
Do share with us your thoughts and comments on MongoDB for analytics in the comments section below.