MongoDB

MongoDB Introduction NoSQL databases are non-relational databases that are used to store and retrieve data. MongoDB is a popular NoSQL database that is used to store data in the form of documents. Feature SQL (Relational Databases) NoSQL (Non-Relational Databases) Schema Fixed schema Dynamic schema for unstructured data Scalability Vertical scalability (scale-up by adding more powerful CPU, RAM, SSD) Horizontal scalability (scale-out by adding more servers) Complexity Tables with rows and columns, complex queries with JOINs Document, key-value, wide-column, or graph formats, simpler queries Transactions ACID properties (Atomicity, Consistency, Isolation, Durability) for reliable transactions BASE properties (Basically Available, Soft state, Eventual consistency) less strict than ACID Development Model Mature, with established standards More flexible and evolving rapidly Use Cases Well-suited for complex queries and transactions, e.g., banking systems Well-suited for hierarchical data storage, big data solutions, and real-time web applications Data Integrity High, due to ACID compliance Variable, depending on the specific NoSQL system and its configuration Query Language Structured Query Language (SQL) is standardized No standard; queries are based on the specific NoSQL database system (e.g., MongoDB uses BSON) Relationship Handling Efficient handling of relationships between entities Relationships can be handled, but often less efficiently than SQL databases; denormalization is commonly practiced Installation There are two ways of working with a mongoDB database, one is to run it locally on your machine and the other is to use a cloud service like MongoDB Atlas. For this tutorial, we will be using MongoDB with mongosh and compass installed. ...

June 27, 2024 · 4 min · 709 words · Aum Pauskar

Big data and hadoop ecosystem

Big data Just data Structured data: data that has a defined length and format for each record. It’s stored in a fixed format such as a relational database or spreadsheet. It’s easy to search and analyze. It’s used for transactional data. Unstructured data: data that has an unknown length and format. It’s stored in a free format such as a text file. It’s difficult to search and analyze. It’s used for non-transactional data. Semi-structured data: data that has a defined length and format for each record but doesn’t conform to the structure of a relational database. It’s stored in a semi-structured format such as XML or JSON. It’s easy to search and analyze. It’s used for non-transactional data. Types of data analysis descriptive: what happened? diagnostic: why did it happen? predictive: what will happen? prescriptive: how can we make it happen? Data management software Hadoop Hadoop is a framework for distributed storage and processing of large data sets using the MapReduce programming model. It consists of a distributed file system (HDFS) and a distributed processing framework (MapReduce). It’s written in Java and is open source. It’s designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. Its use cases include data lake, data warehouse, data hub, data science, and data engineering. It’s used by Facebook, Yahoo, LinkedIn, eBay, and Twitter. It’s core components are HDFS, YARN, and MapReduce. ...

December 5, 2023 · 20 min · 4093 words · Aum Pauskar