Trip to the Data Jungle

Trip to the Data Jungle

Everything You Need to Know about Data

  • introduction

Are you a Data Analyst, Data Engineer, Data scientist, or even a data consumer who just uses Excel sheets to store some data?

  • Are you lost in front of a laptop and a bunch of mysterious data files?

YES!!

But fear not, intrepid explorer! I'm about to be your data Sherpa, guiding you through the treacherous terrain of information overload🔍.
I will help you with your Data literacy journey You should know that data consumers should be good data literate!!

  • What is Data Literacy

Data literacy is the superpower 🪄of understanding, analyzing, and using data wisely. It's like having a data compass to uncover hidden insights, make informed decisions, and save the world from bad spreadsheets📃. It involves skills such as data analysis, data visualization, statistical reasoning, and critical thinking🧠.

Data literacy is the superpower 🪄of understanding, analyzing, and using data wisely. It's like having a data compass to uncover hidden insights, make informed decisions, and save the world from bad spreadsheets📃.
It involves skills such as data analysis, data visualization, statistical reasoning, and critical thinking🧠.

So, let's dive in and unleash your inner Data literate!


  • Do you know what is the structure of your data set?

Dear Data Explorer 🕵️‍♀️ we have 3 structures for Data.

  1. Structured Data

  2. Unstructured Data

  3. Semi-structured Data

Structured dataUnstructured DataSemi-structured data
🔵Follows a schema🔵Schemaless🔵 Does not follow larger schema
🔵Defined data types & relationships🔵Makes up most of the data in the world🔵Self-describing structure
🔵_e.g: SQL, Tables in a relational database🔵_e.g: Photos, chat Logs, MP3🔵_e.g:NoSQL,XML,JSON

This data with their different types of structures are stored and processed in a different ways according to their structure (structured ,unstructured and semi structured ),amount and the perpose they are designed for.

Let's have a tour and know more about how we deal with this data.

  • Do you know how this Data is processed?

Im-sorry-but-no GIFs - Get the best GIF on GIPHY

Now you need to know which type of data you are dealing with, and how this Data is processed.

  • OLTP vs OLAP

OLAP (Online Analytical Processing) and OLTP (Online Transaction Processing) are two distinct approaches to managing and processing data in information systems.

OLTP:

  • Purpose: OLTP is designed for real-time transactional processing with a normalized relational data structure, and handling day-to-day operational activities, such as inserting, updating, and deleting individual records. It ensures data integrity and supports real-time transaction processing.

  • Usage: OLTP is commonly used in transactional systems like e-commerce, banking, and inventory management, where high concurrency and data consistency are critical.

OLAP:

  • Purpose: OLAP is designed for complex analysis and decision-making. It supports multidimensional data models and enables users to perform advanced analytics, such as data mining, forecasting, and trend analysis.

Usage: OLAP is commonly used in business intelligence and reporting applications where users need to analyze large volumes of historical data to gain insights and make strategic decisions.


  • How this data is Stored beyond traditional Databases?

1-Big Data

Big data is like having a huge collection of different kinds of information (numbers, words, pictures) that comes in fast. This information is so big and fast that regular computer systems can't handle it well.

Imagine you have a giant box of toys - some are big, some small, and they keep coming in super quickly. That's like big data! It's a lot of different stuff, and it's coming at you fast.

People use special tools and tricks to organize and understand all this information. They want to find useful things in the big pile of data, like figuring out what people like on the internet or making better decisions for a business.

It's characterized by

  1. its volume.

  2. variety.

  3. velocity.


2-Data Warehouse

Think of a data warehouse as a big, organized storage room for information. It's like having a place where you keep all your important data in a neat and structured way.

Businesses use data warehouses to store and manage large amounts of data from different sources, making it easier to analyze and make informed decisions, so it's organized for OLAP.

This image shows you in-depth how a Data Warehouse is organized for large analytical and structured Data , while an Operational Database is organized for real-time data (OLTP) where high concurrency and data consistency are critical.


3-Data Mart

It's a subset of Data Warehouse, it's dedicated to a specific topic.


4-Data lake

Data Lake is an open lake where you can throw all kinds of information, like documents, images, and videos. A data lake is similar—it's a storage system that allows you to store vast amounts of raw, unprocessed, and unstructured data from various sources. It's like a big pool of data where you can dive in later to find insights and extract valuable information when needed.


  • Let's imagine:

I will take you with me on a magical trip to compare the Big data, Data warehouses, and Data lakes, SOOO be ready🚀!!

Big Data: Imagine you have a super-duper treasure chest that magically grows bigger and bigger every second. It's filled with all sorts of information, like funny cat videos, weather reports, and even secrets about your favorite superheroes. Big data is like that ever-expanding treasure chest of information that is so huge and complex, it needs special tools to unlock its secrets and discover the coolest insights.

Data Warehouse: Picture a giant, high-tech storage room that looks like a secret spy headquarters. It has shelves and drawers neatly organized with all kinds of information, from top-secret documents to secret recipes for the world's best ice cream. A data warehouse is like that super-secret storage room where businesses keep all their important data in one place, making it easy to find, analyze, and make super-smart decisions🤓.

 Picture a giant, high-tech storage room that looks like a secret spy headquarters

Data Lake: Think of a magical lake where you can dive in and explore all sorts of hidden treasures, like pirate ships, ancient artifacts, and even friendly sea monsters. A data lake is just as exciting—it's like a vast and mysterious pool of raw and unprocessed data, where businesses can throw in all kinds of information, from tweets to photos to customer feedback. Later on, they can take a thrilling dive and discover amazing insights and hidden gems that can help them make better decisions.


  • Do you have any idea how will you handle all This data?!

NOO!!

That's why you need to know more about ETL and ELT tools👨‍🍳.

ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform) are different ways of handling data in information systems.

  • ETL vs ELT

ETL: ETL is usually used while talking about Data Warehouse, it extracts data from various sources, transforms or changes the data to fit a specific format or structure, and then loads it into a target system. It's like preparing and packaging the data before storing it.
Imagine you have a machine that takes different ingredients, like fruits and sugar, processes them, and then puts the finished product in jars.

ELT: ELT is usually used while talking about Data Lake is like that you extract data from different sources and loading it directly into a target system without much transformation. Later on, when you need to use the data, you transform or process it as required. It's like keeping the ingredients as they are and doing the preparation when you're ready to cook.

TLDR!!

  • Introduction to Data Literacy and how important that every data consumer should be a good data literate.

  • Approaches of Processing Data:
    1-OLTP
    2-OLAP

  • Data designs:
    1-Big Data
    2-Data Warehouse

    3-Data Mart
    4-Data lake

  • Magic Trip to compare between Data designs

  • How you can handle Data in two different ways:
    1-ETL

    2-ELT

Conclusion

Thank you for reading my blog! I hope you found it informative and helped you to start your data journey ♥️.

Let's share knowledge and support each other's growth! ❤️