Data Lake Architecture Visualisation

Explore how different file types are ingested, processed, and analysed in a data lake environment. Select a file type below to see the complete data flow.

Select Data Format:

External Data Source

Raw JSON files from external systems are prepared for ingestion into the data lake.

1
ExternalDataSource
2
APIEndpoint
3
DataIngestion
4
DataLakeStorage
5
DataProcessing
6
BusinessIntelligence

External Source

Raw JSON file from external system

Data Transfer

API Endpoint

Receives and validates JSON data

Data Ingestion

Validates & prepares data for storage

Data Transfer

Data Lake Storage

Stores raw data in its original format

Data Processing

Transforms & enriches data for analysis

Data Transfer

BI Dashboard

Visualises insights from processed data

JSON
Data Representation
Raw
External source provides JSON data that needs to be processed.

Raw Data Stage

At this stage, the JSON data is in its original format as received from the source system. No transformations have been applied yet.

Original JSON Format
External Source
{
  "sales": [
    {
      "id": 1001,
      "product": "Laptop",
      "quantity": 5,
      "price": 1200.00,
      "date": "2023-04-15"
    },
    {
      "id": 1002,
      "product": "Smartphone",
      "quantity": 10,
      "price": 800.00,
      "date": "2023-04-16"
    },
    {
      "id": 1003,
      "product": "Headphones",
      "quantity": 20,
      "price": 150.00,
      "date": "2023-04-17"
    }
  ]
}

Understanding Data Lakes

New to...

A data lake is like a large reservoir that stores all types of data in its original format. Unlike traditional databases that organize data in tables, a data lake keeps data in its raw form until needed. This makes it flexible for storing different file types like JSON, XML, and CSV.

Considering...

Data lakes provide schema-on-read functionality, meaning data structure is applied only when the data is accessed. This allows for greater flexibility in data analysis and enables organisations to store vast amounts of structured and unstructured data that can be processed using various analytics tools and frameworks.

Working with...

Modern data lake architectures often implement a medallion architecture (bronze, silver, gold layers) for data quality management. They leverage technologies like Apache Spark for distributed processing, Delta Lake for ACID transactions, and integrate with ML pipelines for advanced analytics while maintaining data governance and lineage tracking.