Skip to main content
Introduction
This workshop has its objective to inspire people with the latest open and modern data analytics technologies and how to design the right architectures.
More specifically, a distinction is made between two typical scenarios:
Batch data processing - Analyzing historical data in batches
Real-time data streaming - Processing data as it arrives in real-time
Workshop Objectives
High-Level Goals
Master Modern Data Stack Components : Learn to work with DuckDB, ClickHouse, Kafka, and Metabase
Understand Data Architecture Patterns : Explore both batch and real-time processing paradigms
Build End-to-End Analytics Solutions : From data ingestion to visualization
Practice Real-World Scenarios : Work with actual restaurant and coffee shop data from Prishtina
Learning Outcomes
By the end of this workshop, you will be able to:
Set up and configure modern data analytics tools
Process batch data using DuckDB and PostgreSQL
Implement real-time streaming with ClickHouse and Kafka
Create data lakehouses for scalable analytics
Build interactive dashboards with Metabase
Apply geospatial analytics to location-based data
Design data architectures for different use cases
Workshop Structure
Task 1: Batch Analytics with DuckDB
Focus : Historical data processing and analysis
Technologies : DuckDB, PostgreSQL, R2 Object Storage
Data : Namecheap premium domain names, restaurants and coffee shops and historical revenue data
Task 2: Real-Time Analytics with ClickHouse
Focus : Real-time streaming and live analytics
Technologies : ClickHouse, Kafka, Metabase
Data : Streaming real-time transaction from restaurants and coffee shops
Prerequisites
Basic understanding of SQL
Familiarity with command line tools
Knowledge of CSV, JSON, and tabular database formats
Understanding of basic data concepts
Data Overview
Namecheap Premium Domain Names
Format : CSV
Content : Premium domain names
Data Fields : domain, price, extensions_taken
Source : Namecheap marketplace
Prishtina Places Dataset
Format : JSON
Content : Restaurants and coffee shops
Data Fields : name, location, rating, reviews, coordinates
Source : Scraped using SerpAPI from Google Places
Historical Transaction Data
Format : PostgreSQL database
Content : Synthetic transaction data for each establishment
Time Range : Historical synthetic data from 2025 for trend analysis
Real-time Transaction Data
Format : Kafka
Content : Real-time transaction data from restaurants and coffee shops
Time Range : Real-time syntheticdata for immediate analyses and decision making
Getting Started
Open the Workshop : Click the “Open in GitHub Codespaces” button above
Follow Tasks Sequentially : Complete Tasks (e.g. 1.1) before moving to the next task (e.g. 1.2)
Use Hints : Each task includes helpful hints and answers in accordion sections
Experiment : Don’t be afraid to try different approaches
Next Steps
Ready to begin? Let’s start with Task 1 to learn about ad-hoc and batch analytics with DuckDB.