Introduction

This workshop has its objective to inspire people with the latest open and modern data analytics technologies and how to design the right architectures. More specifically, a distinction is made between two typical scenarios:

Batch data processing - Analyzing historical data in batches
Real-time data streaming - Processing data as it arrives in real-time

Workshop Objectives

High-Level Goals

Master Modern Data Stack Components: Learn to work with DuckDB, ClickHouse, Kafka, and Metabase
Understand Data Architecture Patterns: Explore both batch and real-time processing paradigms
Build End-to-End Analytics Solutions: From data ingestion to visualization
Practice Real-World Scenarios: Work with actual restaurant and coffee shop data from Prishtina

Learning Outcomes

By the end of this workshop, you will be able to:

Set up and configure modern data analytics tools
Process batch data using DuckDB and PostgreSQL
Implement real-time streaming with ClickHouse and Kafka
Create data lakehouses for scalable analytics
Build interactive dashboards with Metabase
Apply geospatial analytics to location-based data
Design data architectures for different use cases

Workshop Structure

Task 1: Batch Analytics with DuckDB

Focus: Historical data processing and analysis
Technologies: DuckDB, PostgreSQL, R2 Object Storage
Data: Namecheap premium domain names, restaurants and coffee shops and historical revenue data

Task 2: Real-Time Analytics with ClickHouse

Focus: Real-time streaming and live analytics
Technologies: ClickHouse, Kafka, Metabase
Data: Streaming real-time transaction from restaurants and coffee shops

Prerequisites

Basic understanding of SQL
Familiarity with command line tools
Knowledge of CSV, JSON, and tabular database formats
Understanding of basic data concepts

Data Overview

Namecheap Premium Domain Names

Format: CSV
Content: Premium domain names
Data Fields: domain, price, extensions_taken
Source: Namecheap marketplace

Prishtina Places Dataset

Format: JSON
Content: Restaurants and coffee shops
Data Fields: name, location, rating, reviews, coordinates
Source: Scraped using SerpAPI from Google Places

Historical Transaction Data

Format: PostgreSQL database
Content: Synthetic transaction data for each establishment
Time Range: Historical synthetic data from 2025 for trend analysis

Real-time Transaction Data

Format: Kafka
Content: Real-time transaction data from restaurants and coffee shops
Time Range: Real-time syntheticdata for immediate analyses and decision making

Getting Started

Open the Workshop: Click the “Open in GitHub Codespaces” button above
Follow Tasks Sequentially: Complete Tasks (e.g. 1.1) before moving to the next task (e.g. 1.2)
Use Hints: Each task includes helpful hints and answers in accordion sections
Experiment: Don’t be afraid to try different approaches

Next Steps

Ready to begin? Let’s start with Task 1 to learn about ad-hoc and batch analytics with DuckDB.

Overview

Open Modern Analytics

Task 1: DuckDB

Task 2: ClickHouse

Introduction

Introduction

Workshop Objectives

High-Level Goals

Learning Outcomes

Workshop Structure

Task 1: Batch Analytics with DuckDB

Task 2: Real-Time Analytics with ClickHouse

Prerequisites

Data Overview

Namecheap Premium Domain Names

Prishtina Places Dataset

Historical Transaction Data

Real-time Transaction Data

Getting Started

Next Steps

Overview

Open Modern Analytics

Task 1: DuckDB

Task 2: ClickHouse

​Introduction

​Workshop Objectives

​High-Level Goals

​Learning Outcomes

​Workshop Structure

​Task 1: Batch Analytics with DuckDB

​Task 2: Real-Time Analytics with ClickHouse

​Prerequisites

​Data Overview

​Namecheap Premium Domain Names

​Prishtina Places Dataset

​Historical Transaction Data

​Real-time Transaction Data

​Getting Started

​Next Steps

Introduction

Workshop Objectives

High-Level Goals

Learning Outcomes

Workshop Structure

Task 1: Batch Analytics with DuckDB

Task 2: Real-Time Analytics with ClickHouse

Prerequisites

Data Overview

Namecheap Premium Domain Names

Prishtina Places Dataset

Historical Transaction Data

Real-time Transaction Data

Getting Started

Next Steps