Data Overview

Understanding the Dataset

The workshop uses two main data sources: Prishtina Places Dataset:
  • Restaurants: Individual JSON files (r1.json, r2.json, etc.)
  • Coffee Shops: Individual JSON files (c1.json, c2.json, etc.)
  • Data Fields: Name, location, rating, reviews, coordinates, address
  • Source: Scraped using SerpAPI from Google Places
Revenue Data:
  • Format: PostgreSQL database
  • Content: Synthetic daily revenue data for each establishment
  • Time Range: Historical data for trend analysis

Data Structure

Each JSON file contains a raw SERPApi response with a JSON object with metadata and local results of place objects with the following structure:
{
    "search_metadata": {
        "id": "6867f58ea289710dcf3ea48f",
        "status": "Success",
        "json_endpoint": "https://serpapi.com/searches/e3f0d0cce2c8410a/6867f58ea289710dcf3ea48f.json",
        "created_at": "2025-07-04 15:38:54 UTC",
        "processed_at": "2025-07-04 15:38:54 UTC",
        "google_local_url": "https://www.google.com/search?q=Coffee&uule=w+CAIQICIiUHJpc2h0aW5hLFByaXN0aW5hIERpc3RyaWN0LEtvc292bw&hl=en&gl=de&tbm=lcl",
        "raw_html_file": "https://serpapi.com/searches/e3f0d0cce2c8410a/6867f58ea289710dcf3ea48f.html",
        "total_time_taken": 3.92
    },
    "search_parameters": {
        "engine": "google_local",
        "q": "Coffee",
        "location_requested": "Prishtina, Pristina District, Kosovo",
        "location_used": "Prishtina,Pristina District,Kosovo",
        "google_domain": "google.com",
        "hl": "en",
        "gl": "de",
        "device": "desktop"
    },
    "local_results": [
        {
            "position": 1,
            "rating": 4.9,
            "reviews": 158,
            "reviews_original": "(158)",
            "lsig": "AB86z5U-pSAyfo30lztdUc25lx9e",
            "thumbnail": "https://serpapi.com/searches/6867f58ea289710dcf3ea48f/images/fcea8ffca5da861b96c24707d286def2eb8a768ad27776066a661b1623ff9519.jpeg",
            "place_id": "4333347444629616077",
            "place_id_search": "https://serpapi.com/search.json?device=desktop&engine=google_local&gl=de&google_domain=google.com&hl=en&location=Prishtina%2C+Pristina+District%2C+Kosovo&ludocid=4333347444629616077&q=Coffee",
            "gps_coordinates": {
                "latitude": 42.65882,
                "longitude": 21.159815
            },
            "title": "Newborn Brew",
            "type": "Coffee shop",
            "address": "M555+GW, 1000 Johan V. Hahn"
        }
    ],
    "pagination": {
        "current": 1,
        "next": "https://www.google.com/search?q=Coffee&uule=w+CAIQICIiUHJpc2h0aW5hLFByaXN0aW5hIERpc3RyaWN0LEtvc292bw&hl=en&gl=de&start=20&tbm=lcl",
        "other_pages": {
            "2": "https://www.google.com/search?q=Coffee&uule=w+CAIQICIiUHJpc2h0aW5hLFByaXN0aW5hIERpc3RyaWN0LEtvc292bw&hl=en&gl=de&start=20&tbm=lcl",
            "3": "https://www.google.com/search?q=Coffee&uule=w+CAIQICIiUHJpc2h0aW5hLFByaXN0aW5hIERpc3RyaWN0LEtvc292bw&hl=en&gl=de&start=40&tbm=lcl",
            "4": "https://www.google.com/search?q=Coffee&uule=w+CAIQICIiUHJpc2h0aW5hLFByaXN0aW5hIERpc3RyaWN0LEtvc292bw&hl=en&gl=de&start=60&tbm=lcl",
            "5": "https://www.google.com/search?q=Coffee&uule=w+CAIQICIiUHJpc2h0aW5hLFByaXN0aW5hIERpc3RyaWN0LEtvc292bw&hl=en&gl=de&start=80&tbm=lcl"
        }
    },
    "serpapi_pagination": {
        "current": 1,
        "next_link": "https://serpapi.com/search.json?device=desktop&engine=google_local&gl=de&google_domain=google.com&hl=en&location=Prishtina%2C+Pristina+District%2C+Kosovo&q=Coffee&start=20&uule=w+CAIQICIiUHJpc2h0aW5hLFByaXN0aW5hIERpc3RyaWN0LEtvc292bw",
        "next": "https://serpapi.com/search.json?device=desktop&engine=google_local&gl=de&google_domain=google.com&hl=en&location=Prishtina%2C+Pristina+District%2C+Kosovo&q=Coffee&start=20&uule=w+CAIQICIiUHJpc2h0aW5hLFByaXN0aW5hIERpc3RyaWN0LEtvc292bw",
        "other_pages": {
            "2": "https://serpapi.com/search.json?device=desktop&engine=google_local&gl=de&google_domain=google.com&hl=en&location=Prishtina%2C+Pristina+District%2C+Kosovo&q=Coffee&start=20&uule=w+CAIQICIiUHJpc2h0aW5hLFByaXN0aW5hIERpc3RyaWN0LEtvc292bw",
            "3": "https://serpapi.com/search.json?device=desktop&engine=google_local&gl=de&google_domain=google.com&hl=en&location=Prishtina%2C+Pristina+District%2C+Kosovo&q=Coffee&start=40&uule=w+CAIQICIiUHJpc2h0aW5hLFByaXN0aW5hIERpc3RyaWN0LEtvc292bw",
            "4": "https://serpapi.com/search.json?device=desktop&engine=google_local&gl=de&google_domain=google.com&hl=en&location=Prishtina%2C+Pristina+District%2C+Kosovo&q=Coffee&start=60&uule=w+CAIQICIiUHJpc2h0aW5hLFByaXN0aW5hIERpc3RyaWN0LEtvc292bw",
            "5": "https://serpapi.com/search.json?device=desktop&engine=google_local&gl=de&google_domain=google.com&hl=en&location=Prishtina%2C+Pristina+District%2C+Kosovo&q=Coffee&start=80&uule=w+CAIQICIiUHJpc2h0aW5hLFByaXN0aW5hIERpc3RyaWN0LEtvc292bw"
        }
    }
}

JSON Data Exploration and Processing

Explore the JSON Data

Start by exploring the structure of your JSON files:

Clean the JSON

The JSON is somewhat unstructured because it contains a lot of unncessary metadata. Find a way to parse out the individual restaurant data.

Create a Table named ‘places’from the JSON

Query aggregations

Query the count of places

Query the average rating of places

Analyze the data further on your own

Query the data on your own with other types of interesting aggregations / analytics. Write down some interesting queries and insights.

Find Top 10 Rated Places in Prishtina

Analyze the highest-rated and lowest-rated places with at least 50 reviews.

Geospatial Analytics

Install and Explore DuckDB Geospatial Extension

Enable geospatial capabilities:
-- Install spatial extension
INSTALL spatial;
LOAD spatial;

-- Create geometry points from coordinates
SELECT 
    title,
    ST_Point(longitude, latitude) as location,
    rating,
    reviews
FROM places
LIMIT 5;

Calculate Distances Between Places

Find the nearest place to ICK (Innovation Centre Kosovo)

Task Summary

Congratulations! You’ve completed the batch analytics portion of the workshop. You’ve learned to: ✅ Explore and clean JSON data
✅ Analyze places data from Google ✅ Implement geospatial analytics
Next Step: Ready for real-time analytics? Move on to Task 2: Real-Time Analytics to learn about ClickHouse and streaming data processing.