DuckDB Logo

Task 1 Overview: Ad-hoc and Batch Analytics with DuckDB

In this task, you’ll learn how to process and analyze historical data using DuckDB, a modern analytical database designed for OLAP workloads. You’ll work with a mix of real and synthetic data.

Learning Objectives

  • Install and configure DuckDB for batch processing
  • Query and explore CSV data files
  • Query and explore JSON data files
  • Perform geospatial analytics
  • Create a data lakehouse with DuckLake, a new open standard built on object storage and PostgreSQL

DuckDB Installation and Setup

Install DuckDB

First, we need to install DuckDB to our Codespaces. DuckDB runs almost anywhere and it’s very easy to install.

Verify Installation

Test that DuckDB is working correctly:
duckdb --version
You should see the DuckDB version information.

Adjust max columns

DuckDB has a default limit of 1000 columns. Let’s increase it to 10000.
.maxcolumns 10000