Skip to content

Home / Blog

Why DuckDB for cricket analytics

·5 min read ·by Dipankar Sarkar

In short: DuckDB is an embedded, columnar OLAP engine — it runs in-process with no server, stores the whole warehouse in one file, and executes analytical GROUP BY queries over millions of delivery rows fast. That combination fits cricket ball-by-ball analytics far better than pandas, SQLite or a paid API.

The workload: aggregations over millions of balls

Cricket analytics is an OLAP workload. A single IPL season is tens of thousands of deliveries; the full Cricsheet archive is millions of rows. Almost every question — strike rates, economy, partnerships, fantasy points — is a GROUP BY that scans a lot of rows but touches only a few columns. That is precisely what a columnar engine is built for.

Why not just pandas?

pandas is excellent for interactive, in-memory work on a few matches. It struggles as a warehouse:

  • Everything is rebuilt in RAM on every run — no persistence.
  • You hand-write and maintain each aggregation.
  • Large archives bump into memory limits.

CricketLogic still plays nicely with pandas — db.query() returns rows you can load into a DataFrame — but the storage, parsing and correct aggregation live in DuckDB.

Why not SQLite?

SQLite is row-oriented and tuned for transactional (OLTP) access — many small reads and writes. Analytical scans that aggregate a whole column are slower there. DuckDB is the OLAP counterpart: same “just a file, no server” simplicity, but a vectorized columnar engine underneath.

DuckDBSQLitepandas
StorageColumnar, on diskRow-based, on diskIn-memory
Best atAnalytical scans / GROUP BYTransactional reads/writesInteractive transforms
Server neededNoNoNo
Persistent warehouseYesYesNo

Why not a hosted API?

A paid cricket API means per-call costs, rate limits, gated history and usually only summary-level data. DuckDB keeps the raw ball-by-ball rows in a file you own and can query without limits, offline, forever. See the full comparison.

The practical wins

  • Zero setuppip install, no database server to run.
  • One portable fileipl.duckdb is your entire warehouse; copy it, commit it, share it for reproducible analysis.
  • Real SQL — window functions, JSON handling (Cricsheet wickets are stored as JSON), CTEs.
  • Fast — vectorized execution over columnar storage handles the whole archive on a laptop.

Bottom line

DuckDB gives CricketLogic a warehouse that is simultaneously simple (a file), fast (columnar OLAP) and powerful (full SQL). For historical, delivery-level cricket analytics, that is the right engine. Start with the quickstart to see it in action.

Last updated Jul 2, 2026 · Code verified against CricketLogic MIT · Source: https://github.com/cricketlogic/cricketlogic