Why DuckDB for cricket analytics

Jun 24, 2026 ·5 min read ·by Dipankar Sarkar

In short: DuckDB is an embedded, columnar OLAP engine — it runs in-process with no server, stores the whole warehouse in one file, and executes analytical GROUP BY queries over millions of delivery rows fast. That combination fits cricket ball-by-ball analytics far better than pandas, SQLite or a paid API.

The workload: aggregations over millions of balls

Cricket analytics is an OLAP workload. A single IPL season is tens of thousands of deliveries; the full Cricsheet archive is millions of rows. Almost every question — strike rates, economy, partnerships, fantasy points — is a GROUP BY that scans a lot of rows but touches only a few columns. That is precisely what a columnar engine is built for.

Why not just pandas?

pandas is excellent for interactive, in-memory work on a few matches. It struggles as a warehouse:

Everything is rebuilt in RAM on every run — no persistence.
You hand-write and maintain each aggregation.
Large archives bump into memory limits.

CricketLogic still plays nicely with pandas — db.query() returns rows you can load into a DataFrame — but the storage, parsing and correct aggregation live in DuckDB.

Why not SQLite?

SQLite is row-oriented and tuned for transactional (OLTP) access — many small reads and writes. Analytical scans that aggregate a whole column are slower there. DuckDB is the OLAP counterpart: same “just a file, no server” simplicity, but a vectorized columnar engine underneath.

	DuckDB	SQLite	pandas
Storage	Columnar, on disk	Row-based, on disk	In-memory
Best at	Analytical scans / GROUP BY	Transactional reads/writes	Interactive transforms
Server needed	No	No	No
Persistent warehouse	Yes	Yes	No

Why not a hosted API?

A paid cricket API means per-call costs, rate limits, gated history and usually only summary-level data. DuckDB keeps the raw ball-by-ball rows in a file you own and can query without limits, offline, forever. See the full comparison.

The practical wins

Zero setup — pip install, no database server to run.
One portable file — ipl.duckdb is your entire warehouse; copy it, commit it, share it for reproducible analysis.
Real SQL — window functions, JSON handling (Cricsheet wickets are stored as JSON), CTEs.
Fast — vectorized execution over columnar storage handles the whole archive on a laptop.

Bottom line

DuckDB gives CricketLogic a warehouse that is simultaneously simple (a file), fast (columnar OLAP) and powerful (full SQL). For historical, delivery-level cricket analytics, that is the right engine. Start with the quickstart to see it in action.

Last updated Jul 2, 2026 · Code verified against CricketLogic MIT · Source: https://github.com/cricketlogic/cricketlogic

Try the quickstart ← All posts