Why DuckDB for cricket analytics
In short: DuckDB is an embedded, columnar OLAP engine — it runs in-process with no server, stores the whole warehouse in one file, and executes analytical GROUP BY queries over millions of delivery rows fast. That combination fits cricket ball-by-ball analytics far better than pandas, SQLite or a paid API.
The workload: aggregations over millions of balls
Cricket analytics is an OLAP workload. A single IPL season is tens of thousands of deliveries; the full Cricsheet archive is millions of rows. Almost every question — strike rates, economy, partnerships, fantasy points — is a GROUP BY that scans a lot of rows but touches only a few columns. That is precisely what a columnar engine is built for.
Why not just pandas?
pandas is excellent for interactive, in-memory work on a few matches. It struggles as a warehouse:
- Everything is rebuilt in RAM on every run — no persistence.
- You hand-write and maintain each aggregation.
- Large archives bump into memory limits.
CricketLogic still plays nicely with pandas — db.query() returns rows you can load into a DataFrame — but the storage, parsing and correct aggregation live in DuckDB.
Why not SQLite?
SQLite is row-oriented and tuned for transactional (OLTP) access — many small reads and writes. Analytical scans that aggregate a whole column are slower there. DuckDB is the OLAP counterpart: same “just a file, no server” simplicity, but a vectorized columnar engine underneath.
| DuckDB | SQLite | pandas | |
|---|---|---|---|
| Storage | Columnar, on disk | Row-based, on disk | In-memory |
| Best at | Analytical scans / GROUP BY | Transactional reads/writes | Interactive transforms |
| Server needed | No | No | No |
| Persistent warehouse | Yes | Yes | No |
Why not a hosted API?
A paid cricket API means per-call costs, rate limits, gated history and usually only summary-level data. DuckDB keeps the raw ball-by-ball rows in a file you own and can query without limits, offline, forever. See the full comparison.
The practical wins
- Zero setup —
pip install, no database server to run. - One portable file —
ipl.duckdbis your entire warehouse; copy it, commit it, share it for reproducible analysis. - Real SQL — window functions, JSON handling (Cricsheet wickets are stored as JSON), CTEs.
- Fast — vectorized execution over columnar storage handles the whole archive on a laptop.
Bottom line
DuckDB gives CricketLogic a warehouse that is simultaneously simple (a file), fast (columnar OLAP) and powerful (full SQL). For historical, delivery-level cricket analytics, that is the right engine. Start with the quickstart to see it in action.
Last updated Jul 2, 2026 · Code verified against CricketLogic MIT · Source: https://github.com/cricketlogic/cricketlogic