CricketLogic vs a pandas-only workflow
Short answer: Use CricketLogic when you want a persistent, SQL-queryable warehouse of Cricsheet matches with correct ball-by-ball aggregation baked in; use pandas when you need one-off, in-memory transforms on a handful of matches.
Side by side
| CricketLogic | pandas only | |
|---|---|---|
| Data model | Persistent DuckDB warehouse (SQL, on disk) | In-memory DataFrames, rebuilt each run |
| Cricsheet parsing | Built in — ingest_directory() | You write the YAML parser yourself |
| Ball-by-ball aggregation | Predefined, correct views | Hand-written groupby per metric |
| Duplicate handling | Automatic (by filename) | Manual dedupe logic |
| Scale | Columnar engine, 20k+ matches easily | Bounded by RAM |
| Interoperability | Query from SQL, CLI, HTTP, or pandas | pandas only |
Choose CricketLogic when
- ✓A reusable, queryable store of many matches
- ✓You want correct metrics without writing aggregation
- ✓Multiple consumers (site, API, notebook)
Choose pandas only when
- →One-off transforms on a few matches
- →You already have a bespoke pandas pipeline
- →You need a DataFrame in the middle of a notebook
The honest take
CricketLogic and pandas are not mutually exclusive — db.query() returns rows you can load straight into a DataFrame. CricketLogic just handles parsing, storage and the correct aggregation for you.
Last updated 2026-07-02 · CricketLogic capabilities verified against README + core.py · Source: https://github.com/cricketlogic/cricketlogic