Home / Features

Features

CricketLogic is a full ingestion-to-query pipeline for ball-by-ball cricket data: it loads Cricsheet YAML into DuckDB, ships correct analytical views, and lets you query with SQL or natural language.

How does ingestion work?

CricketDB.ingest_directory() reads every Cricsheet YAML in a folder and parses it into normalized DuckDB tables — matches, innings, deliveries, players and teams. Matches are keyed by filename so re-running ingestion skips duplicates automatically.

from cricketlogic import CricketDB

db = CricketDB("cricket.duckdb")
db.ingest_directory("cricsheet/ipl")   # matches, innings, deliveries, players, teams
info = db.get_warehouse_info()          # schema for agents / exploration

What analytical views ship out of the box?

Five predefined SQL views encode the correct ball-by-ball aggregation for the most common cricket metrics, each grouped by team, match_type, competition and gender.

view	key columns
batting_performance	batsman, runs_scored, balls_faced, batting_average, strike_rate, fours, sixes
bowling_performance	bowler, wickets_taken, runs_conceded, economy_rate, bowling_average, bowling_strike_rate
partnerships	batsman1, batsman2, partnership_runs, balls_faced, partnership_wickets
match_summary	match_id, team1, team2, venue, match_date, toss_winner, match_outcome
fantasy_points	player, batting_points, four_points, six_points, total_fantasy_points

How do I query the data?

Everything is plain SQL over DuckDB. Use db.query(sql) from Python, the cricketlogic query CLI, or POST /query on the web service. You can also register your own views with create_custom_view().

-- Best economy bowlers in T20 (min 300 balls bowled)
SELECT bowler, team, wickets_taken, economy_rate
FROM bowling_performance
WHERE match_type = 'T20' AND balls_bowled >= 300
ORDER BY economy_rate ASC
LIMIT 10;

Can I ask questions in natural language?

Yes — two optional DSPy agents turn English into SQL. The query agent answers questions ("top batsmen by runs in ODIs"); the view agent creates a reusable view from a description. Configure DSPy with a Claude or OpenAI key; without it, the agents fall back to heuristic matching.

# Natural-language query (DSPy-backed)
cricketlogic ask "Who are the top batsmen by runs in the IPL?"

# Create a reusable view from a description
cricketlogic views create-nl "Batsmen with strike rate above 150 in T20"

Where does the data come from?

A built-in downloader fetches match files directly from cricsheet.org — by recency, match type (Test/ODI/T20) or competition (IPL, BBL, …) — in YAML or JSON, so you never hand-manage archives.

cricketlogic download by-competition ipl     # one competition
cricketlogic download by-type t20            # all T20s
cricketlogic download auto-update --days 7   # recent matches + registry
cricketlogic download all                    # 20,000+ matches (~100MB)

How are players identified across matches?

CricketLogic integrates the Cricsheet player registry: 17,550+ players, 8,766 name variations and external keys for 12 sources (ESPNcricinfo, CricketArchive, BCCI and more). Match participants link to registry identifiers so "V Kohli" and "Virat Kohli" resolve to one player.

SELECT p.name, pr.unique_name, pr.key_cricinfo, COUNT(*) AS matches
FROM players p
LEFT JOIN player_registry pr ON p.registry_identifier = pr.identifier
GROUP BY p.name, pr.unique_name, pr.key_cricinfo;

Last updated 2026-07-02 · Feature set verified against cricketlogic/core.py and README · Source: https://github.com/cricketlogic/cricketlogic