Predicting the likelihood of a penalty during a match is more art than simple arithmetic. Models that estimate penalty probability draw on subtle event patterns — the way a team moves the ball, how often attackers tangle with defenders, and which moments invite contact. This article digs into three practical indicators — touches, crosses, and dribbling — and explains how they function inside a robust penalty probability model.
What a penalty probability model tries to capture
A penalty probability model estimates the chance that a given phase of play or specific action will end with the referee awarding a penalty. Unlike shot models that focus on expected goals, a penalty model emphasizes contact, proximity, and referee context: how crowded the area is, whether the attacker is beating a defender, and if the play involves the kind of physical engagement that historically produces penalties.
These models are probabilistic rather than deterministic. They don’t claim a foul will definitely occur; instead they quantify risk — a continuous probability that can be compared across players, teams, and situations. That probability becomes actionable for coaches, analysts, and broadcasters when tied to the right indicators and reliable data.
Why touches, crosses, and dribbling matter
Touches inside or near the penalty area measure where attackers are interacting with defenders and the ball. Higher concentration of touches often means more defensive pressure, tighter spaces, and more contact opportunities — all conditions that elevate penalty risk. Tracking touches by zone helps separate harmless possession from tense, contact-prone sequences.
Crosses and dribbling are dynamic actions that change how defenders must respond. A driven cross into a crowded box compresses space and raises the chance of inadvertent handball or a tug during aerial contests. Dribblers force defenders onto their heels, increasing one-on-one contact that frequently leads to fouls. Both actions are therefore natural flags for a model seeking penalty signals.
Touches: quantity, location, and buildup context
Not all touches are equal. A touch in the corner flag has little to do with penalties, while a touch inside the six-yard box matters a great deal. A useful feature set separates touches by pitch zone: inside the six-yard box, inside the 18-yard box, central corridor, and wide areas. Weighting touches closer to goal higher reflects the empirical relationship between proximity and fouls.
Beyond counts, temporal context matters. A sudden cluster of touches in the box over a few seconds — a “danger phase” — raises penalty probability more than solitary touches spread across long possession. Models that account for touch density and the sequence that led to the touch better capture the physical contest that produces penalties.
Crosses: vector, target, and aerial contests
Crosses differ by trajectory and target. A lofted cross aimed at a penalty-area cluster invites aerial challenges; driven low crosses create rapid 50/50s near the goalkeeper. Feature engineering should record cross type, intended target zone, delivery speed, and the number of defenders and attackers present in the immediate area. These variables help separate detergent, low-risk crosses from those that historically lead to penalties.
Important interactions exist between crosses and touches. A cross followed by multiple touches inside the box — especially when defenders are forced to make last-ditch clearances — correlates with higher penalty incidence. Modeling these interactions as short event sequences (cross → touch(s) → shot or scramble) improves predictive power compared with treating crosses as isolated events.
Dribbling: beat-success, contact likelihood, and defender displacement
Successful dribbles that beat a defender often leave the attacker unbalanced or shielded by momentum, creating situations ripe for contact and trips. Useful dribbling features include whether the dribble beat a defender, the direction relative to goal, entry speed, and whether the action pulls a defender inside the box. Dribbles originating close to the penalty area and directed inward carry elevated penalty probabilities.
Context remains crucial: a skilled winger cutting inside past an isolated fullback is a different scenario than a crowded central dribble against three defenders. Incorporating defender proximity, angle of attack, and whether the attacker’s body position shields the ball helps the model judge how likely contact will be judged as a foul.
Feature table: practical indicators to engineer
| Indicator | Example feature | Why it matters |
|---|---|---|
| Touches | Touches inside 6-yard/18-yard box; touch density in last 6 seconds | Proximity and clustering increase contact probability |
| Crosses | Cross type (low-driven vs lofted); receivers in box; cross speed | Crosses create aerial/close-range contests with high foul potential |
| Dribbling | Successful take-ons near penalty area; defender distance post-dribble | Dribbles create one-on-one contact and disrupt defensive shape |
Modeling approaches and practical choices
Start with interpretable models—logistic regression, regularized GLMs, or simple tree ensembles—to establish baseline effects and avoid black-box surprises. These methods make it easy to inspect coefficients for touches, crosses, and dribbling features and to verify that model behavior matches domain intuition. Once the baseline is stable, consider gradient-boosted trees or neural sequence models to capture complex interactions.
Sequence modeling is especially valuable for penalties because many relevant signals unfold over several events. Recurrent networks or Markov-style features that encode recent event sequences (e.g., cross→touch→dribble→shot) allow the model to learn typical paths that culminate in a foul. Combining spatial embeddings with temporal sequences yields the best representation of those contested phases.
Evaluation: metrics that reflect the right goals
Because penalties are rare, evaluation must guard against class imbalance. Use precision-recall curves alongside ROC to understand how well the model identifies high-risk situations. Calibration metrics — Brier score and calibration plots — are essential because the model’s output should be interpreted as a probability, not just a ranking.
Cross-validation that respects temporal structure (time-based folds) prevents leakage between training and test sets when matches are correlated. Also track subgroup performance: are predictions equally well-calibrated across types of plays (e.g., crosses vs. dribbles) and across referees or competitions? These checks reveal bias and guide feature refinements.
Data challenges, bias, and referee context
Event data systems vary in tagging consistency: what one provider marks as a “dribble” another might call a “take-on.” Aligning definitions across seasons or suppliers is a nontrivial preprocessing step. Spatial noise also matters — positioning inaccuracies of a few meters can change whether a touch is labeled inside the box.
Referee behavior is a confounding factor. Certain referees award more penalties or call fouls differently depending on game state. Incorporating referee identity, historical strictness, and home/away tendencies reduces unexplained variance. Models that ignore officiating context risk attributing referee-driven patterns to player actions instead.
Applications and an analyst’s perspective
Teams can use penalty probability models to shape attacking strategy: target areas and plays that historically raise the likelihood of a penalty, or identify players who draw more fouls and thus deserve more touches in high-leverage zones. Broadcasters and bettors find value too, translating live probabilities into narratives and micro-markets.
From my experience building event-level classifiers, the most impactful step is iterative feature design informed by video review. I’ve seen minor changes — distinguishing low-driven from lofted crosses or separating dribbles that draw two defenders — produce measurable gains. Always verify model signals with real clips; the statistics must map to observable moments on the pitch.
Sources and experts
For readers who want to dive deeper into xG, event data, and football analytics, these organizations and analysts provide foundational material and ongoing research.
- StatsBomb — in-depth articles and open data discussions on event modeling and shot quality.
- Opta / StatsPerform — major provider of event and tracking data used in professional modeling.
- FiveThirtyEight soccer coverage — accessible explanations of probabilistic models in football.
- Michael Caley — independent analyst with practical pieces on expected goals and event analytics.


