Typical mistakes in football analytics

Football analytics has moved from hobbyist spreadsheets to season-defining decisions, but the field is still littered with predictable errors. Teams and analysts who chase shiny metrics or ignore context can draw confident, but wrong, conclusions. This article walks through the most common traps I see in the industry and offers practical ways to avoid them.

Ignoring context and tactical nuance

Numbers are blunt instruments when taken out of the game’s context. A full-back’s low number of progressive carries might look like poor attacking output, but if the coach asks that player to conserve width and maintain a specific defensive shape, the raw metric tells the wrong story.

Context includes opposition quality, match state, tactical role, and even weather or pitch conditions. Good analysis layers quantitative measures onto qualitative scouting and video review; omission of tactical nuance is a leading cause of misinterpretation.

Overreliance on single metrics

Relying on one stat as a performance proxy creates blind spots. Expected goals (xG), for example, is a powerful tool but not a catch-all; it doesn’t capture defensive positioning, off-ball movement, or finishing quality in isolation.

Instead of elevating a single number to gospel, combine complementary metrics and craft composite indicators that reflect the question at hand. I once saw a scouting report that ranked prospects purely by xG — a striker with poor spatial intelligence and high xG from garbage-time finished near the top until video review exposed the flaw.

Poor data quality and selection bias

Bad inputs produce bad outputs. Incomplete event data, inconsistent tagging across providers, or manually entered datasets with hidden errors all skew models and dashboards. Analysts too often assume their dataset is clean without performing basic validation checks.

Selection bias also creeps in when analysts sample only certain competitions, minutes, or players. For instance, evaluating players solely on highlight-reel matches will overestimate typical performance. Always audit data, document gaps, and report confidence intervals where the sample is thin.

Misusing statistical significance and small samples

Football is noisy and season-to-season variance is high; small-sample comparisons are especially misleading. Declaring a player “significantly better” after ten matches misunderstands both variance and the concept of statistical significance.

Analysts should report effect sizes and uncertainty, not binary statements of significance. Bayesian approaches and shrinkage estimators (which pull extreme estimates toward the mean) are practical tools to reduce overfitting to short runs.

Cherry-picking and confirmation bias

It’s human to look for evidence that supports a favored hypothesis, and analytics is no exception. Cherry-picking games, clips, or time ranges to prove a point leads to recommendations that fall apart when applied broadly.

Instituting pre-registered analysis plans, blind reviews, or peer checks helps mitigate bias. In a past role I instituted a rule: every recruitment claim had to be accompanied by a counterexample where the metric failed — that single practice cut down on overconfident hires.

Ignoring multicollinearity and model assumptions

Many football variables correlate heavily; passing volume, touches, and possession often move together, which can inflate model coefficients and misattribute importance. Multicollinearity makes it hard to identify causal drivers from collinear predictors.

Check correlation matrices, use dimensionality reduction like PCA sparingly, or prefer regularized regression techniques (e.g., Lasso, Ridge) that penalize complexity. Also make assumptions explicit: linear models assume relationships that real-world player interactions may violate.

Overfitting and excessive model complexity

Complex models can fit past matches perfectly and still fail miserably on new data. Overfitting happens when a model captures noise instead of signal — common when teams tune dozens of hyperparameters on limited match data.

Cross-validation, holdout seasons, and simplicity as a design principle reduce this risk. When building predictive models I favor fewer features with clear interpretability over opaque ensembles unless the performance gains justify the complexity.

Confusing correlation with causation

Finding that a team concedes more after 70 minutes does not prove a particular training drill causes fatigue-related breakdowns. Without experimental design or causal inference techniques, correlations can mislead strategic decisions.

Use natural experiments, instrumental variables, or randomized controlled trials where feasible. Even small randomized variations in training load or tactical instructions can yield insight that pure observational analysis cannot.

Poor communication and misleading visualizations

Even sound analysis falls flat if it’s presented badly. Dense tables, unlabelled axes, or color choices that distort perception turn insights into confusion rather than action. Analysts often forget stakeholders may be technical scouts or non-technical coaches.

Design visuals for clarity: use intuitive color scales, annotate key takeaways, and prepare short written summaries that answer the strategic question. I’ve found that a simple annotated shot map paired with a one-paragraph implication drives decisions faster than ten pages of numbers.

Failing to validate in the real world

Models and metrics are tools to inform decisions, not replace them. Implementing changes without piloting them in training or friendly matches risks tactical collapse in competitive fixtures. Analytics should feed experiments that test interventions at low cost.

Run A/B style pilots where possible. For scouting, test signings with short-term loans or structured observation periods. When I recommended a pressing system change, the team trialed it in pre-season and adjusted the thresholds based on match feedback — that reduced in-season risk.

Neglecting human factors and integration

Data does not operate in a vacuum: players, coaches, medical staff, and managers all interpret and act on information differently. Failure to integrate analytics into existing workflows leads to resistance and poor uptake.

Build relationships, educate stakeholders at the right technical level, and co-create metrics with end users so outputs are actionable. Analytics that align with coaching philosophies and deployment realities will actually influence matches.

Practical checklist to avoid these pitfalls

Start with a clear question and pick metrics that directly address it rather than metrics you happen to have. Document data provenance, report uncertainty, and prefer simpler models that generalize well.

Validate with cross-season holdouts and real-world pilots, include qualitative checks like video review, and present results in concise, coach-friendly formats. Encourage peer review and require counterexamples before making irreversible decisions.

Finally, treat analytics as an iterative partnership with the club. The best teams use numbers to inform conversation, not to end it; they test, learn, and adapt continuously.

Quick do / don’t checklist

The following short list is intended as a practical reminder when you start a new project:

  • Do verify data sources and tag consistency; don’t accept datasets without basic QA.
  • Do quantify uncertainty and sample size limits; don’t make definitive claims from small samples.
  • Do combine metrics with video and scouting; don’t let a single stat dictate decisions.
  • Do pilot changes in controlled settings; don’t deploy untested model-driven strategies in competitive matches.

Analytics in football has matured, but the human tendency to overinterpret, oversimplify, or overfit remains. Avoiding these common mistakes requires discipline, transparency, and a willingness to test ideas in the real world. When done thoughtfully, analytics amplifies insight rather than replacing judgment — and that balance is where the best teams find value.

Sources and experts

  • StatsBomb blog — https://statsbomb.com/blog/
  • Opta / Stats Perform (OptaPro insights) — https://www.optasports.com/insights/
  • FiveThirtyEight soccer — https://fivethirtyeight.com/sports/soccer/
  • FBref (stat repository and methodology) — https://fbref.com/
  • Understat (expected goals and models) — https://understat.com/
  • David Sumpter, Soccermatics — https://www.davidsumpter.net/books
Scroll to Top