NBA Roster Network Analysis • Shipped 2025

Predicting team resilience through salary-weighted player connections

Role

Data Scientist & Network Engineer

Timeline

Jul-Oct 2025

Team

Luke Blommesteyn

Yuvraj Sharma

Skills

Network Science

Machine Learning

Data Visualization

Python

Overview

What if we could predict which NBA rosters would collapse under pressure—before injuries even happen?

Traditional salary analysis treats players as isolated assets on a spreadsheet. But rosters aren't collections of individuals—they're networks of on-court partnerships. This project reframes roster construction as a network design problem to reveal hidden structural patterns.

Solution

A network-based framework that quantifies roster resilience through topology and stress-testing simulations.

Intuitive Network Visualization: See how salary flows through on-court partnerships—not just how much each player makes.

Roster Resilience Score (RRS): Simulate injuries to stars, role players, and connectors. Quantify expected performance drop.

Predictive Modeling Pipeline: Machine learning models that identify which roster structures survive disruptions.

Outcomes

Presented at Carnegie Mellon Sports Analytics Conference (CMSAC) 2025

Network features improved playoff prediction accuracy by 2.4 percentage points over baseline metrics.

Framework identifies structural vulnerabilities invisible to traditional analysis.

Reproducible pipeline processing 240k+ possession windows across 5 NBA seasons.

Initial Observations

Traditional roster analysis misses how players actually connect on the court.

Looking at NBA salary data, something stood out: two teams could spend identically but have completely different resilience to injuries. The missing piece wasn't how much they spent—it was how those dollars connected on the court.

The Problem: Salary cap sheets show individual contracts but not the network of who plays with whom. When a star gets injured, some teams collapse while others adapt. Why?

Key Insight: The structure of on-court partnerships matters as much as individual talent. Network topology could quantify this.

The Data Challenge

Combining messy real-world sports data into a unified network representation.

Three complex data sources needed to align: Play-by-play tracking (Cleaning the Glass, Sportradar), Salary databases (Basketball-Reference with name normalization), and Performance metrics (Four-factor stats, lineup net ratings).

The challenge: players change teams, names are inconsistent, two-way contracts complicate salary calculations, and garbage-time possessions skew co-presence metrics.

The result: salary-weighted graphs where node size = salary share and edge weight = co-presence intensity.

Design Process

A network framework with stress-testing capabilities.

The concept of modeling rosters as networks was compelling, but led to a key question: How do we quantify "resilience" in a way that's mathematically rigorous but interpretable?

I started by mapping basketball concepts onto network metrics: Star players → High degree centrality, Role players → Moderate betweenness, Bench depth → Modularity and community structure.

Two Roster Archetypes Emerged: Mesh Networks (Utah Jazz example) with salary distributed across connectors, retaining ~92% win equity after dual shocks, versus Hub Networks (New York Knicks example) with star-heavy builds losing ~18% win equity under similar scenarios.

Final Implementation

Prioritized a reproducible, modular pipeline with clear visual narratives.

The main deliverable is a data processing → network construction → modeling → visualization workflow. Clean interfaces between stages meant rapid iteration on different network configurations.

Key Features: Force-directed layouts reveal natural roster structure, color-coded salary bands make resource allocation visible, interactive filtering by team/season/position, and automated stress-testing simulates injury scenarios.

Technical Stack:

# Data Pipeline
pandas, numpy           # Data wrangling
networkx, igraph        # Network construction & metrics

# Modeling
scikit-learn            # Ridge regression, multinomial logit
scipy                   # Statistical tests

# Visualization
matplotlib, seaborn     # Publication-quality static figures
plotly, pyvis           # Interactive web-based exploration

Key Findings

Salary Mixing Predicts Resilience: Teams with lower salary assortativity—where high-paid players frequently share court time with mid-tier players—showed significantly higher RRS (Pearson r = -0.114, p = 0.036).

Connectors Are Structural Anchors: Players with high betweenness centrality (bridging different lineup groups) were critical. Removing them caused larger performance drops than removing similarly-paid stars.

Network Topology Adds Predictive Power: Leave-one-season-out cross-validation on playoff advancement showed baseline (net rating only) at 29.0% Macro-F1, while adding network topology achieved 31.3% Macro-F1.

Technical Deep Dive

Network Construction Algorithm

Built bounded co-presence intensity:

# Bounded co-presence intensity
w_ij = shared_possessions_ij / max(possessions_i, possessions_j)

# Filter low-minute players
V = {players with ≥ 300 possessions}

# Build weighted graph
G = (V, E, w) where E = {(i,j) : w_ij > threshold}

Stress-Testing Pipeline:

def compute_rrs(graph, model):
    intact_score = model.predict(extract_features(graph))
    
    scenarios = [
        remove_highest_degree(graph),      # Star injury
        remove_highest_betweenness(graph), # Connector loss
        remove_mid_salary(graph)           # Role player absence
    ]
    
    drops = [
        (intact_score - model.predict(extract_features(g))) / intact_score
        for g in scenarios
    ]
    
    return 1 - mean(drops)  # Higher RRS = more resilient

Reflection

What I Learned

Data pipeline foundations matter: Spent significant time on name-matching, salary normalization, and possession thresholds. Getting this right enabled rapid iteration on modeling approaches.

Visualization drives insight: Creating side-by-side network comparisons (mesh vs hub) communicated patterns instantly that would take paragraphs to explain.

Reproducibility = extensibility: Pinning package versions, documenting schemas, and writing modular scripts meant the framework could be extended to other sports without rewriting the entire pipeline.

Start simple, validate, iterate: Initial versions used unweighted edges and basic centrality metrics. Only after validating the core concept did I add salary weighting, community detection, and multi-scenario stress tests.

Impact & Applications

This framework bridges network science and practical sports analytics with applications in trade simulation, draft strategy, contract allocation, and injury risk planning.

View the full reproducible pipeline on GitHub: github.com/lblommesteyn/NBA_Salary_Network_Analysis_CMSAC

Presented at Carnegie Mellon Sports Analytics Conference 2025 in collaboration with Luke Blommesteyn and Yuvraj Sharma (University of Western Ontario).

LUCIAN (LUKA) LAVRIC

SOFTWARE ENGINEER

ABOUT

RESUME