Roster Resiliency • Shipped 2025

NBA roster network analysis

Role

Data Scientist & Network Engineer

Timeline

Jul-Oct 2025

Team

Luke Blommesteyn

Yuvraj Sharma

Me

Skills

Network Science

Machine Learning

Data Visualization

Python

Statistical Analysis

Abstract

Linking salary topology to lineup connectivity

We test whether the network geometry of NBA rosters—how salary resources are distributed across players who actually share the court—predicts playoff stability. Prior work links payroll level to outcomes or analyzes in-game pass networks, but rarely ties salary allocation to lineup connectivity or evaluates resilience to disruptions. We model each team-season as a salary-weighted, shared-possession network: nodes are players with size proportional to salary share, and edges capture co-presence intensity. Using public data from Basketball-Reference and Cleaning the Glass for 2020–21 through 2024–25 (149 team-seasons), we show that topology improves prediction of ordinal playoff advancement beyond a strength control—raising Macro-F1 from 29.0% to 31.3%.

Research Questions

RQ1. Do salary-network topology features add anything beyond a traditional team-strength control when we predict playoff advancement?

RQ2. Which specific patterns—negative assortativity, decentralisation, concentrated edges—show up in the rosters that stay resilient?

RQ3. Does weighting co-presence by high-leverage possessions improve explanatory power compared with a simple shared-possession count?

Why This Matters

Practical implications for different audiences

For analytics teams: The features here bolt onto models you already run. Because they're built from public salary and lineup data, you can recreate them in a notebook, plug them into playoff simulators, and quantify how much topology improves lift over adjusted net rating.

For coaches and executives: The punch line is intuitive—it's not enough to sign expensive talent, you have to spread that talent across lineups that actually play together. Rosters that avoid a single high-salary hub and instead keep multiple "bridges" between player groups stay upright when injuries, foul trouble, or matchup tweaks hit.

Bottom line: Resilience is about arranging salaries so that every unit has creation, defense, and connective glue.

Data Pipeline

149 team-seasons from 2020–21 through 2024–25

Lineups and possessions: Lineup and four-factor tables from Cleaning the Glass. We compute co-presence counts for each pair of teammates and aggregate lineup Off/Def points per possession to a team-season strength proxy.

Salaries: Player and team salary tables from Basketball-Reference, normalized to within-team salary shares.

Pipeline steps:

  1. Parse lineups into possession-weighted on-court units
  2. Merge salaries and normalize to team share per player
  3. Compute co-presence counts and player possessions
  4. Build graph G with threshold on low-minutes nodes
  5. Compute topology features: salary dispersion, salary assortativity, community structure, centralization, edge concentration
  6. Aggregate lineup Off/Def PPP to team NR and attach playoff labels
  7. Export leave-one-season-out splits with frozen seeds

Methods

Network construction and topology features

Network construction: Let V be players with ≥300 possessions. For players i and j, define bounded co-presence intensity: w_ij = shared_possessions_ij / max(possessions_i, possessions_j). Node size equals the player's salary share.

Topology features: • Salary dispersion: Gini and top-k share • Salary assortativity: weighted Pearson correlation of salary shares across edges • Community structure: modularity Q and coefficient of variation of community sizes • Centralization: Freeman degree centralization with edge weights • Edge concentration: fraction of total weight captured by top 5 and top 10 edges

Roster Resilience Score (RRS): We remove, in turn, the highest-degree node (star), a mid-salary node (role player), and the highest-betweenness node (connector). After each removal we recompute features and score with a ridge regression proxy. RRS = 1 - mean(relative drops).

Results

Topology adds signal beyond strength controls

Incremental predictive value (Leave-one-season-out CV):

ModelMacro-F1AccuracyMAE
A: Controls only29.0%57.6%0.656
B: + Salary dispersion26.5%53.1%0.651
C: + Connectivity31.9%54.7%0.665
D: + Full topology31.3%54.3%0.663

Notice how Macro-F1 peaks once connectivity enters. Salary spread alone actually hurts, but mixing in the graph view recovers lost ground and then some.

Key findings:

  1. Connectivity beats a strength-only control
  2. Negative salary assortativity (mixing salaries across connected lineups) correlates with resilience
  3. Lower edge concentration = fewer single points of failure

Case Studies

Two roster archetypes with different resilience profiles

Star-centered topology: High edge concentration, most possessions flow through one or two hubs. Looks pretty until the hub sits—predicted to lose ~18% win equity under dual shocks.

Distributed topology: Balanced communities with multiple playmaking bridges. Salary distributed across connectors, retaining ~92% win equity after similar scenarios.

We validated with permutation tests: shuffling salaries on the same lineup network shows that real teams with stronger mixing consistently beat random baselines and advance further in playoffs.

Technical Implementation

Reproducible pipeline with clean interfaces

# Bounded co-presence intensity w_ij = shared_possessions_ij / max(possessions_i, possessions_j) # Filter low-minute players V = {players with >= 300 possessions} # Build weighted graph G = (V, E, w) where E = {(i,j) : w_ij > threshold}
def compute_rrs(graph, model): intact_score = model.predict(extract_features(graph)) scenarios = [ remove_highest_degree(graph), # Star injury remove_highest_betweenness(graph), # Connector loss remove_mid_salary(graph) # Role player absence ] drops = [(intact_score - model.predict(extract_features(g))) / intact_score for g in scenarios] return 1 - mean(drops) # Higher RRS = more resilient

Stack: pandas, numpy, networkx, igraph, scikit-learn, scipy, matplotlib, seaborn, plotly, pyvis

Limitations

Salary data: Name resolution can miss two-way or ten-day players. We report match rates and rerun with strict/lenient filters.

Leverage: Edges currently reflect possession counts, not clutch weighting. A close/late variant is on the roadmap.

Confounding: Minutes, role, and salary correlate tightly. We include NR strength as a control and publish ablations.

Generalisation: Five seasons and 149 team-seasons limit scope. We favour regularisation and season-wise cross-validation to keep claims modest.

Conclusion

Stagger your stars, keep trusted combos across the rotation

We link salary-weighted roster topology to lineup connectivity and show that connectivity adds information beyond team strength. Topology features—especially salary assortativity and edge concentration—improve ordinal playoff prediction over a strength-only control.

For practice: Front offices can stagger high salaries across lineups and reduce edge concentration. These steps improve robustness without raising total payroll.

Bottom line for hoop heads: Stagger your stars, keep trusted combos across the rotation, and treat payroll like a web, not a totem pole. That's what the math is screaming.