Roster Resiliency • Shipped 2025
NBA roster network analysis
Role
Data Scientist & Network Engineer
Timeline
Jul-Oct 2025
Team
Luke Blommesteyn
Yuvraj Sharma
Me
Skills
Network Science
Machine Learning
Data Visualization
Python
Statistical Analysis
Abstract
Linking salary topology to lineup connectivity
We test whether the network geometry of NBA rosters—how salary resources are distributed across players who actually share the court—predicts playoff stability. Prior work links payroll level to outcomes or analyzes in-game pass networks, but rarely ties salary allocation to lineup connectivity or evaluates resilience to disruptions. We model each team-season as a salary-weighted, shared-possession network: nodes are players with size proportional to salary share, and edges capture co-presence intensity. Using public data from Basketball-Reference and Cleaning the Glass for 2020–21 through 2024–25 (149 team-seasons), we show that topology improves prediction of ordinal playoff advancement beyond a strength control—raising Macro-F1 from 29.0% to 31.3%.
Research Questions
RQ1. Do salary-network topology features add anything beyond a traditional team-strength control when we predict playoff advancement?
RQ2. Which specific patterns—negative assortativity, decentralisation, concentrated edges—show up in the rosters that stay resilient?
RQ3. Does weighting co-presence by high-leverage possessions improve explanatory power compared with a simple shared-possession count?
Why This Matters
Practical implications for different audiences
For analytics teams: The features here bolt onto models you already run. Because they're built from public salary and lineup data, you can recreate them in a notebook, plug them into playoff simulators, and quantify how much topology improves lift over adjusted net rating.
For coaches and executives: The punch line is intuitive—it's not enough to sign expensive talent, you have to spread that talent across lineups that actually play together. Rosters that avoid a single high-salary hub and instead keep multiple "bridges" between player groups stay upright when injuries, foul trouble, or matchup tweaks hit.
Bottom line: Resilience is about arranging salaries so that every unit has creation, defense, and connective glue.
Data Pipeline
149 team-seasons from 2020–21 through 2024–25
Lineups and possessions: Lineup and four-factor tables from Cleaning the Glass. We compute co-presence counts for each pair of teammates and aggregate lineup Off/Def points per possession to a team-season strength proxy.
Salaries: Player and team salary tables from Basketball-Reference, normalized to within-team salary shares.
Pipeline steps:
- Parse lineups into possession-weighted on-court units
- Merge salaries and normalize to team share per player
- Compute co-presence counts and player possessions
- Build graph G with threshold on low-minutes nodes
- Compute topology features: salary dispersion, salary assortativity, community structure, centralization, edge concentration
- Aggregate lineup Off/Def PPP to team NR and attach playoff labels
- Export leave-one-season-out splits with frozen seeds
Methods
Network construction and topology features
Network construction: Let V be players with ≥300 possessions. For players i and j, define bounded co-presence intensity: w_ij = shared_possessions_ij / max(possessions_i, possessions_j). Node size equals the player's salary share.
Topology features: • Salary dispersion: Gini and top-k share • Salary assortativity: weighted Pearson correlation of salary shares across edges • Community structure: modularity Q and coefficient of variation of community sizes • Centralization: Freeman degree centralization with edge weights • Edge concentration: fraction of total weight captured by top 5 and top 10 edges
Roster Resilience Score (RRS): We remove, in turn, the highest-degree node (star), a mid-salary node (role player), and the highest-betweenness node (connector). After each removal we recompute features and score with a ridge regression proxy. RRS = 1 - mean(relative drops).
Results
Topology adds signal beyond strength controls
Incremental predictive value (Leave-one-season-out CV):
| Model | Macro-F1 | Accuracy | MAE |
|---|---|---|---|
| A: Controls only | 29.0% | 57.6% | 0.656 |
| B: + Salary dispersion | 26.5% | 53.1% | 0.651 |
| C: + Connectivity | 31.9% | 54.7% | 0.665 |
| D: + Full topology | 31.3% | 54.3% | 0.663 |
Notice how Macro-F1 peaks once connectivity enters. Salary spread alone actually hurts, but mixing in the graph view recovers lost ground and then some.
Key findings:
- Connectivity beats a strength-only control
- Negative salary assortativity (mixing salaries across connected lineups) correlates with resilience
- Lower edge concentration = fewer single points of failure
Case Studies
Two roster archetypes with different resilience profiles
Star-centered topology: High edge concentration, most possessions flow through one or two hubs. Looks pretty until the hub sits—predicted to lose ~18% win equity under dual shocks.
Distributed topology: Balanced communities with multiple playmaking bridges. Salary distributed across connectors, retaining ~92% win equity after similar scenarios.
We validated with permutation tests: shuffling salaries on the same lineup network shows that real teams with stronger mixing consistently beat random baselines and advance further in playoffs.
Technical Implementation
Reproducible pipeline with clean interfaces
# Bounded co-presence intensity w_ij = shared_possessions_ij / max(possessions_i, possessions_j) # Filter low-minute players V = {players with >= 300 possessions} # Build weighted graph G = (V, E, w) where E = {(i,j) : w_ij > threshold}
def compute_rrs(graph, model): intact_score = model.predict(extract_features(graph)) scenarios = [ remove_highest_degree(graph), # Star injury remove_highest_betweenness(graph), # Connector loss remove_mid_salary(graph) # Role player absence ] drops = [(intact_score - model.predict(extract_features(g))) / intact_score for g in scenarios] return 1 - mean(drops) # Higher RRS = more resilient
Stack: pandas, numpy, networkx, igraph, scikit-learn, scipy, matplotlib, seaborn, plotly, pyvis
Limitations
Salary data: Name resolution can miss two-way or ten-day players. We report match rates and rerun with strict/lenient filters.
Leverage: Edges currently reflect possession counts, not clutch weighting. A close/late variant is on the roadmap.
Confounding: Minutes, role, and salary correlate tightly. We include NR strength as a control and publish ablations.
Generalisation: Five seasons and 149 team-seasons limit scope. We favour regularisation and season-wise cross-validation to keep claims modest.
Conclusion
Stagger your stars, keep trusted combos across the rotation
We link salary-weighted roster topology to lineup connectivity and show that connectivity adds information beyond team strength. Topology features—especially salary assortativity and edge concentration—improve ordinal playoff prediction over a strength-only control.
For practice: Front offices can stagger high salaries across lineups and reduce edge concentration. These steps improve robustness without raising total payroll.
Bottom line for hoop heads: Stagger your stars, keep trusted combos across the rotation, and treat payroll like a web, not a totem pole. That's what the math is screaming.
View the full reproducible pipeline: github.com/lblommesteyn/NBA_Salary_Network_Analysis_CMSAC