← Back

How I'm Using Claude Code to Build a Basketball Biomechanics Pipeline

Last week I sat down with three years of FIBA footage and a problem: I played for Moldova U18 at the European Championship, and I wanted to know whether the gait patterns in warmups predict anything meaningful — workload, fatigue risk, player readiness. Not a startup idea. A methodology paper. The kind of thing that ends up as a CMSAC poster if it works, and a useful negative result if it doesn't.

The first thing I discovered is that NBA League Pass doesn't include warmups. So I pulled personal FIBA broadcast footage instead — starting with a 10-minute warmup clip from the Luxembourg game — and ran a manual pilot. Three layups, two free throws, counted by hand. The sample size problem was immediately obvious: you can't do statistics on five observations. So I pivoted from shot-counting to pose estimation: joint angles, bilateral symmetry, knee flex, vertical displacement. Things you can extract from a single player frame-by-frame.

The tech stack answer was MediaPipe Pose on top of OpenCV. But before I scaled to 21 games across three years of footage, I needed to know if MediaPipe could actually detect landmarks on broadcast quality video — the kind with motion blur, referee occlusion, and aggressive zoom cuts. A 2025 Nature paper on broadcast basketball pose estimation had already documented exactly these failure modes. So I defined a kill criterion: if MediaPipe can't detect a pose with avg landmark visibility ≥ 0.5 on a single frame, the whole approach fails and I need a different foundation.

That's where Claude Code entered the picture.


I've been using Claude Code for a few months across different projects, mostly as a fast implementation partner. For this project I wanted to be more deliberate — the analysis has to be reproducible enough to cite. So I went and read how actual researchers use it before writing a line of code.

The research surfaced a few patterns I hadn't been doing.

CLAUDE.md as a contract, not a README. The highest-leverage move isn't better prompts — it's encoding your constraints in a CLAUDE.md at the project root. MediaPipe version, visibility threshold, coordinate system convention, output CSV schema, kill criterion threshold. Researchers report Claude respects these constraints at nearly 100% vs ~70% for in-prompt instructions. I'd been writing long system prompts. I should have been writing short files.

Phase-isolated sessions. Env setup is one session. The kill criterion run is another. Feature engineering is another. Don't mix concerns in a single long context or you get compounding hallucinations about intermediate state. I'd been letting sessions run long; the pattern that works is shorter, focused bursts with disk-based checkpoints between them.

Kill criterion as code, not prose. Writing "stop if visibility < 0.5" in a plan document isn't the same as writing if avg_visibility < KILL_THRESHOLD: raise EarlyStopError(...) in the script itself. The second one actually enforces the criterion. This felt obvious after reading it, but I hadn't been doing it.

Cheap diagnostic plots at every stage. Visibility histograms, trajectory continuity checks, landmark delta outliers between frames. These catch broadcast-specific artifacts — a FIBA zoom cut mid-drill will produce a frame-to-frame delta spike that looks like a feature signal but isn't. Generate these before you touch feature engineering.

One feature per module. Each extracted feature (joint angle, symmetry ratio, velocity) lives in src/features/<name>.py with a unit test. This makes each feature independently droppable. If knee flex turns out to be noise, I remove one file, not a tangle of shared extraction logic.


The anti-pattern that resonated most: autonomous drift. Letting Claude run without interruption produces "negative productivity" — rabbit holes, accumulated cruft, wasted context window. The neuroai.science piece on this is worth reading in full. The framing is that Claude accelerates iteration in the direction you're pointing, which is a problem if you haven't verified you're pointing the right direction. Interrupt frequently. Check outputs against domain expectations before continuing.

For a biomechanics pipeline this matters a lot. A confident-sounding extraction of "knee flexion angle" that's actually measuring the wrong joint pair — because MediaPipe indices got flipped during a refactor — will pass unit tests and produce a plausible-looking CSV. The only defense is cheap visualizations and a human who knows what knees look like.


Right now I'm at Task 0 of the kill criterion: verify MediaPipe can find a player on a single frame of the Luxembourg footage before anything else. If that passes, Iter 1 is one game, one drill type, one feature (mean knee flex angle), one CSV row. Iter 2 adds player tracking and identity. Iter 3 is the full corpus.

The plan is written. The clip is downloaded. The next step is running uv venv --python 3.12 and seeing if MediaPipe agrees with my hypothesis.

I'll post the result.