HCP Analytics
A K-means clustering engine trained on synthetic prescriber data to segment HCPs in the TGCT (Tenosynovial Giant Cell Tumor) market. This proof of concept demonstrates how a pre-commercial biotech can build actionable physician segmentation from publicly available signals — claims data, publication databases, trial registries, and conference attendance — when entering a rare oncology market with zero proprietary prescribing data against an established incumbent (Turalio).
Data Sources: CMS Open Payments (General Payments), NPPES NPI Registry, PubMed E-utilities for publication matching. K-means++ clustering with silhouette validation.
Generating synthetic HCP data & running K-means clustering...
Lloyd's algorithm with K-means++ initialization for stable centroid seeding. Convergence typically within 15-25 iterations. Silhouette analysis validates cluster separation quality across k=2-8.
8-dimensional feature space constructed from 4 pre-launch data sources. Z-score standardization ensures equal feature weighting. PCA projection to 2D enables visual validation of cluster separation.
Segment membership drives omnichannel budget allocation across 6 engagement channels. Maps unsupervised clustering output to actionable media planning — connecting data science to commercial execution.