← Back to Index

HCP Analytics

ML-Based KOL/HCP
Segmentation

A K-means clustering engine trained on synthetic prescriber data to segment HCPs in the TGCT (Tenosynovial Giant Cell Tumor) market. This proof of concept demonstrates how a pre-commercial biotech can build actionable physician segmentation from publicly available signals — claims data, publication databases, trial registries, and conference attendance — when entering a rare oncology market with zero proprietary prescribing data against an established incumbent (Turalio).

K-Means ClusteringPCA VisualizationSilhouette AnalysisSynthetic Claims DataChannel AllocationPre-Launch StrategyFeature EngineeringInteractive Profiler

Data Sources: CMS Open Payments (General Payments), NPPES NPI Registry, PubMed E-utilities for publication matching. K-means++ clustering with silhouette validation.

Generating synthetic HCP data & running K-means clustering...

Technical Architecture

K-Means Clustering

Lloyd's algorithm with K-means++ initialization for stable centroid seeding. Convergence typically within 15-25 iterations. Silhouette analysis validates cluster separation quality across k=2-8.

Feature Engineering

8-dimensional feature space constructed from 4 pre-launch data sources. Z-score standardization ensures equal feature weighting. PCA projection to 2D enables visual validation of cluster separation.

Channel Optimization

Segment membership drives omnichannel budget allocation across 6 engagement channels. Maps unsupervised clustering output to actionable media planning — connecting data science to commercial execution.

References

  • Lloyd SP. Least squares quantization in PCM. IEEE Trans Inform Theory. 1982;28(2):129-137.
  • Arthur D, Vassilvitskii S. K-means++: The advantages of careful seeding. Proc SODA. 2007;1027-1035.
  • Rousseeuw PJ. Silhouettes: A graphical aid to interpretation of cluster analysis. J Comput Appl Math. 1987;20:53-65.
  • Hotelling H. Analysis of a complex of statistical variables into principal components. J Educ Psychol. 1933;24(6):417-441.
  • Tap WD, et al. Pexidartinib versus placebo for advanced tenosynovial giant cell tumour (ENLIVEN). Lancet. 2019;394(10197):478-487.
  • Campbell JD, et al. HCP segmentation for rare disease commercialization. J Med Mark. 2021;21(2):89-101.

Daniel Tran, PharmD

UC San Diego — Skaggs School of Pharmacy

Source code MIT. Content © 2026 Daniel Tran (CC BY-NC-SA 4.0).