scikit-survival: machine learning for time-to-event analysis
scikit-survival is a Python module for survival analysis built on top of scikit-learn. It allows doing survival analysis while utilizing the power of scikit-learn, e.g., for pre-processing or doing cross-validation.
About Survival Analysis
The objective in survival analysis (also referred to as reliability analysis in engineering) is to establish a connection between covariates and the time of an event. What makes survival analysis differ from traditional machine learning is the fact that parts of the training data can only be partially observed – they are censored.
For instance, in a clinical study, patients are often monitored for a particular time period, and events occurring in this particular period are recorded. If a patient experiences an event, the exact time of the event can be recorded – the patient’s record is uncensored. In contrast, right censored records refer to patients that remained event-free during the study period and it is unknown whether an event has or has not occurred after the study ended. Consequently, survival analysis demands for models that take this unique characteristic of such a dataset into account.
Posts
scikit-survival 0.23.0 released
I am pleased to announce the release of scikit-survival 0.23.0.
This release adds support for scikit-learn 1.4 and 1.5, which includes missing value support for RandomSurvivalForest. For more details on missing values support, see the section in the release announcement for 0.23.0.
Moreover, this release fixes critical bugs. When fitting SurvivalTree, the sample_weight
is now correctly considered when computing the log-rank statistic for each split. This change also affects RandomSurvivalForest and ExtraSurvivalTrees which pass sample_weight
to the individual trees in the ensemble. Therefore, the outputs produced by SurvivalTree,
RandomSurvivalForest, and ExtraSurvivalTrees will differ from previous releases.