Predictive-analysis-of-in-hospital-mortality-in-patients-with-ovarian-cancer

Authors

María Elisa Joannon Ovalle

Amaru Simón Agüero Jiménez

Published

October 24, 2025

1 Abstract

1.1 Introduction

Ovarian cancer remains one of the most lethal gynecological malignancies worldwide, with five-year survival rates persistently below 50% despite advances in surgical techniques and systemic therapies. The heterogeneous clinical trajectory of ovarian cancer patients, particularly during hospitalization episodes, presents significant challenges for risk stratification and resource allocation. While traditional prognostic models have focused primarily on tumor-specific characteristics and staging parameters, the complex interplay between cancer-related complications and pre-existing or treatment-emergent comorbidities remains inadequately characterized in predictive frameworks for acute care outcomes.

1.2 Objectives

This investigation aimed to develop and validate a comprehensive predictive modeling framework for in-hospital mortality among ovarian cancer patients within the Chilean public health system. The primary objective encompassed the development of hierarchical clustering methodologies to identify clinically coherent comorbidity patterns and their integration into supervised machine learning models for mortality prediction. Secondary objectives included the temporal characterization of disease burden evolution, quantification of cluster-specific mortality associations, and optimization of predictive performance through advanced class imbalancing techniques.

1.3 Methods

A retrospective cohort study was conducted utilizing administrative discharge data from the Chilean public health system spanning 2019-2024. Patients with ICD-10 code C56 (malignant neoplasm of ovary) in any diagnostic position were identified, with secondary diagnoses (positions 2-35) analyzed for comorbidity characterization. Unsupervised hierarchical clustering was performed using Jaccard similarity metrics and Ward’s linkage algorithm on a co-occurrence matrix of comorbidity codes. Multiple validation indices, including silhouette coefficient, Calinski-Harabasz index, and Davies-Bouldin index, were employed to determine optimal cluster solutions. Supervised predictive models were developed using the 2023-2024 cohort, incorporating demographic, clinical, and cluster-derived features. Class imbalance was addressed through multiple resampling strategies including SMOTE, ADASYN, and hybrid approaches. Model performance was evaluated using stratified cross-validation with comprehensive metrics including AUC-ROC, precision-recall curves, and threshold-specific performance indicators.

1.4 Results

The analysis encompassed 11,244 hospitalizations meeting inclusion criteria, with an overall in-hospital mortality rate of 12.3%. Hierarchical clustering identified thirteen distinct comorbidity patterns, ranging from chemotherapy-related toxicities to organ-specific dysfunction syndromes. Patients with multiple concurrent clusters demonstrated monotonically increasing mortality risk, with relative risks exceeding 3.5 for those presenting four or more cluster memberships. Temporal analysis revealed progressive comorbidity accumulation, with mean cluster burden increasing from 2.1±1.3 in 2019 to 3.4±1.8 in 2024. Supervised modeling achieved optimal performance using random forest algorithms with SMOTE-Tomek hybrid balancing, yielding AUC-ROC of 0.842 (95% CI: 0.821-0.863) and balanced accuracy of 0.783. Feature importance analysis identified age, chemotherapy-associated cluster membership, and acute organ failure indicators as dominant predictors. Threshold optimization analysis demonstrated substantial performance variability, with precision-recall trade-offs suggesting optimal operating points between 0.65-0.70 probability cutoffs for clinical deployment.

1.5 Conclusions

The integration of hierarchical comorbidity clustering with modern machine learning approaches provides robust predictive capabilities for in-hospital mortality among ovarian cancer patients. The identified comorbidity patterns reflect both disease-specific complications and treatment-related sequelae, offering actionable insights for risk stratification and care intensification decisions.

1.6 Relevance/Implications

These findings have immediate implications for clinical decision support system development, enabling proactive identification of high-risk patients upon admission. The hierarchical clustering framework offers a reproducible methodology for comorbidity pattern recognition applicable to other oncological populations. Furthermore, the demonstrated importance of treatment-related comorbidity clusters underscores the need for integrated supportive care protocols and systematic toxicity monitoring in ovarian cancer management.

2 Introduction

The landscape of ovarian cancer care has undergone substantial transformation over the past decade, with advances in surgical cytoreduction techniques, targeted therapies, and maintenance treatment strategies (Lheureux et al., 2019). Despite these therapeutic innovations, ovarian cancer continues to exhibit the highest mortality-to-incidence ratio among gynecological malignancies, with population-level survival improvements remaining modest (Torre et al., 2018). This persistent mortality burden is particularly pronounced during acute hospitalization episodes, where the convergence of disease progression, treatment-related toxicities, and underlying comorbidities creates complex clinical scenarios that challenge conventional prognostic frameworks (Tetsche et al., 2008).

The conceptualization of comorbidity in oncology has evolved from simple enumeration of concurrent conditions to sophisticated network-based approaches that recognize the interdependence and synergistic effects of multiple disease processes (Valderas et al., 2009). In ovarian cancer populations, the comorbidity landscape is uniquely shaped by several factors: the typically advanced age at diagnosis, the aggressive nature of standard treatment regimens, and the propensity for disease-related complications affecting multiple organ systems (AlHilli et al., 2018). Traditional comorbidity indices, such as the Charlson Comorbidity Index or Elixhauser classification, while valuable for general risk adjustment, may inadequately capture the specific constellation of conditions that influence acute outcomes in this population (Quan et al., 2011).

Recent developments in unsupervised machine learning have enabled data-driven discovery of comorbidity patterns that transcend traditional anatomical or pathophysiological classifications (Hidalgo et al., 2009a). These approaches, particularly hierarchical clustering methodologies, offer the potential to identify clinically meaningful disease combinations that might otherwise remain obscured within conventional taxonomies (Roque et al., 2011). The application of such techniques to ovarian cancer populations represents an unexplored opportunity to enhance understanding of mortality determinants and improve predictive modeling capabilities.

The Chilean healthcare context provides a unique setting for this investigation, characterized by universal health coverage through a dual public-private system and comprehensive administrative data capture (Goic, 2015). The availability of population-level discharge data with detailed diagnostic coding enables robust epidemiological analyses while maintaining external validity for similar middle-income healthcare environments. Furthermore, the temporal span encompassing both pre-pandemic and post-pandemic periods offers insights into the evolution of care patterns and outcomes under varying healthcare system stressors.

3 Objectives

3.1 General Objective

To develop and validate a comprehensive predictive modeling framework for in-hospital mortality among adult patients hospitalized with ovarian cancer (ICD-10: C56) in the Chilean public health system, integrating hierarchical clustering of comorbidities with supervised machine learning approaches to enhance risk stratification capabilities.

3.2 Specific Objectives

To construct and validate a hierarchical clustering taxonomy of comorbidity patterns specific to ovarian cancer hospitalizations, employing unsupervised learning techniques on population-level administrative data to identify clinically coherent disease groupings beyond traditional classification systems.
To characterize the temporal evolution of the ovarian cancer hospitalization cohort from 2019 to 2024, analyzing trends in patient volume, demographic composition, service utilization patterns, disease severity indicators, and crude mortality rates to contextualize the predictive modeling framework.
To quantify the independent and synergistic associations between identified comorbidity clusters and in-hospital mortality, estimating cluster-specific relative risks and examining dose-response relationships between cumulative cluster burden and mortality outcomes.
To develop, optimize, and validate supervised machine learning models for in-hospital mortality prediction using contemporary data (2023-2024), systematically evaluating multiple algorithms, class balancing strategies, and decision threshold configurations to maximize predictive performance while maintaining clinical interpretability.

4 Hypotheses

4.1 Primary Hypothesis

Patients with ovarian cancer presenting multiple concurrent comorbidity clusters, particularly those encompassing chemotherapy-related toxicities and acute organ dysfunction syndromes, will demonstrate significantly elevated in-hospital mortality risk compared to patients with isolated or minimal comorbidity burden, with effect sizes exceeding those observed for traditional prognostic indicators.

4.2 Secondary Hypotheses

Treatment-emergent comorbidity clusters, specifically those related to chemotherapy-induced toxicities and post-procedural complications, will exhibit stronger predictive associations with in-hospital mortality than clusters representing chronic pre-existing conditions, reflecting the acute vulnerability imposed by cancer-directed therapies.
Ovarian cancer-specific complications, including malignant ascites, intestinal obstruction, and venous thromboembolism, will maintain independent predictive value for mortality after comprehensive adjustment for age, general comorbidity burden, and treatment intensity markers.
Temporal trend analysis will reveal progressive increases in comorbidity complexity and cluster co-occurrence over the study period, reflecting both enhanced diagnostic capture and intensification of treatment approaches, with corresponding implications for mortality risk stratification.

5 Methods

5.1 Study Design and Data Source

A retrospective cohort of hospital discharges from the Chilean public health system (2019–2024) was developed. Case identification was performed by presence of ICD-10 code C56 (ovarian cancer) in any diagnostic position. DIAG1–DIAG35 were analyzed, with DIAG2–DIAG35 capturing comorbidities. Discharges with age ≥18 years and defined discharge status (alive/deceased) were included. Exclusions: births, administrative discharges, missing sex/age records, and administrative duplicates. Primary outcome: in-hospital mortality.

5.2 Unsupervised Analysis of Comorbidities (DIAG2–DIAG35)

5.2.1 Co-occurrence Matrix Construction

A binary patient-diagnosis matrix \(\mathbf{B}\) was constructed, where:

\[ \mathbf{B}_{ij} = \begin{cases} 1 & \text{if patient } i \text{ presents diagnosis } j \\ 0 & \text{otherwise} \end{cases} \]

The dimensions of \(\mathbf{B}\) were \(n \times m\), with \(n\) the total number of ovarian cancer patients and \(m\) the number of frequent ICD-10 codes meeting inclusion criteria. The co-occurrence matrix was calculated as:

\[ \mathbf{C} = \mathbf{B}^\top \mathbf{B} \]

Thus, each element \(C_{jk}\) represents the number of patients with both conditions \(j\) and \(k\). This co-occurrence approach is standard in numerical ecology and community analysis (Legendre & Legendre, 2012) and has been applied to multimorbidity networks and disease trajectories in population data (Hidalgo et al., 2009b).

5.2.2 Similarity Measurement

The Jaccard similarity coefficient was employed to quantify the relationship between comorbidity pairs, appropriate for presence/absence data and asymmetric co-occurrences (Jaccard, 1901; Real & Vargas, 1996). For two conditions \(j\) and \(k\):

\[ J(j,k) \;=\; \frac{|P_j \cap P_k|}{|P_j \cup P_k|} \;=\; \frac{C_{jk}}{C_{jj} + C_{kk} - C_{jk}} \]

where \(P_j\) and \(P_k\) are the sets of patients with conditions \(j\) and \(k\), respectively. \(J \in [0,1]\) (0: no shared patients; 1: perfect match). For numerical stability, a small term was added in the denominator (\(\epsilon = 10^{-10}\)). The resulting similarity matrix is symmetric with diagonal fixed at \(1\) (perfect self-similarity).

5.2.3 Hierarchical Clustering Analysis

Similarity was transformed into a distance matrix suitable for clustering:

\[ D_{jk} = 1 - J(j,k) \]

Jaccard distance \(D\) is a valid metric for binary vectors and is widely used in dissimilarity analysis (Gower & Legendre, 1986; Legendre & Legendre, 2012). Agglomerative hierarchical clustering with Ward’s minimum variance criterion was applied to \(D\), minimizing intra-cluster heterogeneity at each merge (Ward, 1963). Ward’s linkage is expressed as:

\[ d(u,v) \;=\; \sqrt{\frac{2\,|u|\,|v|}{|u|+|v|}} \; \lVert c_u - c_v \rVert_2 \]

where \(|u|\) and \(|v|\) are cluster sizes and \(c_u\), \(c_v\) their centroids. The Ward.D2 variant was used, conforming to the modern Ward algorithm formulation and Lance-Williams update (Murtagh & Legendre, 2014).

5.2.4 Optimal Number of Clusters Determination

Partitions with \(k \in [5,20]\) were evaluated using multiple validation metrics:

Inertia (within-cluster sum of squares)
For a solution with \(k\) clusters: \[ W_k = \sum_{i=1}^{k} \sum_{x \in C_i} \lVert x - \mu_i \rVert^2 \] where \(\mu_i\) is the centroid of cluster \(C_i\).
Silhouette coefficient (Rousseeuw, 1987)
\[ s(i) = \frac{b(i) - a(i)}{\max\{a(i),\, b(i)\}} \] with \(a(i)\) the mean distance from \(i\) to its cluster and \(b(i)\) the minimum mean distance to other clusters.
Calinski-Harabasz index (variance ratio) (Caliński & Harabasz, 1974)
\[ CH = \frac{B_k/(k-1)}{W_k/(n-k)} \]
Davies-Bouldin index (Davies & Bouldin, 1979)
\[ DB = \frac{1}{k} \sum_{i=1}^{k} \max_{j \neq i} \left(\frac{\sigma_i + \sigma_j}{d(c_i, c_j)}\right) \]
Elbow method via fusion distances
The rate of change of dendrogram fusion distances was analyzed to detect natural breaks in the hierarchy.

5.2.5 Composite Score Calculation

To synthesize evidence, each metric was normalized to \([0,1]\):

\[ \text{Norm}(x) = \frac{x - x_{\min}}{x_{\max} - x_{\min}} \]

For metrics where lower values are better (Inertia, Davies-Bouldin), the complement was used:

\[ \text{Norm}_{\text{inv}}(x) = 1 - \text{Norm}(x) \]

The composite score was defined as the arithmetic mean of the four normalized metrics:

\[ \text{Score}_{\text{comp}} = \frac{1}{4} \sum_{m=1}^{4} \text{Norm}_m \]

5.2.6 Cluster Characterization

For each selected solution:

Size distribution (number of ICD-10 codes per cluster)
Total and mean frequency of patients per cluster
Most frequent diagnoses (top-10) within each cluster
Membership stability when varying cut height

Dendrogram visualization used logarithmic rescaling to improve readability of low-level merges:

\[ y_{\text{display}} = \log\!\bigl(1 + y_{\text{original}}\bigr) \]

5.3 Supervised Modeling of In-Hospital Mortality (2023-2024 Cohort)

Predictive analysis was performed on the most recent data (2023-2024) to ensure model relevance. Let \(Y\in\{0,1\}\) be in-hospital mortality. The design matrix included continuous variables (age, length of stay), indicators \(\{Z_g\}\) and categorical variables encoded as one-hot (sex, cancer type, service, severity and DRG risk; reference by first category elimination). Stratified train/test partition 70/30 with seed 123 was performed. For optimization stability and scale homogenization, both continuous and binary variables were standardized using z-score:

\[ \tilde{x}=\frac{x-\mu_{\text{train}}}{\sigma_{\text{train}}}, \]

applying parameters learned in training to the validation set, avoiding information leakage. Class imbalance was treated only within training with various schemes: random oversampling, SMOTE (Chawla et al., 2002), ADASYN (He et al., 2008), random undersampling, NearMiss, and SMOTE+Tomek (Tomek link cleaning) (Batista et al., 2004; Tomek, 1976). In SMOTE, for a minority point \(\mathbf{x}_i\) and minority neighbor \(\mathbf{x}_i^{(k)}\):

\[ \tilde{\mathbf{x}}=\mathbf{x}_i+\delta\big(\mathbf{x}_i^{(k)}-\mathbf{x}_i\big),\quad \delta\sim\mathcal{U}(0,1). \]

Two model families were fitted with seed 123: \(\ell_2\)-penalized logistic regression (regularized maximum likelihood solution) and random forest (n_estimators=100, max_depth=10). In logistic regression:

\[ \Pr(Y=1\mid \mathbf{x})=\sigma(\eta),\quad \eta=\beta_0+\mathbf{x}^\top\boldsymbol{\beta},\quad \sigma(t)=\frac{1}{1+e^{-t}}, \]

minimizing cross-entropy loss with penalty:

\[ \mathcal{L}(\boldsymbol{\beta})=-\sum_{i=1}^{N}\big[y_i\log p_i+(1-y_i)\log(1-p_i)\big]+\lambda\lVert\boldsymbol{\beta}\rVert_2^2,\quad p_i=\sigma(\eta_i),\ \lambda=1/C, \]

with interpretation of \(\exp\{\beta_j\}\) as odds ratio per unit increase in \(\tilde{x}_j\) (Hosmer et al., 2013). Random forest used Gini criterion:

\[ G=1-\sum_{c\in\{0,1\}}p_c^2, \]

selecting splits maximizing \(\Delta G\) and averaging votes from 100 trees (Breiman, 2001). Variable importance was reported as coefficient magnitude in logistic regression and permutation importance in forest, less biased than Gini-based (Strobl et al., 2007).

Validation was performed with stratified 5-fold cross-validation on training set, calculating accuracy, precision, recall, F1 and AUC-ROC. After CV, each model was retrained on training (with corresponding balancing scheme) and evaluated on validation partition. ROC curve was obtained by varying threshold \(\tau\) to plot \(\text{TPR}(\tau)\) versus \(\text{FPR}(\tau)\) with area under curve (AUC) calculated by trapezoidal rule (Fawcett, 2006). Given imbalance, precision-recall curve was considered as more informative complementary metric (Saito & Rehmsmeier, 2015). Operational threshold selection was explored on grid \(\tau\in\{0.10,0.15,\dots,0.85\}\) retaining maximum F1; if differential costs were defined, Bayes threshold satisfies \(\tau=c_{\mathrm{FP}}/(c_{\mathrm{FP}}+c_{\mathrm{FN}})\).

5.4 Ethical Considerations

Secondary pseudoanonymized data from FONASA open data (Fondo Nacional de Salud (FONASA), 2021) were used, complying with MINSAL regulations and principles of minimization and confidentiality protection.

6 Results

6.1 Initial Exploratory Analysis

The combined 2019-2024 schema inspection shows a heterogeneous completeness pattern. Key demographic and administrative variables (e.g., COD_HOSPITAL, SEXO, IR_29301_*, admission and discharge dates) are practically complete, while several secondary diagnostic and procedural fields present high null percentages. In particular, DIAGNOSTICO10-DIAGNOSTICO35 exhibit >88% empty cells and several exceed 99%, justifying the treatment of comorbidities based on the most frequent non-oncological diagnoses and clustering aggregation. Among context variables, USOSPABELLON has ~51% nulls and ESPECIALIDADINTERVENCION ~45%, so their direct use in modeling requires caution or explicit coding of record absence.

Code

schema_all = build_schema_df_combined(only_ovarian_cancer=True)
# Display only a subset of relevant columns for brevity
schema_subset = schema_all[schema_all['Variable'].str.contains('DIAGNOSTICO|SEXO|EDAD|FECHA|IR_29301|COD_HOSPITAL|SERVICIO|TIPOALTA', regex=True)]
save_table(schema_subset, 'data_table')
schema_subset.style.hide(axis="index").set_caption("")

Variable	Type	Nulls	Null_Percentage
COD_HOSPITAL	object	0	0.000000
DIAGNOSTICO1	object	0	0.000000
DIAGNOSTICO10	object	9944	76.600000
DIAGNOSTICO11	object	10651	82.100000
DIAGNOSTICO12	object	11179	86.200000
DIAGNOSTICO13	object	11587	89.300000
DIAGNOSTICO14	object	11921	91.900000
DIAGNOSTICO15	object	12169	93.800000
DIAGNOSTICO16	object	12340	95.100000
DIAGNOSTICO17	object	12502	96.300000
DIAGNOSTICO18	object	12624	97.300000
DIAGNOSTICO19	object	12708	97.900000
DIAGNOSTICO2	object	553	4.300000
DIAGNOSTICO20	object	12769	98.400000
DIAGNOSTICO21	object	12819	98.800000
DIAGNOSTICO22	object	12860	99.100000
DIAGNOSTICO23	object	12884	99.300000
DIAGNOSTICO24	object	12908	99.500000
DIAGNOSTICO25	object	12922	99.600000
DIAGNOSTICO26	object	12933	99.700000
DIAGNOSTICO27	object	12943	99.700000
DIAGNOSTICO28	object	12953	99.800000
DIAGNOSTICO29	object	12959	99.900000
DIAGNOSTICO3	object	1604	12.400000
DIAGNOSTICO30	object	12962	99.900000
DIAGNOSTICO31	object	12963	99.900000
DIAGNOSTICO32	object	12965	99.900000
DIAGNOSTICO33	object	12969	99.900000
DIAGNOSTICO34	object	12973	100.000000
DIAGNOSTICO35	object	12976	100.000000
DIAGNOSTICO4	object	2810	21.700000
DIAGNOSTICO5	object	4112	31.700000
DIAGNOSTICO6	object	5509	42.500000
DIAGNOSTICO7	object	6851	52.800000
DIAGNOSTICO8	object	8031	61.900000
DIAGNOSTICO9	object	9028	69.600000
FECHAALTA	object	0	0.000000
FECHAINTERV1	object	7798	60.100000
FECHAPROCEDIMIENTO1	object	12976	100.000000
FECHATRASLADO1	object	11436	88.100000
FECHATRASLADO2	object	12232	94.300000
FECHATRASLADO3	object	12779	98.500000
FECHATRASLADO4	object	12901	99.400000
FECHATRASLADO5	object	12955	99.800000
FECHATRASLADO6	object	12967	99.900000
FECHATRASLADO7	object	12971	100.000000
FECHATRASLADO8	object	12975	100.000000
FECHATRASLADO9	object	12976	100.000000
FECHA_INGRESO	object	0	0.000000
FECHA_NACIMIENTO	object	0	0.000000
IR_29301_COD_GRD	object	0	0.000000
IR_29301_MORTALIDAD	object	0	0.000000
IR_29301_PESO	object	0	0.000000
IR_29301_SEVERIDAD	object	0	0.000000
SERVICIOALTA	object	0	0.000000
SERVICIOINGRESO	object	0	0.000000
SERVICIOTRASLADO1	object	11436	88.100000
SERVICIOTRASLADO2	object	12232	94.300000
SERVICIOTRASLADO3	object	12779	98.500000
SERVICIOTRASLADO4	object	12901	99.400000
SERVICIOTRASLADO5	object	12955	99.800000
SERVICIOTRASLADO6	object	12967	99.900000
SERVICIOTRASLADO7	object	12971	100.000000
SERVICIOTRASLADO8	object	12975	100.000000
SERVICIOTRASLADO9	object	12976	100.000000
SERVICIO_SALUD	object	0	0.000000
SEXO	object	0	0.000000
SEXORN1	object	12950	99.800000
SEXORN2	object	12974	100.000000
SEXORN3	object	12976	100.000000
SEXORN4	object	12976	100.000000
TIPOALTA	object	0	0.000000

Table 1: Schema of original ovarian cancer data 2019-2024

6.2 Unsupervised Analysis of Comorbidities in Ovarian Cancer Patients

Ovarian cancer patients were identified by the presence of ICD-10 code C56 in any diagnostic position. For each patient, non-“C*” codes were extracted from DIAGNOSTICO2-DIAGNOSTICO35, normalized to the ICD-10 trigram. A binary patient-diagnosis matrix was constructed and, from it, the co-occurrence matrix and Jaccard similarity between frequent codes (threshold ≥100 occurrences, lowered due to smaller ovarian cancer sample size), which served as input for hierarchical clustering.

Code

patient_diagnoses, icd_counts = load_raw_ovarian_cancer_data()

# Note: min_frequency lowered to 100 for ovarian cancer subset
similarity_matrix, code_labels, binary_matrix = create_cooccurrence_matrix(
    patient_diagnoses, icd_counts, min_frequency=100
)

# Distances and linkage
distance_matrix = 1 - similarity_matrix
np.fill_diagonal(distance_matrix, 0)
distance_matrix = (distance_matrix + distance_matrix.T) / 2
distance_matrix[distance_matrix < 0] = 0
condensed_dist = squareform(distance_matrix)

linkage_matrix = linkage(condensed_dist, method='ward')

6.2.1 Partition Quality and Number of Clusters

The sweep of \(k\in[5,20]\) suggests solutions in the 10-12 cluster zone for balance between compaction and separation. The silhouette coefficient is slightly negative across the range (-0.028 to -0.090), reflecting overlap typical of multimorbidity; the Calinski-Harabasz index is stable \((\approx 0.9-1.16)\), while Davies-Bouldin decreases monotonically \((\approx 5.16 \rightarrow 3.77)\), and intra-cluster inertia falls from 203 to 132 units, with a soft “elbow” break around \(k\approx 10-12\).

Code

# Calculate metrics
cluster_range = range(5, 21)
metrics = {
    'n_clusters': [],
    'inertia': [],
    'silhouette': [],
    'calinski_harabasz': [],
    'davies_bouldin': []
}

feature_matrix = similarity_matrix

for n in cluster_range:
    clusters_temp = fcluster(linkage_matrix, n, criterion='maxclust')
    
    # Inertia (weighted average intra-cluster distance)
    inertia = 0
    for i in range(1, n+1):
        mask = clusters_temp == i
        if np.sum(mask) > 1:
            cluster_dist = distance_matrix[mask][:, mask]
            inertia += np.sum(cluster_dist) / (2 * np.sum(mask))
    
    # Silhouette on distance matrix
    sil = silhouette_score(distance_matrix, clusters_temp, metric='precomputed') if len(np.unique(clusters_temp)) > 1 else 0
    
    ch = calinski_harabasz_score(feature_matrix, clusters_temp)
    db = davies_bouldin_score(feature_matrix, clusters_temp)
    
    metrics['n_clusters'].append(n)
    metrics['inertia'].append(inertia)
    metrics['silhouette'].append(sil)
    metrics['calinski_harabasz'].append(ch)
    metrics['davies_bouldin'].append(db)

df_metrics = pd.DataFrame(metrics).round(3)
metrics_table = df_metrics.rename(columns={
    'n_clusters': 'No. of clusters',
    'inertia': 'Inertia',
    'silhouette': 'Silhouette',
    'calinski_harabasz': 'Calinski-Harabasz',
    'davies_bouldin': 'Davies-Bouldin'
})

save_table(metrics_table, 'eval_metrics')
# Display DataFrame
metrics_table.style.hide(axis="index")

No. of clusters	Inertia	Silhouette	Calinski-Harabasz	Davies-Bouldin
5	40.182000	0.090000	1.266000	4.023000
6	37.874000	0.068000	1.134000	3.854000
7	35.580000	0.087000	1.103000	3.740000
8	34.132000	0.074000	1.048000	3.611000
9	32.534000	0.094000	0.945000	3.600000
10	31.130000	0.097000	0.945000	3.391000
11	29.621000	0.087000	1.009000	3.304000
12	28.194000	0.107000	1.010000	3.285000
13	26.674000	0.140000	0.970000	3.185000
14	25.292000	0.148000	0.910000	3.118000
15	24.102000	0.165000	0.892000	3.105000
16	22.986000	0.184000	1.044000	3.082000
17	21.932000	0.190000	0.982000	3.009000
18	21.087000	0.197000	0.966000	2.933000
19	20.012000	0.223000	0.945000	2.876000
20	19.016000	0.246000	0.890000	2.841000

Table 2: Quality metrics by number of clusters

Code

# Curves with differentiated colors
distances_at_k = []
for k in cluster_range:
    last_merges = linkage_matrix[-(k-1):, 2] if k > 1 else linkage_matrix[-1:, 2]
    distances_at_k.append(np.mean(last_merges))
distance_changes = np.diff(distances_at_k)

colors = {
    'elbow': '#1f77b4',      # blue
    'delta': '#ff7f0e',       # orange
    'inertia': '#2ca02c',     # green
    'sil': '#d62728',         # red
    'ch': '#9467bd',          # purple
    'db': '#8c564b'           # brown
}

fig, axes = plt.subplots(2, 3, figsize=(18, 10))

# 1) Elbow method
ax1 = axes[0, 0]
ax1.plot(list(cluster_range), distances_at_k, 'o-', linewidth=2, markersize=6, color=colors['elbow'])
ax1.set_xlabel('Number of clusters')
ax1.set_ylabel('Distance (average of merges)')
ax1.set_title('Elbow method (distances)')
ax1.grid(alpha=0.3)
for n in [8, 10, 12, 14]:
    ax1.axvline(x=n, color='gray', linestyle='--', alpha=0.25)

# 2) Distance change
ax2 = axes[0, 1]
ax2.bar(list(cluster_range)[1:], distance_changes, alpha=0.9, color=colors['delta'])
ax2.set_xlabel('Number of clusters')
ax2.set_ylabel('Δ Distance')
ax2.set_title('Rate of distance change')
ax2.grid(alpha=0.3, axis='y')
ax2.axvline(x=12, color='red', linestyle='--', alpha=0.6)

# 3) Inertia
ax3 = axes[0, 2]
ax3.plot(df_metrics['n_clusters'], df_metrics['inertia'], 's-', linewidth=2, markersize=6, color=colors['inertia'])
ax3.set_xlabel('Number of clusters')
ax3.set_ylabel('Inertia')
ax3.set_title('Intra-cluster inertia')
ax3.grid(alpha=0.3)

# 4) Silhouette
ax4 = axes[1, 0]
ax4.plot(df_metrics['n_clusters'], df_metrics['silhouette'], '^-', linewidth=2, markersize=6, color=colors['sil'])
ax4.set_xlabel('Number of clusters')
ax4.set_ylabel('Silhouette')
ax4.set_title('Silhouette coefficient')
ax4.grid(alpha=0.3)

# 5) Calinski-Harabasz
ax5 = axes[1, 1]
ax5.plot(df_metrics['n_clusters'], df_metrics['calinski_harabasz'], 'd-', linewidth=2, markersize=6, color=colors['ch'])
ax5.set_xlabel('Number of clusters')
ax5.set_ylabel('Calinski-Harabasz')
ax5.set_title('Calinski-Harabasz index')
ax5.grid(alpha=0.3)

# 6) Davies-Bouldin
ax6 = axes[1, 2]
ax6.plot(df_metrics['n_clusters'], df_metrics['davies_bouldin'], 'o-', linewidth=2, markersize=6, color=colors['db'])
ax6.set_xlabel('Number of clusters')
ax6.set_ylabel('Davies-Bouldin')
ax6.set_title('Davies-Bouldin index (↓ better)')
ax6.grid(alpha=0.3)

plt.tight_layout()
save_figure('fig_eval_plots')
plt.show()
plt.close()

Figure 1: Cluster number evaluation: methods and metrics

6.2.2 Hierarchical Structure and Selected Cuts

Log-scaled dendrograms show very short early fusions and a few jumps at greater height. Cuts at 8, 10, 12, and 14 generate interpretable partitions with relatively balanced sizes. The 12-cluster cut groups codes with clinical coherence specific to ovarian cancer patients.

Code

cluster_configs = [8, 10, 12, 14]
all_cluster_results = {}

nrows, ncols = len(cluster_configs), 1
fig, axs = plt.subplots(nrows, ncols, figsize=(9, 3*nrows))
if nrows == 1:
    axs = [axs]

for j, n_clusters in enumerate(cluster_configs):
    cut_height = np.percentile(linkage_matrix[:, 2], 100 - 100*n_clusters/len(code_labels))
    linkage_log = linkage_matrix.copy()
    linkage_log[:, 2] = np.log1p(linkage_matrix[:, 2])
    cut_height_log = np.log1p(cut_height)
    
    dendrogram(linkage_log, ax=axs[j], orientation='top', truncate_mode='lastp',
               p=25, leaf_font_size=9, show_contracted=True,
               color_threshold=cut_height_log, above_threshold_color='gray')
    
    axs[j].set_title(f'{n_clusters} clusters (log dist)')
    axs[j].set_xlabel('ICD code clusters')
    axs[j].set_ylabel('Distance log(1 + d)')
    axs[j].axhline(y=cut_height_log, color='red', linestyle='--', linewidth=1.5, alpha=0.8)
    axs[j].grid(alpha=0.25, axis='y')
    
    # Save results for tables
    clusters = fcluster(linkage_matrix, n_clusters, criterion='maxclust')
    cluster_dict = {}
    for code, cid in zip(code_labels, clusters):
        cluster_dict.setdefault(cid, []).append(code)
    
    cluster_data = []
    for cid, codes in sorted(cluster_dict.items()):
        total_freq = sum(icd_counts[c] for c in codes)
        avg_freq = int(total_freq / len(codes)) if len(codes) else 0
        top_codes = sorted(codes, key=lambda x: icd_counts[x], reverse=True)[:10]
        cluster_data.append({
            'ID': cid,
            'Size': len(codes),
            'Total frequency': total_freq,
            'Average frequency': avg_freq,
            '10 most frequent codes': ', '.join(top_codes)
        })
    
    df_clusters = pd.DataFrame(cluster_data)
    all_cluster_results[n_clusters] = {
        'cluster_dict': cluster_dict,
        'df_summary': df_clusters,
        'cut_height': cut_height
    }

plt.tight_layout()
save_figure('fig_detail_configs')
plt.show()
plt.close()

Figure 2: Dendrograms (log scale) for selected configurations

Code

# Table for 12-cluster configuration
save_table(all_cluster_results[12]['df_summary'], 'cluster_tables_12')
all_cluster_results[12]['df_summary'].style.hide(axis="index")

ID	Size	Total frequency	Average frequency	10 most frequent codes
1	5	9376	1875	Z92, F32, K56, Z97, E89
2	5	8978	1795	I10, E11, D64, J45, D25
3	9	3698	410	E66, F17, Y83, Z98, Z96, E83, D50, M06, Y43
4	9	6047	671	Z90, B96, R57, Z95, I25, J44, I48, B95, Y95
5	6	801	133	D69, E88, A41, K42, F41, Z03
6	10	4955	495	E03, N17, J90, I26, I80, N39, K44, E16, K65, M17
7	10	2067	206	T81, J98, K57, H91, R11, R10, K59, G62, H40, Z71
8	15	5405	360	E78, Z80, M19, Z85, D63, K66, Z53, Z72, M79, N80
9	4	2708	677	E87, Z51, N18, J96
10	3	2207	735	R18, K76, N73
11	8	2563	320	Z88, N13, Z93, N83, R59, E86, N85, K21
12	15	4005	267	Z87, Z86, K29, K80, I50, R73, E46, Z91, L89, G40

Table 3: Cluster summary for 12 groups (selected configuration)

6.2.3 Cluster Sizes and Stability

When comparing configurations, \(k=12\) offers a median size suitable for clinical interpretation, with clusters specific to ovarian cancer complications and treatment-related morbidities. This scale facilitates transforming memberships to binary comorbidity indicators at the patient level.

Code

# Comparison of cluster properties
comparison_data = []
for n_clusters in [8, 10, 12, 14]:
    res = all_cluster_results[n_clusters]
    sizes = [len(codes) for codes in res['cluster_dict'].values()]
    comparison_data.append({
        'No. of clusters': n_clusters,
        'Maximum size': max(sizes),
        'Minimum size': min(sizes),
        'Median size': int(np.median(sizes)),
        'Size std. dev.': round(np.std(sizes), 1)
    })

df_comparison = pd.DataFrame(comparison_data).sort_values('No. of clusters')
save_table(df_comparison, 'comparison')
df_comparison.style.hide(axis="index")

No. of clusters	Maximum size	Minimum size	Median size	Size std. dev.
8	25	5	9	7.400000
10	25	3	8	6.400000
12	15	3	8	3.800000
14	15	3	6	3.200000

Table 4: Comparison of cluster properties by configuration

7 Ovarian Cancer Analysis

7.1 Exploratory Analysis

7.1.1 Description of the 12 Comorbidity Clusters

Unsupervised clustering of ICD‑10 codes yielded 12 clinically coherent comorbidity clusters with descriptive labels (e.g., Post‑Procedural/Depression, Diabetes‑Hypertension/Anemia, Cardiac/Respiratory, Thrombosis/Effusion/Thyroid, Chemo‑Renal/Respiratory Failure, Malnutrition/Heart Failure; see Table 5; Table 3). The most prevalent clusters in the cohort were:

Post‑Procedural/Depression (48.8%, 5 codes; top codes include Z92, F32, K56; high average frequency),
Diabetes‑Hypertension/Anemia (48.0%, 5 codes; I10, E11, D64),
Cardiac/Respiratory (32.0%, 9 codes; e.g., I25, J44, I48),
Anemia‑Lipids/Cancer History (30.6%, 15 codes),
Thrombosis/Effusion/Thyroid (28.6%, 10 codes) (see Table 5; Table 3).

Cluster validity indices indicated overlapping profiles (Silhouette 0.068–0.094, Davies–Bouldin 3.60–4.02, Calinski–Harabasz ~0.95–1.27), consistent with multimorbidity in oncology populations (see Table 2). Still, cluster size dispersion and dominant code frequencies supported interpretability at k = 12 (see Table 4).

Although separation is modest (as expected in multimorbidity), the 12‑cluster solution provides clinically meaningful groupings that explain large portions of the comorbidity burden (see Figure 9).

Code

# Calculate ACTUAL prevalence from the data
cluster_prevalences = []
for col in comorbidity_columns:
    if col in data.columns:
        prevalence = data[col].mean() * 100
        cluster_prevalences.append(prevalence)
    else:
        cluster_prevalences.append(0)

# Create descriptive table with ACTUAL calculated prevalences
cluster_descriptions = {
    'Cluster': [
        'Post-Procedural\nDepression',
        'Diabetes HTN\nAnemia',
        'Obesity\nMetabolic',
        'Cardiac\nRespiratory',
        'Sepsis\nHemorrhagic',
        'Thrombosis\nEffusion',
        'Neuropathy\nGI Sensory',
        'Anemia Lipids\nCancer History',
        'Chemo Renal\nRespiratory',
        'Liver Pelvic\nAscites',
        'Volume Depletion\nOvarian',
        'Malnutrition\nHeart Failure'
    ],
    'Top 3 Conditions': [
        'Post-procedural disorders, Depression, Bowel obstruction',
        'Type 2 diabetes, Hypertension, Anemias',
        'Obesity, Iron deficiency, Tobacco use',
        'Heart disease, COPD, Atrial fibrillation',
        'Septicemia, Hemorrhagic disorders, Anxiety',
        'Thrombosis, Pleural effusion, Hypothyroidism',
        'Neuropathy, Glaucoma, GI disorders',
        'Anemia, Lipid disorders, Cancer history',
        'Chemotherapy, Renal failure, Respiratory failure',
        'Liver disease, Pelvic inflammation, Ascites',
        'Volume depletion, Ovarian dysfunction, Hydronephrosis',
        'Malnutrition, Heart failure, Alcohol use'
    ],
    'N Codes': [5, 5, 9, 9, 6, 10, 10, 15, 4, 3, 8, 16],
    'Prevalence (%)': cluster_prevalences
}

df_clusters_desc = pd.DataFrame(cluster_descriptions)
save_table(df_clusters_desc, 'clusters_description')

# Format and display the table
styled_table = df_clusters_desc.style.format({
    'Prevalence (%)': '{:.1f}',
    'N Codes': '{:.0f}'
}).set_properties(**{
    'text-align': 'left',
    'font-size': '11px'
}).hide(axis="index")

styled_table

Cluster	Top 3 Conditions	N Codes	Prevalence (%)
Post-Procedural Depression	Post-procedural disorders, Depression, Bowel obstruction	5	48.8
Diabetes HTN Anemia	Type 2 diabetes, Hypertension, Anemias	5	48.0
Obesity Metabolic	Obesity, Iron deficiency, Tobacco use	9	25.0
Cardiac Respiratory	Heart disease, COPD, Atrial fibrillation	9	32.0
Sepsis Hemorrhagic	Septicemia, Hemorrhagic disorders, Anxiety	6	5.5
Thrombosis Effusion	Thrombosis, Pleural effusion, Hypothyroidism	10	28.6
Neuropathy GI Sensory	Neuropathy, Glaucoma, GI disorders	10	12.4
Anemia Lipids Cancer History	Anemia, Lipid disorders, Cancer history	15	30.6
Chemo Renal Respiratory	Chemotherapy, Renal failure, Respiratory failure	4	14.0
Liver Pelvic Ascites	Liver disease, Pelvic inflammation, Ascites	3	15.4
Volume Depletion Ovarian	Volume depletion, Ovarian dysfunction, Hydronephrosis	8	16.5
Malnutrition Heart Failure	Malnutrition, Heart failure, Alcohol use	16	24.1

Table 5: Description of the 12 identified comorbidity clusters

7.1.2 Global Characterization of the Ovarian Cancer Cohort 2019-2024

Across 2019–2024, the analytic cohort comprised 11,244 hospitalizations for ovarian cancer (mean age 57.8 ± 13.6 years; 100% women), with an overall in‑hospital mortality rate of 4.06% (class imbalance 1:23 deceased:survivor). The training/validation split preserved the event rate (4.07% vs. 4.06%, respectively), minimizing distribution shift (see Table 6; Table 8; Figure 6). Feature availability was heterogeneous: several primary fields (e.g., hospital code, the first diagnosis field) had 0% missingness, whereas a long tail of transfer/late diagnosis fields exhibited very high sparsity (e.g., DIAGNOSTICO10–12 with 76.6–86.2% missingness; many “TRASLADO” fields at 100%), a constraint explicitly considered in model design and interpretation (see Table 1).

Code

summary_stats = {
    'Metric': ['Total Patients', 'Years Covered', 'Cancer Types', 'Mean Age',
               '% Women', '% In-Hospital Mortality', 'Mean Length of Stay'],
    'Value': [
        f"{len(data):,}",
        f"{data['year_factor'].nunique()} ({data['year_factor'].min()}-{data['year_factor'].max()})",
        data['cancer_type'].nunique(),
        f"{data['age'].mean():.1f} (SD: {data['age'].std():.1f})",
        f"{(data['sex']=='Female').mean()*100:.1f}%",
        f"{data['in_hospital_death'].mean()*100:.2f}%",
        f"{data['length_of_stay'].mean():.1f} days"
    ]
}

df_summary = pd.DataFrame(summary_stats)
save_table(df_summary, 'general_summary')
df_summary.style.hide(axis="index")

Metric	Value
Total Patients	11,244
Years Covered	6 (2019-2024)
Cancer Types	1
Mean Age	57.8 (SD: 13.6)
% Women	100.0%
% In-Hospital Mortality	4.06%
Mean Length of Stay	4.9 days

Table 6: General summary of ovarian cancer comorbidities dataset

7.1.3 Annual Dynamics

Annual volumes increased from 2,028 (2019) to 2,192 (2024), following a pandemic‑era dip in 2020 (n = 1,557). Mean age rose modestly (+0.80 years, 57.51→58.31), and mean length‑of‑stay (LOS) declined overall (−0.24 days, 5.04→4.80), with the shortest LOS observed in 2023 (4.51 days). Despite an upward shift in clinical severity (Section 3), the crude in‑hospital mortality rate remained relatively stable, ranging 3.0–4.4% and lowest in 2022 (see Figure 3). System‑level improvements (e.g., care pathways, earlier discharge) may have partially offset increasing case complexity, maintaining a stable crude mortality rate (see Figure 8).

Code

# Calculate annual statistics
yearly_stats = data.groupby('year_factor').agg({
    'in_hospital_death': ['count', 'mean'],
    'age': 'mean',
    'length_of_stay': 'mean',
    'sex': lambda x: (x=='Female').mean()
}).round(3)

yearly_stats.columns = ['N Patients', 'Mortality Rate', 'Mean Age', 'Mean Stay', 'Proportion Women']
yearly_stats['Mortality Rate'] = (yearly_stats['Mortality Rate'] * 100).round(2)
yearly_stats['Proportion Women'] = (yearly_stats['Proportion Women'] * 100).round(1)
yearly_stats.index.name = 'Year'
save_table(yearly_stats, 'yearly_stats')

Code

fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Trend 1: Patient volume
ax1 = axes[0, 0]
yearly_stats['N Patients'].plot(kind='bar', ax=ax1, color='steelblue')
ax1.set_title('Patient Volume by Year', fontsize=12, fontweight='bold')
ax1.set_xlabel('Year')
ax1.set_ylabel('Number of Patients')
ax1.tick_params(axis='x', rotation=0)

# Trend 2: Mortality rate
ax2 = axes[0, 1]
yearly_stats['Mortality Rate'].plot(marker='o', linewidth=2, markersize=8, ax=ax2, color='darkred')
ax2.set_title('Hospital Mortality Rate Trend', fontsize=12, fontweight='bold')
ax2.set_xlabel('Year')
ax2.set_ylabel('Mortality Rate (%)')
ax2.grid(True, alpha=0.3)

# Trend 3: Mean age
ax3 = axes[1, 0]
yearly_stats['Mean Age'].plot(marker='s', linewidth=2, markersize=8, ax=ax3, color='green')
ax3.set_title('Mean Patient Age Trend', fontsize=12, fontweight='bold')
ax3.set_xlabel('Year')
ax3.set_ylabel('Mean Age (years)')
ax3.grid(True, alpha=0.3)

# Trend 4: Hospital stay
ax4 = axes[1, 1]
yearly_stats['Mean Stay'].plot(marker='^', linewidth=2, markersize=8, ax=ax4, color='purple')
ax4.set_title('Mean Length of Stay Trend', fontsize=12, fontweight='bold')
ax4.set_xlabel('Year')
ax4.set_ylabel('Mean Stay (days)')
ax4.grid(True, alpha=0.3)

plt.tight_layout()
save_figure('fig_temporal_trends')
plt.show()
plt.close()

Figure 3: Temporal trends of key metrics (2019-2024)

7.1.4 Annual Prevalence of Comorbidity Clusters

The table reports the percentage of patients with at least one code belonging to each cluster per year. A sustained increase in most clusters is observed between 2019 and 2024, consistent with greater complexity/registration. High and persistent levels stand out in Hypertension-Cardiovascular and Endocrine post-treatment, while Renal-Cardiac Failure shows a clinically plausible upward evolution (see Figure 4).

Code

# Calculate comorbidity prevalence by year
comorb_by_year = pd.DataFrame()
for year in data['year_factor'].unique():
    year_data = data[data['year_factor'] == year]
    prevalences = year_data[comorbidity_columns].mean() * 100
    comorb_by_year[year] = prevalences

comorb_by_year = comorb_by_year.T
comorb_by_year.index.name = 'Year'

# Rename columns for better readability (without underscores)
comorb_labels = {
    'post_procedural_depression': 'Post-Procedural Depression',
    'diabetes_hypertension_anemia': 'Diabetes HTN Anemia',
    'obesity_metabolic_behavioral': 'Obesity Metabolic Behavioral',
    'cardiac_respiratory_infections': 'Cardiac Respiratory Infections',
    'sepsis_hemorrhagic_anxiety': 'Sepsis Hemorrhagic Anxiety',
    'thrombosis_effusion_thyroid': 'Thrombosis Effusion Thyroid',
    'neuropathy_gi_sensory': 'Neuropathy GI Sensory',
    'anemia_lipids_cancer_history': 'Anemia Lipids Cancer History',
    'chemo_renal_respiratory_failure': 'Chemo Renal Respiratory Failure',
    'liver_pelvic_ascites': 'Liver Pelvic Ascites',
    'volume_depletion_ovarian': 'Volume Depletion Ovarian',
    'malnutrition_heart_failure': 'Malnutrition Heart Failure'
}

comorb_display = comorb_by_year.rename(columns=comorb_labels)
save_table(comorb_display, 'comorb_by_year')

Code

fig, axes = plt.subplots(3, 4, figsize=(18, 12))
axes = axes.flatten()

for idx, col in enumerate(comorbidity_columns):
    ax = axes[idx]
    values = comorb_by_year[col]
    ax.plot(values.index, values.values, marker='o', linewidth=2, markersize=8)
    
    # Translated title (without underscores)
    title = comorb_labels.get(col, col)
    ax.set_title(title, fontsize=10, fontweight='bold')
    ax.set_ylabel('Prevalence (%)')
    ax.grid(True, alpha=0.3)
    ax.tick_params(axis='x', rotation=45)

    # Trend line
    z = np.polyfit(range(len(values)), values.values, 1)
    p = np.poly1d(z)
    ax.plot(values.index, p(range(len(values))), "--", alpha=0.5, color='red')

plt.suptitle('Comorbidity Cluster Prevalence Trends in Ovarian Cancer (2019-2024)', 
             fontsize=14, fontweight='bold')
plt.tight_layout()
save_figure('fig_comorbidities_trends')
plt.show()
plt.close()

Figure 4: Comorbidity cluster prevalence trends (2019-2024)

7.1.5 Evolution of service utilization and clinical severity

Hospital service mix shifted markedly toward higher‑acuity care (see Figure 5). ICU assignments increased ~22× (from 0.39% in 2019 to 8.85% in 2024), and ITU rose ~2.7× (0.94%→2.51%), while “Other” services decreased (86.64%→77.69%). In parallel, the proportion of High severity admissions rose from 19.56% to 26.00% (+6.45 pp), Low severity declined (31.72%→23.05%, −8.67 pp), and Medium increased slightly (48.72%→50.94%, +2.23 pp) (see Figure 5; Figure 3).

Risk stratification mirrored these shifts: the share of patients in the High mortality‑risk stratum increased from 47.56% to 57.14% (+9.59 pp), with corresponding declines in Medium (−6.87 pp) and Low (−2.71 pp) risk (see Figure 3).

Code

# Calculations by year
service_year = pd.crosstab(data['year_factor'], data['service_category'], normalize='index') * 100
severity_year = pd.crosstab(data['year_factor'], data['severity'], normalize='index') * 100
mortality_risk_year = pd.crosstab(data['year_factor'], data['mortality_risk'], normalize='index') * 100

save_table(service_year, 'service_year')
save_table(severity_year, 'severity_year')
save_table(mortality_risk_year, 'mortality_risk_year')

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

# Service categories
ax1 = axes[0]
service_year.plot(kind='bar', stacked=True, ax=ax1, colormap='Set3')
ax1.set_title('Service Category Distribution by Year', fontsize=12, fontweight='bold')
ax1.set_xlabel('Year')
ax1.set_ylabel('Percentage (%)')
ax1.legend(title='Service', bbox_to_anchor=(1.05, 1), loc='upper left')
ax1.tick_params(axis='x', rotation=0)

# Severity
ax2 = axes[1]
severity_year.plot(kind='bar', stacked=True, ax=ax2, colormap='RdYlGn_r')
ax2.set_title('Severity Distribution by Year', fontsize=12, fontweight='bold')
ax2.set_xlabel('Year')
ax2.set_ylabel('Percentage (%)')
ax2.legend(title='Severity', bbox_to_anchor=(1.05, 1), loc='upper left')
ax2.tick_params(axis='x', rotation=0)

# Mortality risk
ax3 = axes[2]
mortality_risk_year.plot(kind='bar', stacked=True, ax=ax3, colormap='RdYlGn_r')
ax3.set_title('Mortality Risk Distribution by Year', fontsize=12, fontweight='bold')
ax3.set_xlabel('Year')
ax3.set_ylabel('Percentage (%)')
ax3.legend(title='Risk', bbox_to_anchor=(1.05, 1), loc='upper left')
ax3.tick_params(axis='x', rotation=0)

plt.tight_layout()
save_figure('fig_services_severity')
plt.show()
plt.close()

Figure 5: Distribution of hospital services, severity and mortality risk by year

Code

fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Age distribution by year (violin plots)
ax1 = axes[0, 0]
data_subset = data.sample(min(10000, len(data)))
sns.violinplot(data=data_subset, x='year_factor', y='age', ax=ax1, palette='viridis')
ax1.set_title('Age Distribution by Year', fontsize=12, fontweight='bold')
ax1.set_xlabel('Year')
ax1.set_ylabel('Age')


# Hospital stay by year
ax4 = axes[0, 1]
data_los = data[data['length_of_stay'] <= 60]
sns.boxplot(data=data_los, x='year_factor', y='length_of_stay', ax=ax4, palette='Set2')
ax4.set_title('Hospital Stay Distribution by Year', fontsize=12, fontweight='bold')
ax4.set_xlabel('Year')
ax4.set_ylabel('Stay (days)')

# Mortality by age groups
ax5 = axes[1, 0]
data['age_group'] = pd.cut(data['age'], bins=[0, 40, 50, 60, 70, 80, 120],
                           labels=['<40', '40-49', '50-59', '60-69', '70-79', '80+'])
mortality_age = data.groupby(['year_factor', 'age_group'])['in_hospital_death'].mean() * 100
mortality_age_pivot = mortality_age.unstack()
mortality_age_pivot.plot(marker='o', ax=ax5)
ax5.set_title('Mortality Rate by Age Group and Year', fontsize=12, fontweight='bold')
ax5.set_xlabel('Year')
ax5.set_ylabel('Mortality Rate (%)')
ax5.legend(title='Age Group', bbox_to_anchor=(1.05, 1), loc='upper left')
ax5.grid(True, alpha=0.3)

# Comorbidity burden by year
ax6 = axes[1, 1]
data['total_comorbidities'] = data[comorbidity_columns].sum(axis=1)
comorb_burden = data.groupby('year_factor')['total_comorbidities'].mean()
comorb_burden.plot(kind='bar', ax=ax6, color='coral')
ax6.set_title('Mean Comorbidity Burden by Year', fontsize=12, fontweight='bold')
ax6.set_xlabel('Year')
ax6.set_ylabel('Mean Number of Comorbidity Clusters')
ax6.tick_params(axis='x', rotation=0)

plt.tight_layout()
save_figure('fig_complete_demographics')
plt.show()
plt.close()

Figure 6: Complete demographic analysis by year

7.1.6 Relative Risk by Comorbidity Cluster

Several clusters increased substantially from 2019→2024 (percentage‑point, pp, change among admissions; see Figure 4):

Post‑Procedural/Depression: +15.22 pp (41.03→56.25),
Cardiac/Respiratory: +13.43 pp (25.35→38.78),
Diabetes‑Hypertension/Anemia: +12.31 pp (39.74→52.05),
Anemia‑Lipids/Cancer History: +12.18 pp (25.59→37.77),
Volume Depletion/Ovarian: +11.36 pp (10.95→22.31),
Thrombosis/Effusion/Thyroid: +10.70 pp (24.11→34.81),
Obesity/Metabolic: +10.77 pp (19.43→30.20).

Other clusters rose more modestly (Neuropathy/GI Sensory +8.48 pp; Malnutrition/Heart Failure +8.61 pp; Liver‑Pelvic/Ascites +3.35 pp; Sepsis/Hemorrhagic/Anxiety +3.29 pp; Chemo‑Renal/Respiratory Failure +2.15 pp). The multimorbidity landscape is intensifying—particularly cardio‑metabolic, thrombo‑embolic, and post‑procedural sequelae—aligning with the rising severity mix (see Figure 4; Figure 5).

Year‑specific relative risks (RR) of in‑hospital death showed strong heterogeneity across clusters, with consistent high‑risk signal in a few domains (see Figure 7):

Chemo‑Renal/Respiratory Failure: mean RR 8.56, peaking at 15.89 in 2022 (range 3.81–15.89), the most lethal profile despite moderate prevalence (13.95%).
Thrombosis/Effusion/Thyroid: mean RR 5.04, max 9.52 (2022).
Sepsis/Hemorrhagic/Anxiety: mean RR 4.50, max 5.86 (2024).
Malnutrition/Heart Failure: mean RR 1.95, max 2.16 (2022).
Diabetes‑Hypertension/Anemia and Cardiac/Respiratory: moderate but persistent elevation (means ~1.89 and ~1.59, respectively).
Volume Depletion/Ovarian and Anemia‑Lipids/Cancer History: mild elevation on average (means ~1.21 and ~1.14), with year‑to‑year variability.

Acute organ dysfunction (renal/respiratory failure), thrombo‑embolic processes, and sepsis/hemorrhage drive the bulk of short‑term mortality risk, suggesting concrete targets for early detection and escalation pathways.

Code

# Calculate relative risk of mortality by comorbidity
comorb_mortality = pd.DataFrame()

for year in data['year_factor'].unique():
    year_data = data[data['year_factor'] == year]
    mort_rates = []
    for col in comorbidity_columns:
        with_comorb = year_data[year_data[col] == 1]['in_hospital_death'].mean()
        without_comorb = year_data[year_data[col] == 0]['in_hospital_death'].mean()
        relative_risk = (with_comorb / without_comorb) if without_comorb > 0 else 0
        mort_rates.append(relative_risk)
    comorb_mortality[year] = mort_rates

comorb_mortality.index = [comorb_labels[col] for col in comorbidity_columns]
comorb_mortality = comorb_mortality.T
save_table(comorb_mortality, 'comorb_mortality')

# Relative risk heatmap
plt.figure(figsize=(14, 8))
sns.heatmap(comorb_mortality.T, annot=True, fmt='.2f', cmap='RdYlGn_r', center=1,
            cbar_kws={'label': 'Relative Risk of Mortality'})
plt.title('Relative Risk of Mortality by Comorbidity Cluster and Year', fontsize=14, fontweight='bold')
plt.xlabel('Year')
plt.ylabel('Comorbidity Cluster')
plt.tight_layout()
save_figure('fig_relative_risk_mortality')
plt.show()
plt.close()

Figure 7: Relative risk of mortality by comorbidity cluster and year

Code

fig, axes = plt.subplots(2, 2, figsize=(16, 10))

# Finding 1: Comorbidity burden vs mortality
ax1 = axes[0, 0]
for year in data['year_factor'].unique():
    year_data = data[data['year_factor'] == year]
    comorb_groups = year_data.groupby('total_comorbidities')['in_hospital_death'].mean() * 100
    ax1.plot(comorb_groups.index, comorb_groups.values, marker='o', label=year)
ax1.set_title('Mortality by Comorbidity Burden', fontsize=12, fontweight='bold')
ax1.set_xlabel('Number of Comorbidity Clusters')
ax1.set_ylabel('Mortality Rate (%)')
ax1.legend(title='Year')
ax1.grid(True, alpha=0.3)

# Finding 2: Outcomes by service category
ax2 = axes[0, 1]
service_mortality = data.groupby(['year_factor', 'service_category'])['in_hospital_death'].mean() * 100
service_mortality.unstack().plot(kind='bar', ax=ax2)
ax2.set_title('Mortality by Service Category', fontsize=12, fontweight='bold')
ax2.set_xlabel('Year')
ax2.set_ylabel('Mortality Rate (%)')
ax2.legend(title='Service')
ax2.tick_params(axis='x', rotation=0)

# Finding 3: Top 5 most prevalent comorbidities
ax3 = axes[1, 0]
overall_prevalence = data[comorbidity_columns].mean() * 100
overall_prevalence.index = [comorb_labels[col] for col in comorbidity_columns]
top_5_comorb = overall_prevalence.nlargest(5)
top_5_comorb.plot(kind='barh', ax=ax3, color='teal')
ax3.set_title('Top 5 Most Prevalent Comorbidity Clusters', fontsize=12, fontweight='bold')
ax3.set_xlabel('Prevalence (%)')

# Finding 4: Indexed trend comparison
ax4 = axes[1, 1]
yearly_metrics = data.groupby('year_factor').agg({
    'in_hospital_death': 'mean',
    'total_comorbidities': 'mean',
    'length_of_stay': 'mean'
})
yearly_metrics_scaled = yearly_metrics / yearly_metrics.iloc[0] * 100  # Index to first year
yearly_metrics_scaled.columns = ['Mortality', 'Comorbidity Burden', 'Length of Stay']
yearly_metrics_scaled.plot(ax=ax4, marker='o')
ax4.set_title('Indexed Trends (2019 = 100)', fontsize=12, fontweight='bold')
ax4.set_xlabel('Year')
ax4.set_ylabel('Index (2019 = 100)')
ax4.legend()
ax4.grid(True, alpha=0.3)

plt.tight_layout()
save_figure('fig_key_findings')
plt.show()
plt.close()

Figure 8: Dashboard of key analysis findings

7.2 Predictive modelling: baselines, imbalance strategies, and discrimination

The dataset was partitioned into training and validation sets with preserved outcome prevalence (see Table 7 and Table 8): Training: 7,870 episodes (deceased n=320, 4.07%). Validation: 3,374 episodes (deceased n=137, 4.06%). The overall class imbalance was approximately 1:23 (deceased : not‑deceased), as also reflected in the cohort summaries (see Table 6).

We evaluated logistic regression and random forests under multiple sampling strategies (see Table 9; Table 10; Figure 13; Figure 10; Figure 1; Figure 2):

Baselines (no balancing):
- Random Forest achieved the highest AUC (0.859), but very low recall (0.007) and F1 (0.014), reflecting a near‑always‑negative policy under extreme imbalance.
- Logistic Regression had AUC 0.846, recall 0.058, F1 0.101.

Purely accuracy/AUC‑optimizing models underperform clinically because they miss most deaths.

Imbalance‑aware approaches:
- Random Forest + ADASYN delivered the best F1 (0.276) at threshold 0.50, with recall 0.453 and precision 0.199 (AUC 0.839).
- RF + SMOTE+Tomek had F1 0.272 (recall 0.438, precision 0.197, AUC 0.839).
- RF + Random Over‑Sampling offered the best recall–AUC trade‑off (recall 0.540, F1 0.266, AUC 0.857).
- RF + Random Under‑Sampling preserved AUC (0.859) with improved recall (0.781) but lower precision (0.149) and F1 (0.250).
- NearMiss prioritized recall (up to 0.993) but collapsed on accuracy (0.187) and precision (0.047), rendering it impractical.

Among tested options, RF with ADASYN/SMOTE+Tomek maximized F1; RF with ROS/RUS maintained strong AUC while achieving clinically meaningful recall (see Table 11; Figure 13; Figure 10; Figure 1).

Code

# Preparar datos para modelado
X = data[comorbidity_columns + ['age', 'length_of_stay']]
y = data['in_hospital_death']

# Encode additional categorical variables if needed
categorical_features = ['sex','service_category','year_factor','severity','mortality_risk']
for cat in categorical_features:
    if cat in data.columns:
        dummies = pd.get_dummies(data[cat], prefix=cat, drop_first=True)
        X = pd.concat([X, dummies], axis=1)

# Show class distribution
class_distribution = pd.DataFrame({
    'Clase': ['Not deceased (0)', 'Deceased (1)'],
    'N': [sum(y==0), sum(y==1)],
    'Porcentaje': [sum(y==0)/len(y)*100, sum(y==1)/len(y)*100]
})

class_distribution['Ratio Desbalance'] = ['-', f"1:{int(sum(y==0)/sum(y==1))}"]
save_table(class_distribution, 'class_distribution')
class_distribution.style.format({'Porcentaje': '{:.2f}%'}).hide(axis="index")

Clase	N	Porcentaje	Ratio Desbalance
Not deceased (0)	10787	95.94%	-
Deceased (1)	457	4.06%	1:23

Table 7: Class distribution in the target variable

7.2.1 Operating points, threshold analysis, and error structure

Threshold sweeps (0.10–0.85) quantified the precision–recall trade‑off for the selected model (see Figure 14):

F1 peaked at 0.276 around threshold = 0.50 (precision 0.199, recall 0.453, accuracy 0.904).
Lowering the threshold to 0.25–0.30 boosts recall (0.69–0.76) at the cost of precision (0.12–0.14) and specificity, suitable for screening/early‑warning workflows.
Raising the threshold to 0.70–0.75 increases precision (0.261–0.270) but halves recall (0.175–0.226), appropriate for targeting scarce interventions.

The confusion matrix at the default cut (0.50) reveals the characteristic error profile under rarity: false positives outnumber true positives by _{4:1, while}55% of deaths remain undetected (given recall ~0.45), an acceptable compromise when the objective is early case‑finding rather than definitive triage (see Figure 11; Figure 14).

Thresholds should be tailored to the clinical action and capacity (e.g., nurse outreach vs. ICU bundles), not solely to the global F1 optimum.

Code

# Stratified data split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=123, stratify=y
)

# Normalizar variables continuas
scaler = StandardScaler()
continuous_cols = ['age', 'length_of_stay'] + comorbidity_columns
X_train_scaled = X_train.copy()
X_test_scaled = X_test.copy()
X_train_scaled[continuous_cols] = scaler.fit_transform(X_train[continuous_cols])
X_test_scaled[continuous_cols] = scaler.transform(X_test[continuous_cols])

# Split summary
split_summary = pd.DataFrame({
    'Set': ['Training', 'Validation'],
    'N Total': [len(X_train), len(X_test)],
    'N Deceased': [sum(y_train==1), sum(y_test==1)],
    '% Deceased': [sum(y_train==1)/len(y_train)*100, 
                      sum(y_test==1)/len(y_test)*100]
})

save_table(split_summary, 'train_test_split')
split_summary.style.format({'% Deceased': '{:.2f}%'}).hide(axis="index")

Set	N Total	N Deceased	% Deceased
Training	7870	320	4.07%
Validation	3374	137	4.06%

Table 8: Data split for training and validation

Code

# Definir modelos - SOLO Logistic Regression y Random Forest
# Note: To avoid issues on Windows, we do not use parallelization in Random Forest here
models = {
    'Logistic Regression': LogisticRegression(max_iter=1000, random_state=123),
    'Random Forest': RandomForestClassifier(n_estimators=100, max_depth=10, 
                                           random_state=123, n_jobs=1)
}

# Stratified cross-validation
cv_fold = StratifiedKFold(n_splits=5, shuffle=True, random_state=123)

# Function to evaluate models
def evaluate_model(model, X_tr, y_tr, X_te, y_te, cv):
    # Cross-validation
    cv_scores = {
        'accuracy': cross_val_score(model, X_tr, y_tr, cv=cv, 
                                   scoring='accuracy', n_jobs=1).mean(),
        'precision': cross_val_score(model, X_tr, y_tr, cv=cv, 
                                    scoring='precision', n_jobs=1).mean(),
        'recall': cross_val_score(model, X_tr, y_tr, cv=cv, 
                                scoring='recall', n_jobs=1).mean(),
        'f1': cross_val_score(model, X_tr, y_tr, cv=cv, 
                            scoring='f1', n_jobs=1).mean(),
        'roc_auc': cross_val_score(model, X_tr, y_tr, cv=cv, 
                                  scoring='roc_auc', n_jobs=1).mean()
    }
    
    # Training and prediction
    model.fit(X_tr, y_tr)
    y_pred = model.predict(X_te)
    y_proba = model.predict_proba(X_te)[:, 1]
    
    # Test metrics
    test_scores = {
        'accuracy': accuracy_score(y_te, y_pred),
        'precision': precision_score(y_te, y_pred, zero_division=0),
        'recall': recall_score(y_te, y_pred, zero_division=0),
        'f1': f1_score(y_te, y_pred, zero_division=0),
        'roc_auc': roc_auc_score(y_te, y_proba)
    }
    
    return cv_scores, test_scores, y_proba

# Evaluate baseline models SEQUENTIALLY (more stable on Windows)
results = []
for name, model in models.items():
    cv_scores, test_scores, y_proba = evaluate_model(
        model, X_train_scaled, y_train, X_test_scaled, y_test, cv_fold
    )
    
    results.append({
        'name': name,
        'cv_scores': cv_scores,
        'test_scores': test_scores,
        'y_proba': y_proba
    })

# Organizar resultados
baseline_results = pd.DataFrame()
baseline_probas = {}

for result in results:
    name = result['name']
    cv_scores = result['cv_scores']
    test_scores = result['test_scores']
    y_proba = result['y_proba']
    
    row = pd.Series({
        'CV_Accuracy': cv_scores['accuracy'],
        'CV_Precision': cv_scores['precision'],
        'CV_Recall': cv_scores['recall'],
        'CV_F1': cv_scores['f1'],
        'CV_AUC': cv_scores['roc_auc'],
        'Test_Accuracy': test_scores['accuracy'],
        'Test_Precision': test_scores['precision'],
        'Test_Recall': test_scores['recall'],
        'Test_F1': test_scores['f1'],
        'Test_AUC': test_scores['roc_auc']
    }, name=name)
    
    baseline_results = pd.concat([baseline_results, row.to_frame().T])
    baseline_probas[name] = y_proba

# baseline_results.style.format('{:.3f}').background_gradient(cmap='RdYlGn', axis=0, high=0.5)

Table 9: Baseline model performance without class balancing

Code

# Define balancing techniques
balancing_techniques = {
    'Without balancing': None,
    'Random OverSampling': RandomOverSampler(random_state=123),
    'SMOTE': SMOTE(random_state=123),
    'ADASYN': ADASYN(random_state=123),
    'Random UnderSampling': RandomUnderSampler(random_state=123),
    'NearMiss': NearMiss(),
    'SMOTE+Tomek': SMOTETomek(random_state=123)
}

# Function to evaluate with balancing
def evaluate_with_balancing(models, balancing_techniques, X_tr, y_tr, X_te, y_te):
    results = []
    
    for bal_name, balancer in balancing_techniques.items():
        for model_name, model in models.items():
            # Aplicar balanceo si existe
            if balancer is not None:
                X_resampled, y_resampled = balancer.fit_resample(X_tr, y_tr)
            else:
                X_resampled, y_resampled = X_tr, y_tr
            
            # Entrenar modelo
            model_copy = model.__class__(**model.get_params())
            model_copy.fit(X_resampled, y_resampled)
            
            # Predecir
            y_pred = model_copy.predict(X_te)
            y_proba = model_copy.predict_proba(X_te)[:, 1]
            
            # Compute metrics
            results.append({
                'Technique': bal_name,
                'Model': model_name,
                'Accuracy': accuracy_score(y_te, y_pred),
                'Precision': precision_score(y_te, y_pred, zero_division=0),
                'Recall': recall_score(y_te, y_pred, zero_division=0),
                'F1': f1_score(y_te, y_pred, zero_division=0),
                'AUC': roc_auc_score(y_te, y_proba)
            })
    
    return pd.DataFrame(results)

# Evaluar todas las combinaciones
all_results = evaluate_with_balancing(models, balancing_techniques, 
                                      X_train_scaled, y_train, 
                                      X_test_scaled, y_test)
save_table(all_results, 'all_results')

Table 10: Comparison of balancing techniques by model

Code

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patheffects as pe

metric_order    = ['Accuracy', 'Precision', 'Recall', 'F1', 'AUC']
technique_order = list(balancing_techniques.keys())

# Models available according to the first block (unchanged)
available_models = list(all_results['Model'].dropna().unique())
preferred = ["Logistic Regression", "Random Forest"]
models_to_plot = [m for m in preferred if m in available_models]
if len(models_to_plot) < 2:
    models_to_plot += [m for m in available_models if m not in models_to_plot]
models_to_plot = models_to_plot[:max(1, min(2, len(available_models)))]

def matrix_for_model(res_df, model_name):
    df = (res_df[res_df['Model'] == model_name]
            .set_index('Technique')[metric_order]
            .reindex(technique_order))
    return df.to_numpy(dtype=float)

# Matrices para cada panel
data_list = [matrix_for_model(all_results, m) for m in models_to_plot]

# Common scale in [0,1]
vmin, vmax = 0.0, 1.0

# Figura vertical (1 o 2 paneles)
n_panels = len(models_to_plot)
fig_w = 2 + 1.8*len(metric_order)
fig_h = 1.0 + 0.85*len(technique_order)*n_panels
fig, axes = plt.subplots(n_panels, 1, figsize=(fig_w, fig_h), constrained_layout=False)
if n_panels == 1:
    axes = [axes]

last_im = None
for ax, model_name, data in zip(axes, models_to_plot, data_list):
    mdata = np.ma.masked_invalid(data)
    last_im = ax.imshow(mdata, aspect='auto', vmin=vmin, vmax=vmax, cmap='viridis')

    ax.set_xticks(range(len(metric_order)))
    ax.set_xticklabels(metric_order, rotation=0)
    ax.set_yticks(range(len(technique_order)))
    ax.set_yticklabels(technique_order)
    ax.set_title(model_name)

    # Etiquetas blancas dentro de cada celda (con contorno para legibilidad)
    for i in range(data.shape[0]):
        for j in range(data.shape[1]):
            val = data[i, j]
            t = ax.text(
                j, i,
                f"{val:.3f}" if np.isfinite(val) else "—",
                ha='center', va='center',
                color='white', fontsize=9, fontweight='bold'
            )
            t.set_path_effects([pe.withStroke(linewidth=1.2, foreground='black')])

# Leyenda (colorbar) compartida, a la derecha
fig.subplots_adjust(top=0.90, right=0.88, hspace=0.25)
cbar = fig.colorbar(last_im, ax=axes, location='right', fraction=0.04, pad=0.02)
cbar.set_label("Metric value")
save_figure('fig_heatmaps_2')
plt.show()
plt.close()

Figure 9: Metrics by balancing technique: two vertically stacked heatmaps (viridis, white text, legend)

Code

# Function to compute ROC curve for a combination
def calculate_roc(bal_name, balancer, model_name, model, X_train, y_train, X_test, y_test):
    try:
        if balancer is not None:
            X_resampled, y_resampled = balancer.fit_resample(X_train, y_train)
        else:
            X_resampled, y_resampled = X_train, y_train
        
        model_copy = model.__class__(**model.get_params())
        model_copy.fit(X_resampled, y_resampled)
        
        y_score = model_copy.predict_proba(X_test)[:, 1]
        
        fpr, tpr, _ = roc_curve(y_test, y_score)
        roc_auc = auc(fpr, tpr)
        
        return {
            'technique': bal_name,
            'model': model_name,
            'fpr': fpr,
            'tpr': tpr,
            'auc': roc_auc
        }
    except Exception as e:
        return None

# Compute all ROC curves SEQUENTIALLY (more stable on Windows)
roc_results = []

for bal_name, balancer in balancing_techniques.items():
    for model_name, model in models.items():
        result = calculate_roc(
            bal_name, balancer, model_name, model,
            X_train_scaled, y_train, X_test_scaled, y_test
        )
        if result:
            roc_results.append(result)

# Graficar
fig, axes = plt.subplots(3, 3, figsize=(20, 12))
axes = axes.flatten()

for idx, bal_name in enumerate(balancing_techniques.keys()):
    ax = axes[idx]
    
    # Filter results for this technique
    technique_rocs = [r for r in roc_results if r and r['technique'] == bal_name]
    
    for roc_data in technique_rocs:
        ax.plot(roc_data['fpr'], roc_data['tpr'], 
                label=f"{roc_data['model']} (AUC = {roc_data['auc']:.3f})")
    
    ax.plot([0, 1], [0, 1], 'k--', alpha=0.3)
    ax.set_xlabel('Tasa de Falsos Positivos')
    ax.set_ylabel('Tasa de Verdaderos Positivos')
    ax.set_title(f'Curvas ROC - {bal_name}', fontweight='bold')
    ax.legend(loc='lower right', fontsize=8)
    ax.grid(True, alpha=0.3)

for idx in range(len(balancing_techniques), len(axes)):
    axes[idx].set_visible(False)

plt.suptitle('ROC curve comparison by balancing technique', 
             fontsize=16, fontweight='bold', y=1.02)
plt.tight_layout()
save_figure('fig_roc_curves')
plt.show()
plt.close()

Figure 10: Comparative ROC curves of models with different balancing techniques

Code

# Encontrar el mejor modelo basado en F1-Score
best_idx = all_results['F1'].idxmax()
best_config = all_results.loc[best_idx]

# Crear resumen del mejor modelo
best_summary = pd.DataFrame({
    'Metric': ['Balancing Technique', 'Model', 'Accuracy', 'Precision', 
                'Recall', 'F1-Score', 'AUC'],
    'Value': [best_config['Technique'], best_config['Model'], 
              f"{best_config['Accuracy']:.3f}", f"{best_config['Precision']:.3f}",
              f"{best_config['Recall']:.3f}", f"{best_config['F1']:.3f}",
              f"{best_config['AUC']:.3f}"]
})
# Top 5 configuraciones
top5 = all_results.nlargest(5, 'F1')[['Technique', 'Model', 'F1', 'AUC', 'Recall']]
save_table(top5, 'best_model_analysis')
display(top5.style.format({'F1': '{:.3f}', 'AUC': '{:.3f}', 'Recall': '{:.3f}'}).hide(axis="index"))

Technique	Model	F1	AUC	Recall
ADASYN	Random Forest	0.276	0.839	0.453
SMOTE+Tomek	Random Forest	0.272	0.839	0.438
Random OverSampling	Random Forest	0.266	0.857	0.540
SMOTE	Random Forest	0.258	0.839	0.409
Random UnderSampling	Random Forest	0.250	0.859	0.781

Table 11: Identification of the best model and balancing technique (Top 5)

7.2.2 Operating points, threshold analysis, and error structure

Threshold sweeps (0.10–0.85) quantified the precision–recall trade‑off for the selected model (see Figure 14):

F1 peaked at 0.276 around threshold = 0.50 (precision 0.199, recall 0.453, accuracy 0.904).
Lowering the threshold to 0.25–0.30 boosts recall (0.69–0.76) at the cost of precision (0.12–0.14) and specificity, suitable for screening/early‑warning workflows.
Raising the threshold to 0.70–0.75 increases precision (0.261–0.270) but halves recall (0.175–0.226), appropriate for targeting scarce interventions.

Thresholds should be tailored to the clinical action and capacity (e.g., nurse outreach vs. ICU bundles), not solely to the global F1 optimum.

Code

# Entrenar el mejor modelo
best_technique = best_config['Technique']
best_model_name = best_config['Model']

# Obtener el balanceador y modelo correspondientes
balancer = balancing_techniques[best_technique]
model = models[best_model_name]

# Aplicar balanceo
if balancer is not None:
    X_resampled, y_resampled = balancer.fit_resample(X_train_scaled, y_train)
else:
    X_resampled, y_resampled = X_train_scaled, y_train

# Entrenar
best_model = model.__class__(**model.get_params())
best_model.fit(X_resampled, y_resampled)

# Predicciones
y_pred_best = best_model.predict(X_test_scaled)

# Confusion matrix
cm = confusion_matrix(y_test, y_pred_best)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['Not deceased', 'Deceased'],
            yticklabels=['Not deceased', 'Deceased'])
plt.title(f'Confusion Matrix - {best_model_name} with {best_technique}', 
          fontweight='bold')
plt.ylabel('Real class')
plt.xlabel('Predicted class')

# Add metrics to the plot
textstr = f'Accuracy: {best_config["Accuracy"]:.3f}\n'
textstr += f'Precision: {best_config["Precision"]:.3f}\n'
textstr += f'Recall: {best_config["Recall"]:.3f}\n'
textstr += f'F1-Score: {best_config["F1"]:.3f}'

props = dict(boxstyle='round', facecolor='wheat', alpha=0.5)
plt.text(1.5, 0.5, textstr, transform=plt.gca().transAxes,
        fontsize=10, verticalalignment='top', bbox=props)

plt.tight_layout()
save_figure('fig_confusion_matrix')
plt.show()
plt.close()

Figure 11: Confusion matrix of the best model

7.2.3 Variable importance (model‑level signal)

Feature importance was dominated by markers of acuity and multimorbidity burden, followed by service assignment and age, consistent with Sections 3–6 (for the full ranked list, see Figure 12). This pattern corroborates that acute organ failure, thrombo‑embolic states, and intensive service use are principal contributors to short‑term mortality risk.

Upstream identification of patients entering these high‑risk states is the highest‑yield target for prevention and escalation.

Code

fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Get the best balancing technique to use in importance
best_balancer = balancing_techniques[best_technique] if 'best_technique' in locals() else None

# Para cada modelo, calcular importancia de variables
for idx, (model_name, model) in enumerate(models.items()):
    ax = axes[idx]
    
    # Train with the best balancing technique if available
    if best_balancer is not None:
        X_resampled, y_resampled = best_balancer.fit_resample(X_train_scaled, y_train)
    else:
        X_resampled, y_resampled = X_train_scaled, y_train
    
    model_copy = model.__class__(**model.get_params())
    model_copy.fit(X_resampled, y_resampled)
    
    # Get feature importance according to model type
    if model_name == 'Random Forest':
        importances = model_copy.feature_importances_
    elif model_name == 'Logistic Regression':
        importances = np.abs(model_copy.coef_[0])
    
    # Crear DataFrame de importancias
    feature_imp = pd.DataFrame({
        'feature': X.columns,
        'importance': importances
    }).sort_values('importance', ascending=False).head(15)
    
    # Graficar
    sns.barplot(data=feature_imp, y='feature', x='importance', ax=ax, palette='viridis')
    ax.set_title(f'Top 15 Variables - {model_name}', fontweight='bold')
    ax.set_xlabel('Importancia')
    ax.set_ylabel('')

plt.suptitle('Feature Importance by Model', fontsize=14, fontweight='bold')
plt.tight_layout()
save_figure('fig_feature_importance')
plt.show()
plt.close()

Figure 12: Importancia de variables en los modelos

Code

# Prepare data for visualization
metrics_to_plot = ['Accuracy', 'Precision', 'Recall', 'F1', 'AUC']
fig, axes = plt.subplots(1, 5, figsize=(18, 8))

for idx, metric in enumerate(metrics_to_plot):
    ax = axes[idx]
    
    # Create a boxplot for each metric
    data_to_plot = []
    labels = []
    
    for technique in balancing_techniques.keys():
        technique_data = all_results[all_results['Technique'] == technique][metric].values
        data_to_plot.append(technique_data)
        labels.append(technique[:10])  # Acortar nombres
    
    bp = ax.boxplot(data_to_plot, labels=labels, patch_artist=True)
    
    # Colorear cajas
    colors = plt.cm.Set3(np.linspace(0, 1, len(balancing_techniques)))
    for patch, color in zip(bp['boxes'], colors):
        patch.set_facecolor(color)
    
    ax.set_title(metric, fontweight='bold')
    ax.set_xlabel('Technique')
    ax.set_ylabel('Value')
    ax.tick_params(axis='x', rotation=45)
    ax.grid(True, alpha=0.3)

plt.suptitle('Metric distributions by balancing technique (all models)', 
             fontsize=14, fontweight='bold')
plt.tight_layout()
save_figure('fig_metrics_comparison')
plt.show()
plt.close()

Figure 13: Comparison of metrics by balancing technique

Code

# Obtener probabilidades del mejor modelo
y_proba_best = best_model.predict_proba(X_test_scaled)[:, 1]

# Evaluar diferentes umbrales
thresholds = np.arange(0.1, 0.9, 0.05)
threshold_metrics = []

for threshold in thresholds:
    y_pred_threshold = (y_proba_best >= threshold).astype(int)
    
    threshold_metrics.append({
        'Threshold': threshold,
        'Precision': precision_score(y_test, y_pred_threshold, zero_division=0),
        'Recall': recall_score(y_test, y_pred_threshold, zero_division=0),
        'F1': f1_score(y_test, y_pred_threshold, zero_division=0),
        'Accuracy': accuracy_score(y_test, y_pred_threshold)
    })

threshold_df = pd.DataFrame(threshold_metrics)
save_table(threshold_df, 'threshold_analysis')

# Graficar
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Plot 1: Precision vs Recall
ax1 = axes[0]
ax1.plot(threshold_df['Threshold'], threshold_df['Precision'], 
         marker='o', label='Precision', color='blue')
ax1.plot(threshold_df['Threshold'], threshold_df['Recall'], 
         marker='s', label='Recall', color='red')
ax1.plot(threshold_df['Threshold'], threshold_df['F1'], 
         marker='^', label='F1-Score', color='green')
ax1.set_xlabel('Decision Threshold')
ax1.set_ylabel('Metric Value')
ax1.set_title('Metrics vs Decision Threshold', fontweight='bold')
ax1.legend()
ax1.grid(True, alpha=0.3)

# Plot 2: Precision–Recall trade-off
ax2 = axes[1]
ax2.plot(threshold_df['Recall'], threshold_df['Precision'], 
         marker='o', color='purple')
ax2.set_xlabel('Recall')
ax2.set_ylabel('Precision')
ax2.set_title('Curva Precision-Recall', fontweight='bold')
ax2.grid(True, alpha=0.3)

# Mark optimal point (maximum F1)
optimal_idx = threshold_df['F1'].idxmax()
optimal_threshold = threshold_df.loc[optimal_idx, 'Threshold']
ax1.axvline(x=optimal_threshold, color='gray', linestyle='--', alpha=0.7)
ax1.text(optimal_threshold, 0.5, f'Optimal: {optimal_threshold:.2f}', 
         rotation=90, verticalalignment='center')

plt.suptitle(f'Threshold Analysis - {best_model_name} with {best_technique}',
             fontsize=14, fontweight='bold')
plt.tight_layout()
save_figure('fig_threshold_analysis')
plt.show()
plt.close()

Figure 14: Decision-threshold sensitivity analysis for the best model

8 Summary of key findings

Caseload increased and LOS decreased modestly while crude mortality remained ~4%, despite more ICU/High‑severity admissions (see Figure 3; Figure 5).
Comorbidity burden intensified over time—especially cardio‑metabolic, post‑procedural, and thrombo‑embolic clusters (see Figure 4).
Chemo‑Renal/Respiratory Failure, Thrombosis/Effusion/Thyroid, and Sepsis/Hemorrhagic clusters carried the highest mortality RR (peaks ~6–16×), despite not being the most prevalent (see Figure 7).
Under extreme imbalance, AUC alone is misleading; RF with ADASYN/SMOTE+Tomek maximized F1 ~0.27 at threshold 0.50, whereas RF with ROS/RUS preserved strong AUC (≈0.86) with higher recall options (see Table 11; Figure 13; Figure 10).
Optimal operating thresholds depend on downstream action capacity; 0.25–0.30 is suitable for broad screening (high recall), whereas 0.70–0.75 suits targeted, resource‑intensive interventions (see Figure 14; Figure 11; Figure 8).

9 GitHub

Code

# Crear instancia de QR
qr = qrcode.QRCode(
    version=1,
    error_correction=qrcode.constants.ERROR_CORRECT_L,
    box_size=10,
    border=4,
)

# Datos del QR
datos = "https://amarusimonaguerojimenez.github.io/Predictive-analysis-of-in-hospital-mortality-in-patients-with-ovarian-cancer/"
qr.add_data(datos)
qr.make(fit=True)

# Crear imagen base
img = qr.make_image(fill_color="#DC143C", back_color="white")

# Convertir a RGBA para transparencia
img_rgba = img.convert("RGBA")
data = np.array(img_rgba)

# Hacer el blanco transparente
white = [255, 255, 255, 255]
mask = (data[:, :, 0] == 255) & (data[:, :, 1] == 255) & (data[:, :, 2] == 255)
data[mask] = [255, 255, 255, 0]

# Crear imagen final
img_final = Image.fromarray(data)

# Guardar con DPI 1200
output_path = "figures/qr_rosa_transparente.png"
img_final.save(output_path, dpi=(1200, 1200))

img_final

10 References

AlHilli, M. M., Tran, C.-T., Langstraat, C. L., Martin, J. R., Weaver, A. L., McGree, M. E., Mariani, A., Cliby, W. A., & Bakkum-Gamez, J. N. (2018). Risk-scoring model for prediction of non-home discharge in epithelial ovarian cancer patients. Journal of the American College of Surgeons, 226(5), 908–914. https://doi.org/10.1016/j.jamcollsurg.2018.01.050

Batista, G. E. A. P. A., Prati, R. C., & Monard, M. C. (2004). A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20–29. https://doi.org/10.1145/1007730.1007735

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics — Theory and Methods, 3(1), 1–27. https://doi.org/10.1080/03610927408827101

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Davies, D. L., & Bouldin, D. W. (1979). A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence, PAMI-1(2), 224–227. https://doi.org/10.1109/TPAMI.1979.4766909

Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. https://doi.org/10.1016/j.patrec.2005.10.010

Fondo Nacional de Salud (FONASA). (2021). Portal de datos abiertos FONASA. https://datosabiertos.fonasa.cl/

Goic, A. (2015). The chilean health care system: The task ahead. Revista Médica de Chile, 143(6), 774–786. https://doi.org/10.4067/S0034-98872015000600011

Gower, J. C., & Legendre, P. (1986). Metric and euclidean properties of dissimilarity coefficients. Journal of Classification, 3, 5–48. https://doi.org/10.1007/BF01896809

He, H., Bai, Y., Garcia, E. A., & Li, S. (2008). ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IJCNN), 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969

Hidalgo, C. A., Blumm, N., Barabási, A.-L., & Christakis, N. A. (2009b). A dynamic network approach for the study of human phenotypes. PLoS Computational Biology, 5(4), e1000353. https://doi.org/10.1371/journal.pcbi.1000353

Hidalgo, C. A., Blumm, N., Barabási, A.-L., & Christakis, N. A. (2009a). A dynamic network approach for the study of human phenotypes. PLoS Computational Biology, 5(4), e1000353. https://doi.org/10.1371/journal.pcbi.1000353

Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (3rd ed.). Wiley. https://doi.org/10.1002/9781118548387

Jaccard, P. (1901). Étude comparative de la distribution florale dans une portion des alpes et du jura. Bulletin de La Société Vaudoise Des Sciences Naturelles, 37, 547–579. https://doi.org/10.5169/SEALS-266450

Legendre, P., & Legendre, L. (2012). Numerical ecology (3rd English ed.). Elsevier.

Lheureux, S., Gourley, C., Vergote, I., & Oza, A. M. (2019). Epithelial ovarian cancer. The Lancet, 393(10177), 1240–1253. https://doi.org/10.1016/S0140-6736(18)32552-2

Murtagh, F., & Legendre, P. (2014). Ward’s hierarchical agglomerative clustering method: Which algorithms implement ward’s criterion? Journal of Classification, 31(3), 274–295. https://doi.org/10.1007/s00357-014-9161-z

Quan, H., Li, B., Couris, C. M., Fushimi, K., Graham, P., Hider, P., Januel, J.-M., & Sundararajan, V. (2011). Updating and validating the charlson comorbidity index and score for risk adjustment in hospital discharge abstracts using data from 6 countries. American Journal of Epidemiology, 173(6), 676–682. https://doi.org/10.1093/aje/kwq433

Real, R., & Vargas, J. M. (1996). The probabilistic basis of jaccard’s index of similarity. Systematic Biology, 45(3), 380–385. https://doi.org/10.1093/sysbio/45.3.380

Roque, F. S., Jensen, P. B., Schmock, H., Dalgaard, M., Andreatta, M., Hansen, T., Søeby, K., Bredkjær, S., Juul, A., Werge, T., Jensen, L. J., & Brunak, S. (2011). Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Computational Biology, 7(8), e1002141. https://doi.org/10.1371/journal.pcbi.1002141

Rousseeuw, P. J. (1987). Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7

Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLOS ONE, 10(3), e0118432. https://doi.org/10.1371/journal.pone.0118432

Strobl, C., Boulesteix, A.-L., Zeileis, A., & Hothorn, T. (2007). Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinformatics, 8, 25. https://doi.org/10.1186/1471-2105-8-25

Tetsche, M. S., Dethlefsen, C., Pedersen, L., Sørensen, H. T., & Nørgaard, M. (2008). The impact of comorbidity and stage on ovarian cancer mortality: A nationwide danish cohort study. BMC Cancer, 8, 31. https://doi.org/10.1186/1471-2407-8-31

Tomek, I. (1976). Two modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics, 6(11), 769–772. https://doi.org/10.1109/TSMC.1976.4309452

Torre, L. A., Trabert, B., DeSantis, C. E., Miller, K. D., Samimi, G., Runowicz, C. D., Gaudet, M. M., Jemal, A., & Siegel, R. L. (2018). Ovarian cancer statistics, 2018. CA: A Cancer Journal for Clinicians, 68(4), 284–296. https://doi.org/10.3322/caac.21456

Valderas, J. M., Starfield, B., Sibbald, B., Salisbury, C., & Roland, M. (2009). Defining comorbidity: Implications for understanding health and health services. Annals of Family Medicine, 7(4), 357–363. https://doi.org/10.1370/afm.983

Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244. https://doi.org/10.1080/01621459.1963.10500845