Survival Analysis of Young Triple-Negative Breast Cancer Patients
Abstract: Breast cancer prognosis is crucial for effective treatment, with the disease more common in women over 40 years old but rare under 40 years old, where less than 5 percent of cases occur in the U.S. Studies indicate a worse prognosis in younger women, which varies by ethnicity. Breast cancers are classified based on receptors like estrogen, progesterone, and HER2. Triple-negative breast cancer (TNBC), lacking these receptors, accounts for about 15 percent of cases and is more prevalent in younger patients, often resulting in poorer outcomes. Nevertheless, the impact of age on TNBC prognosis remains unclear. Factors like age, race, tumor grade, size, and lymph node status are studied for their role in TNBC's clinical outcomes, but current research is inconclusive about age-related differences. This study uses SEER data set to examine the influence of younger age on survivability in TNBC patients, aiming to determine if age is a significant prognostic factor. Our experimental results on SEER dataset confirm the existing research reports that TNBC patients have worse prognosis compared to non-TNBC based on age. Our main goal was to investigate whether younger age has any significance on the survivability of TNBC patients. Experimental results do not show that younger age has any significance on the prognosis and survival rate of the TNBC patients
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a concise list of unresolved issues, methodological limitations, and concrete open questions that this paper leaves for future research:
- Endpoint definition ambiguity: The survival endpoint (overall survival vs breast-cancer–specific survival) is not clearly defined, and tables indicate “N of Events=survived, Censored=died,” which reverses standard survival-analysis conventions. Clarify event/censor definitions and use cause-specific or competing-risks methods where appropriate.
- Inappropriate use of odds ratios for time-to-event data: The study relies on odds ratios at fixed time points (e.g., <2 years, 5 years) rather than hazard ratios from Cox models or restricted mean survival time. Re-analyze using time-to-event methods that account for censoring and varying follow-up.
- Lack of multivariable adjustment: No Cox proportional hazards or other multivariable models were used to adjust for confounders (tumor size, grade, nodal status, stage, treatment, comorbidities, year of diagnosis). Perform adjusted analyses and assess effect modification.
- No power or sample-size justification: Many strata (e.g., age <30, marital status categories, ethnicity/race subgroups) are underpowered. Provide formal power calculations and ensure adequate sample sizes for subgroup and interaction analyses.
- Arbitrary and underpowered age cutpoint: The dichotomy at <30 vs ≥30 produces very small “young” TNBC cohorts. Explore age as a continuous variable (e.g., splines), alternative cutpoints (<35, <40), and multi-bin strata (20s, 30s, 40s), with sensitivity analyses.
- SEER TNBC classification validity: SEER lacks HER2 receptor data before ~2010, risking misclassification of triple-negative status if earlier years were included. Specify diagnosis years, handling of missing receptor fields, and validate TNBC coding.
- Missing treatment detail: SEER contains limited systemic therapy information; the analysis does not incorporate chemotherapy, radiation, surgery type, neoadjuvant therapy, or pathologic response—key determinants of TNBC outcomes. Integrate treatment variables or link to datasets with treatment detail.
- Unaddressed temporal trends: Year of diagnosis and secular improvements in TNBC management are not modeled. Include calendar period as a covariate and assess cohort effects.
- Stage- and grade-specific age effects: The paper does not test whether age interacts with disease stage, nodal status, tumor size, or histological grade. Conduct stratified and interaction analyses (age × stage, age × nodal status, etc.).
- NPI applicability and computation transparency: The Nottingham Prognostic Index is applied without detailing how components (size, nodes, grade) were derived from SEER or whether NPI is validated for TNBC. Provide exact computation steps, data completeness checks, and TNBC-specific validation.
- Potential multiple-comparisons inflation: Numerous subgroup tests (race/ethnicity, marital status, NPI strata, survival time cutoffs) were performed without correction. Implement multiple-testing control (e.g., FDR) and pre-specify hypotheses.
- Lack of biomarker and molecular heterogeneity: TNBC subtypes (e.g., basal-like, LAR) and biomarkers (Ki-67, p53, BRCA1/2 status) are not included. Incorporate molecular subtyping and genetic data to assess age-related biological differences.
- Socioeconomic and access-to-care confounding: Analyses omit socioeconomic variables (insurance status, area deprivation, urban/rural, education) and access metrics that can mediate age and race effects. Add SDOH covariates and mediation analyses.
- Cause-specific vs all-cause outcomes and competing risks: Older patients have higher non-cancer mortality; age comparisons using OS may be confounded. Use breast-cancer–specific survival and competing-risks models (Fine–Gray) with cause-of-death coding.
- Follow-up duration reporting: Median follow-up time, maximum follow-up, and number of events per stratum are not reported. Provide detailed follow-up metrics to assess maturity and reliability of survival estimates.
- Censoring assumptions and data quality: The analysis does not examine non-informative censoring assumptions, missing data patterns, or imputation strategies. Audit missingness, specify inclusion/exclusion criteria, and apply appropriate imputation or sensitivity analyses.
- Race/ethnicity definitions and small-cell issues: Categories (e.g., “HWhite,” “NHAPI”) are unclear, with small counts leading to unstable estimates. Standardize race/ethnicity definitions, collapse sparse categories appropriately, and confirm coding fidelity.
- Endpoint limitations in SEER (no recurrence data): Disease-free survival and relapse-free survival cannot be assessed in SEER. Link to datasets capturing recurrence (e.g., NCDB, institutional registries) or use alternative endpoints (time to second cancer event) if available.
- Proportional hazards assumption: The study does not test PH assumptions. Evaluate PH and use flexible parametric or AFT models if hazards are non-proportional, especially across age groups.
- Age × race interaction: Preliminary null findings may be underpowered; assess formal interaction terms and perform stratified Cox models with adequate sample sizes or pooled registries.
- Menopausal status and reproductive factors: Young age effects may be mediated by menopausal status, parity, age at first birth, and breastfeeding—variables not analyzed here. Include these factors to disentangle biological and social mechanisms.
- Geographic heterogeneity: SEER registry differences (regional demographics, care patterns) are not considered. Model registry-level random effects or cluster-robust variance and assess geographic variation.
- Reporting clarity and reproducibility: Tables/figures are inconsistent; the analysis lacks a transparent statistical plan, code, and data-versioning details (SEER release, variable definitions). Provide a reproducible workflow with documented scripts and variable mappings.
- Long-term outcomes: Only 2- and 5-year survival cutoffs are examined; long-term (10-year) survival differences by age remain unexplored. Extend follow-up horizons and assess late mortality patterns.
- Clinical actionability of null age effect: The conclusion that “younger age has no significance” may reflect underpower or confounding. Define clinically meaningful effect sizes, conduct sensitivity analyses, and quantify the probability of type II error to guide interpretation.
These gaps suggest concrete next steps: re-analyze with multivariable time-to-event models (including treatment, stage, grade, year, SDOH), validate TNBC classification in SEER, use continuous/spline age modeling, adopt cause-specific/competing-risks endpoints, incorporate molecular and genetic data, ensure adequate sample sizes with power justification, and provide reproducible, transparent reporting.
Collections
Sign up for free to add this paper to one or more collections.