PROJECTS

A selection of independent, academic, and applied machine-learning projects demonstrating my expertise across data science, HPC automation, and particle-physics analytics.

Time-Series Analysis of Topics

Topic Evolution in High-Energy Physics

End-to-end NLP pipeline analyzing 90 years of HEP literature using BERTopic, ARIMA, and XGBoost for trend discovery.

PythonBERTopicARIMA ForecastingXGBoostNLPText MiningTopic ModellingTime-Series AnalysisData Visualization

Project Story

Over the past six years of researching in Higgs physics, I have learnt how the field and the questions it asks has evolved. This understanding has come from conversations with colleagues and mentors, as well as from the vast body of literature I have explored myself. With my training in Data Science, Machine Learning (ML) and Natural Language Processing (NLP), I began to wonder:

Can the collective knowledge of our field be systematically learned and visualized using modern machine-learning methods?

To explore this, I applied NLP techniques to hundreds of high-energy-physics abstracts and titles, building a pipeline that uncovers latent topic structures and quantifies their evolution through time. Methods such as BERTopic identify semantically meaningful clusters, while ARIMA forecasting models their temporal dynamics — revealing the rise and decline of subfields within particle physics.

The results provide a data-driven view of the sociology of modern physics, illustrating how theoretical and experimental communities have co-evolved over the decades and how research priorities continue to shift with discovery and technology.

View on GitHub
Topic Evolution in High-Energy Physics image 1
Topic Evolution in High-Energy Physics image 2
Topic Evolution in High-Energy Physics image 3
Topic Evolution in High-Energy Physics image 4
Applied Machine Learning & Analytics

University of Cambridge Data Science, ML & AI Career Accelerator

Five end-to-end applied projects from the University of Cambridge Data Science, ML & AI Career Accelerator, demonstrating practical skills in anomaly detection, segmentation, predictive modelling, NLP, and time-series forecasting.

PythonTensorFlowKerasXGBoostScikit-learnPandasNumPyMatplotlibSeabornPlotlyBERTopicGensimspaCySentence TransformersFalcon-7BARIMASARIMAHybrid ARIMA–XGBoostNeural NetworksDeep LearningSVMRandom ForestRegressionPCAt-SNEClusteringTime-Series ForecastingFeature EngineeringWordCloudStatsmodels

Project Story

This collection of projects was part of the University of Cambridge Data Science, Machine Learning & AI Career Accelerator. To keep the experience fair for future learners, the full project materials aren't publicly shared, but the outcomes and insights attached reflect my own work and learning journey.

1) Anomaly Detection

2) Customer Segmentation

3) Predictive Modelling

4) NLP Topic Modelling

5) Time-Series Forecasting.

Each project deepened my ability to turn complex data into insight — showing how strong analytical foundations and modern ML tools can make data both meaningful and actionable.

University of Cambridge Data Science, ML & AI Career Accelerator image 1
University of Cambridge Data Science, ML & AI Career Accelerator image 2
University of Cambridge Data Science, ML & AI Career Accelerator image 3
University of Cambridge Data Science, ML & AI Career Accelerator image 4
University of Cambridge Data Science, ML & AI Career Accelerator image 5

Academic Computing / HPC Projects

High-performance computing workflows and automation frameworks developed as a physicist, supporting large-scale Monte Carlo simulations and amplitude calculations relevant to high-energy colliders such as the LHC. My work involved integrating symbolic algebra (FORM, Mathematica) with numerical Fortran modules and HPC job-submission scripts for efficient multi-parameter scans. Code samples and implementation details are available upon request.

In Search of the Higgs Boson – Xavier Cortada with CMS

"In Search of the Higgs Boson."

Built with v0