Rohan Pattnaik

About

I started my career in computer science, drawn to hard problems. Then I discovered astrophysics — a field with extraordinary data and almost no off-the-shelf solutions.

For 8 years, I've built ML systems for data that most engineers never encounter: galaxy spectra, X-ray binaries, 21cm cosmological signals, transient astronomical events. Problems where you can't Google the solution and the training data might be labeled by 5,000 volunteers.

That unusual trajectory gave me something rare: the ability to drop into a new domain, understand it deeply, and build AI that actually works — not just prototypes, but systems that get published, deployed, and adopted by other researchers.

Currently at Johns Hopkins University as an Assistant Research Scientist, scaling foundation models for spectroscopy and exploring cross-domain transfer to mass spectrometry for planetary biosignature detection.

I'm actively looking for industry roles in ML/AI where rigorous thinking and unconventional data are features, not bugs.

SELECTED WORK

What I've Built

SpecPT — Transformer Foundation Model for Spectroscopy

Designed a transformer autoencoder for self-supervised spectral representation learning, trained on 13 million galaxy spectra. The model denoises high-dimensional sequential inputs and predicts continuous targets with R² = 0.99 — compressing a months-long manual analysis pipeline to seconds per sample. Enables zero-shot transfer to new instruments.

Most remarkably: a University of Maryland team adopted SpecPT as the backbone for a mass spectrometry classifier targeting biosignature detection on planetary rovers — with minimal fine-tuning. The representations learned from galaxies transferred to biology.

Transformer Self-Supervised Foundation Model Transfer Learning PyTorch

→ Paper

Published · The Astrophysical Journal

Redshift Wrangler — Human-in-the-Loop ML at Scale

Built a complete citizen science data pipeline: converted raw FITS astronomical spectral files into visual formats suitable for non-expert annotation, launched the platform on Zooniverse, and recruited 5,000+ active volunteers who contributed 190,000 classifications. The resulting labeled dataset powers downstream ML pipelines with trustworthy training data — achieved at a fraction of the cost of expert labeling.

Data Pipeline Human-in-the-Loop Crowdsourcing FITS Python

→ Zooniverse

5,000+ volunteers · 190K classifications

Cross-Validation Audit Framework for Regulatory Deposit Forecasting

At JPMorgan Chase, designed an audit framework for X-13 ARIMA time-series models used in regulatory deposit forecasting. Identified systematic overfitting in complex model configurations that masked true accuracy. Recommended and validated simpler, interpretable alternatives — improving out-of-sample forecast accuracy by 15% while reducing model complexity and regulatory compliance overhead.

Time-Series Model Risk Statistical Validation X-13 ARIMA R

+15% out-of-sample accuracy

Expert-Labeling Web App + Fine-tuned ResNet (Schlumberger-Doll Research)

Built a containerized expert-labeling web application (Docker + Flask) to replace heuristic-generated labels for rock particle classification. Fine-tuned ResNet on the new expert labels, achieving 85%+ classification accuracy versus the 60% average from heuristics. Replaced a manual, inconsistent process with a reproducible MLOps pipeline.

Computer Vision ResNet Docker Flask MLOps Fine-tuning

85% accuracy vs 60% baseline

Real-Time Astronomical Transient Classifier

Built a CNN classifier for real-time detection of transient astronomical events — gamma-ray bursts, supernovae, flare stars — in the Deeper, Wider, Faster survey. Reduced manual inspection by 95%, achieving a 21× speedup in event detection pipeline throughput.

CNN Anomaly Detection Real-Time Python Astronomy

95% reduction in manual inspection

The Toolkit

ML & Deep Learning

Transformers · CNNs · RNNs
Autoencoders · VAEs
Bayesian Inference
Transfer Learning
Distributed Training
Multi-task Learning

Infrastructure & Tools

Docker · Flask · Git
Linux/Unix · Jupyter
HuggingFace Accelerate
SLURM / HPC clusters
LaTeX

Languages

Python (primary)
R · SQL · C++ · Julia
Bash · HTML/CSS

Data & Analysis

PyTorch · scikit-learn
pandas · NumPy · SciPy
OpenCV · Keras
Astropy · FITS data

Selected Talks

Type	Talk / Venue	Date
Invited	Infrared Spectroscopy from Space · IPAC Caltech	Oct 2025
Invited	Modern Statistics of Galaxies Seminar · LMU Munich	Jul 2025
Invited	Zwicky Transient Facility ML Meeting	Feb 2025
Invited	Johns Hopkins University	Feb 2025
Contributed	AI/ML Applications in Astronomy & Astrophysics	Jan 2025
Invited	Physics Colloquium · SUNY Geneseo	Jan 2024
Invited	AI in Astronomy Meeting · Universidade de São Paulo, Brazil	Oct 2022
Talk	ML in X-Ray Astronomy: Classifying Black Holes & Neutron Stars · PyData London	Apr 2018