About Me
I currently work as a Senior Applied Scientist at Amazon's Artificial General Intelligence (AGI) - Customizations team. In this role, I lead the development and launch of Amazon Nova customizations features, focusing on advanced fine-tuning capabilities that enable customers to tailor foundation models to their specific needs. My expertise spans cutting-edge RL techniques including Direct Preference Optimization (DPO), Proximal Policy Optimization (PPO), and Group Relative Policy Optimization (GRPO). I have architected and deployed scalable reinforcement learning from human feedback (RLHF) pipelines that enhance model alignment and performance for enterprise applications. My work directly contributes to Amazon Nova's customization infrastructure, empowering customers across diverse industries to fine-tune large language models for their unique domains and use cases. Through close collaboration with cross-functional teams, I integrate state-of-the-art machine learning techniques into production-ready systems that serve AWS's global customer base.
Research Background
At UMass Amherst with Prof. Andrew McCallum, my graduate research centered on data- and compute-efficient NLP via self-supervision, meta-learning, and domain-specific pre-training. I helped design diverse distributions of self-supervised tasks for meta-learning—introducing frequency/cluster-aware sampling, sentence-level task construction, and an easy-to-hard curriculum—which improved few-shot transfer (e.g., up to +4.2 average points over prior unsupervised baselines and competitive 5-shot results on FewRel 2.0). arXiv In parallel, I developed an unsupervised pre-training objective for biomedical QA that denoises corrupted entity mentions to teach span-level reasoning from unlabeled text, significantly boosting BioBERT and outperforming the previous best system on BioASQ 7b-Phase B. arXiv Collectively, this work advanced curriculum-driven task design and domain-aware pre-training, and it shaped my later focus on alignment and preference-optimization methods (e.g., RLHF/DPO/PPO/GRPO) for efficient LLM adaptation under limited supervision.
Interest
- Reinforcement Learning from Human Feedback (RLHF)
- Direct Preference Optimization (DPO) and Policy Optimization
- Proximal Policy Optimization (PPO) for Model Alignment
- Group Relative Policy Optimization (GRPO)
- Foundation Model Customization and Fine-tuning
- Large Language Model Optimization
- Natural Language Processing and Deep Learning