Looking for Full-time Applied Research Roles from July 2026

Manas Jain

MS Data Science @ UC San Diego

Ex-Microsoft AI · IIT Bombay

Master's student in Data Science at UC San Diego with 4+ years of industry experience building AI and machine learning systems at scale. Most recently at Microsoft AI, where I shipped multilingual generative suggestion models powering Bing Autosuggest to 200M+ daily active users. Currently conducting research on multimodal LLM agents for scientific discovery with Prof. Rose Yu. IIT Bombay graduate with double minors in Computer Science and Machine Intelligence & Data Science.

Manas Jain

Publications

ICLR '26 Main Track

Zephyrus: An Agentic Framework for Weather Science

S. Varambally, M. Fisher, ..., M. Jain, ..., T. Berg-Kirkpatrick, D. Watson-Parris, Y. Ma, R. Yu

CIKM '25 Applied Research Track

EnhanceMyPrompt: Rewriting Chat Queries for Effective Response Generation from LLMs

M. Jain*, T. Abhishek*, S. Hardia, S. Suriyanarayanan, S. Anil, R. Gandhi, M. Gupta (* equal contribution)

ICON '24 Main Track

Natural Answer Generation: From Factoid Answer to Full-length Answer using Grammar Correction

M. Jain, S. Saha, P. Bhattacharyya, G. Chinnadurai, M. Vatsa

Experience

Mar 2024 — Sept 2025

Data & Applied Scientist 2

Microsoft AI — Bing Autosuggest

Bangalore, India

  • Designed & deployed T5/mT5-based multilingual generative query suggestions in production, achieving +200K DAU, improving coverage (+2 SBS) and reducing latency (-6 ms)
  • Developed generative suggestions for mid-query reformulation with correction-aware modeling, leading to +20K DAU & winning Best Hack Award among 30+ submissions
  • Fine-tuned SLMs (Phi-3-mini, Llama-3.2-3B) with SFT/LoRA for the Enhance Your Prompt project
  • Optimized inference with TensorRT-LLM, pruning, and vocab trimming to handle 100K QPS within 30 ms SLA
  • Built DeepSeek-R1-Distill-Qwen-32B based offline evaluation pipeline with vLLM for large-scale benchmarking
Dec 2022 — Mar 2024

Associate Machine Learning Scientist

Wadhwani Institute of AI — Agriculture NLP Team

Bangalore, India

  • Led development of multilingual NLP pipelines & RAG-based LLM-powered chatbots for news monitoring (Krishi 24/7) and farmer support
  • Implemented an AI/ML-based surveillance system for the Indian Ministry of Agriculture to track pest & disease incidences
Jul 2021 — Dec 2022

Research Data Scientist

HiLabs Inc. — Biomedical NLP Team

Pune, India

  • Built and deployed ML pipelines for ICD10 medical code prediction and NER from provider contracts using BioBERT, SciSpacy, and OCR
  • Developed pipeline to find evidence for medical codes within physician notes at scale using PySpark on AWS EMR
May — Jun 2020

R&D AI Intern

Daikin Industries — DigiNavi, ICT Group

Osaka, Japan

  • Built an end-to-end NLP pipeline for video tagging, captioning, and summarization using TF-IDF, BERT, LDA, and S-BERT with GCP Speech-to-Text API

Research

Feb 2026 — Present Ongoing

RL-based Post-Training for Accelerated Video Generation

Prof. Hao Zhang, Hao AI Lab, HDSI — UC San Diego

  • Contributing to FastVideo, an open-source unified post-training and inference framework for accelerated video generation (3K+ GitHub stars)
  • Focusing on RL-based post-training recipes for video generation models
May 2025 — Present Ongoing

Multimodal LLM Agents for Scientific & Weather Discovery

Prof. Rose Yu, Spatiotemporal Lab, CSE — UC San Diego

  • Pioneered a Bayesian Optimization evaluation task integrated into ZephyrusBench using a Neural GCM simulator
  • Enhancing the Zephyrus agent's ability to solve multi-step scientific problems via subgoal decomposition and RL
  • Co-authored Zephyrus, accepted as a main track paper at ICLR 2026
Fall 2025

Post Training Alignment Against Prompt Injection

Prof. Yu-Xiang Wang, HDSI — UC San Diego

  • Implemented a sequential alignment pipeline (SFT followed by DPO) to robustify Llama-3-3B and Qwen-3-4B using Tinker
  • Reduced Attack Success Rate (ASR) by 70% (9.0% → 2.7%) on Qwen-3-4B while improving benign helpfulness scores by 34%
  • Engineered an automated "LLM-as-a-Judge" evaluation harness to benchmark model safety against adversarial datasets (WildJailbreak, AdvBench, JailbreakBench)

Education

University of California San Diego

Master of Science in Data Science

Sept 2025 — June 2026 (Expected)

Coursework: Probability & Statistics, Statistical Models, Recommender Systems, ML Systems, Safety in GenAI, LLM System Optimization, Efficient AI for LLMs, Algorithms for Data Science

GPA: 4.0 / 4.0

Graduate Student Researcher at Rose Spatiotemporal ML Lab

Indian Institute of Technology Bombay

B.Tech in Civil Engineering

Double Minor in Computer Science & Machine Intelligence (CMInDS)

July 2017 — June 2021

GPA: 8.7 (Major) · 8.75 (Minor)

Bachelor Thesis with Prof. Pushpak Bhattacharyya in collaboration with LG Soft India

Teaching Assistant — MA 106 Linear Algebra

Technical Skills

Languages

Python C/C++ Go Java SQL R Julia

ML & AI

PyTorch JAX OpenCV TensorRT-LLM vLLM LangChain LangGraph MCP Tinker

Tools & Platforms

AWS Git React Docker PySpark

Hobbies & Interests

Racket Sports

Avid table tennis, lawn tennis, badminton, pickleball player. Always up for a match!

Travel & Exploration

Exploring new cities, cultures, and cuisines around the world.

Team Sports

Enjoy playing cricket, volleyball and basketball with friends.

Photography

Capturing moments from travels, sunsets, and everyday life.

Life Beyond Code

Follow along for travel snaps, sports moments, and more.

Follow me on Instagram