About

Graduate Student and Research Assistant

I am currently pursuing an MS at Iowa State University, where my research focuses on offline reinforcement learning methods. I am developing generative models for autonomous racing robotic cars and working on bridging the sim-to-real gap with an emphasis on safety. Prior to starting my MS, I worked as an R&D engineer specializing in robot navigation and computer vision. I earned my undergraduate degree in Mechanical Engineering from the Institute of Engineering at Tribhuvan University.

Graduate Research

Completed Manuscripts:

 

Koirala, Prajwal, Zhanhong Jiang, Soumik Sarkar and Cody Fleming. "Latent Safety-Constrained Policy Approach for Safe Offline Reinforcement Learning." (2024).

Status: Paper accepted at ICLR 2025. Link: https://arxiv.org/abs/2412.08794

Abstract: In safe offline reinforcement learning, the objective is to develop a policy that maximizes cumulative rewards while strictly adhering to safety constraints, utilizing only offline data. Traditional methods often face difficulties in balancing these constraints, leading to either diminished performance or increased safety risks. We address these issues with a novel approach that begins by learning a conservatively safe policy through the use of Conditional Variational Autoencoders, which model the latent safety constraints. Subsequently, we frame this as a Constrained Reward-Return Maximization problem, wherein the policy aims to optimize rewards while complying with the inferred latent safety constraints. This is achieved by training an encoder with a reward-Advantage Weighted Regression objective within the latent constraint space. Our methodology is supported by theoretical analysis, including bounds on policy performance and sample complexity. Extensive empirical evaluation on benchmark datasets, including challenging autonomous driving scenarios, demonstrates that our approach not only maintains safety compliance but also excels in cumulative reward optimization, surpassing existing methods. Additional visualizations provide further insights into the effectiveness and underlying mechanisms of our approach.

 

Koirala, Prajwal, and Cody Fleming. "Solving Offline Reinforcement Learning with Decision Tree Regression." (2024).

Status: Paper accepted at CoRL 2024. Link: https://openreview.net/pdf?id=eTRncsYYdv

Abstract: This study presents a novel approach to addressing offline reinforcement learning (RL) problems by reframing them as regression tasks that can be effectively solved using Decision Trees. Mainly, we introduce two distinct frameworks: return-conditioned and return-weighted decision tree policies (RCDTP and RWDTP), both of which achieve notable speed in agent training as well as inference, with training typically lasting less than a few minutes. Despite the simplification inherent in this reformulated approach to offline RL, our agents demonstrate performance that is at least on par with the established methods. We evaluate our methods on D4RL datasets for locomotion and manipulation, as well as other robotic tasks involving wheeled and flying robots. Additionally, we assess performance in delayed/sparse reward scenarios and highlight the explainability of these policies through action distribution and feature importance.

 

Koirala, Prajwal, and Cody Fleming. "F1tenth Autonomous Racing With Offline Reinforcement Learning Methods." (2024).

Status: Paper accepted at IEEE ITSC 2024. Link: https://arxiv.org/abs/2408.04198

Abstract: Autonomous racing serves as a critical platform for evaluating automated driving systems and enhancing vehicle mobility intelligence. This work investigates offline reinforcement learning methods to train agents within the dynamic F1tenth racing environment. The study begins by exploring the challenges of online training in the Austria race track environment, where agents consistently fail to complete the laps. Consequently, this research pivots towards an offline strategy, leveraging `expert' demonstration dataset to facilitate agent training. A waypoint-based suboptimal controller is developed to gather data with successful lap episodes. This data is then employed to train offline learning-based algorithms, with a subsequent analysis of the agents' cross-track performance, evaluating their zero-shot transferability from seen to unseen scenarios and their capacity to adapt to changes in environment dynamics. Beyond mere algorithm benchmarking in autonomous racing scenarios, this study also introduces and describes the machinery of our return-conditioned decision tree-based policy, comparing its performance with methods that employ fully connected neural networks, Transformers, and Diffusion Policies and highlighting some insights into method selection for training autonomous agents in driving interactions.

 

 

Ongoing Research

"Collision Avoidance in Multiagent Navigation with Deep Reinforcement Learning."

Status: Work in progress.