Project Details
Description
Existing reinforcement learning (RL) approaches usually assume that a learned policy will be deployed in the same environment as the one it was trained in. Such an assumption is often violated in practice, due to e.g., adversarial perturbations, modeling error between simulator and real-world applications, non-stationary environment, and limited amount of training data. The discrepancy between the training and test environments gives rise to a model mismatch, which lead to a notable decline in performance and restrict the suitability of RL in crucial domains, e.g., healthcare, critical infrastructure, transportation systems, and smart cities. To address the above challenge, there have been noteworthy efforts to develop distributionally robust RL approaches. This CAREER project aims to advance the fundamental algorithmic and theoretic limits of distributionally robust RL. The research outcome of this project holds the promise to push the algorithmic and theoretical boundaries of robust RL, and will deliver provably convergent, efficient and minimax optimal robust RL algorithms. The project will have a significant impact on theory and practice of sequential decision making in various domains, e.g., special education, intelligent transportation system, wireless communication networks, power systems and drone networks. The activities in this project will provide concrete principles and design guidelines to achieve robustness in face of model uncertainty. The integration of research work into education and outreach will target K-12 educators, graduate, undergraduate and underrepresented students with efforts on (i) Artificial Intelligence (AI) summer camp for K-12 educators; (ii) Buffalo Day workshop; (iii) curriculum development; (iv) student supervision.
The research efforts are organized around three complimentary thrusts: (i) Thrust A focuses on developing theoretical and algorithmic foundations for distributionally robust RL under the long-term average-reward criterion. (ii) Thrust B focuses on developing a unified framework of distributional robustness for learning (robust) policies from offline dataset without active data acquisition and exploration, and further uncovering their fundamental limits; (iii) Thrust C focuses on constructive approaches and fundamental limits of robust RL under constraints, i.e., optimizing reward while simultaneously guaranteeing constraints under model uncertainty. This project will develop fundamental understandings of robust RL, minimax optimal robust RL algorithms and novel technical convergence and complexity analyses. The research outcome will significantly improve the robustness of RL algorithms and will be of interest to a broad range of communities, e.g., machine learning, statistics, information theory, networking, communication, power, and education. The proposed work will also foster new interdisciplinary research directions across these research communities.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
| Status | Active |
|---|---|
| Effective start/end date | 09/1/24 → 08/31/28 |
Funding
- National Science Foundation: $417,239.00
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.