Comparison of Various Meta-Learning Paradigms in Few-Shot Preference-Based Reinforcement Learning
Implemented and refined few-shot preference-based reinforcement learning algorithms, including MAML, iterated MAML, and REPTILE, to optimize human feedback efficiency on Metaworld datasets. Developed a generalized reward function adaptable to new tasks with minimal human queries and ~90% reduction in training time. You can find the codebase of this project hosted on GitHub.