Reinforcement learning

General data

Course ID:	1000-2M20UZW
Erasmus code / ISCED:	11.3 The subject classification code consists of three to five digits, where the first three represent the classification of the discipline according to the Discipline code list applicable to the Socrates/Erasmus program, the fourth (usually 0) - possible further specification of discipline information, the fifth - the degree of subject determined based on the year of study for which the subject is intended. / (0612) Database and network design and administration The ISCED (International Standard Classification of Education) code has been designed by UNESCO.
Course title:	Reinforcement learning
Name in Polish:	Uczenie ze wzmocnieniem (wspólnie z 1000-318bRL)
Organizational unit:	Faculty of Mathematics, Informatics, and Mechanics
Course groups:	(in Polish) Przedmioty obieralne na studiach drugiego stopnia na kierunku bioinformatyka Elective courses for Computer Science and Machine Learning
ECTS credit allocation (and other scores):	(not available) Basic information on ECTS credits allocation principles: the annual hourly workload of the student’s work required to achieve the expected learning outcomes for a given stage is 1500-1800h, corresponding to 60 ECTS; the student’s weekly hourly workload is 45 h; 1 ECTS point corresponds to 25-30 hours of student work needed to achieve the assumed learning outcomes; weekly student workload necessary to achieve the assumed learning outcomes allows to obtain 1.5 ECTS; work required to pass the course, which has been assigned 3 ECTS, constitutes 10% of the semester student load. view allocation of credits
Language:	English
Type of course:	elective monographs
Short description:	The classes present contemporary techniques and algorithms of reinforcement learning.
Full description:	1. Model-free methods a) Reinforcement Learning formalism: Markov Decision Processes (MDPs) & Dynamic programming (DP) b) Value methods * SARSA and TD(1) * Bias-variance trade-off and TD(lambda) * Function approximators and corresponding challenges c) Policy gradient methods * Vanilla policy gradients * Generalized Advantage Estimator (GAE) * Problems with policy gradient methods d) Actor-critic methods * Trust Region Policy Optimization (TRPO) * Proximal Policy Optimization (PPO) * Soft Actor-Critic (SAC) 2. Model-based methods: a) Model estimation b) Planning * Continuous and discrete control problems * Monte-Carlo Tree Search * AlphaZero 3. Exploration a) Multi-armed bandits model b) Uncertainty related exploration strategies 4. Research topics 5. Talks by practitioners
Bibliography:	R. Sutton, G. Barto, Reinforcement Learning: An Introduction Francois-Lavet, F., Henderson P., Islam R., Bellemare M. G., Pineau J.,, An Introduction to Deep Reinforcement Learning. Szepesvari, C., Algorithms for Reinforcement Learning
Learning outcomes:	Knowledge * Mathematical formalism of reinforcement learning, which allows to develop efficient RL algorithms and analyse existing ones. * Understands the basic components of RL algorithms and how they interact together. * Knows when to apply and how to implement most important algorithms in RL from policy gradient, value-based and actor-critic classes. * Has a basic knowledge of popular RL libraries. Skills * Can develop efficient algorithms and test them. * Can distinguish types of RL problems and estimate its difficulty. * Can appropriately apply methods to develop an algorithm or apply already known methods in own research projects. * Can implement own algorithms and use existing RL libraries. * Can test implemented and developed algorithms. * Can find and use the information contained in research papers Competences * Knows limits of own RL knowledge and realizes the need of continuous learning. * Understands the need for systematic work and meeting deadlines. * Understands and appreciates the importance of intellectual honesty in the use of someone else's software. Behaves ethically during the implementation of algorithmic projects. * Independently be able to find and use various types of information about algorithms, also in foreign languages.
Assessment methods and assessment criteria:	Attendance and project.

This course is not currently offered.

Course descriptions are protected by copyright.
Copyright by University of Warsaw.