University of Warsaw - Central Authentication System
Strona główna

Reinforcement learning

General data

Course ID: 1000-2M20UZW
Erasmus code / ISCED: 11.3 The subject classification code consists of three to five digits, where the first three represent the classification of the discipline according to the Discipline code list applicable to the Socrates/Erasmus program, the fourth (usually 0) - possible further specification of discipline information, the fifth - the degree of subject determined based on the year of study for which the subject is intended. / (0612) Database and network design and administration The ISCED (International Standard Classification of Education) code has been designed by UNESCO.
Course title: Reinforcement learning
Name in Polish: Uczenie ze wzmocnieniem (wspólnie z 1000-318bRL)
Organizational unit: Faculty of Mathematics, Informatics, and Mechanics
Course groups: (in Polish) Przedmioty obieralne na studiach drugiego stopnia na kierunku bioinformatyka
Elective courses for Computer Science and Machine Learning
ECTS credit allocation (and other scores): (not available) Basic information on ECTS credits allocation principles:
  • the annual hourly workload of the student’s work required to achieve the expected learning outcomes for a given stage is 1500-1800h, corresponding to 60 ECTS;
  • the student’s weekly hourly workload is 45 h;
  • 1 ECTS point corresponds to 25-30 hours of student work needed to achieve the assumed learning outcomes;
  • weekly student workload necessary to achieve the assumed learning outcomes allows to obtain 1.5 ECTS;
  • work required to pass the course, which has been assigned 3 ECTS, constitutes 10% of the semester student load.

view allocation of credits
Language: English
Type of course:

elective monographs

Short description:

The classes present contemporary techniques and algorithms of reinforcement learning.

Full description:

1. Model-free methods

a) Reinforcement Learning formalism: Markov Decision Processes (MDPs) & Dynamic programming (DP)

b) Value methods

* SARSA and TD(1)

* Bias-variance trade-off and TD(lambda)

* Function approximators and corresponding challenges

c) Policy gradient methods

* Vanilla policy gradients

* Generalized Advantage Estimator (GAE)

* Problems with policy gradient methods

d) Actor-critic methods

* Trust Region Policy Optimization (TRPO)

* Proximal Policy Optimization (PPO)

* Soft Actor-Critic (SAC)

2. Model-based methods:

a) Model estimation

b) Planning

* Continuous and discrete control problems

* Monte-Carlo Tree Search

* AlphaZero

3. Exploration

a) Multi-armed bandits model

b) Uncertainty related exploration strategies

4. Research topics

5. Talks by practitioners

Bibliography:

R. Sutton, G. Barto, Reinforcement Learning: An Introduction

Francois-Lavet, F., Henderson P., Islam R., Bellemare M. G., Pineau J.,, An Introduction to Deep Reinforcement Learning.

Szepesvari, C., Algorithms for Reinforcement Learning

Learning outcomes:

Knowledge

* Mathematical formalism of reinforcement learning, which allows to develop efficient RL algorithms and analyse existing ones.

* Understands the basic components of RL algorithms and how they interact together.

* Knows when to apply and how to implement most important algorithms in RL from policy gradient, value-based and actor-critic classes.

* Has a basic knowledge of popular RL libraries.

Skills

* Can develop efficient algorithms and test them.

* Can distinguish types of RL problems and estimate its difficulty.

* Can appropriately apply methods to develop an algorithm or apply already known methods in own research projects.

* Can implement own algorithms and use existing RL libraries.

* Can test implemented and developed algorithms.

* Can find and use the information contained in research papers

Competences

* Knows limits of own RL knowledge and realizes the need of continuous learning.

* Understands the need for systematic work and meeting deadlines.

* Understands and appreciates the importance of intellectual honesty in the use of someone else's software. Behaves ethically during the implementation of algorithmic projects.

* Independently be able to find and use various types of information about algorithms, also in foreign languages.

Assessment methods and assessment criteria:

Attendance and project.

This course is not currently offered.
Course descriptions are protected by copyright.
Copyright by University of Warsaw.
ul. Banacha 2
02-097 Warszawa
tel: +48 22 55 44 214 https://www.mimuw.edu.pl/
contact accessibility statement site map USOSweb 7.1.2.0-f5f652ca3 (2025-07-15)