Large-scale machine learning
General data
Course ID: | 1000-319bBML |
Erasmus code / ISCED: |
11.3
|
Course title: | Large-scale machine learning |
Name in Polish: | Uczenie maszynowe w dużej skali |
Organizational unit: | Faculty of Mathematics, Informatics, and Mechanics |
Course groups: |
(in Polish) Grupa przedmiotów obieralnych dla informatyki magisterskiej- specjalność Systemy informatyczne (in Polish) Przedmioty obieralne na studiach drugiego stopnia na kierunku bioinformatyka Elective courses for Computer Science and Machine Learning Obligatory courses for 2nd year Machine Learning |
ECTS credit allocation (and other scores): |
6.00
|
Language: | English |
Type of course: | elective monographs |
Requirements: | Deep neural networks 1000-317bDNN |
Prerequisites (description): | parallel programming, computer networks, algorithms and data structures |
Short description: |
The goal of this course is to build the theoretical foundation and practical skills necessary to use machine learning algorithms and techniques at a large scale. We will discuss the architecture of modern large-scale computing infrastructure (cloud datacenters, and AI and HPC supercomputers). We will present methods for distributing computations across these clusters and the fundamental algorithmic models used to estimate performance. Using examples of typical ML algorithms (decision trees, neural network training), we will demonstrate the theoretical and practical challenges of using them at the scale of a few to several hundred machines. Next, we will cover the challenges of training and using large-scale language models (LLM). The course will conclude by presenting the primary problems of using ML models in large-scale production environments. |
Full description: |
- Hardware: from a GPU to a datacenter, and why the architecture matters at scale - Parallel and distributed optimization: how to parallelize algorithms and how to reason about their performance - Parallelizing classic ML algorithms - LLMs introduction: motivation, transformers and scaling laws - Parallelizing LLM training: parallelization types, bottlenecks, common memory optimizations - Datasets and benchmarking LLMs - Handling data: an introduction to data engineering - ML in production: risks, rewards, common problems - Case study: ML in computational infrastructure |
Bibliography: |
- Scientific papers used during lectures - “The Datacenter as a Computer: Designing Warehouse-Scale Machines”, Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle - “Fundamentals of Data Engineering”, Joe Reis and Matt Housley |
Learning outcomes: |
(in Polish) Wiedza: student zna i rozumie techniki wielkoskalowego przetwarzania danych używane w kontekście uczenia maszynowego [K_W04] metody rozpraszania i zrównoleglania obliczeń [K_W06] Umiejętności: student potrafi stosować współczesne systemy rozpraszania i zrównoleglania obliczeń [K_U20] przetwarzać duże zbiory danych [K_U21] Kompetencje społeczne: student jest gotów do krytycznej oceny posiadanej wiedzy i odbieranych treści [K_K01] uznawania znaczenia wiedzy w rozwiązywaniu problemów poznawczych i praktycznych oraz zasięgania opinii ekspertów w przypadku trudności z samodzielnym rozwiązaniem problemu [K_K02] |
Assessment methods and assessment criteria: |
Final score based on programming assignments, points for participation in laboratories and a written exam. |
Classes in period "Winter semester 2024/25" (past)
Time span: | 2024-10-01 - 2025-01-26 |
Go to timetable
MO LAB
WYK
LAB
TU W LAB
LAB
TH FR LAB
|
Type of class: |
Lab, 30 hours
Lecture, 30 hours
|
|
Coordinators: | Marek Cygan, Krzysztof Rządca | |
Group instructors: | Marek Cygan, Tomasz Kanas, Jakub Krajewski, Michał Krutul, Adrian Naruszko, Krzysztof Rządca | |
Students list: | (inaccessible to you) | |
Credit: | Examination |
Classes in period "Winter semester 2025/26" (future)
Time span: | 2025-10-01 - 2026-01-25 |
Go to timetable
MO LAB
WYK
LAB
TU W LAB
LAB
TH FR LAB
|
Type of class: |
Lab, 30 hours
Lecture, 30 hours
|
|
Coordinators: | Krzysztof Rządca | |
Group instructors: | Jakub Krajewski, Michał Krutul, Adrian Naruszko, Krzysztof Rządca | |
Students list: | (inaccessible to you) | |
Credit: |
Course -
Examination
Lecture - Examination |
Copyright by University of Warsaw.