Units
The lectures are broken into 10 units, as shown below. These pages are also reachable from the calendar.
- Unit 1: Welcome back (Classes 1–4)
Course overview, Data science pipeline, Python review - Unit 2: OOP in Python (Classes 5–10)
Operator overloading, Inheritance - Unit 3: Command line (Classes 11–14)
Files and directories, bash commands, Piping and redirection - Unit 4: Regular expressions (Classes 15–17)
Regex syntax, Python re, Command-line tools - Unit 5: Error handling (Classes 18–19)
try/except, return codes, bash if statement - Unit 6: Data cleaning (Classes 20–21)
Missing data, Outliers, Preprocessing, Merging dataframes - Unit 7: Concurrency (Classes 22–26)
Multithreading, Python GIL, Multiprocessing, bash job control - Unit 8: Versions and packaging (Classes 27–30)
git, github, pip, pixi - Unit 9: Hardware and OS (Classes 31–33)
CPU, Memory hierarchy, Filesystems, Compiled vs Interpreted code, Role of the operating system - Unit 10: Data Ethics (Classes 34–36)
Principles, Case studies