Office hours Wednesday 1-2pm
- ECSS 3.228
With very large amounts of data and processing needs, putting all the capabilities in one machine is not possible
- as such we can chain dozens of machines together to form a cluster of machines each with their own cpu, disks, and ram
Three exams, October, November, December. not comprehensive
40% exams, 55% from assignments, 5% from quizzes

Topics

Hadoop
- Key value pairs
- map reduction
- good for set sizes of data. Non-stream data
Spark
- developed in UC Berkeley and used more now
- can handle stream and non-stream data.
Unsupervised Learning
Clustering
Recommendation Systems
- Content based recommendation
- Collaborative Filtering
- Latent Factor Model
Large Language Models if time in the semester permits
Types of Data
- Structured
  - csv, sql database
- Un-Structured
  - news paper
- Semi-Structured
  - html / xml where things are placed in blocks

🪴 Quartz 4.0