Week 10: In-Class Exercise — Designing Star Schemas¶
Time: ~50 minutes in-class + team homework
Format: Individual (Parts A-B), then team (Part C)
References: Kimball Cheat Sheet, Lecture Notes, Chapter 9 Slides
Part A: Warm-Up — Identify the Components (10 min)¶
For each scenario, identify the fact table grain, dimensions, and measures. Don't draw a schema yet — just list them.
Scenario 1: University Registrar¶
A university wants to analyze student enrollment patterns. The OLTP system tracks students registering for course sections each semester.
| Component | Your Answer |
|---|---|
| Business process | |
| Grain (one row per ___) | |
| Dimensions (who/what/when/where) | |
| Measures (what numbers to aggregate) |
Scenario 2: Online Streaming Service¶
A streaming platform (think Netflix) wants to analyze viewing patterns. Every time a user watches content, the system logs the event.
| Component | Your Answer |
|---|---|
| Business process | |
| Grain | |
| Dimensions | |
| Measures |
Scenario 3: Hospital Emergency Room¶
A hospital wants to analyze ER visit patterns to optimize staffing and reduce wait times.
| Component | Your Answer |
|---|---|
| Business process | |
| Grain | |
| Dimensions | |
| Measures |
💡 Click for suggested answers
**Scenario 1 — University:** - Process: Student enrollment - Grain: One row per student per course section per semester - Dimensions: dim_student (name, major, standing), dim_course (title, dept, credits), dim_instructor (name, title, dept), dim_date/period (semester, year) - Measures: grade_value, credit_hours, enrollment_count (1 per row) **Scenario 2 — Streaming:** - Process: Content viewing - Grain: One row per user per content item per viewing session - Dimensions: dim_user (age_group, subscription_tier, country), dim_content (title, genre, rating, duration), dim_date (date, day_of_week, month), dim_device (type, OS, screen_size) - Measures: watch_duration_min, completion_pct, pause_count **Scenario 3 — ER:** - Process: ER patient visits - Grain: One row per ER visit - Dimensions: dim_patient (age_group, gender, insurance), dim_provider (name, specialty), dim_date (date, hour, shift), dim_diagnosis (category, severity), dim_department - Measures: wait_time_min, treatment_duration_min, total_charges, num_proceduresPart B: Millennium College Star Schema (20 min)¶
(Adapted from Chapter 9 textbook exercises)
Millennium College wants to analyze course grades over time. There are four dimension tables:
| Dimension | Attributes | Size |
|---|---|---|
| CourseSection | CourseID, SectionNumber, CourseName, Units, RoomID, RoomCapacity | ~500 sections/semester |
| Professor | ProfID, ProfName, Title, DepartmentID, DepartmentName | ~200 professors |
| Student | StudentID, StudentName, Major | ~40 students/section, 5 courses/student |
| Period | SemesterID, Year | 30 periods (10 years) |
The facts to record: Course Grade and CompletionDate.
Questions¶
Q1. Draw the star schema. Show the fact table in the center with FKs to each dimension, and list the measures.
Q2. Estimate the number of rows in the fact table.
Hint: 500 sections × 40 students/section × 30 periods = ?
Q3. Estimate the total size of the fact table in bytes, assuming each field averages 15 bytes.
Hint: rows × number of fields per row × 15 bytes
Q4. If you didn't have to use a strict star schema, how would you change the design? Why?
Think about: Does Professor need to connect directly to the fact table? Could it connect through CourseSection instead? What about normalizing Course info out of CourseSection?
Q5. Various characteristics of sections, professors, and students change over time (e.g., a professor changes departments, a room assignment changes). How would you handle this?
Think about: Which SCD type makes sense here?
💡 Click for answers
**Q2:** 500 × 40 × 30 = **600,000 rows** **Q3:** 600,000 × 6 fields (4 FKs + 2 facts) × 15 bytes = **54,000,000 bytes (~54 MB)** *(Or if counting 5 fields: 600,000 × 5 × 15 = 45,000,000 bytes)* **Q4:** Possible changes: - **Snowflake Professor from CourseSection** — professors teach sections, so Professor could connect through CourseSection instead of directly to the fact table. This removes ProfID FK from the fact table. - **Normalize Course from Section** — separate CourseName and Units into a dim_course table, linked from CourseSection. - **Normalize Department from Professor** — DepartmentID and DepartmentName could be a separate table. **Q5:** Use **dimensional segmentation** (or SCD Type 2). Split each dimension into stable attributes and changing attributes. For example: - Student: StudentID and StudentName are stable; Major can change → SCD Type 2 with effective dates - Professor: ProfName is stable; Department and Title can change → SCD Type 2 - CourseSection: RoomID can change → SCD Type 2 This prevents excessive row growth in dimension tables when only some attributes change frequently.Part C: Design Your Project's Star Schema (20 min + homework)¶
Now apply the 4-step process to your team's project domain (the one from Phase 1).
Step 1: Select the Business Process¶
What is the central measurable activity in your domain?
Your answer: ___
Step 2: Declare the Grain¶
What does one row in your fact table represent?
"One row per ___"
Step 3: Identify the Dimensions¶
| Dimension | Key Attributes | Source OLTP Table(s) |
|---|---|---|
| dim_date | date_key, full_date, day_of_week, month, quarter, year, is_weekend | (pre-populated) |
Step 4: Identify the Facts/Measures¶
| Measure | Additive? | Example Analytical Question |
|---|---|---|
Draw Your Star Schema¶
┌──────────────────┐
│ dim_????? │
│──────────────────│
│ │
└────────┬─────────┘
│
┌──────────────────┐ ┌──────────┴───────────────┐ ┌──────────────────┐
│ dim_????? │ │ fact_????? │ │ dim_????? │
│──────────────────│ │──────────────────────────│ │──────────────────│
│ ├──┤ FKs: ├──┤ │
│ │ │ │ │ │
└──────────────────┘ │ Measures: │ └──────────────────┘
│ │
└──────────┬───────────────┘
│
┌────────┴─────────┐
│ dim_????? │
│──────────────────│
│ │
└──────────────────┘
Estimate Your Fact Table Size¶
- Unique values per dimension: ___ × ___ × ___ × ___ = ___
- Fill rate estimate: ___%
- Estimated rows: ___
- Estimated size: ___ rows × ___ columns × 15 bytes = ___ bytes
Write 3 Analytical Questions¶
Your star schema should be able to answer questions like these:
- "___?"
- "___?"
- "___?"
Homework (Due Next Week)¶
- Finalize your star schema with your team
- Write the CREATE TABLE DDL for your fact + dimension tables
- Read: Kimball Ch 2 (pp. 37-68) — Techniques Overview, and Ch 3 (pp. 69-100) — Retail Sales case study
- Think about: What 5 analytical questions would a manager in your domain want answered? (You'll need 5+ analytical queries for Phase 2)
- Download and review: Kimball Cheat Sheet
IS480 — Advanced Database Management | Week 10 Exercise | CSULB Spring 2026 | Prof. Jose Pineda