None
Course Number:
CSCI 5040
Approved Starting Semester:
Fall 2022
Course Title:
Big Data Modeling and Management
Course Description (Bulletin Description):
Using examples of real world big data problems, this course introduces the platforms and technologies including features and value of core architectural components, resource and job management systems, file systems, and programming models used for scalable big data analysis.
Prerequisite:
CSCI 5010 or CSCI 5015
Co-requisite:
None
Pre/Co-requisite::
None
Dual-Listed:
None
Course Objectives (Course-level Student Learning Outcomes):
1. Learn the three V's of Big Data: velocity, variety, and volume 2. Be able to perform basic aggregations on large data sets 3. Be able to work with data in a variety of formats (CSV, JSON, SQL, Text) 4. Be able to identify which join is appropriate when joining data sets 5. Learn SQL for processing large data sets 6. Perform parallel in-core processing on small data 7. Perform parallel out-of-core processing on Big Data 8. Perform stream processing
Topics Covered (In Outline/Calendar):
• Introductions. What is Big Data? What is Spark? • How is data organized for parallel out-of-core processing? • What qualifies as Big Data? • Working with data types. • Aggregations and Joins. • Data Sources. • SQL in Big Data. • Low-level Distributed Programing. • How to run a cluster. • Developing Spark Applications. • Classification and Machine Learning. • Regression Analysis.
Student Learning Outcomes:
Not applicable for this course
Course Coordinator:
Dr. Kriti Chauhan
Instructor-in-charge:
Dr. Kriti Chauhan
Previous Professors:
Dr. James Church, Dr. Kriti Chauhan
Technologies / Skills:
Data modeling
Textbook(s):
Fall 2025
Title: Learning Spark: Lightning-Fast Data Analytics
Edition: 2
Author: Jules Damji, Brooke Wenig, Tathagata Das, Denny Lee
Publisher: O'Reilly Media
ISBN: 9781492050049
========================================
Go back to choose another course