11-637: Foundations of Computational Data Science
Fall 2022
11-637 Foundations of Computational Data Science (FCDS) is a fully online course offered by the Master of Computational Data Science (MCDS) program, School of Computer Science, Carnegie Mellon University (CMU). The course is offered in all semesters (Spring, Summer, and Fall) and is open to MCDS and non-MCDS students from all CMU programs and campuses. The course is also open to non-CMU students.
Course Description
This course introduces foundational concepts, learning material, and projects related to the three core areas of Data Science: Computing Systems, Analytics, and Human-Centered Data Science. Students completing this class will be prepared for further graduate education in Data Science and/or Artificial Intelligence. Students acquire skills in solution design (e.g., architecture, framework APIs, cloud computing), analytic algorithms (e.g., classification, clustering, ranking, prediction), interactive analysis (Jupyter Notebook), applications to data science domains (e.g., Natural Language Processing, Computer Vision) and visualization techniques for data analysis, solution optimization, and performance measurement on real-world tasks.
Course Goals
This course will equip students with the foundational knowledge of computational data science. Students will learn about the Data Science Process by completing projects introducing problem identification, data gathering, exploratory data analysis, supervised and unsupervised learning techniques, model evaluation, and visualizing and interpreting results to inform decision-making. Our goal is that students will develop the skills needed to become a practitioner or carry out research projects in computational data science. Specifically, students are exposed to real-world data and scenarios to learn how to:
- Understand the Data Science Process: From problem identification to ethical data solutions, students will learn to define analytic requirements and frame appropriate questions to guide the solution design process.
- Design and Implement Data Solutions: Develop a robust data-gathering plan that incorporates data governance principles, ensures data integrity, and safeguards data security.
- Analyze and Interpret Data: Apply statistical inference, hypothesis testing, and various data analysis techniques to identify trends, patterns, and outliers in large datasets.
- Develop Data Models and Algorithms: Build and evaluate predictive models using supervised and unsupervised learning techniques, understanding the bias-variance tradeoff and conceptual complexity in model building.
- Process and Analyze Text Data: Gain proficiency in Natural Language Processing (NLP) by exploring language representation, modeling, and implementing NLP tasks and applications.
- Implement and Evaluate Advanced Models: Learn to select appropriate models, interpret their results, and deploy them effectively while understanding the nuances of model selection and interpretability.
- Harness Deep Learning Techniques: Understand the differences between CPU and GPU computation and apply deep learning methods in areas like computer vision.
- Explore Advanced NLP Techniques: Delve into advanced NLP methodologies like transformers and BERT, gaining insights from cutting-edge research in the field.
Through this process, we aspire for our students to become independent and resilient problem solvers who can overcome challenges and learn.