Resources
Learning Outcomes
In this project-based course, we have projects and conceptual learning objectives.
Project Learning Objectives
The project learning objectives (LOs) are designated below. Students will be able to:
Item | Description |
---|---|
Computer Systems and Data Structures | - Implement a variety of basic data structures and algorithms in pure Python - Consider the differences between different data structures, and decide on the best one given performance limitations. - Optimize Python code to hit performance benchmarks. |
Problem Representation | - Read data and perform basic table operations in Pandas. - Use Numpy operations and sparse matrix to perform efficient computations on large datasets. - Acquire a basic understanding of recommender systems in general and collaborative filtering in particular. |
Domain Analysis and Exploration | - Formulate functional and non-functional requirements for an envisioned data-driven solution to a business/research problem. |
Domain Data Preparation | - Use HTTP request, web scraper, and pdfminer to retrieve data from a variety of sources, leveraging both structured, semi-structured, and unstructured data to build holistic views of user experience and deliver targeted analytic solutions. - Perform data cleaning and preprocessing using appropriate API to allow for the organization of the data. |
Machine Learning and Model Performance | - Build and deploy a machine learning model using the appropriate analytic algorithms (such as linear, and logistic regression, and SVM) to gain an understanding from data, make predictions to solve business problems and inform decision making. - Experiment with different corpus models to perform multi-class classification on datasets. - Interpret domain problems as instances of data science task patterns, including classification, regression, ranking, and clustering. |
Model Deployment and Comparison | - Compare the performance of (training or deployment) for a subset of solutions on CPUs vs. GPUs. - Use model evaluation metrics to assess the goodness of fit between a model and data and cross-validation frameworks to evaluate predictive models. - Select appropriate visualization techniques to facilitate understanding of model performance and support error analysis. - Gain familiarity with machine learning on Microsoft Azure. |
Optimization of Model Performance | - Develop a QA system using the SQuAD dataset. - Use various techniques to develop closed-domain QA systems ranging from unsupervised learning methods of Jaccard overlap, tf-idf vectors, and leveraging the syntactic information in the sentences through abstract syntax trees to supervised learning methods, including simple linear models like logistic regression and state of the art language models like BERT. - Gain familiarity with machine learning on Microsoft Azure. |
Conceptual Learning Objectives
The conceptual learning objectives are below. Students will be able to:
-
Describe the phases of the data science project lifecycle and articulate the interaction between them to inform the development of analytic solutions.
-
Examine and describe the formal characteristics of the data collection process. Apply the fundamental concepts in the data collection process to a dataset for analysis by machine learning methods.
-
Compare and contrast different techniques of representing the domain as feature vectors; explore how different featurization of a dataset influence trained model performance and task utility.
-
Explain different analysis techniques using the same linear/logistic regression model at its core (“gatekeeper” task) and interpret domain problems as instances of data science task patterns, including classification, regression, ranking, and clustering.
-
Explain basic principles of how the bias-variance tradeoff affects model choice and configuration, and then describe how information available for domain problems can become a dataset on which models can be trained.
-
Identify ways to improve the representation, model, and/or experiment towards the analytic objective and explain quantitative evaluation metrics for various data science task patterns.
Getting help
Piazza
The best communication portal to inquire about coursework-related matters is Piazza (see the Overview for Piazza link). For urgent communication with the teaching staff, it is best to post on Piazza and then send an email for a timely response.
Office Hours (OH)
The teaching staff holds office hours weekly to assist students with any course-related matters. Students can find the office hours schedule in the Google Calendar provided in the Overview. Before joining the Zoom meeting rooms, students must join the OH Queue, as only those on the queue list will be invited to the meetings. Students attending office hours should join the OH queue, regardless of the queue’s current status. Doing so allows TAs to better prepare for students’ questions, helps maintain an orderly queue, and ensures students can track their place in line without concern.
Designated TA
Students will be assigned a designated TA who will be their primary contact for course-related matters. It is important to note that students are not limited to attending only their designated TA’s office hours; they are encouraged to interact with all course staff as needed.
Assessment
Canvas
Practice Quiz
Practice quizzes are non-graded retrieval practices to assess your understanding of the reading materials in the pre-class work session. Completing the Learn by Doing activities provides excellent signals on your weekly summative (graded) in-class quizzes.
You are strongly recommended to complete the Learn by Doing activities before the synchronous in-class quiz. If you missed many of the questions in the quiz, it is suggested that you review the material again.
Weekly In-Class Quiz
Each week, during synchronous class meeting time, students will spend 10 minutes of class time completing the quiz. The quiz assesses the reading contents on Canvas the preceding week and is graded. When it is time to take the quiz, students log into Canvas to start. A timer will activate once the quiz starts and cannot be paused. The quiz will automatically stop after the time allowance. The student’s work will automatically be submitted and graded by the Canvas system.
Students will have only a single attempt to complete the weekly quiz on Canvas.
Sail()
This course includes seven individual project themes. Each project theme consists of several project modules. A project module must be completed based on the deadlines on Sail() Platform. The write-up required to complete each project module is available on Sail() Platform. Each module has a submission process that is specific to the project module that is due. It is the student’s responsibility to ensure all project work is completed and the project module is submitted before the deadline. Students typically have multiple attempts to submit the project module on Sail() Platform.
Online Programming Exercise (OPE)
On the Sail() Platform, you will also see two OPE (Online Programming Exercise) activities: a practice OPE and an OPE after Project 3. These are synchronous team-based exercises to expand your knowledge on the topics of the preceding projects. Each OPE is associated with a project and accounts for 5% of that project’s grade. The first OPE, “OPE: Practice,” is ungraded.
Manual-graded Components
In Project 3 and the final exam, some components are manually graded questions. Students must submit a pdf their work on Canvas for manual grading.
Final Exam
The final exam consists of three parts: the coding component, the write-up, and the conceptual quiz. Each of these is designed to assess different skills and knowledge acquired throughout the course.
Coding Component on Sail() (30 points):
- Similar to the format of the seven prior projects, students will work on the coding component using Sail().
- There is no limit on the number of submissions, but the final grade for this component is based on the most recent submission, not the highest score.
- The autograder on Sail() will not run your notebook code; it will only grade the model deployment part by sending POST requests to your endpoint, with a runtime limit of 70 seconds.
- Students can use Azure ML Studio exclusively for model deployment, while other steps can be completed in any chosen environment.
- After submission, students can check their model's accuracy and their score on the "My Submissions" tab in Sail().
Write-up Component on Canvas (40 points):
- A set of write-up questions is included at the end of the coding component notebook.
- Students should type their responses directly in the notebook and submit a PDF their work to Canvas for manual grading.
- Submissions can be made any number of times before the final exam deadline on Canvas, with the final grade based on the latest submission.
- A grace period of 12 hours is available on Canvas, but submissions in this period will incur a 20% grade penalty. Submissions after this grace period will not be accepted.
- It's important to note that the Sail() submissions will not be checked for the write-up; only submissions to Canvas will be considered for grading.
Conceptual Component of Canvas Reading Content (30 points):
- This comprehensive quiz covers all course modules and must be completed during class time as per the schedule found in the course calendar.
- The quiz comprises 15 questions with a 30-minute time limit and is independent of the coding and write-up components.
- All components of the final exam must be completed within the designated final exam period, and no extensions will be granted. This structure ensures a comprehensive evaluation of the skills and knowledge acquired in the course.
End-of-course Survey
An end-of-course survey will be distributed around the final exam week. The survey aims to gather constructive feedback from students for further improvement and development of the course. To thank you for your time, students will be awarded 1% of the overall course grade for survey completion.
Accommodations
For Students with Disabilities
If you have a disability and have an accommodations letter from the Disability Resources office, I encourage you to discuss your accommodations and needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them at access@andrew.cmu.edu.
Medical Accommodations:
If you require accommodations for medical reasons, please contact Carnegie Mellon University’s Disability Resources to initiate the process. It is important to engage with Disability Resources directly for all accommodation requests, as they are equipped to assess and provide the necessary support in line with university procedures. Please refrain from sending any medical documentation directly to course instructors. As instructors, we are committed to implementing the accommodations approved by Disability Resources to ensure equitable access to our course.
Take care of yourself
Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep, and taking some time to relax. This will help you achieve your goals and cope with stress.
All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.
If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty, or family member you trust for help getting connected to the support that can help.
If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:
CaPS: 412-268-2922
Resolve Crisis Network: 888-796-8226
If the situation is life-threatening, call the police:
- On-campus: CMU Police: 412-268-2323
- Off-campus: 911
Please let us know if you have questions about this or your coursework.
Office Hours
Each TA holds weekly office hours (OH) virtually via Zoom. Their office hours schedule can be found in the Google Calendar. All students who come for OH, please join the OH queue https://www.eberly.cmu.edu/ohq/#/courses. Please do so even if the Zoom waiting room is empty or no one in the OH queue. Doing so is beneficial in the following ways:
- As you indicate your question when joining the OH queue, the TA can mentally prepare for the doubt better.
- It helps the TA keep track of the order of students who join the OH queue and the Zoom waiting room.
- It helps students to track their place in line properly without concern that the TA is not being tentative to them.