Link Search Menu Expand Document

Resources

Learning Outcomes

In this project-based course, we have projects and conceptual learning objectives.

Project Learning Objectives

The project learning objectives (LOs) are designated below. Students will be able to:

Item Description
Computer Systems and Data Structures - Implement a variety of basic data structures and algorithms in pure Python
- Consider the differences between different data structures, and decide on the best one given performance limitations.
- Optimize Python code to hit performance benchmarks.
Problem Representation - Read data and perform basic table operations in Pandas.
- Use Numpy operations and sparse matrix to perform efficient computations on large datasets.
- Acquire a basic understanding of recommender systems in general and collaborative filtering in particular.
Domain Analysis and Exploration - Formulate functional and non-functional requirements for an envisioned data-driven solution to a business/research problem.
Domain Data Preparation - Use HTTP request, web scraper, and pdfminer to retrieve data from a variety of sources, leveraging both structured, semi-structured, and unstructured data to build holistic views of user experience and deliver targeted analytic solutions.
- Perform data cleaning and preprocessing using appropriate API to allow for the organization of the data.
Machine Learning and Model Performance - Build and deploy a machine learning model using the appropriate analytic algorithms (such as linear, and logistic regression, and SVM) to gain an understanding from data, make predictions to solve business problems and inform decision making.
- Experiment with different corpus models to perform multi-class classification on datasets.
- Interpret domain problems as instances of data science task patterns, including classification, regression, ranking, and clustering.
Model Deployment and Comparison - Compare the performance of (training or deployment) for a subset of solutions on CPUs vs. GPUs.
- Use model evaluation metrics to assess the goodness of fit between a model and data and cross-validation frameworks to evaluate predictive models.
- Select appropriate visualization techniques to facilitate understanding of model performance and support error analysis.
- Gain familiarity with machine learning on Microsoft Azure.
Optimization of Model Performance - Develop a QA system using the SQuAD dataset.
- Use various techniques to develop closed-domain QA systems ranging from unsupervised learning methods of Jaccard overlap, tf-idf vectors, and leveraging the syntactic information in the sentences through abstract syntax trees to supervised learning methods, including simple linear models like logistic regression and state of the art language models like BERT.
- Gain familiarity with machine learning on Microsoft Azure.

Conceptual Learning Objectives

The conceptual learning objectives are below. Students will be able to:

  1. Describe the phases of the data science project lifecycle and articulate the interaction between them to inform the development of analytic solutions.

  2. Examine and describe the formal characteristics of the data collection process. Apply the fundamental concepts in the data collection process to a dataset for analysis by machine learning methods.

  3. Compare and contrast different techniques of representing the domain as feature vectors; explore how different featurization of a dataset influence trained model performance and task utility.

  4. Explain different analysis techniques using the same linear/logistic regression model at its core (“gatekeeper” task) and interpret domain problems as instances of data science task patterns, including classification, regression, ranking, and clustering.

  5. Explain basic principles of how the bias-variance tradeoff affects model choice and configuration, and then describe how information available for domain problems can become a dataset on which models can be trained.

  6. Identify ways to improve the representation, model, and/or experiment towards the analytic objective and explain quantitative evaluation metrics for various data science task patterns.

Getting help

Piazza

The best communication portal to inquire about coursework-related matters is Piazza (see the Overview for Piazza link). For urgent communication with the teaching staff, it is best to post on Piazza and then send an email for a timely response.

Office Hours (OH)

The teaching staff holds office hours weekly to assist students with any course-related matters. Students can find the office hours schedule in the Google Calendar provided in the Overview. Before joining the Zoom meeting rooms, students must join the OH Queue, as only those on the queue list will be invited to the meetings. Students attending office hours should join the OH queue, regardless of the queue’s current status. Doing so allows TAs to better prepare for students’ questions, helps maintain an orderly queue, and ensures students can track their place in line without concern.

Designated TA / Extra-help OH

Students will be assigned a designated TA who will be their primary contact for course-related matters. It is important to note that students are not limited to attending only their designated TA’s office hours; they are encouraged to interact with all course staff as needed.

Designated TAs also provide students with special extra-help OH if additional assistance is required. These extra-help sessions are separate from the regular office hours and follow a separate schedule. This offers students a valuable opportunity to receive extended support whenever needed.

Assessment

OLI Torus

Practice Quiz (Did I Get This?)

Practice quizzes (Did I Get This?) are non-graded retrieval practices found integrated into each module page on OLI Torus. These quizzes assess your understanding immediately after you engage with the reading materials. Completing the practice quizzes provides excellent signals on your weekly summative (graded) in-class quizzes.

You are strongly recommended to complete the practice quiz before the synchronous in-class quiz. If you missed many of the questions in the quiz, it is suggested that you review the material again.

Weekly In-Class Quiz

Each week, during synchronous class meeting time, students will spend 10 minutes of class time completing the quiz. The quiz assesses the reading contents on OLI Torus the preceding week and is graded. When it is time to take the quiz, students log into OLI Torus to start. A timer will activate once the quiz starts and cannot be paused. The quiz will automatically stop after the time allowance. The student’s work will automatically be submitted and graded by the OLI Torus system.

Students will have only a single attempt to complete the weekly quiz on OLI Torus.

Sail()

This course includes seven individual project themes. Each project theme consists of several project modules. A project module must be completed based on the deadlines on Sail() Platform. The write-up required to complete each project module is available on Sail() Platform. Each module has a submission process that is specific to the project module that is due. It is the student’s responsibility to ensure all project work is completed and the project module is submitted before the deadline. Students typically have multiple attempts to submit the project module on Sail() Platform.

Online Programming Exercise (OPE)

On the Sail() Platform, you will also see two OPE (Online Programming Exercise) activities: a practice OPE and an OPE after Project 3. These are synchronous team-based exercises to expand your knowledge on the topics of the preceding projects. Each OPE is associated with a project and accounts for 5% of that project’s grade. The first OPE, “OPE: Practice,” is ungraded.

Manual-graded Components

In Project 3 and the final exam, some components are manually graded questions. Students must submit a pdf or images of their work on Gradescope for manual grading. It is crucial for students to meticulously follow the submission guidelines on Gradescope, ensuring that each image or page of their write-up is accurately associated with the relevant question. Any deviation from this process, leading to an unlinked or incorrectly linked submission, will be treated as an invalid response and will automatically receive a grade of zero.

Final Exam

The final exam consists of three parts: the coding component, the write-up, and the conceptual quiz. Each of these is designed to assess different skills and knowledge acquired throughout the course.

Coding Component on Sail() (30 points):

- Similar to the format of the seven prior projects, students will work on the coding component using Sail().
- There is no limit on the number of submissions, but the final grade for this component is based on the most recent submission, not the highest score.
- The autograder on Sail() will not run your notebook code; it will only grade the model deployment part by sending POST requests to your endpoint, with a runtime limit of 70 seconds.
- Students can use Azure ML Studio exclusively for model deployment, while other steps can be completed in any chosen environment.
- After submission, students can check their model's accuracy and their score on the "My Submissions" tab in Sail().

Write-up Component on Gradescope (40 points):

- A set of write-up questions is included at the end of the coding component notebook.
- Students should type their responses directly in the notebook and submit a PDF or images of their work to Gradescope for manual grading.
- Students must adhere strictly to the Gradescope instructions to guarantee that their submitted images or pages are correctly linked to the designated question in their write-up. Failure to properly link work to the corresponding question will result in a zero grade, as our system will categorize it as an invalid response.
- Submissions can be made any number of times before the final exam deadline on Gradescope, with the final grade based on the latest submission.
- A grace period of 12 hours is available on Gradescope, but submissions in this period will incur a 20% grade penalty. Submissions after this grace period will not be accepted.
- It's important to note that the Sail() submissions will not be checked for the write-up; only submissions to Gradescope will be considered for grading.

Conceptual Component on OLI Torus (30 points):

- This comprehensive quiz covers all course modules and must be completed during class time as per the schedule found in the course calendar.
- The quiz comprises 15 questions with a 30-minute time limit and is independent of the coding and write-up components.
- All components of the final exam must be completed within the designated final exam period, and no extensions will be granted. This structure ensures a comprehensive evaluation of the skills and knowledge acquired in the course.

End-of-course Survey

An end-of-course survey will be distributed around the final exam week. The survey aims to gather constructive feedback from students for further improvement and development of the course. To thank you for your time, students will be awarded 2% of the overall course grade for survey completion.

Accommodations

For Students with Disabilities

If you have a disability and have an accommodations letter from the Disability Resources office, I encourage you to discuss your accommodations and needs with me as early in the semester as possible. I will work with you to ensure that accommodations are provided as appropriate. If you suspect that you may have a disability and would benefit from accommodations but are not yet registered with the Office of Disability Resources, I encourage you to contact them at access@andrew.cmu.edu.

Medical Accommodations:

If you require accommodations for medical reasons, please contact Carnegie Mellon University’s Disability Resources to initiate the process. It is important to engage with Disability Resources directly for all accommodation requests, as they are equipped to assess and provide the necessary support in line with university procedures. Please refrain from sending any medical documentation directly to course instructors. As instructors, we are committed to implementing the accommodations approved by Disability Resources to ensure equitable access to our course.

Take care of yourself

Do your best to maintain a healthy lifestyle this semester by eating well, exercising, avoiding drugs and alcohol, getting enough sleep, and taking some time to relax. This will help you achieve your goals and cope with stress.

All of us benefit from support during times of struggle. You are not alone. There are many helpful resources available on campus and an important part of the college experience is learning how to ask for help. Asking for support sooner rather than later is often helpful.

If you or anyone you know experiences any academic stress, difficult life events, or feelings like anxiety or depression, we strongly encourage you to seek support. Counseling and Psychological Services (CaPS) is here to help: call 412-268-2922 and visit their website at http://www.cmu.edu/counseling/. Consider reaching out to a friend, faculty, or family member you trust for help getting connected to the support that can help.

If you or someone you know is feeling suicidal or in danger of self-harm, call someone immediately, day or night:
CaPS: 412-268-2922
Resolve Crisis Network: 888-796-8226

If the situation is life-threatening, call the police:

  • On-campus: CMU Police: 412-268-2323
  • Off-campus: 911

Please let us know if you have questions about this or your coursework.

Office Hours

Each TA holds weekly office hours (OH) virtually via Zoom. Their office hours schedule can be found in the Google Calendar. All students who come for OH, please join the OH queue https://www.eberly.cmu.edu/ohq/#/courses. Please do so even if the Zoom waiting room is empty or no one in the OH queue. Doing so is beneficial in the following ways:

  • As you indicate your question when joining the OH queue, the TA can mentally prepare for the doubt better.
  • It helps the TA keep track of the order of students who join the OH queue and the Zoom waiting room.
  • It helps students to track their place in line properly without concern that the TA is not being tentative to them.