Link Search Menu Expand Document

11-631: Data Science Seminar - Syllabus

Course Learning Outcomes

The main learning objectives of the course are for students to (a) demonstrate a basic understanding of the Data Science literature (via sample application areas, associated publication venues, and writing styles), (b) apply this understanding to specific publications (by writing and justifying academic evaluations of the work), (c) report on a Data Science publication in a comprehensive, collaborative presentation of a given publication and its related works, (d) defend and criticize, via relevant statements, questions and form-based evaluation, reports, and presentations on Data Science publications while participating in constructive discussion about such presentations, and (e) able to critically analyze and synthesize Data Science literature individually from the lens of a specialist role and collaboratively in a group.

All of these outcomes are essential preparation for the subsequent MCDS capstone course sequence (11-634 Capstone Planning Seminar, 11-632 Data Science Capstone, and 11-635 Data Science Capstone Research).

Time & Location

Section A: TR 8:35 am - 9:55 am, POS 153

Section B: TR 11:50 am - 1:10 pm, POS 153

Course Format

In-Person. The course opens with an initial overview of the Data Science literature and tutorials on how to analyze and critique Data Science publications. The course also provides tutorials on preparing and presenting reviews of Data Science publications and related literature.

Course Organization

The bulk of the course will consist of paper discussion sessions as well as associated presentations and reviews of related works. The deliverables expected from each student are:

  1. Play a role in a student team to present an analysis and critique of an assigned paper.
  2. Written summaries of the assigned paper when not presenting.
  3. Prepared questions and commentaries for the assigned paper to facilitate classroom discussion.
  4. Comparative analyses of base papers and surprise papers.
  5. A written literature survey on related work.
  6. A constructive review of a capstone project.

The course is sectioned into two parts, parts I and II. Part I of the course consists of group presentations and paper discussion sessions in which students are divided into groups to present the assigned reading for that session while other students submit a paper summary for it. Part II of the course consists of three surprise paper sessions, a literature survey assignment, and a capstone project review.

Part I

In the first half of the course, each class session involves reading, presenting, critiquing, and discussing one assigned paper. Before each session, all students are required to read the paper. Two teams of students are responsible for presenting the paper. Each student in the presenting team is assigned a specialist role (details below) which guides the critiquing approach to the paper. All other students are required to submit a summary of the assigned paper and a discussion question for the paper and/or the presenting team.

The class session begins with a presentation, after which the class breaks into smaller discussion groups. Each presenter is responsible for generating a discussion question, and the class is divided into groups based on these questions for more targeted discussions.

Grace day policy: Each student is granted one grace day throughout Part I of the course. A grace day is a day that you notify us in advance that you will not be presented for the class. This is the day that you will not be selected to present in a team nor be chosen as one of the reviewers of the presentations. This grace day is only allowed once. Any additional absence from the course throughout Part I of the course will result in a zero grade should you be randomly selected to present and/or on the additional absence day.

Specialist roles

Each student in a team is assigned a role to play to critique a paper. Each role is of equal importance. Details and suggested preparations for each role are detailed below. Students are encouraged to read through each role and request clarification if needed. To foster individuality and diverse student backgrounds, students are offered to give their preferences to play each role. However, the ultimate decision of assigning roles belongs to the instructor. It is plausible that students are not assigned their desired roles. If this is the case, students are encouraged to take this opportunity to step outside of their comfort zone and discover their potential. It is noted that not all roles would be assigned for all papers. Roles are assigned to a paper based on its suitability to the context and content of the paper. It is also noted that depending on the number of students in a course session (sessions A and B), one student might present multiple times. If this is the case, the student would be assigned to a different paper, team, and role each time they are presenting.

For more details on the descriptions of each specialist role, please see Specialist Role Descriptions.

Paper summaries

When a student is not in a presenting group for a given class session, this student must submit a summary of the assigned paper and a discussion question for the presenters regarding the paper, presentation, or individual role. We provide a guided questionnaire for students to complete the paper summary. Our goal is to have deep and collaborative discussions on the week’s topics. To do so, it is important that all students are well-prepared for each class. Group presentations are followed by a discussion session monitored by the instructor. To ensure that all students are prepared for the discussions, the instructor will call on students at random to ask the prepared question, comment on the paper or offer commentary to the presenters. Although we recognize that this approach may induce some level of stress in students, it is our instructional philosophy that it is alright to offer incorrect answers, uncomfortable with random chance, and afraid of asking silly questions. Only by doing so do you grow. In short, it’s okay not to know; it’s not okay not to have tried.

Part II

The second half of the course consists of surprise paper sessions, guest speaker presentations, a literature survey, and a capstone project review.

Surprise paper sessions and guest speaker presentations

The second half of the course starts with three surprise paper sessions in which students are required to read a base paper before the Tuesday class meeting. During the Tuesday class meeting, the instructor will distribute an additional paper to be read. There will also be a presentation from a guest speaker during the class meeting on Tuesday or Thursday. For each of the first two surprise paper sessions, students must submit a comparative analysis to compare and contrast the base and surprise papers due on Tuesday.

For the last surprise paper session, each student chooses one additional paper to write a related work survey of the base, surprise, and additional papers. Students are encouraged to take this opportunity to practice their literature survey skills to prepare for the comprehensive literature survey assignment to be due at the end of the course.

Literature survey

Each student chooses any paper covered in the course for which they will write and submit a detailed literature survey. It is recommended that students start exploring their topic of interest earlier in the semester through the presentation and/or surprise paper sessions to headstart their literature survey process. It is required that each student has the survey document proofread and revised in collaboration with the Global Communications Center before submission for grading.

Capstone project review

Finally, during the last two weeks of class, students will attend at least one final second-year capstone presentation and review one draft capstone report written by a second-year MCDS student team.

Attendance Policy

This course will be held in person. You are responsible for completing the work assigned and seeking clarification as needed. Late work is generally not accepted without prior arrangement or proper justification.

Assessment

The course grade will be based on the following:

  • Paper summary (complete when not presenting in a group): 25%
  • Presentation discussion participation (provide a question to the presenting group and be presented in class if selected to lead the discussion): 5%
  • Group presentation (individual grade of fulfilling the assigned role in the group): 25%
  • Surprise paper comparative analysis (two comparative analyses of surprise paper sessions I and II): 10%
  • Learning group presentation (learning groups get together to answer questions posted by the guest speaker): 5%
  • Practice literature review (a related work review of surprise paper session III): 5%
  • Literature review (choose a topic covered in the course and write a literature review of the chosen topic): 20%
  • Capstone report review (provide constructive feedback to an 11-632 team capstone report): 2.5%
  • Capstone final presentation review (provide constructive feedback to an 11-632 team capstone final presentation): 2.5%
  • Reproducibility challenge (bonus 5% for participating in the reproducibility challenge): 5%
  • End-of-course survey (bonus 2% for completing the course survey for feedback and improvement): 2%
Assessment Type Grade Percentage
Weekly Paper Summary 25
Presentation Discussion Participation 5
Group Presentation (Individual Grade) 25
Surprise Paper Comparative Analysis 10
Learning Group Presentation (Group Grade) 5
Practice Literature Review 5
Literature Review 20
Capstone Report Review 2.5
Capstone Final Presentation Review 2.5
Reproducibility Challenge 5
End-of-course Survey 2
TOTAL 107

AIV Policy

For preparing each presentation, you share work with your assigned teammates and no other students. In particular, when your paper is also being presented by a different team(s) in the same or different section of this course, you may not collaborate or share work with students in this other team(s). Similarly, all other deliverables in the course are individual assignments. You are required to synthesize, research literature, and produce the document by yourself without working with your classmates. This course is intended to give you experience in autonomous research, so trying to delegate or shortcut preparation is a wasted learning opportunity. Acting against this rule will be considered an academic integrity violation and lead to reprimands, including possible dismissal from the program (see the MCDS Handbook).

For your paper summaries and comparative analyses, you must produce your own work. You may discuss the papers with classmates, but the submissions must be your own work. Do not use the internet or other sources to find prior analyses to complete your assignments.

The presentation and related work survey emphasize a literature search and compare/contrast to other material. All material you find and use in any of the course deliverables must be explicitly and correctly referenced/cited.

Reading List

For more details on the required readings, examples of prior work, and follow-up work, please see Reading List.

Tentative Schedule

Week Date Content Activities and Assignments
Part I: Weekly Paper Presentation and Discussion
1
Introduction to the Course
Aug 30 Course Introduction Read syllabus
Sep 1 Tutorials:
How to read a technical/research paper
Roleplaying in reading papers
Role selection ranking due midnight
Week 2 presenters announced Friday (Sep 2)
2 Sep 6 Required Reading: Frankle, J., & Carbin, M. (2018). The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635. Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Sep 8 Group presentation and discussion
3 Sep 13 Required Reading: Zhao, Jieyu, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. (2017). "Men also like shopping: Reducing gender bias amplification using corpus-level constraints." arXiv preprint arXiv:1707.09457. Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Sep 15 Group presentation and discussion
4 Sep 20 Required Reading: Aakanksha Naik*, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose, Graham Neubig. "Stress Test Evaluation for Natural Language Inference." 27th International Conference on Computational Linguistics (COLING-2018) Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Sep 22 Group presentation and discussion
5 Sep 27 Required Reading: Ken Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudík, Hanna Wallach, Improving fairness in machine learning systems: What do industry practitioners need?, in Proceedings of 2019 ACM CHI Conference on Human Factors in Computing Systems. Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Sep 29 Group presentation and discussion
6 Oct 4 Required Reading: Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, Joelle Pineau. Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning, JMLR 2020. Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Oct 6 Group presentation and discussion
7 Oct 11 Required Reading: Emily M. Bender and Alexander Koller, Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185–5198. Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Oct 13 Group presentation and discussion
8 Oct 18 Fall Break (No Class)
Oct 20
9 Oct 25 Required Reading: Kuchnik, M., Klimovic, A., Simsa, J., Smith, V., & Amvrosiadis, G. (2022). Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines. Proceedings of Machine Learning and Systems, 4, 33-51. Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Oct 27 Group presentation and discussion
Part II: Surprise Paper Session, Literature Review, and Capstone Review
10
Surprise Paper Session I
Nov 1 Base paper:
Surprise paper:
Comparative Analysis Due
Nov 3 Learning Group Presentation
11
Surprise Paper Session II
Nov 8 Base paper:
Surprise paper:
Comparative Analysis Due
Nov 10 Learning Group Presentation
12
Surprise Paper Session III
Nov 15 Base paper:
Surprise paper:
Practice Literature Survey Due
Nov 17 Learning Group Presentation
13 Nov 22 No Class - Happy Long Thanksgiving Break
Nov 23 - Nov 25 Thanksgiving Break
14 Nov 29 Guest Speaker
Dec 1 Guest Speaker
15 Dec 6 Attend 11-632 Capstone Presentation (No Class) Capstone Review Due
Dec 8 Reproducibility Challenge Due
16 Dec 13 Final exam week (No Class) Literature Review Due
Dec 15 Capstone Final Presentation Review Due
Dec 21 Grades Due