Link Search Menu Expand Document

11-631: Data Science Seminar - Syllabus

Course Learning Outcomes

The main learning objectives of the course are for students to (a) demonstrate a basic understanding of the Data Science literature (via sample application areas, associated publication venues, and writing styles), (b) apply this understanding to specific publications (by writing and justifying academic evaluations of the work), (c) report on a Data Science publication in a comprehensive, collaborative presentation of a given publication and its related works, (d) defend and criticize, via relevant statements, questions and form-based evaluation, reports, and presentations on Data Science publications while participating in constructive discussion about such presentations, and (e) able to critically analyze and synthesize Data Science literature individually from the lens of a specialist role and collaboratively in a group.

All of these outcomes are essential preparation for the subsequent MCDS capstone course sequence (11-634 Capstone Planning Seminar, 11-632 Data Science Capstone, and 11-635 Data Science Capstone Research).

Time & Location

Section A: TR 09:30AM - 10:50AM, BH A36

Section B: TR 11:00AM - 12:20PM, GHC 4307

Course Format

In-Person. The course opens with an initial overview of the Data Science literature and tutorials on how to analyze and critique Data Science publications. The course also provides tutorials on preparing and presenting reviews of Data Science publications and related literature.

Course Organization

The bulk of the course will consist of paper discussion sessions as well as associated presentations and reviews of related works. The deliverables expected from each student are:

  1. Play a role in a student team to present an analysis and critique of an assigned paper.
  2. Written summaries of the assigned paper when not presenting.
  3. Prepared questions and commentaries for the assigned paper to facilitate classroom discussion.
  4. Comparative analyses of base papers and surprise papers.
  5. A written literature survey on related work.
  6. A constructive review of a capstone project.

The course is sectioned into two parts, parts I and II. Part I of the course consists of group presentations and paper discussion sessions in which students are divided into groups to present the assigned reading for that session while other students submit a paper summary for it. Part II of the course consists of three surprise paper sessions, a literature survey assignment, and a capstone project review.

Part I

Presentations

In the first half of the course, each class session involves reading, presenting, critiquing, and discussing one assigned paper. Before each session, all students are required to read the paper. One team of students are responsible for presenting the paper. Each student in the presenting team is assigned a specialist role (details below) which guides the critiquing approach to the paper. All other students are required to submit a summary of one of the two assigned papers for that week.

The class session begins with a presentation, after which the class breaks into smaller discussion groups. Each presenter is responsible for generating a discussion question, and the class is divided into groups based on these questions for more targeted discussions.

Discussions:

Discussion questions come from the people who presented that day. While each role comes up with their own discussion question, please make sure that the questions are different enough.

Note: please consolidate all the discussion questions (one per role) into one slide that you will keep up at the end of the presentation.

We will break up into groups, one group per discussion question, for about 15minutes / 50% of the time remaining before the end of lecture.

One member of each discussion group will have to take notes.

Each member will fill out the Audience question form available on Canvas.

Note: people who presented that day do not have to fill out the form.

Then, we will have a full-class discussion, where one member from each group (note-taker or someone else) will summarize the main talking points that were brought up during the group discussion. (remaining 15 minutes)

Grace day policy: Each student is granted one grace day throughout Part I of the course. A grace day is a day that you notify us in advance that you will not be presented for the class. This is the day that you will not be selected to present in a team nor be chosen as one of the reviewers of the presentations. This grace day is only allowed once. Any additional absence from the course throughout Part I of the course will result in a zero grade should you be randomly selected to present and/or on the additional absence day.

Specialist roles

Each student in a team is assigned a role to play to critique a paper. Each role is of equal importance. Details and suggested preparations for each role are detailed below. Students are encouraged to read through each role and request clarification if needed. To foster individuality and diverse student backgrounds, students are offered to give their preferences to play each role. However, the ultimate decision of assigning roles belongs to the instructor. It is plausible that students are not assigned their desired roles. If this is the case, students are encouraged to take this opportunity to step outside of their comfort zone and discover their potential. It is noted that not all roles would be assigned for all papers. Roles are assigned to a paper based on its suitability to the context and content of the paper. It is also noted that depending on the number of students in a course session (sessions A and B), one student might present multiple times. If this is the case, the student would be assigned to a different paper, team, and role each time they are presenting.

For more details on the descriptions of each specialist role, please see Specialist Role Descriptions.

Paper summaries

When a student is not in a presenting group for a given class session, this student must submit a summary of the assigned paper and a discussion question for the presenters regarding the paper, presentation, or individual role. We provide a guided questionnaire for students to complete the paper summary. Our goal is to have deep and collaborative discussions on the week’s topics. To do so, it is important that all students are well-prepared for each class. Group presentations are followed by a discussion session monitored by the instructor. To ensure that all students are prepared for the discussions, the instructor will call on students at random to ask the prepared question, comment on the paper or offer commentary to the presenters. Although we recognize that this approach may induce some level of stress in students, it is our instructional philosophy that it is alright to offer incorrect answers, uncomfortable with random chance, and afraid of asking silly questions. Only by doing so do you grow. In short, it’s okay not to know; it’s not okay not to have tried.

Extra credit: ChatGPT red-teaming

ChatGPT red-teaming question:

Extra credit: Try asking ChatGPT/other AI platforms some questions related to the paper or its general topic, and assess the output’s correctness. Your goal is two-fold: (1) find an input question / prompt that will lead ChatGPT to produce something incorrect, and (2) explain what about the output is incorrect, and hypothesize why ChatGPT might have gotten it wrong.

You will get more points the more creative your input prompt/question is, and the better your explanation is for why it got it wrong.

Grading rubric:

0: if the system got the answer right

Prompt points: 1: low-hanging fruit input prompt (e.g., who is Author?, explain [paper title], copy-pasted questions from our own assignments) 2: good input prompt (summarize this paper, and good explanation (this likely will be the most common grade?) 3: input prompt is creative and interesting

Explanation points: 0: Explanation is missing or wrong 1: Explanation is correct 2: Explanation is correct and shows insights into limitations of ChatGPT

Part II

The second half of the course consists of surprise paper sessions, guest speaker presentations, a literature survey, and a capstone project review.

Surprise paper sessions and guest speaker presentations

The second half of the course starts with three surprise paper sessions in which students are required to read a base paper before the Tuesday class meeting. During the Tuesday class meeting, the instructor will distribute an additional paper to be read. There will also be a presentation from a guest speaker during the class meeting on Tuesday or Thursday. For each of the first two surprise paper sessions, students must submit a comparative analysis to compare and contrast the base and surprise papers due on Tuesday.

For the last surprise paper session, each student chooses one additional paper to write a related work survey of the base, surprise, and additional papers. Students are encouraged to take this opportunity to practice their literature survey skills to prepare for the comprehensive literature survey assignment to be due at the end of the course.

Literature survey

Each student chooses any paper covered in the course for which they will write and submit a detailed literature survey. It is recommended that students start exploring their topic of interest earlier in the semester through the presentation and/or surprise paper sessions to headstart their literature survey process. It is required that each student has the survey document proofread and revised in collaboration with the Global Communications Center before submission for grading.

Capstone project review

Finally, during the last two weeks of class, students will attend at least one final second-year capstone presentation and review one draft capstone report written by a second-year MCDS student team.

Attendance Policy

This course will be held in person. You are responsible for completing the work assigned and seeking clarification as needed. Late work is generally not accepted without prior arrangement or proper justification.

Assessment

The course grade will be based on the following:

  • Paper summary (complete when not presenting in a group): 25%
  • Presentation discussion participation (provide a question to the presenting group and be presented in class if selected to lead the discussion): 5%
  • Group presentation (individual grade of fulfilling the assigned role in the group): 25%
  • Surprise paper comparative analysis (two comparative analyses of surprise paper sessions I and II): 15%
  • Practice literature review (a related work review of surprise paper session III): 5%
  • Literature review (choose a topic covered in the course and write a literature review of the chosen topic): 20%
  • Capstone report review (provide constructive feedback to an 11-632 team capstone report): 2.5%
  • Capstone final presentation review (provide constructive feedback to an 11-632 team capstone final presentation): 2.5%
  • Reproducibility challenge (bonus 5% for participating in the reproducibility challenge): 5%
  • End-of-course survey (bonus 2% for completing the course survey for feedback and improvement): 2%
Assessment Type Grade Percentage
Weekly Paper Summary 25
Presentation Discussion Participation 5
Group Presentation (Individual Grade) 25
Surprise Paper Comparative Analysis 15
Practice Literature Survey 5
Literature Survey 20
Capstone Report Review 2.5
Capstone Final Presentation Review 2.5
Extra credit: Red-teaming ChatGPT 3
Extra credit: End-of-course Survey 2
TOTAL 105

AIV Policy

For preparing each presentation, you share work with your assigned teammates and no other students. In particular, when your paper is also being presented by a different team(s) in the same or different section of this course, you may not collaborate or share work with students in this other team(s). Similarly, all other deliverables in the course are individual assignments. You are required to synthesize, research literature, and produce the document by yourself without working with your classmates. This course is intended to give you experience in autonomous research, so trying to delegate or shortcut preparation is a wasted learning opportunity. Acting against this rule will be considered an academic integrity violation and lead to reprimands, including possible dismissal from the program (see the MCDS Handbook).

For your paper summaries and comparative analyses, you must produce your own work. You may discuss the papers with classmates, but the submissions must be your own work. Do not use the internet or other sources to find prior analyses to complete your assignments.

The presentation and related work survey emphasize a literature search and compare/contrast to other material. All material you find and use in any of the course deliverables must be explicitly and correctly referenced/cited.

Tentative Schedule

Week Date Content Activities and Assignments
Part I: Weekly Paper Presentation and Discussion
1
Introduction to the Course
Aug 29 Course Introduction Read syllabus
Aug 31 Tutorials:
How to read a technical/research paper
Roleplaying in reading papers
Role selection ranking due midnight
Week 2 presenters announced Friday (Sep 1)
2 Sep 5 Required Reading: Frankle, J., & Carbin, M. (2018). The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635. Presentation slides due midnight Monday (for presenting teams)
Paper summary due midnight Monday (for non-presenting students)
Group presentation and discussion
Sep 7 Required Reading: Kuchnik, M., Klimovic, A., Simsa, J., Smith, V., & Amvrosiadis, G. (2022). Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines. Proceedings of Machine Learning and Systems, 4, 33-51. Group presentation and discussion
3 Sep 12 Required Reading: Zhao, Jieyu, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. (2017). "Men also like shopping: Reducing gender bias amplification using corpus-level constraints." arXiv preprint arXiv:1707.09457. Presentation slides due midnight Monday (for presenting teams)
Paper summary due midnight Monday (for non-presenting students)
Group presentation and discussion
Sep 14 Required Reading: S. Rajbhandari, J. Rasley, O. Ruwase and Y. He, "ZeRO: Memory optimizations Toward Training Trillion Parameter Models," SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, 2020, pp. 1-16. Group presentation and discussion
4 Sep 19 Required Reading: Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, Joelle Pineau. Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning, JMLR 2020. Presentation slides due midnight Monday (for presenting teams)
Paper summary due midnight Monday (for non-presenting students)
Group presentation and discussion
Sep 21 Required Reading: Zhu, Shien, Luan HK Duong, and Weichen Liu. "XOR-Net: An efficient computation pipeline for binary neural network inference on edge devices." 2020 IEEE 26th international conference on parallel and distributed systems (ICPADS). IEEE, 2020. Group presentation and discussion
5 Sep 26 Required Reading: Qiao, Aurick, et al. "Pollux: Co-adaptive cluster scheduling for goodput-optimized deep learning." 15th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 21). 2021. Presentation slides due midnight Monday (for presenting teams)
Paper summary due midnight Monday (for non-presenting students)
Group presentation and discussion
Sep 28 Group presentation and discussion
6 Oct 3 Required Reading: Ken Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudík, Hanna Wallach, Improving fairness in machine learning systems: What do industry practitioners need?, in Proceedings of 2019 ACM CHI Conference on Human Factors in Computing Systems. Presentation slides due midnight Monday (for presenting teams)
Paper summary due midnight Monday (for non-presenting students)
Group presentation and discussion
Oct 5 Required Reading: (FlexFlow) Jia, Zhihao, Matei Zaharia, and Alex Aiken. "Beyond Data and Model Parallelism for Deep Neural Networks." Proceedings of Machine Learning and Systems 1 (2019): 1-13. Group presentation and discussion
7 Oct 10 Required Reading: Presentation slides due midnight Monday (for presenting teams)
Paper summary due midnight Monday (for non-presenting students)
Group presentation and discussion
Oct 12 Group presentation and discussion
8 Oct 17 Fall Break (No Class)
Oct 19
Part II: Surprise Paper Session, Literature Review, and Capstone Review
9
Surprise Paper Session I
Oct 24 Base paper:
Surprise paper:
Comparative Analysis Due
Oct 26 Learning Group Presentation
10
Surprise Paper Session II
Oct 31 Base paper:
Surprise paper:
Comparative Analysis Due
Nov 9 Learning Group Presentation
11
Surprise Paper Session III
Nov 14 Base paper:
Surprise paper:
Comparative Analysis Due
Nov 16 Learning Group Presentation
13 Nov 21 No Class - Happy Long Thanksgiving Break
Nov 22 - Nov 24 Thanksgiving Break
14 Nov 28 Guest Speaker
Nov 30 Guest Speaker
15 Dec 5 Attend 11-632 Capstone Presentation (No Class) Capstone Review Due
Dec 7 Reproducibility Challenge Due
16 Dec 12 Final exam week (No Class) Literature Review Due
Dec 14 Capstone Final Presentation Review Due
Dec 20 Grades Due