Link Search Menu Expand Document

11-631: Data Science Seminar - Syllabus

Course Learning Outcomes

This course introduces students to the breadth of data science—covering human-centered, analytic, and systems approaches—through exposure to a wide variety of research topics and literature. Emphasis is placed on developing core academic skills: reading, writing, presenting, critiquing, discussing, and researching in data science. Students will collaborate to analyze publications, synthesize ideas from diverse subfields, and effectively communicate their insights both individually and in groups.

  • Gain exposure to the breadth of data science literature, including human-centered, analytic, and systems-oriented research, as well as relevant application areas, venues, and writing styles.
  • Learn how to critically read, evaluate, and discuss data science publications, justifying academic assessments of specific works.
  • Develop skills for writing academic papers and reviews, by synthesizing research content from multiple perspectives.
  • Present research papers in a clear, comprehensive, and collaborative manner, connecting a given publication to related works and broader themes in the field.

Time & Location

Section A: TR 8:35 am - 9:55 am, POS 153

Section B: TR 11:50 am - 1:10 pm, POS 153

Course Format

In-Person. The course opens with an initial overview of the Data Science literature and tutorials on how to analyze and critique Data Science publications. The course also provides tutorials on preparing and presenting reviews of Data Science publications and related literature.

Course Organization

The bulk of the course will consist of paper discussion sessions as well as associated presentations and reviews of related works. The deliverables expected from each student are:

  1. Play a role in a student team to present an analysis and critique of an assigned paper.
  2. Written summaries of the assigned paper when not presenting.
  3. Prepared questions and commentaries for the assigned paper to facilitate classroom discussion.
  4. Comparative analyses of base papers and surprise papers.
  5. A written literature survey on related work.
  6. A constructive review of a capstone project.

The course is sectioned into two parts, parts I and II. Part I of the course consists of group presentations and paper discussion sessions in which students are divided into groups to present the assigned reading for that session while other students submit a paper summary for it. Part II of the course consists of three surprise paper sessions, a literature survey assignment, and a capstone project review.

Part I

In the first half of the course, each class session involves reading, presenting, critiquing, and discussing one assigned paper. Before each session, all students are required to read the paper. Two teams of students are responsible for presenting the paper. Each student in the presenting team is assigned a specialist role (details below) which guides the critiquing approach to the paper. All other students are required to submit a summary of the assigned paper and a discussion question for the paper and/or the presenting team.

The class session begins with a presentation, after which the class breaks into smaller discussion groups. Each presenter is responsible for generating a discussion question, and the class is divided into groups based on these questions for more targeted discussions.

Grace day policy: Each student is granted one grace day throughout Part I of the course. A grace day is a day that you notify us in advance that you will not be presented for the class. This is the day that you will not be selected to present in a team nor be chosen as one of the reviewers of the presentations. This grace day is only allowed once. Any additional absence from the course throughout Part I of the course will result in a zero grade should you be randomly selected to present and/or on the additional absence day.

Specialist roles

Each student in a team is assigned a role to play to critique a paper. Each role is of equal importance. Details and suggested preparations for each role are detailed below. Students are encouraged to read through each role and request clarification if needed. To foster individuality and diverse student backgrounds, students are offered to give their preferences to play each role. However, the ultimate decision of assigning roles belongs to the instructor. It is plausible that students are not assigned their desired roles. If this is the case, students are encouraged to take this opportunity to step outside of their comfort zone and discover their potential. It is noted that not all roles would be assigned for all papers. Roles are assigned to a paper based on its suitability to the context and content of the paper. It is also noted that depending on the number of students in a course session (sessions A and B), one student might present multiple times. If this is the case, the student would be assigned to a different paper, team, and role each time they are presenting.

For more details on the descriptions of each specialist role, please see Specialist Role Descriptions.

Paper summaries

When a student is not in a presenting group for a given class session, this student must submit a summary of the assigned paper and a discussion question for the presenters regarding the paper, presentation, or individual role. We provide a guided questionnaire for students to complete the paper summary. Our goal is to have deep and collaborative discussions on the week’s topics. To do so, it is important that all students are well-prepared for each class. Group presentations are followed by a discussion session monitored by the instructor. To ensure that all students are prepared for the discussions, the instructor will call on students at random to ask the prepared question, comment on the paper or offer commentary to the presenters. Although we recognize that this approach may induce some level of stress in students, it is our instructional philosophy that it is alright to offer incorrect answers, uncomfortable with random chance, and afraid of asking silly questions. Only by doing so do you grow. In short, it’s okay not to know; it’s not okay not to have tried.

Part II

The second half of the course consists of surprise paper sessions, guest speaker presentations, a literature survey, and a capstone project review.

Surprise paper sessions and guest speaker presentations

The second half of the course starts with three surprise paper sessions in which students are required to read a base paper before the Tuesday class meeting. During the Tuesday class meeting, the instructor will distribute an additional paper to be read. There will also be a presentation from a guest speaker during the class meeting on Tuesday or Thursday. For each of the first two surprise paper sessions, students must submit a comparative analysis to compare and contrast the base and surprise papers due on Tuesday.

For the last surprise paper session, each student chooses one additional paper to write a related work survey of the base, surprise, and additional papers. Students are encouraged to take this opportunity to practice their literature survey skills to prepare for the comprehensive literature survey assignment to be due at the end of the course.

Literature survey

Each student chooses any paper covered in the course for which they will write and submit a detailed literature survey. It is recommended that students start exploring their topic of interest earlier in the semester through the presentation and/or surprise paper sessions to headstart their literature survey process. It is required that each student has the survey document proofread and revised in collaboration with the Global Communications Center before submission for grading.

Capstone project review

Finally, during the last two weeks of class, students will attend at least one final second-year capstone presentation and review one draft capstone report written by a second-year MCDS student team.

Attendance Policy

This course will be held in person. You are responsible for completing the work assigned and seeking clarification as needed. Late work is generally not accepted without prior arrangement or proper justification.

Assessment

The course grade will be based on the following:

  • Paper summary (complete when not presenting in a group): 25%
  • Presentation discussion participation (provide a question to the presenting group and be presented in class if selected to lead the discussion): 5%
  • Group presentation (individual grade of fulfilling the assigned role in the group): 25%
  • Surprise paper comparative analysis (two comparative analyses of surprise paper sessions I and II): 10%
  • Learning group presentation (learning groups get together to answer questions posted by the guest speaker): 5%
  • Practice literature review (a related work review of surprise paper session III): 5%
  • Literature review (choose a topic covered in the course and write a literature review of the chosen topic): 20%
  • Capstone report review (provide constructive feedback to an 11-632 team capstone report): 2.5%
  • Capstone final presentation review (provide constructive feedback to an 11-632 team capstone final presentation): 2.5%
  • Reproducibility challenge (bonus 5% for participating in the reproducibility challenge): 5%
  • End-of-course survey (bonus 2% for completing the course survey for feedback and improvement): 2%
Assessment Type Grade Percentage
Weekly Paper Summary 25
Presentation Discussion Participation 5
Group Presentation (Individual Grade) 25
Surprise Paper Comparative Analysis 10
Learning Group Presentation (Group Grade) 5
Practice Literature Review 5
Literature Review 20
Capstone Report Review 2.5
Capstone Final Presentation Review 2.5
Reproducibility Challenge 5
End-of-course Survey 2
TOTAL 107

AIV Policy

Collaboration policy: For preparing each presentation and literature survey, you must only share work with your assigned teammates and no other students. Paper summaries, reviews, and capstone reviews are individual assignments. This course is intended to give you experience in autonomous research, so trying to delegate or shortcut preparation is a wasted learning opportunity. Acting against this rule will be considered an academic integrity violation and lead to reprimands, including possible dismissal from the program (see the MCDS Handbook).

Plagiarism and AIV policy: The presentation and related work survey emphasize a literature search and compare/contrast to other material. All material you find and use in any of the course deliverables must be explicitly and correctly referenced/cited. Notes:

  • Directly copying text from the paper being summarized, and/or from author websites or other sources, without using “quotation marks” around everything that is a direct quote, followed by a reference to the source being quoted, is plagiarism.
  • Text and/or slides copied directly from other sources without attribution in presentations is also considered plagiarism.

Here are some resources for learning what is and isn’t plagiarism:

GenAI use policy: In this course, you are expected to do all the work that is required to satisfy the learning objectives. Use of generative AI (e.g., ChatGPT, Perplexity, etc.) to automatically do your assignments for you is not allowed: any assignment you turn in must be your own writing and content (i.e., no direct copy-pasting output from GenAI models, and no copy-pasting + manual or automatic paraphrasing).

You may use GenAI to enhance your understanding of papers and subjects (e.g., by asking questions about papers), helping find papers to include in your literature review (but beware of it may invent non-existent papers), and to ask for feedback on your writing flow (e.g., “can you indicate whether the ordering of paragraphs makes sense”). For grammar and writing improvement, we suggest using Grammarly instead of GenAI.

It is very easy to tell when a student is not actually familiar or has not actually understood the material. If we suspect or confirm that you turned in something AI generated or relied too heavily on AI for your assignments, we reserve the right to ask you to justify your turned-in assignment, waive all your grades in that homework category, or even report an academic integrity violation (AIV).

Reading List

For more details on the required readings, examples of prior work, and follow-up work, please see Reading List.

Tentative Schedule

Week Date Content Activities and Assignments
Part I: Weekly Paper Presentation and Discussion
1
Introduction to the Course
Aug 30 Course Introduction Read syllabus
Sep 1 Tutorials:
How to read a technical/research paper
Roleplaying in reading papers
Role selection ranking due midnight
Week 2 presenters announced Friday (Sep 2)
2 Sep 6 Required Reading: Frankle, J., & Carbin, M. (2018). The lottery ticket hypothesis: Finding sparse, trainable neural networks. arXiv preprint arXiv:1803.03635. Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Sep 8 Group presentation and discussion
3 Sep 13 Required Reading: Zhao, Jieyu, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. (2017). "Men also like shopping: Reducing gender bias amplification using corpus-level constraints." arXiv preprint arXiv:1707.09457. Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Sep 15 Group presentation and discussion
4 Sep 20 Required Reading: Aakanksha Naik*, Abhilasha Ravichander, Norman Sadeh, Carolyn Rose, Graham Neubig. "Stress Test Evaluation for Natural Language Inference." 27th International Conference on Computational Linguistics (COLING-2018) Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Sep 22 Group presentation and discussion
5 Sep 27 Required Reading: Ken Holstein, Jennifer Wortman Vaughan, Hal Daumé III, Miro Dudík, Hanna Wallach, Improving fairness in machine learning systems: What do industry practitioners need?, in Proceedings of 2019 ACM CHI Conference on Human Factors in Computing Systems. Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Sep 29 Group presentation and discussion
6 Oct 4 Required Reading: Peter Henderson, Jieru Hu, Joshua Romoff, Emma Brunskill, Dan Jurafsky, Joelle Pineau. Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning, JMLR 2020. Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Oct 6 Group presentation and discussion
7 Oct 11 Required Reading: Emily M. Bender and Alexander Koller, Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5185–5198. Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Oct 13 Group presentation and discussion
8 Oct 18 Fall Break (No Class)
Oct 20
9 Oct 25 Required Reading: Kuchnik, M., Klimovic, A., Simsa, J., Smith, V., & Amvrosiadis, G. (2022). Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines. Proceedings of Machine Learning and Systems, 4, 33-51. Presentation slides due midnight Monday (for presenting team)
Paper summary + a discussion question for the team due midnight Monday (for non-presenting students)
Group presentation and discussion
Oct 27 Group presentation and discussion
Part II: Surprise Paper Session, Literature Review, and Capstone Review
10
Surprise Paper Session I
Nov 1 Base paper:
Surprise paper:
Comparative Analysis Due
Nov 3 Learning Group Presentation
11
Surprise Paper Session II
Nov 8 Base paper:
Surprise paper:
Comparative Analysis Due
Nov 10 Learning Group Presentation
12
Surprise Paper Session III
Nov 15 Base paper:
Surprise paper:
Practice Literature Survey Due
Nov 17 Learning Group Presentation
13 Nov 22 No Class - Happy Long Thanksgiving Break
Nov 23 - Nov 25 Thanksgiving Break
14 Nov 29 Guest Speaker
Dec 1 Guest Speaker
15 Dec 6 Attend 11-632 Capstone Presentation (No Class) Capstone Review Due
Dec 8 Reproducibility Challenge Due
16 Dec 13 Final exam week (No Class) Literature Review Due
Dec 15 Capstone Final Presentation Review Due
Dec 21 Grades Due