Homework

Literature Review

Finding teammates and selecting papers for review & literature survey

To prepare for the review and literature surveys, you must find 4 teammates, select a topic for your literature survey, and select 4 papers that are related. Write a short justification for why you chose this topic, and why you chose each paper. Also describe each of the teammates’ expertise.

Using the Google or Canvas form, give short explanations:

What is the topic of your literature survey? Choose broadly,
Write 4-5 sentences of introduction for this topic, including motivation (why it’s important as a subject of study).
For each paper, write 2-3 sentence description of its relevance to the topic.
For each team member, write a 2-3 sentence background of the expertise and interest of each teammate with respect to the topic of study.

We will randomly assign each team member a paper from the list.

Note, you are required to select recent (2020 or newer) and published (not arxiv-only) papers. Additionally, you cannot select survey papers or position papers. If you have doubts about the papers you select, please post on Piazza.

Paper review

The review should consist of (adapted from the ARR reviewing guide):

A short summary of the paper (5-7 sentences), written as a neutral, dispassionate summary of the research question and findings/contributions. Make sure you acknowledge all the contributions that you believe the paper is making: experimental evidence, replication, framing of a new question, artifacts that can be used in future work (models, resources, code), literature review, establishing new cross-disciplinary connections, conceptual developments, theoretical arguments. A paper may make several contributions, and not all of them need to be equally strong. You should state in your own words what you see as contributions of the paper, rather than copy/paste it from the abstract.
Strengths of the paper: even if you fundamentally disagree with a paper, it is important to accurately state all the best aspects of it. Once again, the strengths may come in many different forms: an engineering solution, framing an important issue, a literature review, a useful artifact (a model or a resource), a conceptual development, a reproduction. Performance improvements or complex math are by themselves neither necessary nor sufficient. It should be clear in what way the study advances the field: what did we learn from it that we did not know before? What can we do that we could not do before?
Weaknesses of the paper: here, list the aspects of the paper that could be improved, of which there could be many.
- There may be claims that are not actually supported by the evidence or by the arguments, but that are presented as conclusions rather than as hypotheses/discussion. The framing may be misleading. There may be obvious methodological flaws (e.g., only the best run results are reported), errors in the proofs, in the implementation, or in the analysis. There may be insufficient detail to understand what was done or how to reproduce the method and the results. There may be a lack of clarity about what the research question is (even if it is “Does system A work better than system B”?), what was done, why, and what was the conclusion. The paper should also make it clear in what way the findings and/or the released artifacts advance the field.
- A common reviewer mistake is confusing “must-haves” (weaknesses) with “nice-to-haves” (often, possible follow up or alternative experiments). Any project has limited time and pages, and it is always possible to think of more follow-up experiments. As long as enough work was done to prove the claims that the authors are making, any extra experiments are in the “nice-to-have” category, and not a weakness as such.
- Note, try to focus on the conceptual, technical, and methodological weaknesses. Do not overly focus on how the paper is written, terminology, or clarity issues, unless they really make the paper harder to read (this is somewhat less likely since the papers all got into conferences, so presumably they should be somewhat understandable). Clarity issues often fall in the “nice-to-have” category (i.e., you think the paper would be nicer if it were written differently, but that doesn’t make the structure a huge weakness).
Future directions and remaining open questions: to get you started thinking about your capstone, we want you to think about remaining open questions with respect to the broader goal of the paper, as well as any future directions you can think of. These could include methodological changes or improvements to the method, adaptations to new domains, follow up experiments to run, etc. Make sure to mention follow up experiments that would shed important light onto the paper’s main research goals and why the follow up directions would be required,(e.g., avoid simply saying “they should try it on another dataset”, make sure to motivate why the paper’s main research question would benefit from another dataset).

You must turn in your review using LaTex with Bibtex, using the ARR style format (LaTeX templates, also available as an Overleaf template).

Practice literature review

The goal of this assignment is to develop your skills in reviewing and synthesizing academic literature in the your subfield of data science. You will select a task and topic area, choose 4 related recently published research papers, and summarize and compare their methodologies, findings, and contributions. Additionally, you will identify and discuss any gaps or open questions that remain in the research. The review should be comprehensive yet concise, spanning 2 to 3 pages.

Read and Analyze Your Chosen Papers:
- Thoroughly read each paper, taking notes on key points such as:
  - Research objectives and questions.
  - Methodologies and techniques used.
  - Key findings and results.
  - Contributions to the field.
  - Limitations and future research directions.
Summarize the Papers:
- Write a brief summary for each of the four papers. Each summary should include:
  - A concise overview of the research problem and objectives.
  - Description of the methods and techniques employed.
  - Summary of the main findings and conclusions.
  - Discussion of the paper’s contributions to the field.
Compare and Contrast the Papers Along Various Dimensions:
- Analyze the similarities and differences among the papers in terms of the following dimensions:
  - Research questions and objectives.
  - Methodological approaches.
  - Dataset, domain, scope.
  - Key findings and results.
  - Contributions and impact on the field.
- Identify common themes, patterns, and trends that emerge from the comparison.
Discuss Open Questions and Future Directions:
- Highlight any gaps or open questions that the papers leave unanswered.
- Discuss potential areas for future research based on the identified gaps.
- Reflect on how addressing these questions could advance your subfield.
Write the Structured Review:
- Organize your review into a coherent and logical structure, according to this structure:
  - Introduction: Introduce the topic area and the importance of the selected task within data science. Provide a brief overview of the four papers you will review.
  - Summaries: Provide individual summaries of the four papers, focusing on highlighting the details of each paper’s approach that you will focus on in the comparison part. Suggestion: avoid writing more than a 1/4 page (1/2 column) for each summary), and use the \paragraph command for each new paper.
  - Comparison: Compare and contrast the methodologies, findings, domains, and contributions of the papers. Suggestion: have one subsection for each of the dimensions.
  - Open Questions: Discuss the gaps and open questions that remain.
  - Conclusion: Summarize the main points of your review and suggest directions for future research.
Format and Submission:
- The review should be between 2 to 3 pages in length, not including references.
- Use LaTex with Bibtex, using the ARR style format (LaTeX templates, also available as an Overleaf template).
- Proofread your review to ensure clarity, coherence, and correctness.
- Submit your assignment by the specified deadline.

Some notes:

You can and are encouraged to cite more than just the required papers, especially in the introduction (e.g., to motivate the existence of the research area) and future directions sections (e.g., to give ideas of how to address open gaps).
You can include at most one figure and one table in your write-up. They must bring in useful information that isn’t better written in text.
Example published literature reviews (which are useful to learn how to frame reviews, we do not expect as much work as these published reviews):
- Human-centered/NLP: https://aclanthology.org/P19-1159.pdf
- Analytics/NLP: https://aclanthology.org/2020.coling-main.247.pdf
- Systems: https://ietresearch.onlinelibrary.wiley.com/doi/full/10.1049/cdt2.12016
- Data science/ML: https://arxiv.org/abs/2402.16827
- LLMs/Systems: https://arxiv.org/abs/2312.03863

Rubric:

2 points - Formatting & turning in assignment: did the student turn in the assignment on time? Is it using Latex & fits within the page limit?
3 points - Coherent choice of papers: are the papers related to the same overall topic?
5 points - Introduction
- Does the review introduce the topic area well (with examples)? Does it motivate why the task is meaningful or important? Does it give an overview of the literature review?
12 points - Summaries
- For each paper: Does the review summarize the papers’ main task and overall topic? Does it summarize papers’ methodology, approach? Does it summarize the papers’ main results, findings, and takeaways?
20 points - Comparison. For each dimension:
- Does the review appropriately compare the similarities the papers of the papers? Does the review contrast the papers appropriately? If some dimensions are not applicable, does the review mention it? Does the review go beyond simply restating what papers did, but instead tie the papers together thematically and relationally?
- Dimensions: Research questions and objectives; Methodological approaches; Dataset, domain, scope; Key findings and results.
10 pts - Open Questions. Does the review appropriately outline open questions that are remaining? Does the review mention future directions for next research projects towards tackling the broader task? Does the review provide possible methods for tackling these future directions? Does the review mention more than 2 open remaining questions?
3 pts - Conclusion - Does the review appropriately summarize the task at hand, and papers examined? Does it briefly mention open questions or future work? Is the conclusion an appropriate length (~1/4-1/2 column)?

Total: 55 points

Literature review

The goal of this assignment is to further finetune your skills in reviewing and synthesizing academic literature in the your subfield of data science. You will build on top of the existing practice literature review, adding 4 more papers, and incorporating the comments from TAs. The final review should 5-6 pages.

You should find 4 related papers, preferably ones that are cited in your other 4 papers, or that cite one/some of the 4 papers. Same as for the review, you are required to select recent (2020 or newer) and published (not arxiv-only) papers. Additionally, you cannot select survey papers or position papers. If you have doubts about the papers you select, please post on Piazza.

Rubric:

2 points - Formatting & turning in assignment: did the student turn in the assignment on time? Is it using Latex & fits within the page limit?
3 points - Coherent choice of papers: are the papers related to the same overall topic?
5 points - Introduction
- Does the review introduce the topic area well (with examples)? Does it motivate why the task is meaningful or important? Does it give an overview of the literature review?
24 points - Summaries
- For each paper: Does the review summarize the papers’ main task and overall topic? Does it summarize papers’ methodology, approach? Does it summarize the papers’ main results, findings, and takeaways?
40 points - Comparison. For each dimension:
- Does the review appropriately compare the similarities the papers of the papers? Does the review contrast the papers appropriately? If some dimensions are not applicable, does the review mention it? Does the review go beyond simply restating what papers did, but instead tie the papers together thematically and relationally?
- Dimensions: Research questions and objectives; Methodological approaches; Dataset, domain, scope; Key findings and results.
20 pts - Open Questions. Does the review appropriately outline open questions that are remaining? Does the review mention future directions for next research projects towards tackling the broader task? Does the review provide possible methods for tackling these future directions? Does the review mention more than 2 open remaining questions?
6 pts - Conclusion - Does the review appropriately summarize the task at hand, and papers examined? Does it briefly mention open questions or future work? Is the conclusion an appropriate length (~1/4-1/2 column)?

Total: 100 points

Presentation Tips

Here are some general tips for presenting a research paper, though some of these may not apply to your paper.

Introduction

Assume an adversarial crowd - assume they don’t care about your project
Motivate your project by explaining: why it matters, what real-world problem it might solve, etc.
Reel people in using a captivating example! Simplify your task and walk through an example so people really understand. High level data descriptions don’t give the listener a concrete idea of the data we’re looking at. Caveat: make sure the example is short; people shouldn’t be reading for more than 10-15 seconds.

Data description

Include interesting examples if it’s data that people are unfamiliar with or if the data has interesting properties
Include data statistics (with numbers and/or graphs)

Methodology

Diagrams, flowcharts, drawings are much better than text! Often it takes a while to come up with a good visualization for your model, but it can create much more lasting impression than 5 equations.
Keep equations at a minimum (and don’t put more than 1 or 2 equations in one slide)
Also, using animation as you explain your models or algorithmic procedures can help people follow along as you are talking.
Use intuitive labels or icons.
- If it’s a vector, draw a narrow vertical rectangle
- Use logos or cliparts for articles, stories, people, etc.

Experimental Set-Up

Include what the train/dev/test split is.
Include what objective function you are optimizing and what metrics you will be evaluating

Results

Make visuals (tables and graphs) easy to follow with a clear takeaway message. Audiences should be able to look at tables and draw conclusions without having to interpret them on their own. You can also use bold font to make the best performing models more clear in tables.
Replace all tables with graphs/other types of visualizations (you will very likely lose points if you screenshot a table from the paper). Check out these tips for data viz: http://guides.library.duke.edu/datavis/topten.
Make sure to title and label tables/axis/legends correctly.
Limit significant figures! p= 0.35 is much more legible than p = 0.346749362.
Include short takeaways from results (plots/tables)
Tell the audience what an ideal plot would look like to help understand the plot

General tips

Limit the number of words per slide as much as you can Try writing out what you want to say first, then replace words with graphs/images/icons.
Rehearse your talk fully at least once, it helps debug structural and technical issues and helps you figure out how you’re doing on the time limit. This is especially helpful if you’re co-presenting
Make sure you look up at the audience and not your slides (especially if those are behind you). Using speaker notes is fine but make sure you’re not reading them out loud.
If you’re pointing at something in the slide, try to highlight it either using a laser pointer or using animations (fade, red circles, etc).
Content warning: if you’re tackling a problem space that contains sensitive topics, offensive language, etc. please use a content warning, and refrain from using actual examples (e.g., use emojis or blurred out text instead). Refrain from using slurs out loud or in your slides (e.g., fg, fggot, n**er, btch, n*gga, etc). Remember to keep your audience’s well being in mind.

FAQ from practice presentations

Q: The [assigned] paper for the practice presentation is in total 42 pages long. Do we have to study the entire 42 pages for the presentation?
A: Thank you for asking. Although the article is 42 pages long, the main text is only around 10 pages. The remainder consists of references and an appendix, which you should only consult if necessary.

Q: Wanted to confirm if the 15 minutes of practice presentation includes the 5 minutes for questions? (because I see in the Excel file for the paper assignments that there is no separate 5 minutes slot for questions)
A: The practice presentation must be 13-15 minutes with no questions from the audience, so that we can get through all slots!

Q: I was wondering - what should the title slide look like? What information should it include in addition to the name of the paper?
A: The first slide can include details such as the title of the paper, your name, the date of the presentation, and other relevant information if any. It is also in general a good practice to add the authors’ names in the title slide to give due credit. If there are too many authors, just the first author’s surname followed by “et al.” is also okay. If there is space left, you can also mention the venue, but it’s not necessary. Further, affiliation logos are a good idea. Since you are from LTI, the LTI and CMU logos in one corner are often considered the normal.

Q: I know that we were told to recreate the graphs and not use the graphs directly from the papers but some of the graphs are hard to recreate and quite important to be included in the slides. Can it be an exception if we site it properly?
A: Yes, you can use such plots, as long as it is explicitly mentioned in the slide and duly cited.

Q: Wanted to check if we are allowed to use Microsoft Designer Tool (will it be considered AIV)? I believe the tool might be AI powered. It will not generate content, but will help create slide designs.
A: If it is based on your content, and the image/art generated is reasonably general (for example, a timeline image based on your text that included a timeline), it should be okay. However, if you ask it to generate a very specific image (say, something similar to what DallE or other diffusion models would generate), then I would refrain from doing so. It is important to evaluate how much novelty the designer is really adding to the image before using it. Furthermore, you should consider the need for the designer altogether, as it often creates very unnecessarily flashy and marketing-y looking designs which may actually distract from the content rather than enhance it.

Q: Do I need to cite the formulas from the paper when I use them? Is it okay to take a screenshot of the formula and include it in my slide like the one below? For some reason, it is clearer than the ones I typed out using latex myself.
A: For equations, you do not have to cite the paper if you’re screenshotting the equation. However, equations can be quite dry and boring and overwhelming to look at, so it’s hard to make them interesting in a slide… if you just copy-paste the equation but do nothing with it, it could fall under “wall of text” issues (esp. cause the variables need to be defined)… but if they have good animations, colors for different variables, etc., then that’s fine. So for equations it’s more about “is the equation easily understandable”, and often making them easily understandable requires some level of animation.

Q: I was wondering if there was a need to cite any icons we might get from free websites.
A: That depends on the licence under which the icons are made public. If they are free for use, then no. Else, it depends on the licence requirements and those should be followed properly.

Q: I was wondering if it is required to cite images not from academic sources. For example, I include an apple picture I find from Google, do I include the source?
A: Yes, you should ideally cite that image. Even better would be to check the licence under which it is available. If it can be freely used without a citation, then it’s okay not to cite it. If not, then a citation would be necessary. At the very least, include the link to the image at the bottom of the slide in a small font text box.

Q: How should we go about citing data that we use to recreate the paper’s figures?
A: If you use numbers from somewhere, just put a citation at the bottom of the slide. Else, if it is an image that you use as a sub-part of the figure that you are creating, then just put a link to the image if taken from the internet (unless available for free use), or the citation of the source from where it is taken from.

Q: I am using Canva to draw a flowchart that I personally think will be beneficial for explaining. If I use the template and mention it in my slides, will it be considered AIV?
A: That won’t be AIV. Just be very explicit!