how to do research in 10 "easy" steps

Cutting to the chase…

💡Define/find a problem definition.
- Ideally your advisors / collaborators will provide this along with reference paper(s) to get started with.
👓 Do a literature review and establish baselines in terms of existing methods and benchmark datasets.
- Do 1-page/slide summaries of papers you read (Ref: “How to Read a Paper”). If there is a latest survey paper available, great! Start with that and save yourself some time.
- If it is an existing problem definition, there will be pre-existing datasets in the literature. If it is related-but-not-exactly-the-same problem definition, there will be related datasets (duh!). If it is custom/proprietary data, your advisors/collaborators should have provided you with it in step 1.
💭 Brainstorm contributions that can be made by you towards “solving” the problem definition to improve upon the baseline methods and/or benchmark datasets and/or evaluation/performance metrics to measure.
- Methods contribution could be a novel method or an improvement of an existing method for scale/speed/robustness/explainability/etc.
- Datasets contribution could be a new dataset that would be valuable to current or future methods contributions for the problem definition or improving existing benchmarks in some meaningful way.
- Evaluation/performance metrics contribution could be new (or modification of existing) measures of how well the methods are doing towards solving the problem.
💁 Present the literature review and contributions in context of the problem definition.
- Ideally listeners will tell you about the papers you might have missed and poke holes in your understanding of some literature (if they have read the papers too or are familiar with the topic) as well as your proposed contributions.
😩 Establish baseline methods performance on benchmark datasets.
- Yes, you have to find implementations of baseline methods or implement them from scratch yourself and actually run the experiments to reproduce the reported performance numbers of baselines. DO NOT simply copy results tables from literature. Steps 1-5 can result in a survey paper.
😀 Realize contribution(s).
- Define a new method and/or apply a previously unapplied set of existing methods to realize the proposed methods contributions.
- Apply a suite of established methods to the new datasets to realize the proposed datasets contributions.
- Report performance on the new evaluation/performance metrics to realize the proposed evaluation/performance metrics contributions.
💪 Verify correctness and compare against baselines.
- Best to start with synthetic datasets to crash-test ideas and verify correctness of implementation(s).
- Usually a massive grid search is warranted at this stage to compare-and-contrast.
👓 Review Results.
- Hopefully, the performance of the proposed solution is better and the proposed contributions are realized 😌. If not, go back to step 6 and improve the method 😧 or go back to step 3 and modify the contributions 😮 or, last resort, go back to step 1 to modify the problem definition and start over 💀.
- Silver Lining: If your literature review wasn’t too narrow it will still be useful if you have to start over with a modified problem definition. Worst case, you’ll at least knock out an interesting survey paper if you did a good lit. review and if the problem definition is a currently hot topic.
💁 Present results and get feedback from colleagues and collaborators.
- Ideally listeners will poke holes in your experimental setup, sanity check your claims. They may suggest more experiments to make the claims stronger.
🐌 Write up a paper. Then go smoke a joint and rinse and repeat.
- You should already have an Overleaf repo in Step 1 in a generic conference format with the typical sections of intro, background and motivation, proposed contributions and methodology, results, conclusion and future directions. This document will initially serve as a place to store your daily/weekly notes/drafts and over time it evolves into the final paper. Trust me, it can be quite overwhelming to start from a blank page at the end.

(Disclaimer: The above is in the context of methods/applied research. I don’t know anything about theory research. I guess those folks begin by sacrificing a goat on a full moon, then do peyote and wait for some novel theory about some deep problem to pop into their mind… or more realistically go borrow some ideas from math and physics from 50 years ago. But then again, if you could read those math/physics papers and effectively translate their ideas into our domain, you wouldn’t be here reading this right now, would you?👀)