What I learned during my 14 Days Fraud Detection Challenge

The problem: Do you have many topics you would like to study or areas of coding you want to improve, but you can’t decide which is the best to focus on so you end up studying a little bit of everything and never making noticeable progress? Or you end up so overwhelmed that you just procrastinate by watching YouTube?

If not, then you’re the lucky one, because I definitely experience this a lot.

The classic: Overwhelmed to procrastination

Let’s see what I did to overcome this and what I did wrong in the process, so you can learn from my mistakes and do it more efficiently.

Table of Contents

My solution: setting myself a challenge

At the start of February, I decided to overcome this decision paralysis and set out to set myself a challenge with a fixed time frame: go through all stages of a simple data science Kaggle project in 14 days. The advantage of a fixed time scope is that the perceived risk of “What if I’m focusing on the wrong thing?” is small, since you are only focusing on this for 14 days.

The “rules” I chose for myself:

  • work on it every single day
  • send out an email update about my progress to hold myself accountable and also share a realistic portrayal of daily Data Science work
  • not look at previous Kaggle solutions for this dataset
  • go through all stages of a Data Science project as much as the time allows:
    • Data Understanding
    • Data Exploration
    • Feature Transformation
    • Modelling
    • Evaluation

Side note: I also worked a normal 9-5 job in that time. And it was deadline-week…

What ended up happening:

  • I had to extend the challenge almost by a week since deadline-week killed my free-time and energy. I had the choice between ending the data science project without any sort of working model or extending the time and I chose the latter.
  • I sent out 6 instead of 14 emails
  • I did finish the project with a decent model that was at least partly optimized
  • I learned a LOT

If you want to see the full code and all updates, click here.

Organizational hurdles and what I would do different in the next challenge

Not defining clear hours I would work on the project

“Work will always expand to fill the time allotted for its completion” – One of the many wordings of Parkinson’s Law

I recognized that when I set myself the 14 days as a deadline. What I did not think through, however, was when exactly I would be working on the project next to work and other commitments. If I had sat down and wrote down:

“I will work on this project 1 hour per day (plus the time for writing email updates)”

then I would have likely realized that

  1. I needed to set aside an hour every day and that this might be unrealistic on some days
  2. just 14 hours for a full project was not enough

But I was too excited about the idea of this project, skipped this crucial planning step and it almost derailed the project completely.

Too much overhead: Writing an email update every single day

In theory, the email update was a nice idea: Document for myself and others what I achieved that day. In practice, I underestimated the time it would take to write such an email. Since I included code examples and visualizations of the data, it took about an hour on average just to write this email.

Now that I know this, I would plan fewer written updates. An alternative would be updating more often on social media. I think a good ratio would be 3:1. 3 work hours to 1 hour of documenting and sharing. Documenting forces me to reflect on my progress and helps me make better decisions for future work, especially when done in a reflecting way.

Not setting measurable goals

Maybe the work was too much for 2 weeks, but a clear obstacle to progress was not defining when each of the stages of data science would be “done”. When you’re a perfectionist like me, there is always more that could be done.

So in the future I would possibly choose a smaller scope from the beginning, for example just focusing on optimization of the model for one week, but I will carefully define what has to be done for each stage that should be completed.

Examples: “Create 3 types of visualizations for data overview”, “Train 2 different models (no matter how good they are) for the modeling phase”

What I would do again in a future challenge

To end all of this on a positive note, here are 3 quick things that went well and that I would do again:

  • share my progress: It was scary, but I would absolutely recommend sharing your work. Not just making your GitHub public but also talking about what works well and what doesn’t. It got some great tips (like the imbalanced-learn library) from comments on my status updates.
  • set a time limit: Even though I had some issues around time management, I learned so much about how long things take and frankly, I would have never started or finished this without knowing that it had a defined “end” point. Would 100% recommend to get from “one day”-thinking into actual doing.
  • finish it: I’m super proud I finished, even though I had to make some adjustments along the way. This was a challenge, and this means the outcome will likely not be perfect. But that’s okay, I can always improve my skills and results in the next challenge. And there’s nothing worse than having unfinished projects lying around.

Leave a Reply

Consent Management Platform by Real Cookie Banner