What do I want to learn this month?
I liked doing this last month, so let’s continue this tradition of sharing my study goals for now.
Quick context, I already have university degrees and I started a job in January in Data Science that I enjoy, so my goals are largely internally motivated. The number one goal is to pick things I’m genuinely excited about to maximize the likelihood of me actually making time for this in my free time.
Table of Contents
- Goal 1: This month’s Kaggle Playground Challenge
- Goal 2: Random Forest – Applying the algorithm to a dataset
- Bonus-Goal: Read boosting theory to get an overview of the general idea
- Update: largely successful
Goal 1: This month’s Kaggle Playground Challenge
After the very positive experience with the July challenge last month, I will put this month’s challenge on the list again.
Here are a few of my reasons why I want to participate again:
- It has an inbuilt deadline that makes sure I actually wrap things up.
- Code from other Kagglers gives me a reference point of what to try if I’m clueless (which is likely)
- My colleagues expressed interest in my experiments and discoveries of new techniques last time, so hopefully I can share some knowledge again this month.
- Some colleagues even said they would try the challenge this month too!
The goal as last time:
- Submitting at least two different solutions
- Reading at least 2 notebooks from other Kagglers and considering what they did to improve their solution
Goal 2: Random Forest – Applying the algorithm to a dataset
I’m still on a quest to understand the field of tree-based algorithm. Last month I did basic decision trees, this time I want to break into Random Forests.
Read this blog post here on why I’m interested in tree-based algorithms.
The goal here is to use the random forest class from sci-kit on a dataset. This might actually tie in nicely with the August Kaggle Playground challenge above.
In the process I will learn
- how to pre-process the data to work with the algorithm
- side note: see my post on One-Hot Encoding I wrote while learning Decision Trees
- how to evaluate the model
- how to tweak the most important hyperparameters
I’ll leave it open how much theory I will learn to not overwhelm myself. But since I’m always curious how things actually work, I will read theory while doing the above things anyways.
Bonus-Goal: Read boosting theory to get an overview of the general idea
Next step on my tree journey is boosting after random forests. I’m not sure how to progress here exactly, so my first step is to do some reading. This will include some book chapters (like in Elements of Statistical Learning) and potentially a paper about it.
After that I will have more knowledge about the topic and then I can decide how to tackle experimentation or application of it. This includes deciding between the many implementations of boosted trees, like Lightgbm, xgboost etc
Update: largely successful
I did indeed participate in the kaggle challenge and I even got to talk with a colleague who also participated afterwards! While talking about it, we inspired another coworker to want to try it next time as well.
You can read more details about my results and what I learned in this blog post: Kaggle Playground Series, August 2022: What I learned
My results weren’t super great (read: they were bad), but I did learn quite a bit about the process that will benefit me in future problems.
I also used a random forest in my attempt, so that’s 2/2 in goals. I did not read about boosting however.
Let me know what you’re studying this month or if you are also participating in this month’s Kaggle challenge in the comments!