In this monthly post, I tell you what I plan to study or improve on in the area of machine learning (including an update at the end of the month).
Last month was a bit … non-optimal. So this month I’m learning from my experience and going back to practical goals despite loving theory. Let’s jump right in.
Oh, and if you want to receive weekly updates on how I’m doing with these goals, consider subscribing to my newsletter 😉
Table of Contents
- Goal 1: Anomaly Detection with Neural Networks
- Goal 2: Kaggle Playground Challenge for November
- Previous month’s study goals & results
Goal 1: Anomaly Detection with Neural Networks
At work I spent the last few months learning about Anomaly Detection methods and even though it’s related to basic methods like clustering or classification, the rarity of the anomalies in the dataset give this problem an interesting spin that has been fascinating me.
The first methods I have used for this problem have been fairly basic algorithms, but a lot of these, like the Local Outlier Factor and others, run into problems as the data gets larger and as more dimensions are introduced because in very high dimensional space distances and densities are deceptive.
Since large data is typically a good area to use neural networks, especially because of high computing demands and the ability of neural networks to work with the data iteratively in batches, I want to do some research and find out which strategies exist for anomaly detection within deep learning.
Questions I want to answer:
- Which models or methods exist for anomaly detection with neural networks?
- How easy is it to interpret the results? Because interpretability is a rather large concern in anomaly detection cases like fraud detection.
Possibly I’ll even get to playing around with a dataset.
Goal 2: Kaggle Playground Challenge for November
If you read some of my last goal-posts, you know that I’m a massive fan of the monthly Tabular Playground Series challenge on Kaggle, which is aimed at beginners in the world of Kaggle challenges and typically includes a synthetic dataset around a specific ML topic. I skipped some of the last challenges because they didn’t align with the areas I wanted to study, but this month I’m back.
This one is about the technique of ensemble models that combine the results of multiple classifiers to create more robust results. A well known example of typical ensemble methods are Random Forests. I’m excited to dive a bit deeper into the topic and
- read 2 notebooks of other participants to learn new ideas
- submit 2 solutions myself, no matter how bad