Kaggle Playground Series, August 2022: What I learned

I participated in this month’s beginner’s challenge on a simulated dataset that Kaggle releases every month. In the August 2022 challenge we are given simulated data from a fictional product test series and given the measured data, the task is to predict whether the product will fail or not in each case. In this post……

How is Data Science different from Machine Learning?

I have been working in a “Data Science” consulting team for 8 months now. Before that I got a Master’s degree in Computer Science with a machine learning specialization. So I could argue that I have seen both sides of the coin here and I have noticed some differences. Disclaimer: You will have trouble finding……

First project as a Data Science Consultant: tasks, tools, meetings

Consulting can be a frustrating business to enter into as a newcomer. Because every time you ask a consultant a question about their typical work, they tend to answer with β€œIt depends.” Because a lot of our day-to-day depends on the specific data science consulting project but also on the client. So after recently completing……

How to build a Decision Tree for Classification with Python

As promised in my July 2022 Machine Learning Study Plans, here is content on decision trees. Specifically, let’s talk about how you can build a trained decision tree for a classification problem with the Python library Scikit-Learn. I will also address what steps you need to take before using the example dataset in terms of……

One Hot Encoding – How to deal with categorical data in Machine Learning

Many models in machine learning don’t work with categorical data. So what do we do in that case? Of course you can always just remove them, but you would lose a lot of valuable information. So in this post, I share how you can use one hot encoding to make that information usable. I stumbled……

Consent Management Platform by Real Cookie Banner