It would be great if our data was always ready to go and be fed into a machine learning model. That is however not the case in most projects. Among the many possible issues, a likely one is that your data has missing values. Possible reasons for missing data are that perhaps the data was…… Continue reading How to impute missing values in Python DataFrames
I have been working in a “Data Science” consulting team for 8 months now. Before that I got a Master’s degree in Computer Science with a machine learning specialization. So I could argue that I have seen both sides of the coin here and I have noticed some differences. Disclaimer: You will have trouble finding…… Continue reading How is Data Science different from Machine Learning?
Consulting can be a frustrating business to enter into as a newcomer. Because every time you ask a consultant a question about their typical work, they tend to answer with “It depends.” Because a lot of our day-to-day depends on the specific data science consulting project but also on the client. So after recently completing…… Continue reading First project as a Data Science Consultant: tasks, tools, meetings
What do I want to learn this month? I liked doing this last month, so let’s continue this tradition of sharing my study goals for now. Quick context, I already have university degrees and I started a job in January in Data Science that I enjoy, so my goals are largely internally motivated. The number…… Continue reading Machine Learning Study Goals – August 2022
Many models in machine learning don’t work with categorical data. So what do we do in that case? Of course you can always just remove them, but you would lose a lot of valuable information. So in this post, I share how you can use one hot encoding to make that information usable. I stumbled…… Continue reading One Hot Encoding – How to deal with categorical data in Machine Learning
I recently stumbled upon a research paper about decision trees that made me feel interested in decision trees for the first time in my life. Let me tell you why. My fascination with Neural Networks During my master’s degree (2020-2021) in Computer Science I developed a fascination for neural networks. But not because of their…… Continue reading Are Decision Trees the Siblings of Neural Networks? – Interesting research
You have probably heard of the train-test-split in the context of machine learning, which is fairly intuitive. Show some examples to your model, let it learn and then test it on other examples. But there is one more data split that is used and that is the train-validation-test split or sometimes achieved by using cross-validation.…… Continue reading What is validation data used for? – Machine Learning Basics
Summary of Chapter 1, called “The importance of context”, of the book Storytelling with Data. Book website. Book cover of the original book I’m talking about here, would recommend. Don’t have affiliate links, so please find it yourself in your favorite book store. Presenting data is a scientific and fact-based endeavor, right? “Storytelling with Data”…… Continue reading How to present data in context – Chapter 1 Summary Storytelling with Data
There has been a lot of talk about “the sexiest job” of… I think it was back in 2012?! I’m talking about the role of Data Scientist. So a few people know what that is about. But what then is a Data Science Consultant?! Since I have been recently hired in such a role, let…… Continue reading I’m a Data Science Consultant – What even is that?