I recently stumbled upon a research paper about decision trees that made me feel interested in decision trees for the first time in my life. Let me tell you why.
Table of Contents
- My fascination with Neural Networks
- Decision Trees are more popular than Neural Networks
- Similarities between Decision Trees and Neural Networks
- The research paper: Adversarial examples and robustness
My fascination with Neural Networks
During my master’s degree (2020-2021) in Computer Science I developed a fascination for neural networks. But not because of their current hype and the tasks they can solve now. Instead I was captivated by the idea that despite getting a lot of recognition and attention these days, people still did not understand how these models were learning anything at all! And so my descent into theoretical research papers began and culminated in my master’s thesis on exactly this topic. Descending that is. Because my thesis was about Stochastic Gradient Descent. Get it? Anyways…
Decision Trees are more popular than Neural Networks
Now I’m in my first year in an industry job away from research and to my dismay neural networks are not very popular for common machine learning and data science use cases – that is, if you are not at a purely AI-focused company. I work at a consulting company that offers IT services for middle-sized companies. These companies often simply don’t have the huge amounts of data to utilize neural networks, especially not image data or sequential data, where neural networks shine. Instead they have small amounts of tabular data (often in excel sheets or old database designs, but that’s another topic).
Another problematic factor is the black-box nature of neural networks. Companies want to understand the decisions of the models they employ and often need to justify these decisions to their managers or stakeholders. “Because the AI said so” is obvisouly not a great argument.
Decision trees on the other hand have the huge advantage of being very interpretable. Take the below trained tree about predicting if a patient has a high risk for heart diesase or not. It is very apparent that the first important feature is the ST slope (medical term).
Unfortunately, decision trees also appear fairly boring. Possibly because they are so darn interpretable. It seem super obvious how they work, no mystery whatsoever. Good for industry decisions, bad for curious students and researchers. Or maybe there is more to them? Let’s first look at their connection to neural networks on the surface:
Similarities between Decision Trees and Neural Networks
First off, both have variants for regression as well as classification. That’s probably where the similarities end for basic decision trees. But wait, there’s more!
Decision trees are just the beginning. In most use cases, especially with larger amounts of data or complex problems, we instead use a random forest, or use techniques such as bagging or boosting the tree. There is even something called “Extreme Gradient Boosted Trees” (short XGBoost), which is a very competitive algorithm that along with neural networks wins many of today’s machine learning competitions.
Gradient boosting in turn can be viewed as functional gradient descent, which opens up another parallel to neural networks which are typically trained with a variation of stochastic gradient descent. So here we come full circle to my area of interest.
The research paper: Adversarial examples and robustness
A common problem neural networks face is that they are very fickle. After training they might make correct predictions on some examples, but even slighlty altering this example – like changing a few pixels in an image, a difference so small it’s not detectable with the naked eye – can lead to a completely different prediction. These slightly altered examples are called adversarial examples and can be used in malicious attacks on the model.
And it seems like decision trees share this problem as discussed in this research article from 2019, called “Robust Decision Trees Against Adversarial Examples” by Chen et al. Below I have included the start of the paper and it will be easy to see how the authors piqued my interest in the very first sentence:
Although adversarial examples and model robustness have been extensively studied in the context of linear models and neural networks, research on this issue in tree-based models and how to make tree-based models robust against adversarial examples is still limited. In this paper, we show that tree based models are also vulnerable to adversarial examples and develop a novel algorithm to learn robust trees.Beginning of abstract of “Robust Decision Trees Against Adversarial Examples”, Chen et al 2019
Sadly, I quickly realized that my background knowledge on decision trees, especially their boosted variants, is too small to follow this paper further than the introduction, but if you have more knowledge you might appreciate it. It is of course free to read under the above included link like most machine learning research 🙂
Now, are decision trees the long lost sibling of neural networks? It might be a stretch to claim this, but my research interests have been tickled enough to continue down this rabbit hole. I have now decided to study up on decision trees to fill this void in my knowledge and to find out how similar the two model classes are. And maybe decision tree research can be my bridge between my research interests and my growing experience in industry and cure my homesickness.
Proper credit to the mentioned paper
Chen, H., Zhang, H., Boning, D. & Hsieh, C.. (2019). Robust Decision Trees Against Adversarial Examples. Proceedings of the 36th International Conference on Machine Learning, in Proceedings of Machine Learning Research 97:1122-1131 Available from https://proceedings.mlr.press/v97/chen19m.html.