Consulting can be a frustrating business to enter into as a newcomer. Because every time you ask a consultant a question about their typical work, they tend to answer with “It depends.” Because a lot of our day-to-day depends on the specific data science consulting project but also on the client.
So after recently completing my first data science consulting project after 7ish months on the job, I will now give you my experience as one example point. One point of reference. And hopefully you can find other points of reference that will together give you a rough idea of the job.
If this is your first time hearing of this job, I posted a quick introduction of it here and I described how I became a data science consultant here.
Table of Contents
- The project pitch
- Project parameters: how many people, how long, how expensive?
- Awaiting data access: the eternal struggle as a data scientist
- My day-to-day work life
- Technologies we used during this project
- Technologies we did NOT use and why
The project pitch
You probably have a phone service contract. And possibly a Netflix account. And maybe a gym membership. What do all of these companies have in common? They need to collect monthly payments from all of their customers. This gets particularly gnarly when payments can’t be collected, then they are delayed, late fees collect and at some point the money has to be brought in otherwise or be written off.
As it turns out, there are 3rd party companies that provide this as a service. They keep track of the payments and the warning letters when the payments are late etc. This is exactly the sort of business our client does.
Here is where Data Science or Analytics comes in: Our client has been keeping track of all of these payments and updates. Now, that’s their job. But this also means that they have been collecting data. Lots of it. And in the age of Data Science and Artificial Intelligence it would be a tragedy to let this data gold go to waste.
Our client wanted to provide an additional service to their customers in form of an analytical dashboard that gives insight into how payments are going, which customers are the most reliable and how the process might be improved.
Project parameters: how many people, how long, how expensive?
Okay, I actually can’t disclose how much that project cost, I would probably be fired and sued. Sorry, headline-clickbait.
But I can tell you, that 2 “official” Data Scientists were brought onto the team from our company. That’s me and one colleague. We did double-function as data analysts though, as there were no further data employees on the team.
Additionally, a senior colleague from our Analytics department was on the team part-time – while also supervising another project at the same time.
After some adjustments to the timeline the project went a little over 3 months. Given the project team size, this is a very short project and the (realistic) goal was not getting the dashboard live into production, but instead more of a “Proof of concept” and doing initial analysis of the data.
Awaiting data access: the eternal struggle as a data scientist
Not to be dramatic, but I’ve heard from others in the field that sadly my experience on this data science consulting project is not uncommon. We had to wait a while until we had access to the first data tables and after that we realized frequently that we needed additional tables or columns from the production data for our analysis.
Why am I telling you this? The goal here is not to blame anyone, but instead bring awareness to the fact that a significant portion of our time as data consultants is spent doing tasks besides SQL or Python coding or dealing with data in general. Simply because we don’t have the data yet.
What did we do instead? First off, we did a “gap-analysis” where we looked at the current data report and the client’s requirement and figured out what still needed to be done to meet the requirements. This work was mostly done in Excel tables and on digital whiteboards. We also got introductions into the subject matter of the invoices and warning-cycles in collaboration with the client.
My day-to-day work life
Each morning we had a short meeting at 10am. Fun fact, here these meetings are called “Daily”, while I think in the US this would be called a “Stand-Up”. In theory, this would be a meeting to distribute tasks for the day or discuss blockers on ongoing tasks quickly. In practice, those meetings often went 45 minutes and concerned general catching up and discussions about current events, sometimes even preparations for the next client meeting.
Other than that we had no fixed meetings on a regular basis. After we got access to the data, we started biweekly to weekly progress meetings, where we would present our prototype visualizations to our client to collect feedback.
The other data scientist on the team and me also frequently worked together on video calls by working through an example case in the data to understand the nuances – basically pair-programming except with data-digging. This is to say that even when we were digging through data and coding, the work doesn’t need to be lonely. If you work better alone, that’s fine, but if you sometimes feel a bit stuck, you could also arrange a video call to talk it through.
Technologies we used during this project
Now this is one of those classic questions where a lot of it depends on the project and the client.
- Python: Probably not a surprise, but we used Python for our data analysis scripts. The client knew Python, both of us data scientists knew Python.
- Jupyter Notebooks: Now this is a bit more controversial. We used Jupyter Lab in the browser as our “IDE”. Two reasons: 1) this was only a prototype and would later be implemented in a dashboarding software anyways, and 2) we were working remotely on the clients server and Jupyter Lab is easy and quick to set up for such a short project
- Bitbucket & Git: Notebooks are bit finicky to share over git, but we did upload them at specific intervals to the client’s Bitbucket server. Bitbucket is another host similar to Github and works with largely the same Git-commands
- SQL on Oracle databases: to access the data we used SQL Developer (Oracle Software) and the Python library sqlalchemy to access the database directly from the Jupyter notebook
Technologies we did NOT use and why
- Jira or Confluence: We did use the client’s Jira system, but only for very high-level management and it was only updated a few times throughout the project. This is a quirk of small consulting projects. Either the client needs access to your ressources, or more commonly they need to give all external consultants access to their Jira, which in this case was just too much hassle for such a short and small project.
If you have any further questions, feel free to leave them in the comments and I’ll do my best to answer them 🙂