From silos to collaboration

within the Data Science lifecycle

How to bridge the gap?
 

To be able to get data science models to work, organisations need engineering (DevOps) capacity next to Data Scientists. Companies therefore rely on a mixture of Data Scientists and DevOps engineers, and neet to let them work together.

We speak different languages

Data Scientists are the problem solvers of this world. They experiment on vast amounts of data to understand it, find patterns and build intelligent models.

DevOps engineers are the saviours of this world. Their task is to put all strings together to bring models to operation, which rely on different datasets, languages, libraries and software packages.

Both teams are needed in an organisation but speak different languages. Data Scientists experiment and prototype to get the best model, whereas DevOps focus on reproducibility and continuity.

How to bridge this gap?

The integration of data science operations into existing IT processes and teams is a very hard challenge that lies ahead for managers. Data Scientists are new to the table, and need best practices and tools to collaborate with IT. Data Scientists generally lack the expertise and authority to bring models to production. DevOps engineers typically lack the skills to develop the best models. No one can do it alone.

It’s therefore important understanding the journey of a Data Scientist and DevOps engineer. Where and how do they contribute within the machine learning lifecycle?

As the figure above shows the journey of a Data Scientist starts at preparing the data and stops at when a model (proof of concept) is ready. Their task is to build the best model from the data. The DevOps engineer picks it up after the model is ready and needs to roll it out and manage it accordingly during the rest of its lifecycle. When a new or better model needs to be created the Data Scientist comes into play again and creates a new and better version. The DevOps engineer takes over and rolls out accordingly – and so on. This is what we call the data science lifecycle. Continuous iterations of improvements and optimizations are needed to get to the best result. Often many versions of the same model are created in such a process and many handovers exist between the Data Scientist and the DevOps engineer.

To work effectively together Data Scientists and DevOps engineers need to know who has created what, at which time, at what place and with which data.

Many questions will arise between the Data Scientists and DevOps engineers when models will become operational in online environment.

  • What is the latest version of the model (version control)?
  • Does it function well in production (metrics & logging)?
  • Do we know when and what will go wrong in our production process? (metrics & logging)?
  • Does it perform better than the previous model (A/B testing)?
  • Who was actually responsible for building the latest version, and has this person the right authorization to make a change (users & permissions)?
  • How can we save our credentials and be safely used by your team (credential management)?
  • If we would like to go back to a previous version, is that easy to do (rollbacks)?
  • If we only would like to change a component in our pipeline and leave the rest intact, is that possible (pipelines)?
  • Can we easily create a multitude of the same pipelines for different end-users (autoscaling)?

Profielfoto Sascha

Sascha

Xenia Expert

 

Speak to one of our specialists

If you have any questions about Xenia or if you would like to receive a demo about our platform talk to one of our specialists by contacting us

  • This field is for validation purposes and should be left unchanged.