Paper Summary: Software Engineering for Machine Learning: A Case Study

Paper Summary: Software Engineering for Machine Learning: A Case Study

Last updated:

Please note This post is mainly intended for my personal use. It is not peer-reviewed work and should not be taken as such.


The article summarizes best practices and lessons learned in software engineering in AI/ML/DS-related projects at Microsoft.

Authors report findings on:

  • Where ML-based software projects are different from regular projects;

  • How to adapt Agile principles to ML-based systems

Also, they suggest a maturity model to help gauge where a company/team is in the ML-engineering practice (like CMMI).


Authors surveyed around 550 software engineers working with AI-enabled products at Microsoft, asking about their work and their level of expertise.


3 Fundamental Differences

Authors argue there are 3 fundamental differences between ML-based and regular software projects:

  • 1) Managing and versioning Data is needed for ML systems and it's harder to version than code.

  • 2) ML knowledge (in adddition to Softw. Eng knowldge) is needed to build ML-systems

  • 3) It's harder to keep modules/components decoupled in ML systems

Issues depend on who you ask

  • Some issues are important for all engineers working in ML-based systems, such as Data availability, processing and management.

  • Other issues are different depending on who you ask:

    • To senior engineers, AI Tooling, Scaling and Deployment are more pressing issues
    • To junior engineers, Training in ML concepts of utmost importance

Data versioning is hard

Though there are good and mature tools for versioning code, tools for versioning data are still few and new.

One example is DVC.

ML-systems get coupled in nonobvious ways

In addition to the usual coupling between software modules, ML-based software systems have at least two other constraints that make the problem worse:

  • It is hard to reuse and extend models the same way you would a generic module

  • Models interact with each other even when they do not share anything

    • The output of one model (e.g. movie recommendation) may affect how data is produced for other models (e.g. more data will be generated for movies recommended by the other model)


  • "While feedback loops are typical in Agile software processes, the peculiarity of the machine learning workflow is related to the amount of experimentation needed to converge to a good model for the problem."

  • "... integration of machine learning components is happening all over [Microsoft], not just on teams historically known for it."

MY 2¢

  • I didn't see a lot of value in the Maturity Model, it seems just a thin layer over CMMI levels.

  • Not a lot of mention of two topics I find very relevant to SE for ML products:

    • Monitoring
    • Train/serve skew


Dialogue & Discussion