Data-related Job Descriptions: Making of a Data Team

Last updated: 19 Mar 2017

Table of Contents

CDO (Chief Data Officer)
Data Lead
Data Scientist
Data Engineer
Database Engineer
Data Analyst
Machine Learning Engineer
Other roles
Suggestions of Team Setups

WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.

The actual position names may, of course, vary to some extent depending on your location or industry. The ones used in this article have been chosen according to months of observation of job descriptions, news articles and, most importantly, day-to-day conversation with people involved with data work.

CDO (Chief Data Officer)


Who you should be	Executive profile, ideally with previous data-related work
What you should do	• Oversee all data operations in your organization, at a high level • Prioritize tasks according to business impact • Define the company's data strategy • Connect data initiatives with the organization's overall business objectives
What you should not do
What you may do
Sample tools	• Spreadsheets (e.g. MS Excel) • Project Management Software (e.g. MS Project) • Other Generic tools (e.g. AirTable)

Data Lead


Who you should be	• Senior employee, with extensive experience in data work and deep domain expertise • Relevant Advanced Degrees are expected
What you should do	• Oversee data work done through the company, at a more technical level than the CDO • Use domain knowledge to steer data scientists towards solving actual business problems • Mentor other employees (maybe conduct workshops or technical training) • Make technical decisions and/or settle differences in opinion within data teams
What you should not do
What you may do
Sample tools

A data lead will work as an intellectual leader for all employees working with data in the organization. They will make technical decisions if necessary, and help turn the company's data strategy (if any) into actual systems and projects.

He/she should have a strong track record of data work him/herself and possibly some managerial experience. In addition to that, he/she should have deep domain expertise and offer guidance and mentoring to other members of the organization.

The role of a data lead may of course be held by someone who also performs other work in the organization. This position commonly emerges from the organization itself when it's necessary; it will often be the most senior data scientist or engineer, or the person whose opinion is perceived to carry more weight in discussions.

Data Scientist


Who you should be	• A graduate from a highly quantitative field; or a CS graduate with a strong quantitative/statistical bent • Knowledgeable in the domain area (e.g. finance, retail, marketing, etc) you are working on • Relevant Advanced Degrees are nice, but not required.
What you should do	• Explore and create useful visualizations for data • Train predictive and descriptive stastistical models based on data • Excel at your ML toolkit of choice (R, Python, Java, etc) • Suggest ways in which value can be extracted from data (actionable insights)
What you should not do
What you may do	Learn about the business in order to become a domain expert, so that you can help shape the company strategy with data-driven knowledge.
Sample tools	• Jupyter Notebook • RStudio • SAS

Data Engineer


Who you should be	Ideally a CS graduate, with previous experience with Distributed Systems, System Administration and Software Development
What you should do	• Write data pipelines to move data around the infrastructure • Write streaming and batch jobs
What you should not do
What you may do
Sample tools	Workflow/Pipeline Orchestrator (e.g. Airflow) Distributed Soft. Frameworks (e.g. Spark/Hadoop MR) Message Queues (e.g. ActiveMQ) Data Streams (e.g. Kafka) Caching tools (e.g. Redis) Non-relational Data Stores (e.g. Elasticsearch, MongoDB, S3) RDBMSs (e.g. PostgreSQL)

Database Engineer


Who you should be	Ideally a CS graduate, with a database bent
What you should do	• Optimize data stores depending upon access patterns • Debug slow queries • Choose the best database for a given task • Keep databases running smoothly
What you should not do
What you may do
Sample tools	RDBMSs Document Stores (e.g. Elasticsearch, MongoDB) Columnar Data Stores (e.g. Cassandra) Data Warehousing Tools (e.g. Redshift)

Data Analyst


Who you should be	Ideally a graduate from a quantitative field
What you should do	• Communicate with and obtain data from external sources • Query databases • Deliver high-level analyses of data, such as means, sums, counts, counts per day, outliers, etc.
What you should not do
What you may do	• Prepare executive reports • Clean datasets (merge datasets, take out bad data, etc) • Build datasets from multiple sources
Sample tools	Spreadsheets Data analysis tools (e.g. Tableau) PostgreSQL

Machine Learning Engineer

This is sometimes referred to as "Software Engineer - Machine Learning"


Who you should be	Ideally, an experienced software engineer with good knowledge of machine learning
What you should do	• Engineer the system to serve the models to clients • Know how to use out-of-the-box machine learning toolkits
What you should not do	Try to develop your own models from scratch
What you may do
Sample tools	• Black-box ML Solutions (e.g. Prediction.io) • ML Toolkits (Scikit-learn) • Web Frameworks (e.g. Flask, Ruby on Rails)

This is probably the most versatile role so far.

Machine learning engineers will build working systems that deliver machine learning solutions and connect them to other systems.

They split their time more or less evenly between software development and building statistical models for data.

They know how to build general-purpose software systems (along with all associated tasks such as testing, versioning, deploying and implementing software engineering best practices) and they know enough about machine learning algorithms and tools in order to use them to add value to the organization.

They probably do not, however, have the same level of expertise in their respective areas as specific roles such as Data Engineers or Data Scientists. For that reason, larger teams should have at least one Data Engineer and one Data Scientist to focus on and optimize their respective areas.

Other roles

Business Analyst

TODO

Suggestions of Team Setups

1) 1 Data Scientist + 1 Data Engineer + 2 Data Analysts

Company Size	Company Data Maturity
Small - Medium	Medium - High

2) 1 Machine Learning Engineer + 1 Data Engineer

Company Size	Company Data Maturity
Small	Low

3) 1 Data Scientist + 1 Data Engineer + 1 Database Engineer

Company Size	Company Data Maturity
Small-Medium	Medium-High

References

queirozf.com: Data Teams

Felipe 19 Mar 2017 19 Mar 2017 data-science