Data-related Job Descriptions: Making of a Data Team
Last updated:- CDO (Chief Data Officer)
- Data Lead
- Data Scientist
- Data Engineer
- Database Engineer
- Data Analyst
- Machine Learning Engineer
- Other roles
- Suggestions of Team Setups
WIP Alert This is a work in progress. Current information is correct but more content may be added in the future.
The actual position names may, of course, vary to some extent depending on your location or industry. The ones used in this article have been chosen according to months of observation of job descriptions, news articles and, most importantly, day-to-day conversation with people involved with data work.
CDO (Chief Data Officer)
Who you should be | Executive profile, ideally with previous data-related work |
What you should do | • Oversee all data operations in your organization, at a high level • Prioritize tasks according to business impact • Define the company's data strategy • Connect data initiatives with the organization's overall business objectives |
What you should not do | |
What you may do | |
Sample tools | • Spreadsheets (e.g. MS Excel) • Project Management Software (e.g. MS Project) • Other Generic tools (e.g. AirTable) |
Data Lead
Who you should be | • Senior employee, with extensive experience in data work and deep domain expertise • Relevant Advanced Degrees are expected |
What you should do | • Oversee data work done through the company, at a more technical level than the CDO • Use domain knowledge to steer data scientists towards solving actual business problems • Mentor other employees (maybe conduct workshops or technical training) • Make technical decisions and/or settle differences in opinion within data teams |
What you should not do | |
What you may do | |
Sample tools |
A data lead will work as an intellectual leader for all employees working with data in the organization. They will make technical decisions if necessary, and help turn the company's data strategy (if any) into actual systems and projects.
He/she should have a strong track record of data work him/herself and possibly some managerial experience. In addition to that, he/she should have deep domain expertise and offer guidance and mentoring to other members of the organization.
The role of a data lead may of course be held by someone who also performs other work in the organization. This position commonly emerges from the organization itself when it's necessary; it will often be the most senior data scientist or engineer, or the person whose opinion is perceived to carry more weight in discussions.
Data Scientist
Who you should be | • A graduate from a highly quantitative field; or a CS graduate with a strong quantitative/statistical bent • Knowledgeable in the domain area (e.g. finance, retail, marketing, etc) you are working on • Relevant Advanced Degrees are nice, but not required. |
What you should do | • Explore and create useful visualizations for data • Train predictive and descriptive stastistical models based on data • Excel at your ML toolkit of choice (R, Python, Java, etc) • Suggest ways in which value can be extracted from data (actionable insights) |
What you should not do | |
What you may do | Learn about the business in order to become a domain expert, so that you can help shape the company strategy with data-driven knowledge. |
Sample tools | • Jupyter Notebook • RStudio • SAS |
Data Engineer
Who you should be | Ideally a CS graduate, with previous experience with Distributed Systems, System Administration and Software Development |
What you should do | • Write data pipelines to move data around the infrastructure • Write streaming and batch jobs |
What you should not do | |
What you may do | |
Sample tools | Workflow/Pipeline Orchestrator (e.g. Airflow) Distributed Soft. Frameworks (e.g. Spark/Hadoop MR) Message Queues (e.g. ActiveMQ) Data Streams (e.g. Kafka) Caching tools (e.g. Redis) Non-relational Data Stores (e.g. Elasticsearch, MongoDB, S3) RDBMSs (e.g. PostgreSQL) |
Database Engineer
Who you should be | Ideally a CS graduate, with a database bent |
What you should do | • Optimize data stores depending upon access patterns • Debug slow queries • Choose the best database for a given task • Keep databases running smoothly |
What you should not do | |
What you may do | |
Sample tools | RDBMSs Document Stores (e.g. Elasticsearch, MongoDB) Columnar Data Stores (e.g. Cassandra) Data Warehousing Tools (e.g. Redshift) |
Data Analyst
Who you should be | Ideally a graduate from a quantitative field |
What you should do | • Communicate with and obtain data from external sources • Query databases • Deliver high-level analyses of data, such as means, sums, counts, counts per day, outliers, etc. |
What you should not do | |
What you may do | • Prepare executive reports • Clean datasets (merge datasets, take out bad data, etc) • Build datasets from multiple sources |
Sample tools | Spreadsheets Data analysis tools (e.g. Tableau) PostgreSQL |
Machine Learning Engineer
This is sometimes referred to as "Software Engineer - Machine Learning"
Who you should be | Ideally, an experienced software engineer with good knowledge of machine learning |
What you should do | • Engineer the system to serve the models to clients • Know how to use out-of-the-box machine learning toolkits |
What you should not do | Try to develop your own models from scratch |
What you may do | |
Sample tools | • Black-box ML Solutions (e.g. Prediction.io) • ML Toolkits (Scikit-learn) • Web Frameworks (e.g. Flask, Ruby on Rails) |
This is probably the most versatile role so far.
Machine learning engineers will build working systems that deliver machine learning solutions and connect them to other systems.
They split their time more or less evenly between software development and building statistical models for data.
They know how to build general-purpose software systems (along with all associated tasks such as testing, versioning, deploying and implementing software engineering best practices) and they know enough about machine learning algorithms and tools in order to use them to add value to the organization.
They probably do not, however, have the same level of expertise in their respective areas as specific roles such as Data Engineers or Data Scientists. For that reason, larger teams should have at least one Data Engineer and one Data Scientist to focus on and optimize their respective areas.
Other roles
Business Analyst
TODO
Suggestions of Team Setups
1) 1 Data Scientist + 1 Data Engineer + 2 Data Analysts
Company Size | Company Data Maturity |
---|---|
Small - Medium | Medium - High |
2) 1 Machine Learning Engineer + 1 Data Engineer
Company Size | Company Data Maturity |
---|---|
Small | Low |
3) 1 Data Scientist + 1 Data Engineer + 1 Database Engineer
Company Size | Company Data Maturity |
---|---|
Small-Medium | Medium-High |