Data Engineering vs Data Science

Mark Subra
2 min readNov 21, 2020

The terms data science and data engineering get thrown around a log, but what is the difference? What are the similarities? Both have to do with vast amounts of data, but where do they diverge, and where do they overlap?

Data Engineering

Data engineering involves preparing the data infrastructure for analysis. This usually involves extract, transform, load (ETL) operations. Engineers are focused on building and maintaining the data pipeline systems and attributes such as formatting, scaling, and security.

Data engineers are more likely to have a software engineering background or other engineering background. They may be proficient in other computer languages than Python such as Java or Scala. They are also tasked with developing and maintaining systems for handling large volumes of data. Their main purpose is to help the data scientist to use the data for making valuable insights.

Common skills include:

  • Linux and command line.
  • Experience programming in at least Python or Scala/Java.
  • SQL
  • Understanding of distributed systems in general and how they are different from traditional storage and processing systems
  • How to access and process data.

Data Science

Data science is considered to be an advanced level of data analysis driven by machine learning or artificial intelligence. Before data engineering roles, data scientists built the pipelines and cleaned the data, however, with increasingly large volumes of data, engineers are needed to maintain and build the infrastructure.

Data scientists now focus more on analyzing and interpreting data to find insights. Data scientists and engineers work together and complement each other.

While data engineers have an engineering or software background, data scientists usually have a math or statistics or other physical or natural science background. Some may even have an economic or financial background. Data scientists have more interaction with the business side. This involves analyzing data for the well being of the business and presenting it in an understandable fashion.

Common skills include:

  • Python and R languages
  • Data visualization and interpretation to provide valuable and actionable business insights
  • Advanced statistical and mathematical analysis
  • Machine learning and artificial intelligence

Overlap

Data science and data engineering overlap in a few ways. Both require analytical skills, programming, and big data. A data scientist will typically have more advanced analytical skills, while an engineer will have stronger software and programming skills.

A data scientist can create a data pipeline, but an engineer will be far more proficient. A data scientist uses programming as a tool to enhance their analytical skills for business insights while a data engineer uses programming as their primary tool.

The roles are ultimately complementary with one compensating for the other’s deficiencies. Both involve the use of big data. Data engineers use their programming skills to create big data pipelines while the scientists use their more limited programming skills create data products using the data pipelines.

--

--

Mark Subra

I am a data scientist having recently graduated from the Flatiron School Immersive Data Science Bootcamp