Python vs R for Data Science

Mark Subra
3 min readOct 31, 2020

Python and R both have their strengths and weaknesses when it comes to data science. One language isn’t necessarily better than the other, but it comes down to the application and the solution to the questions you’re trying to answer. Data scientists should know both languages to some degree as they are the most used languages for data analysis and statistics. A very basic distinction is that Python is a general purpose language while R was developed for statistics specifically.

Python

Python is a general purpose language, and it is an object-oriented programming (OOP) language. It has the properties of OOP languages such as inheritance, polymorphism, encapsulation, and abstraction. Python is better suited to be used in production by software engineers while R would likely need to be converted into something else.

People with a software engineering background will find it easier to use Python rather than R because they would be familiar with OOP. Coding and debugging would be easier. Python is considered to be one of the easier languages to learn due to its simple syntax and similarity to the English language.

Python is also great for building data science pipelines and machine learning products. It can be easily integrated with web frameworks. The Python Package Index (PyPi) is a repository of Python software with all libraries. R has similar packages which can be installed in R with one line.

Some disadvantages are that Python doesn’t have as many data science specific libraries as R. Visualizations in Python are also not up to par when compared to R.

R

R is used primarily in academics and research, and it was developed with statistical applications in mind. R is also used by statisticians, engineers, and scientists without a programming or computer science background.

R is easier to learn for those without coding experience and statistical models can be written in a few lines. Data analysis may also be easier as users can string their workflows together. R uses packages which are collections of R functions whereas Python uses libraries. R packages can be installed in one line of code.

R is easier to learn when starting out, however, the advanced functions are more complicated to learn. Programmers with experience will be able to learn easily.

R is considered the best tool for visualizations while Python has inferior tools for data visualization. R is excellent for data and statistical analyses and is superior to Python in that regard.

Some disadvantages in R are that it is not as effective for deep learning or NLP as Python. R is geared for statistical analysis.

Which is Better?

There is no answer. Each language is best suited for certain applications. Learning Python can translate to other fields such as software engineering.

Learning both Python and R is a good idea, but it seems that Python is more popular. As far as the job market goes, employers are looking for data scientists who know both. My personal path is that I learned Python first because there were more resources for learning it. My bootcamp experience did not include R, but there were several materials post-bootcamp provided.

--

--

Mark Subra

I am a data scientist having recently graduated from the Flatiron School Immersive Data Science Bootcamp