R Vs Python for Machine Learning
So you're thinking of building a machine learning project, and it's time to decide on a programming language. Although programming languages R and Python offer similar capabilities, they differ in syntax, libraries and community support. Let's take a closer look at the two.

Description
What is R?
R is a language and environment for statistical computing and graphics. It was created by statisticians for statistics, specifically for working with data. It is used by companies like Deloitte, Facebook, Instagram and Google.
What is Python?
Python is a general purpose programming language. It was developed with the goal of improving code readability. It is used by companies like Dropbox, YouTube, Instagram and Google.
Libraries

Data Collection
R supports Excel, CSV and text files, as well as files built in Minitab or in SPSS format.
Python in comparison supports all kinds of data formats, from CSV files to JSON sourced from the web, and SQL tables. Python's request library allows you to easily grab data from the web, although modern R packages like Rvest can be used for basic web scraping.
Data Wrangling/Exploration
R allows you to build probability distributions, apply different statistical tests, and use standard ML and data mining techniques. It is optimized for statistical analysis of large datasets, and it offers a number of different options for exploring data.
Python in comparison allows you to explore data with the data analysis library Pandas. With this library you can filter, sort and display data in a matter of seconds.
Data Visualization
Since R was built to demonstrate the results of statistical analysis, you can easily create basic charts and plots with the base graphics module. Using the library ggplot2 more advanced plots can be created, such as complex scatter plots with regression lines.
Python in comparison is not as strong for data visualization. The Matplotlib library for generating basic graphs and charts. The Seaborn library allows you to draw more attractive and informative statistical graphics in Python.
Build your intuition. Is this statement true or false?
R allows for more advanced data visualization capabilities in comparison to Python.
Press true if you believe the statement is correct, or false otherwise.
Let's test your knowledge. Is this statement true or false?
R is designed to be highly readable in comparison to Python.
Press true if you believe the statement is correct, or false otherwise.
Code/Syntax
As Python was created with emphasis on code readability it is regarded as easier to pick up in comparison to R. Let's take a look at the actual coding syntax for importing a csv file and finding the mean.
R Code
1library(readr)
2
3nba_data <- read_csv("nba_2013.csv")
4
5library(purr)
6library(dplyr)
7
8nba_data %>%
9 select_if(is.numeric) %>%
10 map_dbl(mean, na.rm = TRUE)
Python Code
1import pandas
2
3nba_data = pandas.read_csv("nba_2013.csv")
4
5nba_data.mean()
Comparing both languages, you can see why Python is regarded as easier to read and pick up in comparison to R.
Advantages & Disadvantages

R
Advantages
- Open source.
- Strong for statistical analysis.
- Hundreds of well established packages/libraries devoted to analytics.
- Easy to build visualizations.
Disadvantages
- Steeper learning curve since it is a more challenging language to learn.
- Need knowledge of a large amount of packages.
- Can run slowly due to how R stores data.
Python
Advantages
- Open source.
- General-purpose language thus regarded as a better choice over R if your project demands more than just statistics.
- Easy to read and learn thus programming skills can be developed faster and it is a more productive language.
- Integrates better in comparison to R for example with lower level languages like C, C++.
- Growing number of libraries for data analysis.
Disadvantages
- Processing speed can be slow.
- Uses a large amount of memory.
- It includes fewer statistical model packages in comparison to R.
Build your intuition. Is this statement true or false?
Both R and Python are open source programming languages.
Press true if you believe the statement is correct, or false otherwise.
Conclusion
So which is better, is it Python or R? Well the honest answer is that it really depends on your ML project.
If your project is heavily statistics based then R is most suitable, whereas if you are looking to build larger scale, production ready, ML projects Python is the best match.
One Pager Cheat Sheet
- R and Python both offer great capabilities for
machine learning
projects, but have different syntax, libraries, and community support. - R is a statistical language and
environment
while Python is a general purpose programming language. - R and Python both provide a variety of
libraries
for data collection, data wrangling/exploration, and data visualization. - R offers
ggplot2
for more complex graphical representations, whereas Python relies onMatplotlib
andSeaborn
for basic and more advanced visualizations, respectively. - R is generally considered to be less readable and accessible than Python due to its complex syntax and low-level programming language abstractions.
- Python is
regarded as easier to read
andpick up
than R due to its emphasis on code readability. - Python and R are both
open source
, however Python is generally considered easier to read and learn with a growing number of libraries for data analysis while R has more packages available devoted to analytics but can run slowly due to data storage. - The software code of
open source
programming languages like R and Python can be freely used, modified, and distributed without any restriction, encouraging collaboration and enabling commercial and research applications. - It
really depends
on yourML project, but generally speaking, R is best for heavily statistics-based projects, while Python is better forlarger-scale, production-ready
projects.