I have been using Python
as my daily tool and got a chance to work with R
during my Spring 2017 semester at Columbia University. I appreciate R
for its statistical tools and amazing visualization libraries like ggplot
. For the final project, I wanted to use Python
again but also wanted to use R
visualization tools.
My online search led me to a python package called rpy2
and I will show how we can use it to integrate R
in Jupyter Notebooks via Cell Magics.
My preferred way to set up Python Environments¶
In the past I liked using virtualenv and virtualenvwrapper, but have recently shifted to using Conda and use it everyday in my workflow. A general walkthrough follows :
Create the virtual environment
conda create -n py_R python=3.6
Install the pre-requistie packages
conda install rpy2
conda install jupyter
conda install pandas
Imports for the Code¶
## Lets try out some code
import pandas as pd # do all processing in pandas, and convert Pandas DataFrame to R DataFrame
## imports required from rpy2
from rpy2.robjects import pandas2ri
%reload_ext rpy2.ipython
## HTML changes to align the images to the center of the screen
from IPython.core.display import HTML
.output_png {
display: table-cell;
text-align: center;
vertical-align: middle;
Pandas DataFrame : Just initialize a data frame¶
df = pd.DataFrame({'Letter': ['a', 'a', 'a', 'b','b', 'b', 'c', 'c','c'],
'X': [4, 3, 5, 2, 1, 7, 7, 5, 9],
'Y': [0, 4, 3, 6, 7, 10, 11, 9, 13],
'Z': [1, 2, 3, 1, 2, 3, 1, 2, 3]})
Using rpy2 is as simple as using cell magic R¶
You can also specify the input pandas
dataframe to R
cell using %%R
with -i
with the height and width specified after -w
and -h
arguments and the units specified using the -u
%%R -i df -w 900 -h 480 -u px
## Everything in here is ** R ** (magic)
print(df) # Tada, df is now an R dataframe
library("ggplot2") # If this line does not work, make sure you have R installed on your laptop
ggplot(data = df) + geom_point(aes(x = X, y = Y, color = Letter, size = Z))
comments powered by Disqus