Linking R with Intel’s math kernel libraries

March 13, 2015:  I purchased a new computer, so I needed to re-link R with MKL.  I’ve modified the following post with a few corrections.

Matrix operations are usually the most computationally intensive task in a statistical program.  Fortunately, R and most modern programming languages rely on highly optimized matrix multiplication algorithms through an external FORTRAN library such as BLAS or LINPACK, which have been continuously refined since the 1970s.

Though R comes pre-installed with these highly optimized libraries, per the installation and administration manual, its default internal BLAS implementation is described as “well-tested and will be adequate for most uses of R.”  However, it does not take advantage of recent advances utilized by external libraries such as OpenBLAS, ATLAS, and MKL, which have multithreaded supported and highly optimized cache utlization.

Using the default BLAS libraries, R is still substantially slower than its proprietary counterpart Matlab.  A major reason for this discrepancy is due to Matlab’s use of Intel’s Math Kernel Libraries for BLAS and LAPACK operations.  By offering parallel support for matrix operations and utilizing Intel architecture-specific optimizations, the MKL result in Matlab dominating R in most benchmark tests  (see, for example, this performance comparison).

However, since Intel provides both its compilers (for C and C++) and its MKL free for students (provided that they are not compensated for software development) and build R from source using Intel’s compilers and linking to Intel’s MKL.  Alternatively, Revolution R offers a free academic license and links to an older version of MKL, but it appears to only have a Linux binary for Red Hat/Fedora.

I found building R from source and linking it with MKL to be a somewhat onerous process, with existing documentation being outdated, confusing, and often misleading, so I will outline my procedure in as much detail as possible, in hopes that it may aid someone in the future.

I’m working from a fresh install of Linux Mint 17, but I am assuming that these instructions should work for any Linux-based OS.  It’s also recommended that you have an Intel chipset in order to experience the most gains from the MKL.  The computer that I used for this tutorial is a Samsung Series 9 with an Intel i5 and 4GB of RAM.

Step 1: Download and install the Intel® C++ Studio XE for Linux

The Intel C++ studio comes with MKL, a C and C++ compiler, and a few other useful things such as Vtune for profiling (though it will not work for profiling C++ code within R, I will write a post on profiling in the future).

You will also need to make sure that you have the gcc and g++ libraries installed for these libraries to function properly.  You will get a warning if they are not available, though you can safely ignore the warning for the missing 32-bit libraries if you are on a 64-bit machine.  Most of the required files can be installed via the command:

sudo apt-get install build-essential

Step 2: Pre-requisites for building R from source

Several libraries and other dependencies are needed to build R from source.  First, install a fortran compiler.  I didn’t have a license for Intel’s fortran compiler “ifort,” so I installed GNU’s gfortran.  Next, I needed to install the X11-windows system, and the readlines library, which supposedly help improve the aesthetics of R’s terminal display.  Finally, Java is needed for an external “RJava” package, which is built in the default install.  With the proper configure flags, you can probably get around installing some of these libraries, but I didn’t feel like experimenting.  The commands to install these packages are:

sudo apt-get install gfortran
sudo apt-get install libx11-dev
sudo apt-get install xorg-dev
sudo apt-get install libreadline-dev

sudo apt-get install default-jdk

sudo apt-get install libcairo2-dev

If you find that you need another library when compiling something, a good rule-of-thumb is to type “sudo apt-get install [dependency-name]-dev, which generally will provide the needed dependencies for compilation.

3. Download R’s source and edit the configuration files

R’s source code is available here.  Upon downloading it and extracting it, open the file config.site and edit the following

CC='icc -std=c99'
CFLAGS='-O3 -ipo -xavx -openmp'
CXX='icpc'
CXXFLAGS='-O3 -ipo -xavx -openmp'

This is telling R to use Intel’s compilers with the appropriate flags (-03 turns on several optimization flags, -ipo is for inter-procedural optimization, -xavx intel-specific optimization, and openmp for openmp support).

4. Create environmental variables

Next, you need to tell R where MKL and Intel’s compilers are located is located.  Your installation directory should have several bash variables that you will need to run every time that you want to use MKL or intel’s compilers.  For convenience, you can add the following three lines to your ~/.bashrc file, so that they are automatically linked in each bash session.

source /opt/intel/composer_xe_2013_sp1.1.106/bin/compilervars.sh intel64
source /opt/intel/composer_xe_2013_sp1.1.106/mkl/bin/mklvars.sh intel64
source /opt/intel/composer_xe_2013_sp1.1.106/bin/iccvars.sh intel64
export AR="xiar"
export LD="xild"

Note that your path may be different depending on the version of MKL and Intel composer that you install, but it should be along these lines.  There is an MKL link tool included in the MKL directory which can help you provide the link if you run into problems.

Finally, switch into root and run the following commands

source /opt/intel/composer_xe_2013_sp1.1.106/bin/compilervars.sh intel64
source /opt/intel/composer_xe_2013_sp1.1.106/mkl/bin/mklvars.sh intel64
source /opt/intel/composer_xe_2013_sp1.1.106/bin/iccvars.sh intel64
export AR="xiar"
export LD="xild"

export MAIN_LDFLAGS='-openmp'
OMP_LIB_PATH=/opt/intel/composer_xe_2013_sp1.1.106/compiler/lib
MKL_LIB_PATH=/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64
MKL=" -L${MKL_LIB_PATH} -L${OMP_LIB_PATH} \
-Wl,--start-group \
-lmkl_gf_lp64 \
-lmkl_intel_thread \
-lmkl_core \
-lm \
-Wl,--end-group \
-liomp5 -lpthread"
./configure --with-lapack --with-cairo --with-blas="$MKL" --build="x86_64-linux-gnu" --host="x86_64-linux-gnu" --enable-R-shlib

Previously, I did not include the lines ‘–enable-R-shlib,’ and ‘–with–cairo,’ which enable shared libraries (which are required to use external applications, such as Rstudio) and enable better graphics respectively.  To enable shared libraries, I referenced the following stack overflow post.  Note that some tutorials use the -lmkl_intel_lp64 flag instead of -lmkl_gf_lp64.  According to Ying H at Intel 

libmkl_gf_lp64  is for LP64 interface library for the GNU Fortran compilers and libmkl_intel_lp64 is for intel compilers.

So if, like me, you don’t have ifort, be sure to use the GNU fortran flag.

I re-ran all of the source variables in root because I only defined them in my personal ~/.bashrc user profile; they will not be defined for root or sudo.  I could have edited the file /etc/.bashrc, but I did not know that at the time.

If it worked correctly, you should see something along the lines of

R is now configured for x86_64-unknown-linux-gnu

Source directory: .
Installation directory: /usr/local

C compiler: icc -std=c99 -O3 -xavx -ipo -openmp
Fortran 77 compiler: gfortran -g -O2

C++ compiler: icpc -O3 -ipo -xavx -openmp
C++ 11 compiler: icpc -std=c++11 -O3 -ipo -xavx -openmp
Fortran 90/95 compiler: gfortran -g -O2
Obj-C compiler:

Interfaces supported: X11
External libraries: readline, BLAS(generic), LAPACK(in blas)
Additional capabilities: PNG, NLS
Options enabled: R profiling

Recommended packages: yes

If you don’t see this, check your config log. If you see an error in the following lines

conftest.c:210: warning: implicit declaration of function 'dgemm_'
configure:29096: $? = 0
configure:29103: result: yes
configure:29620: checking whether double complex BLAS can be used
configure:29691: result: yes

Something went wrong and R will ignore your MKL link and use its internal BLAS libraries.

 I adapted most of the configuration step from Intel’s tutorial except I found that I had to add an -lm flag to $MKL, or it wouldn’t recognize the math symbols in MKL. Myself and several others (you’ll find them if you do a google search) had problems with the use of double complex BLAS, but for me, they seemed to go away when I ran the configure step as root.  I have no explanation as to why.

Next, run make and make install while still root.  It should take about 10-15 minutes.  Once it finishes, run the command:


R CMD ldd ./bin/exec/R

And you should see:

linux-vdso.so.1 => (0x00007fff723d4000)
libmkl_gf_lp64.so => /opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_gf_lp64.so (0x00007f65cfa50000)
libmkl_intel_thread.so => /opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_intel_thread.so (0x00007f65cea92000)
libmkl_core.so => /opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64/libmkl_core.so (0x00007f65cd3d4000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f65cd0ab000)
libiomp5.so => /opt/intel/composer_xe_2013_sp1.1.106/compiler/lib/intel64/libiomp5.so (0x00007f65ccd90000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f65ccb72000)
libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f65cc858000)
libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f65cc61c000)
libreadline.so.6 => /lib/x86_64-linux-gnu/libreadline.so.6 (0x00007f65cc3d6000)
librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f65cc1cd000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f65cbfc9000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f65cbdb3000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f65cb9ec000)
/lib64/ld-linux-x86-64.so.2 (0x00007f65d0196000)
libtinfo.so.5 => /lib/x86_64-linux-gnu/libtinfo.so.5 (0x00007f65cb7c3000)

Complications

I realized a few annoyances after building R from source. I typically run R from Emacs via Emacs Speaks Statistics, and when I first tried to open R from Emacs it would crash.  This is due to the mkl and icc environmental variables only being set through in bash, and not when I launch a program.  This means that if I want to use Emacs speaks statistics, I need to execute emacs from the terminal.

Additionally, I wanted to install the “Rcpp” package and I encountered the error: “catastrophic error: cannot open source file “bits/c++config.h,” but following the advice of this stackoverflow post, I was able to resolve it.

Perhaps the most annoying issue was the poor quality of R’s default graphics, which produced plots like this:

Rplot

I posted a question on StackOverflow, but didn’t get much help, so I did some research.  It turns out that I need to install libcairo 2 and ./configure with the option –with-cairo.  Then, edit your .Rprofile in your home directory to include:

setHook(packageEvent(“grDevices”, “onLoad”),
function(…) grDevices::X11.options(type=’cairo’))
options(device=’x11′)

After doing so, your graphics should appear as they normal would after installing a package from the repository.

There may be more complications in the future in installing packages that require compiled code, but my initial performance gains appear quite substantial.  I routinely see R executing working at 400% CPU utilization when running some of my computationally intensive procedures.  I will run some benchmarks in the future, but I’ve noticed performance gains roughly similar to those of the Revolution R’s blog, with even greater gains with my parallel code in Rcpp, which is automatically linked to MKL.

Advertisements