Gentoo Forums
Gentoo Forums
Gentoo Forums
Quick Search: in
Installing numpy with multiprocessing
View unanswered posts
View posts from last 24 hours

 
Reply to topic    Gentoo Forums Forum Index Portage & Programming
View previous topic :: View next topic  
Author Message
gary_uk
n00b
n00b


Joined: 18 Nov 2014
Posts: 10

PostPosted: Tue Nov 18, 2014 1:42 pm    Post subject: Installing numpy with multiprocessing Reply with quote

Hardware: Xeon E5-2640v3 X2
Sofware: Gentoo_AMD64

I was hoping to install Numpy and its LAPACK/BLAS dependencies in a way that makes the most of the many CPU cores. According to Scipy's website ( http://wiki.scipy.org/ParallelProgramming ), this is can be done by compiling with OpenMP turned on. I must admit I'm a little concerned about adding `-fopenmp' to the compiler flags in /etc/portage/make.conf and applying a general update using emerge. Since I'm new to Gentoo I was wondering about the following:

1. Is there a recommended way in Gentoo to install Numpy with OpenMP?
2. If not, is there a way I can tell Portage to use specific compiler flags only to Numpy and it dependencies?
3. If not, is my best solution compiling Numpy and its dependencies manually without Portage?

Thanks.
Back to top
View user's profile Send private message
khayyam
Watchman
Watchman


Joined: 07 Jun 2012
Posts: 6227
Location: Room 101

PostPosted: Thu Dec 04, 2014 5:17 pm    Post subject: Reply with quote

gary ...

I'm not sure about this specific use case but you might try the following:

Create an 'openmp.conf' in /etc/portage/env with whatever env you need to apply (ie, CFLAG changes)

/etc/portage/env/openmp.conf
Code:
CFLAGS="${CFLAGS} -fopenmp"

In /etc/portage/package.env/ create a file providing the information as to which packages should use this env ...

/etc/portage/package.env/openmp.env
Code:
dev-python/numpy openmp.conf
<category>/<package> openmp.conf

This env should be applied to these packages on re-merge.

HTH & best ... khay
Back to top
View user's profile Send private message
gary_uk
n00b
n00b


Joined: 18 Nov 2014
Posts: 10

PostPosted: Thu Jan 29, 2015 11:18 am    Post subject: Reply with quote

khayyam wrote:

I'm not sure about this specific use case but you might try the following:


Thanks for the advice, but my best solution so far has turned out to be a little more long-winded. In order to make optimal use all the cores, I needed a few things:

1. OpenMP (a la UNIX's pthread)
1. Intel's mkl.
2. Intel's icc.

... I couldn't get these to install into Gentoo (apparently) because Gentoo isn't LSB-compliant. I therefore installed a LSB package onto an RPM-based distro, installed the Intel binaries and copied the /opt/intel/ directory into Gentoo's /opt/. In order to make sure icc was available in Gentoo's environment, I created a shell script /etc/profile.d/intel along the lines of...

Code:

source /opt/intel/.../compilervars.sh intel64
export LD_LIBRARY_PATH=/opt/intel/.../mkl/lib/intel64:/opt/intel/.../lib/intel64:$LD_LIBRARY_PATH


...so I could compile with icc as any user. I then downloaded the latest stable numpy tar.gz source and made the following changes to the configuration files:

intelccompiler.py:

Code:

         #self.cc_exe = 'icc -m64 -fPIC'
         self.cc_exe = 'icc -O3 -g -fPIC -fp-model strict -fomit-frame-pointer -openmp -xhost'


intel.py:

Code:

        #return ['-i8 -xhost -openmp -fp-model strict']     
        return ['-xhost -openmp -fp-model strict -fPIC']


site.cfg:

Code:

[mkl]
library_dirs = /opt/intel/.../mkl/lib/intel64
include_dirs = /opt/intel/.../mkl/include
mkl_libs = mkl_rt
lapack_libs = mkl_lapack95_lp64


Finally, I installed numpy using the following line:

Code:

sh-4.2# python setup.py config --compiler=intelem build_clib --compiler=intelem build_ext --compiler=intelem install


(... actually I used `python2...' because most of my code is still written for Python 2.7.X).

I agree the process is a lot more long winded than `emerge -qan numpy'. I'm not familiar enough with Gentoo to know whether Portage could have performed this kind of installation with this kind of customisation. It's also quite irritating when attempting to install numpy-dependent packages for Portage to complain that numpy is not installed, forcing me to resort to distutils every time I want to install something rather than using emerge. However, a comparison of the differences in performance when running the following program is instructive:

Code:

import numpy as np 
import time   

n = 5
N = 6000 
M = 10000 
 
k_list = [64, 80, 96, 104, 112, 120, 128, 144, 160, 176, 192, 200, 208, 224, 240, 256, 384] 
 
def get_gflops(M, N, K): 
    return M*N*(2.0*K-1.0) / 1024**3 
 
#np.show_config() 
 
for K in k_list: 
    a = np.array(np.random.random((M, N)), dtype=np.double, order='C', copy=False) 
    b = np.array(np.random.random((N, K)), dtype=np.double, order='C', copy=False) 
    A = np.matrix(a, dtype=np.double, copy=False) 
    B = np.matrix(b, dtype=np.double, copy=False) 
 
    start = time.time() 
 
    for i in range(n):
      C = np.dot(A, B) 
 
    end = time.time() 
 
    tm = (end-start) / float(n)
 
    print ('{0:4}, {1:9.7}, {2:9.7}'.format(K, tm, get_gflops(M, N, K) / tm))
sh-4.2$


Running the program under open-source BLAS/LAPACK:

Code:

  64,  7.206665,  1.057355
  80,  9.500084,  1.004202
  96,  10.81931,  1.059217
 104,  12.05694,  1.030112
 112,   12.7766,  1.047227
 120,  13.97688,   1.02598
 128,  14.33727,  1.067149
 144,  16.51003,  1.043002
 160,  18.15292,  1.054376
 176,  20.34829,  1.034977
 192,  21.69134,  1.059409
 200,  23.98138, 0.9982743
 208,   24.3797,  1.021341
 224,  25.53424,  1.050354
 240,  28.10184,  1.022709


Running the program under Intel MKL/pthread:

Code:

  64, 0.05577588,  136.6182
  80, 0.09417605,  101.2996
  96, 0.08080792,  141.8178
 104, 0.06589293,  188.4876
 112,  0.145041,  92.24978
 120, 0.1274409,  112.5227
 128, 0.1084719,  141.0504
 144, 0.1280391,  134.4901
 160,  0.124629,  153.5758
 176, 0.1102171,  191.0774
 192,  0.105535,  217.7476
 200, 0.2072258,  115.5262
 208, 0.2185948,  113.9094
 224, 0.2123821,  126.2818
 240, 0.1290472,  222.7093


Of course the results speak for themselves, with a speed increase somewhere between 100-200X. Bear in mind matrix multiplication is easily parallelised whereas simple element-by-element arithmetic would not see such speed increases. Presently I'm looking to overcome the limitations imposed by GIL using `Joblib' which provides convenient pipelining. I've experimented with it and have seen improvements in the 5-10X range, but I'm sure I've only scratched the surface.

In the meantime, I hope this feedback is useful!
Back to top
View user's profile Send private message
Display posts from previous:   
Reply to topic    Gentoo Forums Forum Index Portage & Programming All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum