Tuesday, 17 September 2013

First version of CUDA Enabled SAC Code: SMAUG

The first version of the CUDA enabled SAC code, called SMAUG is available for download and testing in this post we provide details for downloading the current distribution.

The Sheffield Advanced Code (SAC) is a novel fully non-linear MHD code, designed for simulations of linear and non-linear wave propagation in
gravitationally strongly stratified magnetised plasma.
Ref.  http://adsabs.harvard.edu/abs/2008A%26A...486..655S


The GPU version of the code may be downloaded from
https://code.google.com/p/smaug/downloads/list

Getting started documentation for the code is available at https://code.google.com/p/smaug/ SMAUG Requires the following hardware and software

CUDA-Enabled Tesla GPU Computing Product with at least compute capability 1.3.
See 
  http://developer.nvidia.com/cuda-gpus

CUDA toolkit
http://developer.nvidia.com/cuda-toolkit-40

Correctly installed and compiler on user path.

Benchmarking SAC on Intel and AMD Processors

This blog entry presents tables of timings for running the SAC code on systems with different intel and AMD cpu's. Test code was provided by Viktor Fedun and is a SAC model of a flux tube. The work was undertaken in January 2011. The test problem was allowed to run  for the first 20 iterations and the totsl time recorded. Theproblem was also run on a diffrent number of processing cores.

SWAT server Intel Nehalem E5530 @2.4GHz
OpenMPI compiled with gigabit ethernet
Num. ProcsTime/step(sec)Total time(sec)
2 12.3 246.09
4 9.76 195.28
6 9.36 187.21
8 10.98 175.65
10 19.758 197.58
12 15.53 186.45
14 17.7 248.1
16 13.5 216.3





ICEBERG AMD Opteron Barcelona 2376 2.3GHz
OpenMPI Using Infiniband
Num. ProcsTime/step(sec)Total time(sec)
16 6.39 127.71
14 6.49 129.78
12 8.13 162.5
10 7.73 154.69
8 8.51 170.16
6 10.13 202.64
4 12.66 253.1
2 22.63 452.7




ICEBERG AMD Opetron Barcelona 2376 2.3GHz

OpenMPI gigabit ethernet

Num. ProcsTime/step(sec)Total time(sec)
2 23.0 460.0
4 13.5 269.6
6 11.3 225.9
8 11.0 220.6
10 8.0 160.2
12 10.3 206.5
14 7.6 152.7
16 6.5 130.0




ICEBERG DELL C60100 testnode AMD INTEL Westmere EP (Gulftown 6c) Xeon X5650 2.67GHz
OpenMPI gigabit ethernet
Num. ProcsTime/step(sec)Total time(sec)
2 11.2 224.6
4 7.9 157.8
6 7.3 146.1
8 6.6 131.2
10 5.9 118.0
12 6.1 121.0
14 5.7 113.6
16 5.0 100.3
18 4.7 93.9
20 4.6 92.8
24 4.1 81.3







ICEBERG AMD Opteron Barcelona 2347 1.9GHz
OpenMPI Using Infiniband
Num. ProcsTime/step(sec)Total time(sec)
16 7.18 114.88
14

12

10

8 9.77 195.43
6

4

2 23.23 464.67




vac3d test AMD Opetron6176  12cores and 2300MHz using the gateway test cluster, MPICH . Qlogic infiniband

Num. ProcsTime/step(sec)Total Time(sec)
1
543.78
2
353.19
4
265.03
6
234.16
8
108.74
9
128.57
10
81.01
12
118.49
14
66.84
16
77.76
18
68.91
20
101.33
22
144.56
24
86.56


vac3d test AMD Opteron6140  8cores and 2600MHz using the gateway test cluster, MPICH . Qlogic infiniband

Num. ProcsTime for one step(s)Total time (sec)
1
504.56
2
330.94
4
153.82
6
146.58
8
103.2
9
123.67
10
79.34
12
121.40
14
70.36
16
75.58





vac3d ICEBERG DELL C60100 newworkers AMD INTEL Westmere EP (Gulftown 6c) Xeon X5650 2.67GHz using /fastdata
Num. ProcsTime for one step(s)Total time (sec)
1 15.30 305.936
2 10.40 207.956
4 5.50 109.953
6 3.76 75.28
8 4.56 91.23
9 3.69 73.72
10 3.22 64.301
12 2.74 54.80
14 2.49 49.72
16 2.31 46.17
32 1.51 30.20
64 1.13 22.57
125 1.26 25.21
216 1.06 21.25
384 0.94 18.77
512

idl6.2 gives Segmentation fault under ubuntu

When executing routines such as tvframe using idl6.2 under ubuntu gives a segmentation fault.

The tip from fanning consultation for resolving this can be found here
http://www.idlcoyote.com/misc_tips/segfault.html

One resolution is to down load and unpackage the libX11-6_1.0.3-6_i386.deb file.

This file can be downloaded from
http://hera.ph1.uni-koeln.de/~ossk/ftpspace/debian/etch-packages/libx11-6_1.0.3-6_i386.deb

From the directory from which you will run idl extract the .deb file
using the command

dpkg-deb -x libx11-6_1.0.3-6_i386.deb

Next add the following line to the .bashrc file

export LD_PRELOAD_PATH=/my/private/lib

for tcsh (using .tcshrc )

setenv LD_PRELOAD_PATH /my/private/lib

Here is a quote from another forum which makes the point

"this my/lib.... has to changed with the path where you put old libX11.so file. For example I am using fedora7 with bash and I put old libX11 file under /home/orek then I add export LD_PRELOAD_PATH=/home/orek line in my .bashrc file.
After modifying .bashrc open new terminal window and run seadas. The trick here is run seadas under the same path, like if you put the libX11 under /home/orek you should run seadas from /home/orek then you may change the working from seadas easily if you like. If you do not have old linX11.so file I can send you via email it is 1 mb file. For editing .bashrc file you may use gedit or other text editors present in Linux.."