=============================== GRAPE Software Package grapepkg version 1.2.0 =============================== GRAPE Software Package (grapepkg) is a collection of user libraries, utilities, documents, and sample programs for GRAPEs. It supports not only KFCR's GRAPE-DR and GRAPE-7, but also GRAPE-6A, GRAPE-6BX, and Phantom-GRAPE-5. An optional package CUDA G5/G6 provides G5 and G6 library for NVIDIA's CUDA devices (i.e. GPUs), too. The package includes: ./00readme -- a brief instruction of the package. ./00readme-j -- 00readme in Japanese. ./doc/ -- user's guide, reference manual, and other documents. ./script/ -- install & backup scripts. ./include -- header files. ./lib/ -- libraries. ./driver/ -- a device driver for GRAPE-DR and GRAPE-7. ./hibutil/ -- the Host Interface Bridge (HIB) for GRAPE-DR and GRAPE-7. ./gdr/ -- softwares for GRAPE-DR. ./g7/ -- softwares for GRAPE-7. ./g6a/ -- softwares for GRAPE-6A. ./g6bx/ -- softwares for GRAPE-6BX. ./pg5/ -- softwares for Phantom-GRAPE-5. ./cuda/ -- softwares for CUDA devices. [included in the optional package CUDA G5/G6] ./sample/ -- sample programs. ./sample/direct/ -- a sample program (direct-summation algorithm, equal-timestep, written in C). ./sample/directf/ -- a sample program (direct-summation algorithm, equal-timestep, written in Fortran). ./sample/vtc/ -- a sample program (Barnes-Hut tree algorithm, equal-timestep, written in C). ./sample/s9/ -- a sample program (direct-summation algorithm, equal-timestep, written in C). ./sample/s8/ -- a sample program (direct-summation algorithm, individual-timestep, written in C). ./sample/s8f/ -- a sample program (direct-summation algorithm, individual-timestep, written in Fortran). ./sample/pairwise/ -- a sample program to check accracy of the pairwise force, written in C. ./init/ -- snapshots of particle distributions . used for functionality test of the hardwares. ./tmp/ -- used by test programs and other utilities. ./ttf/ -- bitstreams (.ttf files) for GRAPE-7 model300 and model600. this directory is optional. your package may not include this directory. you can download this directory from our Web site (http://www.kfcr.jp/grape7.html). Contents ======== 1. Installation and Preparation 1.1 Installation of the Package 1.2 Preparation of GRAPE-DR at Boot Time 1.3 Preparation of GRAPE-7 at Boot Time 1.4 Preparation of GRAPE-6A at Boot Time 1.5 Preparation of GRAPE-6BX at Boot Time 1.6 Preparation of Phantom-GRAPE-5 at Boot Time 1.7 Preparation of CUDA G5/G6 at Boot Time 2. Compilation and Linkage 3. Environment Variables 3.1 GDEVICE : Assignment of Calculation Resources 3.2 GWARNLEVEL : Warning Message Control 4. Sample Programs 5. Compatibility 6. Additional API for CUDA G5 6.1 Support for Multiple Walks Method 7. Additional API for CUDA G6 7.1 Optimization for J-Paricle Transfer 8. Tested Platforms 9. References 10. License and Copyright 11. Acknowledgement 12. Modification History 13. Contact 1. Installation and Preparation ============================== 1.1 Installation of the Package ------------------------------- In order to install the package, type $pkgroot/script/install and follow its instructions. Here, $pkgroot denotes the topmost directory of the package. 1.2 Preparation of GRAPE-DR at Boot Time ---------------------------------------- You need to follow the procedure (1)--(4) everytime you restart the host computer. (1) [Root Permission Required] Load the Device Driver Change directory to $pkgroot/driver/, and then type make installmodule This will plug-in the HIB driver for GRAPE-DR into the Linux kernel. If the driver is successfully loaded, you should see a word "hibdrv" in the output of /sbin/lsmod. (2) [Root Permission Required] Set up MTRR Change directory to $pkgroot/driver/, and then type ./setmtrr This will set MTRR (memory type range register) of the host computer to "write-combining" mode, which improve speed of Programmed I/O Write (PIOW) data transfer. This setting affects performance of sending data from the host computer to GRAPE-DR. Note 1: In some case MTRR cannot be set up to "write-combining" mode (For example, all 8 existing MTRR are already assigned to other PCI devices or main memory regions, or, the total size of the main memory exceeds 4GB and the chipset of the mother board does not support I/O mapping to the main memory address higher than 4GB). In such cases, you can continue all installation procedure described below, without MTRR set up. All functions of GRAPE-DR should work without problem, except that the speed of data transfer from the host computer to GRAPE-DR would be reduced by 50% or more. Note 2: MTRR set up is not necessary if the Linux kernel version is 2.6.26 or higher, and PAT (page attribute table) support is enabled. In order to know whether your kernel suppot PAT or not, you can check the Linux header files (e.g. /usr/src/linux/include/linux/autoconf.h) to see if CONFIG_X86_PAT is defined or not. (3) Initialize GRAPE-DR The following command initializes GRAPE-DR, and performs some basic tests. $pkgroot/script/config [For Users of model450 and model1800] For the initialization of model450 and model1800, use $pkgroot/gdr/test/config450 and $pkgroot/gdr/test/config1800 instead of $pkgroot/script/config. (4) Check the Functionality of GRAPE-DR The following command performs some many-body simulations and compare the results with precalculated ones. $pkgroot/script/check [For Users of model450 and model1800] For the initialization of model450 and model1800, use $pkgroot/gdr/test/check450 and $pkgroot/gdr/test/check1800 instead of $pkgroot/script/check. 1.3 Preparation of GRAPE-7 at Boot Time --------------------------------------- You need to follow the procedure (1)--(3) everytime you restart the host computer. (1) Follow the procedure 1.2-(1) and 1.2-(2) in section "Preparation of GRAPE-DR at Boot Time" to load the device driver, and set up MTRR. (2) [For model300 and model600 Only] Configure the FPGAs Configure FPGAs with G5PIPE backend logic (pipelines for gravitational force calculation). This procedure is not necessary for model 100 and model 800. Change directory to $pkgroot/g7/config/, and then type ./config [devid] This will reconfigure the FPGA(s) on the devid-th card. If the argument devid is not given, device ID 0 is assumed. If you have only one GRAPE-7 card in the system, its device ID should always be 0, and thus you can omit the argument. If you have multiple cards, you need to confirm the device ID of the card to be configured. Use a command $pkgroot/script/lsgrape to identify the device ID. (3) Follow the procedure 1.2-(4) in section "Preparation of GRAPE-DR at Boot Time" to check the functionality of the card(s). More descriptions for installation and usage of GRAPE-7 can be found in "GRAPE-7 Installation Guide" ($pkgroot/doc/g7install.pdf). For usage of G5PIPE, refer to "G5PIPE User's Guide" ($pkgroot/doc/g5user.pdf). 1.4 Preparation of GRAPE-6A at Boot Time ---------------------------------------- (1) [Root Permission Required] Load the Device Driver Change directory to $pkgroot/g6a/pcimem, and then type make installmodule This will plug-in the driver for GRAPE-6A into the Linux kernel. If the driver is successfully loaded, you should see a word "pcimem" in the output of /sbin/lsmod. (2) Initialize GRAPE-6A Change directory to $pkgroot/g6a/lib, and then type g6aconfig to initialize the card. (3) Check the Functionality of GRAPE-6A Change directory to $pkgroot/g6a/s8, and then type make s8 s8 After the completion of the command, compare the result with $pkgroot/g6a/s8/sample.1k. 1.5 Preparation of GRAPE-6BX at Boot Time ----------------------------------------- (1) [Root Permission Required] Load the Device Driver Change directory to $pkgroot/g6bx/pcixmem, and then type make installmodule This will plug-in the driver for GRAPE-6BX into the Linux kernel. If the driver is successfully loaded, you should see a word "pcixmem" in the output of /sbin/lsmod. (2) Check the Functionality of GRAPE-6BX Change directory to $pkgroot/g6bx/s8, and then type make s8 s8 After the completion of the command, compare the result with $pkgroot/g6bx/s8/sample.1k. 1.6 Preparation of Phantom-GRAPE-5 at Boot Time ----------------------------------------------- No initialization procedure is necessary for Phantom-GRAPE-5 at boot time. 1.7 Preparation of CUDA G5/G6 at Boot Time ----------------------------------------------- Note : By default, CUDA devices are not supported by the GRAPE Software Package. You need an optional package CUDA G5/G6 in order to use them. (1) Set up CUDA Environment and Install the Package Before you run the installation script $pkgroot/script/install, you need to set up CUDA developing environment including the device driver, the Toolkit and the SDK, provided by NVIDIA. (2) Check the Functionality of CUDA G5/G6 After the installation is completed, you can run the following command to performs some many-body simulations and compare the results with precalculated ones. $pkgroot/script/check_cuda (3) Configuration at Boot Time Once installation is completed, no initialization procedure is necessary at each boot time. 2. Compilation and Linkage ========================== In order to utilize GRAPE hardwares from your own application programs, include the header file of the control library into your program, and then link the library. The control library provides two different APIs, namely, GRAPE-5 compatible G5 API, and GRAPE-6 compatible G6 API (G6 API does not support GRAPE-7 and Phantom-GRAPE-5). In addition to G5 API, G5nb API is provided for GRAPE-7. G5nb API is an extension of G5 API, which have additional APIs for neighbor particle detection. For the details of G5 and G5nb APIs, see "G5PIPE User's Guide" ($pkgroot/doc/g5user.pdf) and "G5nbPIPE User's Guide" ($pkgroot/doc/g5nbuser.pdf), respectively. The table below shows required header files and libraries for each GRAPE hardware: ----------------------------------------------------------------------------------------- GRAPE API header library ----------------------------------------------------------------------------------------- GRAPE-DR G6 g6util.h libgdr6.a libhib.a libm.a G5 g5util.h libgdr5.a libhib.a libm.a GRAPE-7 G5 g5util.h libg75.a libhib.a libm.a G5nb g5nbutil.h libg75nb.a libhib.a libm.a GRAPE-6A G6 g6util.h libg6a6.a libm.a G5 g5util.h libg6a5.a libm.a GRAPE-6BX G6 g6util.h libg6bx6.a libg6bxhib.a libm.a G5 g5util.h libg6bx5.a libg6bxhib.a libm.a Phantom-GRAPE-5 G5 g5util.h libpg55.a libm.a CUDA G5/G6 G5 g5util.h libcuda5.a libcudart.so(note1) libm.a libcuda5s.a(note2) G6 g6util.h libcuda6.a libcudart.so libm.a ----------------------------------------------------------------------------------------- Note1 : libcudart.so should be found at $cudapath/lib, where $cudapath is the path you installed NVIDIA's CUDA Toolkit into. Note2 : libcuda5s.a adopts single precision (32-bit floating-point) format as numerical expression. libcuda5.a adopts pseudo-double precision (64-bit addition, 32-bit multiplication). Examples of Options Switches Passed on to the Compiler An application program foo.c utilize GRAPE-DR via G6 API: cc -o foo foo.c -L$pkgroot/lib -I$pkgroot/include -lgdr6 -lhib -lm An application program foo.c utilize GRAPE-7 via G5 API: cc -o foo foo.c -L$pkgroot/lib -I$pkgroot/include -lg75 -lm An application program foo.c utilize NVIDIA's CUDA device via G5 API: cc -o foo foo.c -L$pkgroot/lib -L$cudapath/lib -I$pkgroot/include -lcuda5 -lm 3. Environment Variables ======================== For GRAPE-DR, GRAPE-7, Phantom-GRAPE-5 and CUDA G5/G6, the behavior of the libraries can be controled by the following environment variables. These variabel are not valid for GRAPE-6A nor GRAPE-6BX. 3.1 GDEVICE : Assignment of Calculation Resources ------------------------------------------------- If you have a system with multiple cards installed, by default, the GRAPE control library functions use all of them. In order to modify this behavior, you can set a list of device IDs to an environment variable GDEVICE. If the list is set, GRAPE control library functions use the listed cards only. For example, csh> setenv GDEVICE "0 2 3" sh> export GDEVICE="0 2 3" would indicate the cards with device ID 0, 2, and 3 should be used. This environment variable might be useful, when you share your system with someone else. In the case of GRAPE-DR mode1800/2000/4000, GRAPE-7 model 800 and some CUDA devices (e.g. GeForce GTX 295), multiple LSI chips on a single card have device IDs different from each other, and therse chips can be assigned to different simulations. For example, on a system with one GRAPE-DR model 1800 installed, you can set csh> setenv GDEVICE "0 2" sh> export GDEVICE="0 2" in order to run a simulation on two GRAPE-DR chips with device ID 0 and 2. You can run another simulation simultaneously, using chips with device ID 1 and 3. 3.2 GWARNLEVEL : Warning Message Control ---------------------------------------- Controls the warning message output. The variable can be set to 0, 1, 2, or 3. The larger number indicates the more verbose outputs. The number 1 or 2 is recommended for normal operation. The number 3 would be nice for debugging purpose. The default value is 2. The number 0 suppresses all but fatal error messages. The variable should not be set to 0 when you run the functionality check script ($pkgroot/script/check). Otherwise it would fail. 4. Sample Programs ================== You can find sample programs in $pkgroot/sample/ directory. These programs can utilize different GRAPE hardwares by linking different libraries. Note that, however, not all programs support all hardwares. For example, a sample program for many-body simulation in the direct-summation algorithm is stored in $pkgroot/sample/direct/ directory. When you run the installation script, executables for various types of GRAPEs are generated: direct_gdr, direct_g7, direct_g6bx, direct_pg5 and direc_cuda (which are for GRAPE-DR, GRAPE-7, GRAPE-6BX, Phantom-GRAPE-5 and CUDA G5, respectively). But no executable for GRAPE-6A would be generated. In order to build sample programs by yourself, you can: - run a script '00recompile' located in a directory for each sample program. or, - use 'make' command. By default, a Makefile is initially set up for GRAPE-DR. For other architectures, you need to edit it by hand. A brief instruction can be found at the top of the Makefile. 5. Compatibility ================ As a general rule, G5 API and G6 API provide functions compatible with GRAPE-5 and GRAPE-6, respectively. For some GRAPE models, however, the APIs are not fully compatible. Some functions are not supported, some are restricted. Such functions are summarized below: G5 API Compatibility ----------------------------------------------------------------------------------------------------------------------- GRAPE Models Precision Potential Cutoff Neighbor Particle Number of Size of Equivalent Calculation Calculation List Creation Pipelines Particle Memory to GRAPE-5 Function Function Function per Device per Device ----------------------------------------------------------------------------------------------------------------------- Original GRAPE-5 - Yes Variable Yes 96 131071 GRAPE-DR Yes Yes Fixed for P3M Method No 256 4194304 GRAPE-7 Yes No Fixed for P3M Method Yes 20-120 4095-24570 GRAPE-6A Yes Yes Fixed for P3M Method Yes 48 65536 GRAPE-6BX Yes Yes Fixed for P3M Method Yes 48 262144 Phantom-GRAPE-5 No No No No 4 65536 CUDA G5 Yes Yes Fixed for P3M Method No 8192 1048576 ----------------------------------------------------------------------------------------------------------------------- G6 API Compatibility ------------------------------------------------------------------------------------------ GRAPE Models Neighbor Particle Nearest-Neighbor Number of Size of List Creation Particle Search Pipelines Particle Memory Function Function per Device per Device ------------------------------------------------------------------------------------------ GRAPE-DR No Yes 256 1048576 GRAPE-6A Yes Yes 48 65536 GRAPE-6BX Yes Yes 48 262144 CUDA G6 No Yes 8192 1048576 ------------------------------------------------------------------------------------------ 6. Additional API for CUDA G5 ============================= 6.1 Support for Multiple Walks Method ------------------------------------- CUDA G5 supports the Multiple Walks Method, an algorithm to improve the performance of the Barnes-Hut Treecode on GPUs. In the conventional (but modified for GRAPE) Treecode, forces from one group of j-particles to one group of i-particles are calculated by the GRAPE. And then the results are sent back to the host computer. This procedure is repeated untila all forces for all i-particles are obtained. On the otherhand, in the case of the Treecode that adopts the Multiple Walks Method, multiple combinations of j-particle groups and i-particle groups are posted to the GPU and handled simultaneously. By doing this, we can take full advantage of arithmetic units on the GPU. Also, the efficiency of data transfer between the host computer and the GPU is improved (See [1] for the detail). The Multiple Walks Method [1] is an algorithm named and integrated into the Barnes-Hut Treecode on GPUs by Dr.Hamada (Nagasaki University) and Dr.Nitadori (RIKEN). A many-body simulation perormed using the algorithm won the Gordon Bell Prize in 2009. CUDA G5 provides a new API g5_flush_runs() and g5_flush_runsMC() to support the Multiple Walks Method. The following shows a typical procedure to perform force calculation using these APIs: (1) Set the environment variable G5_MULTIWALK to 1. (2) Perform force calculation loop Nwalk times, (i.e., set j-particles, set i-particles, start the run, and get the results), using the conventional G5 API. Here, Nwalk is the number of pairs of i-particle groups and j-particle groups, which are posted to the GPUs at once. (3) Call g5_flush_runs() (or g5_flush_runsMC()). Example: // (a) force calculation for nwalk pairs of i-paritcle groups and j-particle groups. for (w = 0; w < nwalk; w++) { g5_set_jp(0, n[w], mj[w], xj[w]); g5_set_eps2_to_all(eps*eps); g5_set_n(n[w]); for (off = 0; off < n[w]; off += npipe) { if (off + npipe > n[w]) { ni = n[w] - off; } else { ni = npipe; } g5_set_xi(ni, (double (*)[3])xj[w] + off); g5_run(); g5_get_force(ni, (double (*)[3])(a[w] + off), p[w] + off); } } // (b) nwalk pairs are posted to the GPU. g5_flush_runs(); (a) When the environment variable G5_MULTIWALK is set to 1, g5_set_jp(), g5_set_xi() and g5_get_force() do not perform actual data transfer. They push transfer requests to an execution queue prepared in the CUDA G5 library. Similarly, g5_runs() does not perform force calculation. It pushes a calculation request to the queue. (b) At the point g5_flush_runs() is invoked, transfer and calculation requests stored in the queue are processed one by one, and then the results are retrieved from the GPU. A sample program $pkgroot/sample/direct/multiwalktest.c shows the usage of the new API. It performs multiple many-body simulations simultaneously, using direct-summation algorithm adopting the Multiple Walks Method. Example: Run the program as follows: multiwalktest pl1k pl2k pl4k This will perform three different simulations for particle distributions pl1k, pl2k and pl4k. A sample program using the Barnes-Hut Treecode adopting the Multiple Walks Method is in preparation. 7. Additional API for CUDA G6 ============================= 7.1 Optimization for J-Paricle Transfer --------------------------------------- CUDA G6 provides a mechanism to improve the performance of the individual-timestep code on GPUs. In the conventional individual-timestep code on a GRAPE, j-particles are sent one by one from the host computer to discontinuous addresses of the particle memory on the GRAPE. The method above is not optimal for GPU. Following the procedure below, you can send all j-particles to a continuous region of the particle memory by a single transfer. This would reduce the overhead of the transfer, and improve the performance. (1) Set the environment variable G6_JPSORTED to 1. (2) Send all j-particles to a continuous address starting from the top of the particle memory, in the ascending order of their time, i.e., invoke g6_set_j_particle() Nupdate times, starting from the oldest particle with address 0, incrementing the address one by one until it reaches (Nupdate-1). Here, Nupdate is the number of j-particles whose positions are updated in the previous timestep. In order to send j-particles following the procedure (2), particles on the host computer must be sorted in the ascending order of their time. You can find how to do this in a sample program $pkgroot/sample/s8/sticky8.c. Its performance should be improved when you set the environment variable G6_JPSORTED to 1. Note : If the environment variable G6_JPSORTED is set to 1 and j-particles are sent not following the procedure (2), the calculation result would be incorrect. 8. Tested Platforms =================== Fedora Core 5,10,11 x86_64 CentOS 5.4 x86_64 9. References ============= [1] T. Hamada, R. Yokota, K. Nidadori, T. Narumi, K. Yasuoka, M. Taiji "42 TFlops Hierarchical N-body Simulations on GPUs with Applications in both Astrophysics and Turbulence", SC09 (ACM/IEEE) 2009. 10. License and Copyright ======================== The MIT software license (see below) is applied to the GRAPE Software Package (hereafter "the Software"), unless otherwise mentioned. Files in $pkgroot/driver/ directory and $pkgroot/g6a/pcimem directory, to which GNU General Public License (hereafter GPL) is applied. Redistribution of files in $pkgroot/cuda directory is prohibited. The copyright of the software belongs to K&F Computing Research Co. (hereafter KFCR), except for the following files: The copyright of the Phantom-GRAPE-5, that is, files under $pkgroot/pg5/ directory, except for $pkgroot/pg5/phantom_g5mc.c, belong to Keigo Nitadori (RIKEN). The copyright of $pkgroot/pg5/phantom_g5mc.c belongs to KFCR. The copyright of files under $pkgroot/g6a/ directory and $pkgroot/g6bx/ directory belong to Toshiyuki Fukushige (KFCR), except for files under $pkgroot/g6a/pcimem/ directory and $pkgroot/g6bx/pcimem/ directory, of which copyright belong to Atsushi Kawai (KFCR). The copyright of files under $pkgroot/driver/ directory also belong to Atsushi Kawai (KFCR). ------------------------------------------------------------------------------- The MIT Software License: Copyright (c) 2009-, K&F Computing Research Co. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ------------------------------------------------------------------------------- 11. Acknowledgement ================== K^&F Computing Research Co. would like to thank the following people for help in development of the GRAPE Software Package: Keigo Nitadori (RIKEN). 12. Modification History ======================= -------------------------------------------------------------------------------------------------- version date author note -------------------------------------------------------------------------------------------------- 1.2.0 22-Jun-2010 AK An optional package CUDA G5/G6 added. 1.1.4 12-Mar-2010 TF Fixed a bug on G6 library for GRAPE-DR (libgdr6.a). 1.1.3 23-Dec-2009 TF Support for new control logics of GRAPE-DR model2000, model460. Cutoff function added to the G5 API of GRAPE-DR. G6 API of GRAPE-DR modified to maintain backward compatibility. 1.1.2 28-Sep-2009 TF Package management utility improved. 1.1.1 19-Sep-2009 AK English documents added. 1.1.0 17-Sep-2009 AK, TF Support for GRAPE-DR model2000, model460. 1.0 17-Jul-2009 A. Kawai, Document created. The package is build based on gdrpkg0.32, T. Fukushige g7pkg2.2.1, g6apkg1.1, g6bx, and phantom_limited_accuracy_080110. -------------------------------------------------------------------------------------------------- 13. Contact ========= Contact address for questions and bug reports: K&F Computing Research (support@kfcr.jp)