Products : Goose : Domain-Specific Compiler
#pragma goose parallel for loopcounter(i, j) \
precision("double")
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
for (k = 0; k < 3; k++) {
dx[k] = x[j][k] - x[i][k];
}
r2 = dx[0] * dx[0] + dx[1] * dx[1] +
dx[2] * dx[2] + eps2;
rinv = rsqrt(r2);
mrinv = m[j] * rinv;
mr3inv = mrinv * rinv * rinv;
for (k = 0; k < 3; k++) {
a[i][k] += mr3inv * dx[k];
}
pot[i] -= mrinv;
}
}
|
|
Key Features
-
Goose is an environment for software development that integrates
compilers and other utilities. It helps porting a program written
in a programing language such as C, from PC to a hardware
accelerator that works in SIMD fashion. It is designed to minimize the
amount of modification to the original source code.
-
Goose is a kind of domain-specific developing environment, that is, it
is not designed to support the whole grammar and specification of the
targeting programing language. It handles only descriptions which are
suitable for the hardware accelerator. Other descriptions are passed
on to a conventional compiler to generate an executable for PC, which
serves as the host computer of the accelerator. This approach would
minimize the necessary amount of modification to the source code.
- Goose encapsulates APIs and architectures of the accelerators.
Therefore, the user can develop application programs without knowledge
of vendor-specific APIs and internal structure of processor/memory
units. Numerical format can be switched among 'single',
'double-single', and 'double', just by adding a simple directive
('double-double' will also be supported soon).
- We are offering a 30-day free evaluation for the software.
Please contact us for request.
Supported Languages & Accelerators
As a programming language, Goose supports C. As hardware
accelerators, it supports our GRAPE-DR and GPUs of both AMD &
NVIDIA. The following languages and hardware accelerators will be
supported soon.
- Programming Languages
- C
- (in preparation): Fortran
- Hardware Accelerator
- GRAPE-DR
- GPU (AMD)
- GPU (NVIDIA)
- (in preparation): OpenCL, Intel SSE Technology, GRAPE-7
- Numerical Format
Precision | Accelerator |
GRAPE-DR | AMD | NVIDIA |
single | - | ** | ** |
double-single (add/sub:double, others:single) | ** | * | ** |
double | ** | ** | ** |
double-double | * | * | - |
**:Supported *:Will be supported soon.
Contents of the Package:
- Goose C Compiler goosecc (including the source code)
-
a summary of the package, User's Guide,
and other documents.
- Sample programs (examples of application programs which can be
compiled with the Goose C Compiler).
Required Environment
Runs on 64-bit Linux (x86_64). Functions for GRAPE-DR are not fully
tested on 32-bit environment yet, although functions for other
accelerators should work on it.
Goose internally uses the following softwares. They need to be installed beforehand.
Required Softwares:
| Accelerator |
GRAPE-DR | AMD | NVIDIA |
ruby | * | * | * |
gcc | * | * | * |
ATI Stream SDK | | * | |
CUDA | | | * |
grapepkg | * | | |
LSUMP | * | * | |
VSM | * | | |
- C compiler
(gcc version 4.1.0 or higher recommended)
- Ruby
(version 1.8.5 or higher recommended)
- ATI Stream SDK
(version 1.3 or higher recommended).
- CUDA
(version 2.1 or higher recommended).
- GRAPE Software Package
(version 1.1.0 or higher recommended)
- LSUMP (A compiler for the Goose intermediate-representation.
Developed by Naohito Nakasato, University of Aizu)
- Installation: Unpack the LSUMP
Binary Package at any directory, and set the path name to an
environment variable LSUMPPATH. See the Goose Software Package
User's Guide for the detail.
- The LSUMP Binary Package does not include the source code.
However, the owners of the package may request a free-of-charge
copy of the source. Please contact support@kfcr.jp for the detail.
- For the copyright and license of the LSUMP Binary Package, see
"00license" included in the package.
- VSM
(An assembler for GRAPE-DR. Developed by Junichiro Makino, National Astronomical Observatory of Japan)
- Installation: Download the tarball from the hyperlink above, and
unpack it at any directory. Set the path name to an environment
variable VSMPATH. See the Goose Software Package
User's Guide for the detail.
- For the copyright and license of VSM, see "COPYRIGHT" included in
the tarball.
License
Permission for use of the Goose Software Package (hereafter the
"Software") is granted only to owners of a copy of the Software.
The Software may not be redistributed.
The copyright of the software belongs to K&F Computing Research. Co.
Products Line up
Sample Programs
The followings are examples of application programs which can be
compiled with the Goose C Compiler.
- gravity:
Calculate gravitational interactions.
- gravity_cutoff:
Calculate gravitational interactions with P3M cutoff, under
a periodic boundary condition.
- hermite:
Calculate gravitational interactions and its time derivatives.
- gravperf:
For performance measurement on calculation of gravitational interactions
with various numbers of particles.
- pairwise:
For accuracy measurement on calculation of gravitational interactions
between pairwise particles.
- tree:
Calculate gravitational interactions using the Barnes-Hut Tree algorithm.
- s9:
Performs a gravitational many-body simulation (the leaf-frog integrator, shared timestep).
- s8:
Performs a gravitational many-body simulation (the Hermite integrator of the 4th order, individual timestep).
- vdw:
Calculate van der Waals interactions (the Lennard-Jones potential).
- sph:
Calculate the accelerations for SPH particles (Spline kernel, no artificial viscosity).
gravitational interactions (double precision) |
ni | nj | GRAPE-DR | AMD | NVIDIA | CPU 1core | CPU 8cores |
1k | 1k |
0.79 (30.1) |
0.40 (15.4) |
1.05 (39.9) |
0.07 (2.5) |
0.45 (17.0) |
1k | 8k |
1.13 (42.8) |
0.73 (27.8) |
1.49 (56.5) |
0.07 (2.5) |
0.45 (17.1) |
1k | 64k |
1.18 (44.9) |
0.81 (30.8) |
1.55 (58.8) |
0.07 (2.5) |
0.45 (17.1) |
8k | 8k |
1.63 (62.0) |
2.65 (100.7) |
1.63 (62.0) |
0.07 (2.5) |
0.46 (17.5) |
64k | 64k |
1.88 (71.4) |
3.44 (130.6) |
1.66 (62.9) |
0.06 (2.2) |
0.46 (17.5) |
gravitational interactions
(double precision for addition & subtraction, single precision for other operations) |
ni | nj | GRAPE-DR* | AMD | NVIDIA | | |
1k | 1k |
0.85 (32.2) |
- |
3.28 (124.5) |
- |
- |
1k | 8k |
1.24 (47.1) |
- |
5.54 (210.4) |
- |
- |
1k | 64k |
1.31 (49.9) |
- |
6.15 (233.8) |
- |
- |
8k | 8k |
1.88 (71.6) |
- |
7.81 (296.8) |
- |
- |
64k | 64k |
2.22 (84.5) |
- |
8.40 (319.2) |
- |
- |
gravitational interactions (single precision) |
ni | nj | | AMD | NVIDIA | | |
1k | 1k |
- |
0.56 (21.5) |
5.39 (204.9) |
- |
- |
1k | 8k |
- |
1.50 (57.1) |
10.82 (411.3) |
- |
- |
1k | 64k |
- |
1.87 (71.1) |
12.16 (461.9) |
- |
- |
8k | 8k |
- |
10.15 (385.8) |
16.33 (620.6) |
- |
- |
64k | 64k |
- |
19.78 (753.1) |
18.17 (690.6) |
- |
- |
- Wallclock time spent for a calculation from nj particles to ni
particles is measured for various combination of (ni, nj) and the
hardware accelerators. Figures are the numbers of
pairwise-gravitaional interactions calculated in one second (Giga
interactions per second). Figures inside the parenthesis denotes
Gflops values (38 floationg-point operation counts are assigned to a
pairwise interaction).
- GRAPE-DR, AMD, NVIDIA, CPU 1core, CPU 8cores denote measurements
on KFCR GRAPE-DR model450, AMD Radeon HD 4850, NVIDIA GTX285, Intel
Xeon (E5430/2.66GHz) single core, and 8 cores, respectively.
- Executables used for the measurements are all generated from
the same source code
. For compilation, goosecc is used for the hardware accelerators,
and gcc is used for the CPUs. The 8-CPU runs are parallelized using
OpenMP.
- *cf. Performance of GRAPE-5-compatible user library (a hand-tuned assembly code):
ni | nj | GRAPE-DR (model2000, 1 proc.) |
1k | 1k |
5.05 (192) |
1k | 8k |
8.58 (326) |
1k | 64k |
9.39 (357) |
Measured Accuracy
- Gravitational interactions between pairwise particles are
calculated on a hardware accelerator, and are compared with the ones
calculated on the host computer. Standard deviation (left panel) and
average (right panel) of the error are plotted against distance of
pairwise particles r, normalized with the maximum value of the
spatial coodinate rmax
- 'Single', 'double-single' and 'double' are arguments passed on to
a Goose directive that specify numerical format. They denote single
precision, double precision for addition & subtraction (single
for others), double precision, respectively.
- Measured with NVIDIA GTX280 and pairwise.
Examples of Intermediate Descriptions
C-language description passed on to Goose:
#pragma goose parallel for loopcounter(i, j) \
precision("double")
for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
for (k = 0; k < 3; k++) {
dx[k] = x[j][k] - x[i][k];
}
r2 = dx[0] * dx[0] + dx[1] * dx[1] +
dx[2] * dx[2] + eps2;
rinv = rsqrt(r2);
mrinv = m[j] * rinv;
mr3inv = mrinv * rinv * rinv;
for (k = 0; k < 3; k++) {
a[i][k] += mr3inv * dx[k];
}
pot[i] -= mrinv;
}
}
|
Description generated by Goose
For GRAPE-DR:
Kernel & API calls (in C language)
The kernel (in VSM language)
loop body
vlen 4
bm ccc_reg0 $t
upassa $ti $lr0v $lm120v
....
fadd $lr16v $lm104v $t
upassa $ti $ti $lm104v
nop
nop
fadd $lr24v $lm112v $t
upassa $ti $ti $lm112v
nop
nop
For GPU (AMD)
Kernel & API calls (in C language)
The kernel (in ATI-IL language)
whileloop
sample_resource(7)_sampler(7) r300, r200.xy
sample_resource(8)_sampler(8) r301, r200.xy
sample_resource(9)_sampler(9) r302, r200.xy
....
;; force
func 0
ixor r999, r100, l8
dadd r747.xy, r300.xy, r999.xy
dadd r747.zw, r300.zw, r999.zw
ixor r999, r101, l8
....
For GPU (NVIDIA)
Kernel & API calls (in CUDA C language)
The kernel (in CUDA C language)
extern __shared__ char smembuf_[];
int kbdim_ = blockDim.x;
f00_jp_t *f00_jp_smem_ = (f00_jp_t *) smembuf_;
....
for (int j_ = jstart_; j_ < jsup_; j_ += jstride_) {
dx_0_ = f00_jp_smem_[j_].x_j_0_ - f00_ip_[isrc_].x_i_0_;
dx_1_ = f00_jp_smem_[j_].x_j_1_ - f00_ip_[isrc_].x_i_1_;
dx_2_ = f00_jp_smem_[j_].x_j_2_ - f00_ip_[isrc_].x_i_2_;
r2 = dx_0_ * dx_0_ + dx_1_ * dx_1_ + dx_2_ * dx_2_ + eps2;
rinv = rsqrt(r2);
....
|
- Goose Personal Edition (International Price: 420,000 JPY)
-
A product for up to 10 users. There is no limitation for the
number of local copies installed into different PCs.
- Goose Institutional Edition (International Price: 997,500 JPY)
-
A product for more than 10 users. There is no limitation for the
number of local copies installed into different PCs. There is no
limitation for the number of users as long as the user belongs to the
user group.
Even if a user belongs to a group that consists of more than 10
members (e.g. a university, an enterprise), the user can purchase the
personal edition, as long as the number of actual users does not
exceed 10.
All product includes one-year free version up service, and one-year
technical support. Please contact info@kfcr.jp for purchase. We
accept the following payment methods : PayPal, IPMO, and Wire
Transfer.
Related Information