April 2010
Finally got motivated to do some additional programming.![]() | ![]() |
![]() | ![]() |
![]() | ![]() |
July 20, 2009 The source code to my .net SPH program, written to run in parallel on any CUDA compatible NVIDIA device is below.
The user can load load default files I have created, but ultimately the user will need to understand how SPH works in general. The program does have a GUI interface, so there is no need to create text files or use multiple programs, setup utilities, and all of that. All of it can be done through the GUI interface. With a little fiddling it's fairly straight forward getting a simulation working. Molecular simulations with mutual gravity will explode unless Grav factor is set to a very small value: this is simply because I chose arbitrary G/M values to simplify things so molecules and planets aren't actually pegged to real numeric masses (in kg say) etc. In order to use mutual gravitation (which is not a factor at the atomic level anyway), choose the Star default options. For fluids and things, select the molecular value, and mutual grav will be turned off. If you select Earth grav, then the simulation takes place in a gravity field and adds a -y acceleration to each particle in the field in accordance with Grav factor. The rocky type setting also defaults to mutual gravity with appropriate default tuning of various variables in the particle options to make them stable.
-
|
Source Code to the Project The main project is in the _MainSPHProject folder. The CUDA project (separate, compiles a .dll) is in the _CUDA2005_20 directory, although you should not need to do anything with this. Requirements CUDA 2.0 SDK DirectX 9.0 SDK Visual Studio 2008 Microsoft Parallel Extensions for .net (for CPU parallel processing) CUDA compatible NVIDIA video card. If you're able to get it to run on your machine, you can simply click on the left sandbox window and a sphere will appear. Click the 'Launch Simulation' button and a winform will appear with the simulation running. The source is compiled in this, so you can run without building the project in debug mode in VS |
|
-
The source has a lot of other stuff in it. Several projects, including a polytrope creator which basically use the lane-emden equation to compute polytropes.
| Colored Shear Cavity (showing rho) |
![]() |
SPH
August Source Code
( 2009) Parallel SPH using CUDA–up and running [see small vids below] [Serial source code below that]
Current progress is toward attempting to create terrestrial type collisions for moon formation. I have tweaked the fluid and gas SPH computations to simulate rocky type objects. I had actually created through some errors I had in the energy equation (left out a term) previously. After I finally found the error, I was able to get accurate fluid and gas interactions but also lost the ablity to create what appeared to be more realistic handling of terrestrial bodies. I haven't yet completely restored this code for an option (rocky), but I have the code handy and a target.
Moon Formation/Collisions May 2009
In any case, what I do have are some new videos of attempted moon formation, which are shown below. The bodies are still fluid (they tend to bounce inward and outward as gravity and pressure countermand one another), and the only way I've discovered of fixing this problem is to create a body, run it for a while, then save that body to a separate file. I then go through and load various body options, let them settle in for minute or so until they are stable, and create a library. To create a new system, I have an 'add body' option that will add the contents of file to the interface. From there, you can highlight and move the bodies around in the XY plane. By highlighting a body, you can modify it's vx,vy, and vz parameters (as well as other properties, although I have to ree-nable the later option with a check box).
New 'rocky' type experimental interactions
Collision
Collision
Close Orbit (matter exchange)



Fluid type interactions (using the fluid options + mutual gravitation in 3 space)
m
Fluid Dynamics 2/25/2009
Small scale fluid dynamic scenarios with local (Earth gravity) in 2 and 3 dimensions are working well. I have verified the Monaghan Shear Cavity simulation, and the results of the version done in series versus the one in parallel are virtually equal. The big difference of course is that the parallel version runs a lot faster. Going from serial CPU code to CUDA parallel code is basically a complete rewrite and rethink. Counters are impossible to maintain in parallel code because you have N processors processing the same code path on different data. For example, you pass in an array of values that you want to process in parallel, and the program splits them into threads and each element in the array is processed by the same code on a different thread simultaneously. The original FORTRAN code (and my serial VB.net code) computes all of the kernels and its derivatives initially, deciding which pairs of particles interact based on whether they fall in each other's support domain, and it stores these in arrays, calling them during different force computations (gravity, viscosity, energy, pressure), computing only interacting pairs. Parallel code computes the kernel every time it needs it: a huge redundancy in the mind of the human programming it, but to the individual threads, a necessity because they are unaware of one another can't can't share values efficiently. In parallel code, the processing time for all of the particles is the time it takes one particle (whichever takes the longest) to process through all of its support domain.
Below the picture of the current interface, I have videos of the Shear simulation running in series and in parallel. The series option also has the option to select a linked-list search algorithm which usually speeds up a bit over the O(N2) time of direct-find. However, even the serial linked-list is blown away by direct-find running in parallel on my NVIDIA C1060 parallel processing card.
Demo Video
Shear Cavity
baseline simulation
series versus parallel The left side of the box runs at a
constant
velocity, dragging the particles nearest along with it in the direction
of the
velocity. The virtual particle (type -2) types on the left hand side
interact
in all ways with other particles except 1) they do not move 2) they
apply an
anti-penetrative force to any particle that collides with them.
|
Running in series (12 fps) |
|
|
Running in parallel (58+fps) I don't get faster fps than 58 and 60 regardless, so the limiting factor isn't even the computational speed of the card, but some other process: loading the vertex buffer, assigning colors for each point based on properties etc. |
Waves
I also created a different
type of
virtual particle, an Impulse particle (-3) that does move, but each
iteration
its velocity fades by a factor (<1). It works to apply impulse
forces to a
scenario, for example, in creating a wave.
|
~5,600 individual particles (including virtual particle boundaries), running in parallel on the C1060. |
I am still working on creating difference scenarios, but I ought to be able to add new features quicker.
More explanations and other updates are below
KB 2009
Kurt Bingham 2/2009
The sandbox for both loading and creating scenarios is modifiable, pegged to a small volume around the origin. The program can build ~.005 m length scenarios akin to those in L&L’s SPH small scale systems, with the option to switch the system over to 0.5 AU for doing astrophysical interactions where everything will generally be in terms of Suns: objects created in terms of Sun Radii, masses given in Sun Masses and so on, order to make the computations simpler and to keep the numbers in a range comfortable for human minds, so debugging and analysis is easier.
The physical
quantities &
calculi handled by the current parallel code are
· Dynamic particle density [Summation Density and Continuous Density] using various kernels, including exponential (Monaghan/Gingold '77) and cubic spline, among others. With CUDA 2.0’s access to exponents, I can option for exponentiation, which led to very smooth results when I tested it serially, although will gobbles up GPU cycles like crazy and really slams the frame rate.
· Average Velocity –attempts to stabilize particle velocities. The Nbody CUDA template program packaged with Nvidia’s SDK uses a damping (<1) variable to smooth velocities; average velocity is obviously a more subtle if more complex method of doing so (toggle).
· Mutual Gravity for astrophysical simulations. Uses a special quintic (toggle)
· Earth Gravity (simulation occurs near Earth’s surface and everything is under acceleration at 9.8m/s)
· Artificial Viscosity (to prevent penetration of through one another. This is a simplified viscosity, and is given in the Liu and Liu text)
· Pressure
· Internal
Energy
· Virtual Particle boundaries. These are particles that interact, but don’t move, creating a boundary.
· Each Particle has its own smoothing length, currently the average of each polytrope’s particle spacing, which is used to generate the polytrope.
Some options are currently not implemented however, due to the memory constraints of the GPU card
· Individual
particle smoothing lengths (HSML).
I have implemented a __constant__ hsml set at run time when running
device
code. A global HSML (smoothing length) fails for obvious
reasons when very
different fluids are loaded, as was specifically warned by L&L.
Each
particle must have a suitable smoothing length, even if the program
doesn’t
upgrade or modify it (a few different algorithms are given in the
L&L
book). What I have done is give each particle in each star a smoothing
length
proportional (1/k) to the distance between points loaded from the
Polytrope
file file; this prevents implode/explode instability. What happens when
the
points in the star don’t have a smoothing length near the actual
particle
spacing is, if the smoothing length is too small, pressure causes the
star to
explode outright or greatly expand (like a red giant no doubt), and
contract to
varying degrees. If the smoothing length is too large, then gravity
pulls the
star inward until pressure builds to a point of explosion, and in each
case the
accumulated velocities of the particles cause excessive expanding and
contracting similar to what happens when stars die in reality. It could
be
mitigated by slowing dt down repeatedly, but ultimately, the stars end
up in
unrealistic stable states. And besides, if dt has to be too small to
make a
fun-to-watch real time video, then I’m not interested anymore.
· Particle
specific smoothing lengths work in keeping two different stars of
different
size, density, and mass stable. When
the
stars coalesce or collide, then the HSML for each interacting pair
becomes
their average.
· HSML upgrade options –To be added? These can fine tune the smoothing lengths of each particle based on a quick computation. If it’s fast, I can add it to be run in parallel prior to the main execution of code for the array.
· Monaghan viscosity (just using artificial visc)
· Normalized Density
Small
Scale Tests from 1/09
I have already done a little experimentation colliding different small scale fluids and in running other tests to make sure the simulations are consistent with the serial code. So far, so good, and with a lot more speed. Rather than the ability to run 100-200 or so particles at once in real time, I can run thousands (2048, 4096, N =8192, N>8192?) on my NVIDIA GTX 295 card, and it flies at 58-64 fps [the new card did break my power supply after a few days of use, and I splurged for a ‘gamer’ power supply capable of handling the additional power needs of my system]. (I am considering an Nvidia Tesla C1060 parallel processing card, but it’s $1,600 currently, and that’s a little steep at this point, when I don’t know how much of an improvement it would make, but I may get one if I find a good deal).
The smoothing length for the systems (shown below) varies, but is usually around 0.000035m, with a dt of around 0.004 s. Increasing dt is the fastest way to increase the aesthetics of a nice real-time program, but it creates instability, especially when encountering virtual particles, and when fluids and gases interact. There are several other parameters which can be modified to deal with these instabilities, but finding the right combination is obviously very difficult.
Older versions
of the program
being tested
|
Density profile. Two fluids collide, and the one on the left and the other less massive by a factor of 15 or so. As the simulation runs, the density of each changes in density. The lighter the color, the denser the volume. It's important to note, the particles do not gravitate mutually in these sims, and in small scale systems generally because it wouldn’t have much effect. The sims below however have acceleration due to Earth's gravity, while the one on the left has no external forces applied and no particle-particle gravity. |
Star Collisions phase 1) I am still in the process of creating new scenarios and tweaking variables, but things are rolling along. Density profiles shown |
||
|
|
|||
|
|
|||
The above are just a few simple examples. I am currently working on the builder to make creating scenarios easier, which is a lot of work. I’ve built a lot of user interfaces in the past, and they are one of those things, unlike a problem solving DLL, that are just never done, never have all the features you want, and always need fiddling, which takes forever.
The SPH
CUDA code
The SPH CUDA code is wrapped in a DLL which accepts an array of fluid particles, and several parameters. It performs a computation on the set in parallel, and returns the array of data updated with new values. (My final internal struct name is flpsm, which was originally flparticle which contained too many variables to fit well into CUDA’s shared memory given the number of threads I wanted to run per block, so I sliced it down to the bare minimum of x,y,z,vx,vy,vz,m,rho and then added new items back, testing for memory problems as I went, until I finally ended up with a struct flpsm ( “fluid particle small” ), that contains x,y,z,vx,vy,vz,m,rho,u,c, itype,hsml = position(x,y,z),velocity(x,y,z),mass,density,internal energy,temperature, and particle type [gas,fluid,virtual], and smoothing length).
Since I’m doing the interface and DirectX display in VB.net, and the DLL is unmanaged, I also need a C# project for interfacing with the hardware through pointers using unsafe blocks (described elsewhere on this site).
struct flpsm{float x;float y;float z;float vx;float vy;float vz;float m;float rho;float p;float u;float c;int itype;float hsml;};
· The serial code is available below for download below, which is very different from the parallel code of course. You will have to come to understand what SPH is all about to get a handle on what’s going on there, and that will require a good working knowledge of practical engineering physics—or, if you know math, then you just read the book and apply the equations. All of the parallel code is based on that, but for the serial code, the data is stored in a larger struct since memory and bandwidth isn’t a problem on a CPU.
· I can run either on my interface, and this allows me to test whether the CUDA version is working. There are several small items that have been modified to get the ported code to fit into the CUDA space.
-Kurt
Kurt Bingham Jan 2009
Serial SPH code in VB.net (Jan 2009)
Below is the SPH source code, which replicates the Shock Tube and Shear Cavity experiments from Smoothed Particle Hydrodynamics, and is true to their original FORTRAN, wrapped in classes and with DirectX rendering and other things to make it a complete solution.
It contains: Polytrope generation program | DirectX renderer | SPH code | Shock Tube | Shear Cavity | CUDA and will run the Shear Cavity and Shock Tube tests without any tweaking. It is up to the programmer to put the SPH serial code to use using other setups. You will need VS 2005 to use this, and you'll need to tweak the source for making changes. I don't know how hard it will be to follow what I've done here, but I'm making it available anyway and spend a lot of time debugging and tweaking the code myself. It's offered as is, and I'm sure there are plenty of optimization that could be introduced. I have taken short cuts and long cuts in order to make progress with the actual simulations, but they ought to be able to be made faster. I mix and match public properties and public variables depending on how urgently I want to add a feature, so you'll find a mix of properties and public variables for the classes. There aren't any star collisions therein however, as that's still an ongoing project. Of course, you can use the polygui to create point sets and run them, but running in series on the CPU it will tank, and gravity between particles isn't included here since it wasn't in the FORTRAN. I have included it in my new project, however and will be ready before long, I think (see below).
CUDA parallel polytrope (and any SPH for that matter) program in the works and making headway
Jan 09 - Parallel SPH code in CUDA should be available soon, and hopefully with an interface that will allow both water splashes and star collisions in a nice .net, object oriented program. Previously, I have had only SPH simulations working while running in .net in series on the CPU. This is obviously quite slow, as you can see from the movies given on the SPH page. However, I recently went at it again for a few days and managed to parallelize portions enough to make a fairly realistic star collision. This is not full blown SPH however like the water splashes on the SPH page, and it takes shortcuts and tweaking that I'm working to solve as time and patience permits.
Currently using an NVIDIA 8800 GT, and the videos are in real time.
Expecting a new NVIDIA card from Newegg.com shortly [a 295 series] which will hopefully allow more impressive performance. I'm impatient, so I need simulations to play in real time, since the whole purpose of this entire enterprise was to satisfy my childlike need to see stars collide, to simulate the Earth Moon formation, and a whole host of other things.
|
This is a quasi-realistic SPH collision of two stars. The Polygui program generates the basic polytrope as a solution of a ODE using a small delta (usually 0.01), and it outputs a file. The SPH program then reads that file and generates a set of fluid particles and the SPH program then begins to simulate it--naturally, there is a DirectX renderer as well, using shaders. The Polygui program generates many points, and they are put into successive shells. They can't all be incorporated into (currently) 2048 points for each star, therefore the stars are not completely stable at the outset, particularly if they are very different from one another inside. This depends on a lot of tweaking to the smoothing lengths for gravity and viscosity currently. After the simulation is begun, the stars will 'collapse' into a stable configuration, which takes several seconds as they travel toward one another. Once again, the simulation incorporates (running in parallel using CUDA) · Dynamic particle density (Summation Density using cubic spline kernel) · Gravity (quintic kernel I got from someone's publish Phd thesis, and I need to post a link to it) · Artificial Viscosity (as per Liu & Liu) -Needs- Pressure and Internal Energy, which I am working on. See: · Smoothed Particle Hydrodynamics(Liu and Liu)
|
|
|
|
|
Nov 08- I felt inspired to create a particle sandbox editor and simulator in order to do my testing faster. It's been so much fun, that I have expanded it quite a bit over the last several days.
Here's picture of the interface, which is still in its infancy. Of course, putting lit spheres (rather than mono colored pixel clusters) for each particle would greatly increase the overhead, although I may attempt it just to see how much of an impact it has on computation since I'm not doing any of this in parallel at the moment.
The SPH sandbox can be found here
Oct 08- I'm currently working on my MS, so I don't have time to make headway parallelizing SPH in CUDA as mentioned below (Liu and Liu, Smoothed Particle Hydrodynamics). So, this is shelved for the time being. In the process, I was able to parallelize the Direct Find function, which checks every pair of points and decides based on a dynamic parameter whether the particles will interact, and it wasn't particularly fast in comparison with doing it serially in my CPU; probably because of the number of calls being made from the Host to the Device. It would likely be a lot faster if the entire thing were running on the card, but in any case, I wasn't bouncing off the walls about the results. Converting many of the other functions (which they originally wrote in Fortran) quickly exceeds the Nvidia card memory. In any case, Larrabee will be out in a year or so, and I will probably do some work with their SDK at the time.
Recent updates (June 2008)
· N-body simulations using CUDA [Compute Unified Device Architecture] parallel programming in 8 series NVIDIA graphics cards. Using NVIDIA's sdk and one of their graphics cards, you have blindingly more computing power than you have with your CPU. I'm still just getting my feet wet with this, but the main point here is that I'm calling the CUDA C code from managed C# using unsafe blocks and ultimately managing Visual basic.net arrays (by calling the C# code), so this allows VB.net code to access GPGPU parallel computing power. I did it this way because my simple shader driven point plotter is in VB.net DirectX and the N-body setup code is also in VB.net, so I figured it would be a good exercise to add a C# library that calls and unmanaged DLL and processing managed arrays via the parallel processing on my card.
· MSFT's Framework 3.5 Parallel programming extension (available June 08)
· Computed layer data of a Polytrope in ASP
· Computed layer data of a Homogeneous Star Model in ASP
July 08
· Confirmed results of one dimensional shock-tube program (Monaghan et al '83) See Liu and Liu
<= Other links are to the left
|
|
Kurt Bingham
Kurt Bingham
URL:
Last Modified:
Kurt Bingham 2008
kurtbingham.net
kbingham.net
kurt j binghamKurt Bingham 2007-2008
| hit counters |