Pat


« on: March 12, 2008, 01:51:08 AM » 

The first modules of our Matrix Library for computational applications, based on Felix Friedrichs Compiler enhancements have been uploaded at the projects page of this platform  many more to follow soon. I am interested in your performance measurements using our Oberon Linpack implementation (there is some unaccounted overhead in my simple Oberon timings; if measured similarly stringent as the Intel reference does, Oberon MFLOPs will be somewhat higher than reported). Patrick


« Last Edit: March 12, 2008, 11:57:24 AM by Pat »

Logged




Pat


« Reply #1 on: March 12, 2008, 04:49:53 PM » 

Added module MatrixNorms.Mod to the matrix algebra library. Pat



Logged




Pat


« Reply #2 on: March 12, 2008, 06:02:33 PM » 

Added MatrixAlgebraicMultigrid.Mod, a multigrid solver for dense systems of equations. Pat



Logged




Pat


« Reply #3 on: March 12, 2008, 06:26:24 PM » 

Added Module MatrixSVD.Mod for singular value decomposition/solving of dense matrices. Pat



Logged




Pat


« Reply #4 on: March 19, 2008, 08:46:00 AM » 

Added MatrixIterativeSolvers (GaussSeidel, SOR, Jacobi, Conjugate Gradient ). Removed dependence from Oberon  runs now in Oberonfree Bluebottle. Some speedups and cleanups. Pat



Logged




Pat


« Reply #5 on: March 19, 2008, 07:12:10 PM » 

Added Module MatrixLeastSquares for computation of least square solutions for systems of linear equations. Made minor cleanups and a fix for nonsquare matrices in MatrixStandardSolvers.QR . Please do performance measurements using LinpackBenchmark.Test on your hardware (see variables/template on http://www.ocp.inf.ethz.ch/wiki/MatrixProject/Front) so that we can identify and eliminate specific weaknesses. More functionality to come. Modules available at http://www.computational.ch/MatrixLibrary.html . Pat



Logged




Pat


« Reply #6 on: March 22, 2008, 03:51:34 PM » 

Added module MatrixBlockSolvers.Mod to the library: This module features a first version blockwise LU solver. Because here, the workload of the solver is partially shifted from MatrixVector multiplies (which suffer from bus clock constraints) to MatrixMatrix multiplies which less dependent on bus speed, speedup compared to the MatrixStandardSolvers.LU implementation is possible. The blocked version will probably be fused with the latter at some point. On my notebook, I see already significant speedup approaching the Intel Reference solver, although the parametrisation of the module is not yet optimized (optimal block size). In addition, the use of matrix multiplication and blockwise processing now opens the door to heavy use of multiprocessors, although active objects are not yet used on the high level implementation (but they are in the compiler's matrixmultiply). I am eager to see multicore performance data, therefore. In addition, MatrixStandardSolvers.LU has been given a facility for multiple right hand sides, and for matrix inversion by LU. MatrixUtilities has been freed from its link to Oberon.



Logged




staubesv


« Reply #7 on: March 22, 2008, 05:14:54 PM » 

Where can we download the module MatrixBlockSolvers.Mod?



Logged




Pat


« Reply #8 on: March 22, 2008, 11:46:45 PM » 

The MatrixBlockSolvers.Mod is on http://www.computational.ch/MatrixLibrary.htmlI have worked a little bit on optimization of blocks and have now seen a major breakthrough in speed: on my pentium M notebook/1 cpu I see a large increase in computational performance using the new version; looking forward to see results on multicores.



Logged




staubesv


« Reply #9 on: March 23, 2008, 12:36:14 AM » 

I observed a massiv performance gain (up to 3.5 times faster) on my dual core machine for matrices > 600 (BlockLU), but the three BlockLU results differ up to 100%. What result should I post on the Wiki (best one, average)? Why do the results differ? Different block sizes?



Logged




Pat


« Reply #10 on: March 23, 2008, 06:53:28 PM » 

The block sizes for the three Block LU results are 128, 256, 512. I propose to report the fastest version and give its block size. In particular on very large matrices, markedly slower performance for one of the three Block LU results may indicate that the machine has started swapping to virtual memory; in my observations, the three performances do not seem to differ too much when this does not happen.
In the nonblock version of LU, the computational workload is about 45% MatrixVector product and about 45% MatrixMatrix product, so that not much higher performance than for the slower of the two, which is MatrixVectorProduct, could be expected on a given hardware.
The block version of LU consists mainly of MatrixMatrix multiplies, and the inversion of a submatrix. Thus, the best that could ever be achieved with this approach is the performance of MatrixMatrix multiplication (but naturally there are some more computations to be done than just a simple matmul for solving a linear equation). I am happy that we are on the way to approach that performance number. This also means that for further optimisations (e.g. handling memory more economically), there is only moderate room for further speedup with the current compiler approach to to matrix multiplication, which is very competitive indeed (thanks, Felix !). Further speedup for solving linear equations might be based on this block algorithm combined with use of a fast matrix muplication algorithm, like the Strassen algorithm.


« Last Edit: March 24, 2008, 02:44:12 PM by Pat »

Logged




schorsch
Newbie
Posts: 10


« Reply #11 on: March 26, 2008, 11:26:54 PM » 

What compiler should one use to test the matrix modules? Does one necessarily need AOS or native oberon would also work?



Logged




Pat


« Reply #12 on: March 27, 2008, 07:48:06 AM » 

currently, the PC compiler in WinAos and in Bluebottle is able to compile enhanced arrays, which are the basis for the matrices. I do not know if Linux Aos also features those compiler features. just try this in the VAR definition of a procedure:
VAR A: ARRAY [*,*] OF LONGREAL;
If it compiles on your installation, the library will compile.



Logged




staubesv


« Reply #13 on: March 27, 2008, 12:44:14 PM » 

When I tried to run the benchmark downloaded today I've got problems regarding its memory consumption. It seems that the BlockLU for 4096 needs more than 2GB RAM. This won't work for (at least) two reasons:  The 32bit versions of Windows only support 2GB memory per user process per default (this could be changed to 3GB)  WinAOS supports max. 2GB heap (since it uses 32bit signed type to represent memory addresses)
The first problem mentioned results in an ASSERT (107) failing in Win32.Heaps.NewHeapBlock which freezes the system. Could you please reduce the memory cosumption a bit?



Logged




Pat


« Reply #14 on: March 27, 2008, 03:11:15 PM » 

 LinpackBenchmark  commented the 4096*4096 matrix out for the moment to avoid problems with RAM.
 MatrixFastMatrixMultiply: New module with an implementation of Strassen's algorithm for fast matrix multiplication. On my Pentium M fast with matrix sizes >512*512. I wonder about multicore performance. Recursive implentation (not yet done) may be even faster.


« Last Edit: March 27, 2008, 03:44:06 PM by Pat »

Logged




