Oberon Community Platform Forum
December 16, 2019, 03:09:49 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News:
 
   Home   Help Search Login Register  
Pages: 1 [2]
  Print  
Author Topic: Matrix Library  (Read 26510 times)
Pat
Moderator
Jr. Member
*****
Posts: 69


« Reply #15 on: March 30, 2008, 10:35:08 PM »

We have observed that for solving linear equations, Block LU (MatrixBlockSolvers.Mod) scales nicely up to 4 CPUs and is fast on such machines, but does not profit from more (eight) CPUs, and Windows does not indicate full cpu load on such a machine.
I have now added an extra layer of active object based parallelization in Block LU ('Agents') and wonder if this improves the scale-up of performance on multiple CPUs. Please report on our Performance table at http://www.ocp.inf.ethz.ch/wiki/MatrixProject/Front or with a message here indicating your performance measurements using LinpackBenchmark.Test() and giving the system parameters fitting to that table. Does the performance meter on your machine indicate full load for all CPUs ?.
(I observe a performance penalty of about 10% on my 1 CPU machine with these modifications compared to a more serial approach, though)
« Last Edit: March 30, 2008, 10:40:25 PM by Pat » Logged
fnecati
Jr. Member
**
Posts: 60


« Reply #16 on: March 31, 2008, 09:12:00 AM »

Hi Pat,

I tested Block LU on Core2Quad machine and see slow down of about 0.5.  I attached the output files for WinAos and AOS.
Process load is about 25%, tested with windows process monitor for WinAos. All processors does not work in full parallel but sequentially.

AOS version traps at size 2048, trap is appended to its output file.

- Necati.



* fnecat4.aos.out.txt (14.17 KB - downloaded 732 times.)
* fnecat4.winaos.out.txt (12.35 KB - downloaded 558 times.)
Logged
Pat
Moderator
Jr. Member
*****
Posts: 69


« Reply #17 on: April 01, 2008, 10:34:20 PM »

MatrixBlockSolvers.Mod:

Improved Performance and robustness of linear equation solver (LU):
A) elimination of the (only sometimes occuring) trap's cause, which was an unprotected AWAIT() statement (not inside an EXCLUSIVE section - thanks Sven).
B) distributed the many matrix multiplications inside the algorithm into several active objects to allow parallel computation on multicores.

How does this perform on your multicore workstation when using LinpackBenchmark.Test :
 - what LU version with which block size does perform best  - how fast ?
 - how much are the individual processors loaded (e.g., when looking at the windows task manager)?
« Last Edit: April 01, 2008, 11:02:52 PM by Pat » Logged
fnecati
Jr. Member
**
Posts: 60


« Reply #18 on: April 04, 2008, 01:34:55 PM »

Core2Quad results:
On WinAos:
size 2048;
LU: 877.5 MFLOPs
Block LU (Block 256) : 5391.2 MFLOPs
Block LU  Agents  (Block 256): 3561.7 MFLOPs

On AOS:
size 2048;
LU: 855.5 MFLOPs
Block LU (Block 256): 6617.6 MFLOPs
Block LU  Agents  (Block 256): 7733.9 MFLOPs

On Windows, processors loaded around 30-70%, where as it is  90-99% on AOS.
If required I can attach the output files here.
Logged
staubesv
Administrator
Sr. Member
*****
Posts: 387



« Reply #19 on: April 04, 2008, 07:37:47 PM »

If you find some time it would be very interesting to see the linpack benchmark results on your Core2Quad running AOS with MaxProcs=1 (and maybe MaxProcs=2).
Logged
fnecati
Jr. Member
**
Posts: 60


« Reply #20 on: April 05, 2008, 12:30:07 PM »


In previous message I think  I made a small mistake while reporting summary of the results and while extracting numbers from the file, sorry for that.

Here, I attached five output files for WinAos and AOS (with MaxProcs=1,2,3,4;  "fnecat4.winaosTest.out.txt",
"fnecat[1234].aosTest.out.txt" ) in the attached zipped file fnecatiTests.zip



* fnecatiTests.zip (10.97 KB - downloaded 535 times.)
Logged
Pat
Moderator
Jr. Member
*****
Posts: 69


« Reply #21 on: April 07, 2008, 06:33:40 PM »

http://www.computational.ch/MatrixLibrary.html
Logged
Pat
Moderator
Jr. Member
*****
Posts: 69


« Reply #22 on: April 16, 2008, 01:08:41 PM »

The LU Block solver variant which is based on active objects should now scale better to multiprocessor machines with > 4 CPUs (MatrixBlockSolvers.BLU)
Scheduling is now done based on active object priorities Low/Medium/High with the aim to
- make shure tha most of the CPUs should have work most of the time
- avoid that too many processors compete for cache
- retain a program structure which is very readable and identical for any number of processors.
Could some of you check/report performance on multiprocessor machines using LinpackBenchmark.TestLUA ~ ?
How are the processors loaded on these machines, using Aos Performance Monitor, or Windows Task Manager ?
Thanks

Logged
fnecati
Jr. Member
**
Posts: 60


« Reply #23 on: April 17, 2008, 04:53:34 PM »


My browser could not download the Matrix modules, It says that " .. files not found on the server ..",  is it due to attackers  Angry
Could you compress them to a single file as a package and attach it to here for easy downloading ?



 
Logged
Pat
Moderator
Jr. Member
*****
Posts: 69


« Reply #24 on: April 17, 2008, 06:25:36 PM »

was not downloadable because i erroneously put a wrong html file at the last upload - is reachable again.
I agree that file-by-file download is not very practical but as the repair of the once working Oberon WebDAV server plugin from Edgar seems imminent , I hope to improve that very soon by using WebDAV instead of HTTP download.
Logged
schorsch
Newbie
*
Posts: 10


« Reply #25 on: July 30, 2008, 10:47:05 PM »

Both Matrix Project site and the host http://www.computational.ch/ are off line for at least couple of days. Anyone knows what is going on? Embarrassed
Logged
Pages: 1 [2]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines Valid XHTML 1.0! Valid CSS!