Oberon Community Platform

Edit Page
Printable View

Front

Oberon Matrix Library - Performance evaluation over time and on various hardware

Performance measurements for LU solve for LONGREAL in LinpackBenchmark.Test at optimal matrix size for Block LU (add your data !)
CPUcoresclockRAMOberonSmall MatMul
ARRAY [4,4] OF REAL
MFLOPS 1 core
Best Matrix size for LULarge MatMul
LONGREAL MFLOPS {/core/cycle}
LU LONGREAL MFLOPS
(single threaded, non-block)
{per cycle, 1 core}
BlockLU LONGREAL MFLOPS {/core/cycle}MFLOPS LU IntelReferencemodule version
MatrixStandardSolvers.Mod
MatrixBlockSolvers.Mod
OutputTester
Pentium M11.861 GWinAos3.05

WinAos R1084
 600*600

2048*2048
1317 {0.65}

1429{0.7}
662

610


1112{0.7}
720

720
13.3.2008

22.3.2008
 Pat
Core Duo T250022.002 GWinAos Rev. 1083 2048*20482666

2936
545

526
n/a

1864
104419.03.2008

27.03.2008
Output

Output
staubesv
2x Xeon E534582.332.5 GWinAos Rev. 1083

WinAos R1120

WinAos R1365

Aos R1365

A2 R~1450
 600*600

2048x2048

2048x2048

512x512

512x512

2048x2048
13935

18315

17179

16777

17895

18315{1.0}
1031

720

696

965

871

635
n/a

6215

5559

2695

4167

9599{0.5}
143820.03.2008

27.03.2008

02.04.2008

18.07.2008

18.07.2008

12.08.2008
Output

Output

Output

Output

Output

Output
staubesv
Athlon 64 3700+12.202 GWinAos Rev. 1084

Aos R1084
 2048*20482563

2690
527

565
 102020.03.2008Output

Output
staubesv
Pentium 412.00256 MBWinAos Rev. 1084

Aos R1084
 300*3001741

1862
318

369
 73320.03.2008Output

Output
staubesv
Core2Quad Q660042.43.2GWinAos Rev. 1082
Aos Rev. 1082
 600*6009191

16000
1031

1085
 148420.03.2008Output

Output
fnecati
Core2Quad Q660042.43.2GWinAos Rev. 1082

Aos Rev. 1082
 2048*2048
block size 256
15477

16408{1.9}
875

853
5640

6632{0.75}
148420.03.2008 fnecati
Intel Atom N27011.61GWinAos Rev. 1979

WinAos R6458 (FoxSSE)
50

615{0.4}
2048*2048
block size 256
544{0.33}

578{0.35}
184

271
306{0.18}

470{0.3}
194

194
11.02.2009

9.9.2015
 Pat
Core i3-233022.28G DDR3WinAos R6458 (FoxSSE) 2048*2048
block size 256
9570{2.2} 1625{0.7} 7058{1.85} ?9.9.2015 ShulgaDim
Core i7-377043.416G DDR3WinAos R6458 (FoxSSE) 2048*2048
block size 256
32800{2.5} 2740{0.7} 24000{1.85}  10.9.2015 MorozovA
Intel Haswell G322023.08G DDR3WinAos R6458 (FoxSSE) 4200{1.4}2048*2048
block size 256
15050{2.5} 2260{0.75} 10200{1.9}  10.9.2015 HunzikerP
your data here..

The Intel reference implementation was on http://www.ocp.inf.ethz.ch/wiki/Development/Repository and uses SSE2 and loop unrolling to optimize speed, while our Oberon implementation based on Felix' compiler additions for math arrays only uses high level language features ! Since ~2014, the Fox compiler does implicit SSE1..3 optimizations, resulting in a significant speedup without any need for low Level optimizations in the high level implementation.

SideBar

Copyright © 2007 ETH Zürich
Page last modified on September 10, 2015, at 10:59 PM