Oberon Community Platform Forum
December 16, 2019, 02:50:44 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News:
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: High Performance Computing for Oberon: Scheduling of Active Objects  (Read 9165 times)
Pat
Moderator
Jr. Member
*****
Posts: 69


« on: April 05, 2008, 12:06:53 PM »

The project "Matrix" found on this platform has the goal to develop tools for high-performance computing for Oberon making the best out of SIMD, MultiCores, and Clusters.

Insights from implementation of parallel blockwise LU decomposition for active objects in Aos:

For optimal performance in numerical high performance computing, all CPUs should be active (implying a number of active objects large enough for the available CPUs) but the individual pieces should be as large as possible because large arrays are handled more efficiently.
If there are more active objects (nObj) than processors (nProc) available, only nProc objects with the highest priority should be run at a given time and the others should be preempted, because otherwise, there is cache fragmentation which is very bad for performance; at the same time, there are some tasks which need to be prioritized because their result is needed for later processing, to avoid interruptions in the pipeline.
To implement this in the high (algorithmic) level can lead to quite intricate scheduling, if an optimal solution for an arbitrary number of cpu's is sought.

As seen in the recently reported runtime difference of the same implementation on Bluebottle versus WinAos (thanks fnecati) on the same hardware, system handling of active object is paramount for performance in such computational applications.

Thus, it would be helpful to understand the scheduling strategy in Aos in more detail - is there some "whitepaper" ?
I believe that it should be possible to fine-tune specific "compute objects" by fine-grained priorities, set by software.
Can Active Object priorities be set or changes at runtime ?

Logged
Pat
Moderator
Jr. Member
*****
Posts: 69


« Reply #1 on: April 07, 2008, 05:46:10 PM »

Found the "whitepaper" about scheduling of active objects: Pieter Muller's doctoral thesis.

Currently, 3 user priority levels for non-realtime active objects can be defined
- either at compile time at the BEGIN {ACTIVE} code location or
- at runtime using the Objects.SetPriority() procedure for the running object itself.

For high performance computating in Oberon, his built-in priorisation would yield a nice implicit mechanism to avoid overcrowding the system with too many concurrent tasks (resulting in cache fragmentation with performance degradation); at the same time, the creation of a large enough number of compute objects to occupy all the CPUs waiting for work would still be possible in large problems.
In an elegant manner, however, this would require a significantly larger number of user priority levels available.
We will try this...
Logged
staubesv
Administrator
Sr. Member
*****
Posts: 387



« Reply #2 on: April 07, 2008, 08:18:59 PM »

The implementation of a thread pool using a constant number of worker threads to execute a possible large number of jobs from a job queue could help. The jobs could also have priorities. Besides that the thread pool is more efficient since no threads have to be created, this allows an easy implementation of scheduling strategies on application level.
Logged
Pat
Moderator
Jr. Member
*****
Posts: 69


« Reply #3 on: April 07, 2008, 09:46:38 PM »

Thread pools would work.
However, I believe that this is a typical example where the developer at application level replicates a functionality, i.e. managing individual processes, which are already done in the system.
In the same manner as the invention of garbage collectors relieves the programmer from explicitely handling and freeing memory occupation by objects, I believe that the invention of active objects should relieve the programmer from the task of producing threadpools and the like.
The individual tasks I am talking about are for example the matrix multiplication of chunks of 256*256 Longreals (out of a matrix of e.g. 2048*2048 elements) amounting to 2*2^24 operations for each chunk. That means that the overhead of object creation, for which Active Objects should be ideal, is probably negligible.
In addition, replicating active object management leads in my opinion to reduced readability of code.
But maybe we should do a case study for our LU (active object) algorithm example and compare the two versions in terms of source code elegance, computational speed etc ?
 Smiley  Greetings Patrick
Logged
Pat
Moderator
Jr. Member
*****
Posts: 69


« Reply #4 on: April 12, 2008, 05:28:26 PM »

Performance for parallel linear algebra also depends on avoiding redundant data transfers as far as possible.

Timeslicing interrupts a running active compute object and potentially starts another service active object on this CPU, while the compute object is started on a different CPU a little later, resulting in the need of relocating the data to the new CPUs cache.

Does the Aos system foresee any internal strategies to render an active object relatively sticky to the CPU where it executed last ?

What is the time window for time slicing (i.e. how many times can such a system-induced cache miss be produced in the worst case ?)
Logged
staubesv
Administrator
Sr. Member
*****
Posts: 387



« Reply #5 on: April 12, 2008, 09:21:20 PM »

a) No, AOS does not foresee such a mechanism
b) Timeslice duration is 1 millisecond. The currently running threads are only preempted if there is at least one other thread with the same or a higher priority. Threads that are suspended because of Objects.Yield or waiting on locks or conditions are also potentially executed on another CPU when continuing execution.
« Last Edit: April 12, 2008, 09:25:13 PM by staubesv » Logged
Pat
Moderator
Jr. Member
*****
Posts: 69


« Reply #6 on: April 13, 2008, 07:49:40 AM »

It would be interesting to have a PerformanceMonitor Tool to see how frequently a given process is preempted and to see how processes hop around on different CPUs.
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.21 | SMF © 2015, Simple Machines Valid XHTML 1.0! Valid CSS!