Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Oct 12 17:49
    ShatrovOA commented #38
  • Oct 11 15:25
    szaghi labeled #38
  • Oct 11 15:25
    szaghi assigned #38
  • Oct 11 15:25
    szaghi commented #38
  • Oct 11 13:52
    ShatrovOA edited #38
  • Oct 11 13:44
    ShatrovOA opened #38
  • Sep 19 11:19
    szaghi commented #7
  • Sep 19 11:08

    szaghi on master

    Fix parsing bug issue#7 The me… update travis config Merge branch 'release/0.1.0' (compare)

  • Sep 19 11:06

    szaghi on fix-parsing-bug-issue#7

    (compare)

  • Sep 19 07:54

    szaghi on fix-parsing-bug-issue#7

    (compare)

  • Sep 19 07:52
    szaghi commented #7
  • Sep 19 07:51
    szaghi labeled #7
  • Sep 19 07:51
    szaghi assigned #7
  • Sep 18 19:07
    NicSAE opened #7
  • Aug 29 08:09
    ito436 opened #6
  • Jun 19 11:43

    szaghi on master

    Merge tag 'v0.0.7' into develop… Merge branch 'master' into deve… Merge tag 'v0.0.8' into develop… and 3 more (compare)

  • May 28 14:57

    szaghi on master

    update submodules (compare)

  • May 28 14:06

    szaghi on master

    update submodules (compare)

  • May 28 14:04

    szaghi on master

    update submodules (compare)

  • May 28 13:34

    szaghi on master

    update submodules (compare)

Stefano Zaghi
@szaghi
@jacobwilliams Thanks!
Stefano Zaghi
@szaghi

Dear @/all

Can anyone suggest good reference for using from python modern Fortran libraries?

For example, assume you have a modern Fortran library exploiting OOP (a lot of modules, abstract types, polymorphims...), but, at the end, the library can provide a simple API to the end-user: a single concrete derived type exposing a procedure with plain real arrays dummy arguments. Assume you want to use this derived type from python. Is it possible?

F2py seems to be not useful in this scenario, but I maybe wrong, I have no experience with f2py. In the case it is not possible, with a lot of work, I can extract the derived type procedure as a standalone procedure, but I cannot totally refactor the library to trim-out all internal OOP.

Thank you in advance for any suggestions.

Cheers

Jacob Williams
@jacobwilliams
Take a look at Ctypes. That might be what you need. The documentation doesn't mention Fortran but if you use F/C interoperability it works the same.
Stefano Zaghi
@szaghi
Hi @jacobwilliams Thank you for your help. I found the great resource of @certik here. Indeed, Ctypes looks very promising for my needs! Thank you both!
Tom Dunn
@tomedunn
@szaghi, I haven't used it much myself but there is https://github.com/jameskermode/f90wrap which looks to extend F2py to work with derived types.
Stefano Zaghi
@szaghi
@tomedunn Tom, thank you for the highlight. I considered it, but it looked somehow cumbersome to use and I was not sure if the underlining OOP of the library (that is WenOOF) will be handled correctly. If my test with ctypes will fail, I'll consider it again.
Tom Dunn
@tomedunn
@szaghi, I've also done a bit of what you're looking to do with F2py (I would have tried f90wrap but I didn't have a lot of control over the system I was doing the work on at the time). I wasn't ever able to pass derived types between Python and Fortran but I got around this in a limited sense by creating a Fortran module that contained the different top level derived types I wanted to use as module variables and then exporting functions to Python for adding/initializing these derived types. I'd love to replace it with something else someday but it works for the limited scope I needed it to.
Stefano Zaghi
@szaghi
@tomedunn Tom, indeed, your pattern seems to be similar to what I am going to do... if you are not too much bothered by my spam I 'll probably ask for your help :smile:
Tom Dunn
@tomedunn
That works for me, I'll do my best to answer any questions I can. From what I remember the hardest part for me was getting the compiler flags and linking working for F2py to do it's thing. So if you can get a simple example to compile then you've done well.
Stefano Zaghi
@szaghi

@tomedunn and @/all interest (in particular @certik could be helpful...),
I have been successful to wrap some WenOOF facilities by means of ctypes, here is the python-wrapper test: this is already enough, but moving toward the wrapping of the actual interpolator object could be better. However, there are some things that I do not understand, in particular concerning ctypes.

The Fortran wrapped procedures now look like:

  subroutine wenoof_interpolate_c_wrap(S, stencil, interpolation) bind(c, name='wenoof_interpolate_c_wrap')
  !< Interpolate over stencils values by means of WenOOF interpolator.
  integer(C_INT32_T), intent(in), value  :: S                 !< Stencils number/dimension.
  real(C_DOUBLE),     intent(in)         :: stencil(1-S:-1+S) !< Stencil of the interpolation.
  real(C_DOUBLE),     intent(out)        :: interpolation     !< Result of the interpolation.

  call interpolator_c_wrap%interpolate(stencil=stencil, interpolation=interpolation)
  endsubroutine wenoof_interpolate_c_wrap

The current (naive) Python wrapper looks like:

from ctypes import CDLL, c_double, c_int, POINTER
import numpy as np

wenoof = CDLL('./shared/libwenoof.so')
wenoof.wenoof_interpolate_c_wrap.argtypes = [c_int, POINTER(c_double), POINTER(c_double)]

cells_number = 50
x_cell, dx = np.linspace(start=-np.pi, stop=np.pi, num=cells_number + 2 * stencils, endpoint=True, retstep=True)
y_cell = np.sin(x_cell)
y_weno = np.empty(cells_number + 2 * stencils, dtype="double")
interpolation = np.empty(1, dtype="double")
for i, x in enumerate(x_cell):
  if i >= stencils and i < cells_number + stencils:
    wenoof.wenoof_interpolate_c_wrap(stencils,
                                     y_cell[i+1-stencils:].ctypes.data_as(POINTER(c_double)),
                                     interpolation.ctypes.data_as(POINTER(c_double)))
    y_weno[i] = interpolation

The oddities for my limited vision are:

  • I was not able to use y_weno numpy array directly into the wenoof_interpolate_c_wrap calling: I have to pass through the temporary 1-element array interpolation; surely this is due to my ignorance about ctypes, but google does not offer me much insight...
  • I do not understand in details the rationale about data_as(POINTER...), refby and similar of ctypes and the corresponding Fortran exploitation of value attribute; the guide of @certik provides a very clear example, but it does not enter in details.

How do you simplify this workflow? Are the Fortran dummy arguments well defined? How the Python wrapper can be simplified/improved? At the end, how to wrap a class(integrator), allocatableobject or at least a type(integrator_js) concrete instance? I definitely need more documentation about Python-ctypes-Fortran mixing...

Thank you in advance for any hints.

Cheers

Stefano Zaghi
@szaghi

Dear @/all ,
I am facing on a very big issue with GNU gfortran and polymorphic functions (for your interest see this) related to serious memory leaks generation. If the bug is confirmed I think the fix will be difficult and it will require a lot of time.

I am wondering if there are some techniques or approaches (maybe developed when f2003/2008 features were not yet widely available) to mimic polymorphism. I would like to save the work done in last months and reuse as much as possible all the polymorphic libraries. I found a paper describing how to achieve multi-inheritance in f2003, maybe there are similar papers on how to achieve a sort of polymorphism without class(...), allocatable...

Thank you in advance for any hints!

Cheers

Chris MacMackin
@cmacmackin
There is already a report of this bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60913. My way of dealing with it (which doesn't stop all memory leaks but is enough to keep them to a manageable level) is that described on pages 236-244 of Rouson, Xia, and Xu's Scientific Software Design: The Object-Oriented Way. The main disadvantage of this approach is it means your routines can no longer be pure. You can see an example of this approach in my factual library. Let me know if you want me to explain some of the details of it.
Stefano Zaghi
@szaghi
Hi Chris! @cmacmackin Thank you very much for the hints. I searched for similar issues, but I missed it yesterday. Surprising, now that you point it out, I am quite sure I have already it (probably again from one of your hints). This makes me very sad: that report is dated 2014! I am also surprise about the comments of Damian in that report (@rouson). I am also surprised that in his book there is a workaround, I'll go to re-study it right now. Frankly speaking, this is a nightmare (see also the comments of Ian Harvey): I am seriously thinking to left Fortran...
Stefano Zaghi
@szaghi
@cmacmackin @rouson I just re-read the note Warning: Calculus May Cause Memory Loss, Bloating and Insomnia on page 238... arghhhhh tell me this is a nightmare!!!!
Stefano Zaghi
@szaghi

@cmacmackin Chris, I need your help...

I re-read the @rouson book, but that part looks obscure in the description of the leak-free hermetic field (this is probably why I missed in my first lecture...). I have then studied your factual but I have difficult to understand the *rationale".

I am focused on how you implement the operators, say this multiplication that I report here

  function array_scalar_sf_m_sf(this,rhs) result(res)
    class(array_scalar_field), intent(in) :: this
    class(scalar_field), intent(in) :: rhs
    class(scalar_field), allocatable :: res !! The restult of this operation
    class(array_scalar_field), allocatable :: local

    call this%guard_temp(); call rhs%guard_temp() 
   allocate(local, mold=this)
    call local%assign_meta_data(this)
    select type(rhs)
    class is(array_scalar_field)
      local%field_data%array = this%field_data%array * rhs%field_data%array
    class is(uniform_scalar_field)
      local%field_data%array = this%field_data%array * rhs%get_value()
    end select
    call move_alloc(local,res)
    call res%set_temp()
    call this%clean_temp(); call rhs%clean_temp()
  end function array_scalar_sf_m_sf

I see that guard_temp method increase the temporary level if they are already set as temporary, so this and rhs are optional now more temporary... why this is necessary? My first guess is that this and rhs should not be a problem, but I could underestimate the function recursive execution scope I think.

Why the local is necessary? Is there something implied in the move_alloc?

Why res is set as temporary? I guess because then it can be properly finalized, but I cannot figured out who then is able to finalize res.

Chris MacMackin
@cmacmackin
To see why this is necessary, consider the following code.
program example
  use factual_mod
  implicit none
  type(cheb1d_scalar_field) :: a, b, c

  a = b * c
  a = do_thing(b, c)
  a = do_thing(b*b, c)

contains

  function do_thing(thing1, thing2)
    class(scalar_field), intent(in) :: thing1, thing2
    class(scalar_field), allocatable :: do_thing
    call thing1%guard_temp(); call thing2%guard_temp()
    call thing1%allocate_scalar_field(do_thing)
    do_thing = thing1*thing2
    call thing1%clean_temp(); call thing2%clean_temp()
    call do_thing%set_temp()
  end function do_thing

end program example
When a = b*c is executed, b, and c are both non-temporary variables. As such, they should not be finalised. So far, so obvious. However, the actual arguments to function do_thing may or may not be temporary. If they are not temporary (e.g. when executing a = do_thing(b, c)), then this function will not finalise them and nor will the multiplication routine.
Chris MacMackin
@cmacmackin
Memory leaks occur when the variable returned from a function are not finalised after assignment or use as an argument. (Note that this issue can sometimes be avoided by returning pointer variables and using pointer assignment, at the cost of lots of manual memory-management. ) As such, wee need to have the clean_temp method which is called at the end of all procedures using fields as arguments. However, we don't want to finalise any fields which have been assigned to variables, so we must distinguish between these ones and temporary ones which are just function results.
Chris MacMackin
@cmacmackin

Function results may or may not be assigned to a variable, so we must assume that they are temporary, hence why set_temp is called for the function results. In the defined assignment operator, the LHS argument will be set not to be temporary so that it will not be finalised prematurely.

Where things get complicated is when a function receives a temporary field as an argument and must pass it to another procedure. This happens in do_thing when it is called in the a = do_thing(b*b, c) statement. As b*b is temporary, it will be guarded at the start of do_thing. When we pass it to the multiplication method, we don't want it to be finalised yet (as we might want to do something else with it in do_thing). As such, the call to set_temp in the multiplication routine must increment the temporary-count of this argument. That way, it will know not to finalise it when clean_temp is called in the multiplication routine. Instead, clean_temp will just decrement the temporary-count again. When the clean_temp call is made in do_thing, the count will be back down to its initial value and clean_temp will know that it can safely finalise the argument.

I hope that clarifies things.
Stefano Zaghi
@szaghi

@cmacmackin Chris all are more clear, but indeed, I think this does not work my bug. Now I understand the care you paid on finalization, but the point is that in my test case the finalization is totally not useful... my type prototype (your fields) is something like


type :: field
   real :: i_am_static(length)
   contains
     procedure...
endtyep field

Now if I add a finalizer to field it will have no effect on the i_am_static member of the type. My leaks originate to the fact that gfortran is not able to free the static memory of class(...), allocatable function results. If I made the static member allocatable the leaks seem to vanish (but not if the allocatables are other types...) as well as if I trim out the polymorphism defining the result as type(...), allocatable the finalization is automatically done right with both static and dynamic members. So, you workaround could be very useful to fix related to dynamic members that are other class/types, but it has no effect on the static member. Tomorrow I'll try your workaround in more details, but I am going to sleep very sadly...

Anyhow, thank you very much, you are great!

Chris MacMackin
@cmacmackin
You are correct. What I do is make all of my large type components dynamic, which happened to be the case for my field types anyway. Static components can not be finalised, hence why I said that this approach doesn't stop all memory leaks. Actually, it gets a little bit more complicated then this, because you can not deallocate components of an intent(in) argument. However, in a non-pure procedure, pointer components of intent(in) objects are a bit odd. It is only required that you don't change the pointer--there is no issue with changing the thing that it points to.
What I did was define an additional, transparent, derived type called called array_1d:
```fortran
  type, public :: array_1d
    real(r8), dimension(:), allocatable, public :: array
  end type array_1d
I have similar types for higher-dimensional arrays. That way I can just deallocate the component array. You can see this in action in the source code.
Stefano Zaghi
@szaghi
@cmacmackin Chris, thank you very much for your help! Indeed, my fields have also big components defined defined as allocatable, but they have also many static components: for an integrand field could be a big block with not only the fluid dynamic fields, but also grid dimensions, species concentration, boundary conditions types... alone they could be few static bytes for each integrand, but then you have to consider that each integrand is integrated (namely added, multiplied, divided...) many times for each time step and for time accurate simulations you perform millions/billions of time steps... the few bytes leaked become quickly gigabytes. Put all of these into HPC view... it is not acceptable. Anyhow, thank you again!
Izaak "Zaak" Beekman
@zbeekman
Hi all, @DmitryLyakh pointed out his cool looking project (hosted on GitLab): Generic Fortran Containers I just thought I would pass it along!
Stefano Zaghi
@szaghi
@zbeekman @DmitryLyakh GFC is very interesting! Thank you both! Why not also GitHub? There are other sources of documentation?
Chris MacMackin
@cmacmackin

@szaghi I've come up with an idea which should work better than the "forced finalisation" approach which I'm using currently. I'll still use my guard_/clean_temp methods, but I'll couple them with an object pool. That way, when a temporary object is ready to be "cleaned", I can simply release it back into the object pool for later reuse and no memory is leaked. A pool of 100 or so should be more than enough for most applications and would have a reasonably manageable minimal memory footprint.

This approach still would not be ammenable to pure procedures, so you likely won't want to take it. However, I thought it might be worth mentioning on here in case anyone else is interested. Note that I have not actually tested it yet, or done more than sketch out the basic details.

Stefano Zaghi
@szaghi
@cmacmackin Chris, thank you very much, this is interesting. Please, keep me informed about it, in particular if you test it on your factual. Currently, I think I have found my nirvana coupling abstracts with pure math-operators (that has a restriction on abstraction, but a great performance boos), but your object pool approach could come in hand for totally abstract non pure operators. Thank you very much for your help!
Chris MacMackin
@cmacmackin
@szaghi How much of a performance boost is there from using pure procedures? Do you know what sorts of optimisations are used? This is something I've wondered about.
Stefano Zaghi
@szaghi
@cmacmackin Chris, this is just a feeling, I have not yet done accurate comparisons (just run some small tests, indeed large being 1D tests), the non-pure allocatable polymorphic version was not really usable in production due to the memory leaks. Today, I can try a more rigorous analysis (with gcc 7.1), but in the 1D tests I did, the performance improvement seems visible. However, this fact could be not (only) related to the purity: now my math operators really works on plain real arrays, each operator (+, -, *, /, **) returns a real array, polymorphic classes are totally out from this kind of operators (polymorphism returns in play, without allocatable, in assignment) , thus I think that the intrinsic optimal handling of arrays of Fortran can play a role (or I had a wrong feeling and the performance boost is not there :cry: ).
Chris MacMackin
@cmacmackin
@szaghi I just wonder because I was under the impression that PURE was mostly used for handling things like parallelisation. I'd have thought that most of the opportunities for parallelisation would occur within the type-bound operators and not in making parallel calls to the operators themselves. The big advantage I can see to your new approach, though, is that it would make it much easier to use abstract calculus with coarrays, since function results are not allowed to contain coarray components. In Scientific Software Design it was proposed that you would essentially have two versions of your types: one with coarray components and one without, where the non-coarray version would be used for function results. This greatly increases the ammount of code needed, whereas just using arrays would be much simpler. The disadvantage is that it becomes harder to use new defined operators, such as .div., .grad., .curl., on function results because they wouldn't have the necessary information about grid-layout.
Chris MacMackin
@cmacmackin

On a different note, you say you're using gcc 7.1. I compiled that today using the OpenCoarrays script. I wanted to see if it got rid of the memory leaks in my project. However, when I tried running my test suite, I found that it produced the error

Fortran runtime error: Recursive call to nonrecursive procedure 'cheb1d_scalar_grid_spacing'

When I examined the backtrace and the code, it seemed that a call to totally different type-bound procedure got mixed up with the one called grid_spacing. This happened twice, which is what ended up producing the "recursion". I have no idea what could be wrong with the compiler to produce this. Is it working properly for you?

Stefano Zaghi
@szaghi
@cmacmackin I run a simple test with 7.1. If you can wait few minutes I can try a more serious test (memory leaks seem to be still here with 7.1...)
Chris MacMackin
@cmacmackin
Good to know.
Stefano Zaghi
@szaghi

@cmacmackin Chris, I have just run a more complex test with this

╼ stefano@zaghi(02:32 PM Thu May 04) on feature/add-riemann-2D-tests [!?] desk {gcc-7.1.0 - gcc 7.1.0 environment}
├───╼ ~/fortran/FORESEER 15 files, 840Kb
└──────╼ gfortran --version
GNU Fortran (GCC) 7.1.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It seems to work exactly as in gcc 6.3

the test is in FORESEER, this one and uses a lot of OOP
Chris MacMackin
@cmacmackin
Okay, something must have gone wrong with how I compiled it. If it doesn't solve the memory leaks then I won't bother pursuing it any further.
Stefano Zaghi
@szaghi
Let me check the memory leaks issues with the dedicated tests, few minutes again :smile:
@cmacmackin Chris, we are not very fortunate... the leaks seems to be still there
╼ stefano@zaghi(02:43 PM Thu May 04) on master desk {gcc-7.1.0 - gcc 7.1.0 environment}
├───╼ ~/fortran/leaks_hunter 3 files, 88Kb
└──────╼ scripts/compile.sh src/leaks_raiser_static_intrinsic.f90 

┌╼ stefano@zaghi(02:43 PM Thu May 04) on master [?] desk {gcc-7.1.0 - gcc 7.1.0 environment}
├───╼ ~/fortran/leaks_hunter 4 files, 100Kb
└──────╼ scripts/run_valgrind.sh 
==59798== Memcheck, a memory error detector
==59798== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==59798== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
...
==59798== HEAP SUMMARY:
==59798==     in use at exit: 4 bytes in 1 blocks
==59798==   total heap usage: 20 allocs, 19 frees, 12,012 bytes allocated
==59798==
==59798== Searching for pointers to 1 not-freed blocks
==59798== Checked 101,856 bytes
==59798==
==59798== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==59798==    at 0x4C2AF1F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==59798==    by 0x40075C: __static_intrinsic_type_m_MOD_add_static_intrinsic_type (leaks_raiser_static_intrinsic.f90:24)
==59798==    by 0x40084D: MAIN__ (leaks_raiser_static_intrinsic.f90:37)
==59798==    by 0x40089F: main (leaks_raiser_static_intrinsic.f90:30)
==59798==
==59798== LEAK SUMMARY:
==59798==    definitely lost: 4 bytes in 1 blocks
==59798==    indirectly lost: 0 bytes in 0 blocks
==59798==      possibly lost: 0 bytes in 0 blocks
==59798==    still reachable: 0 bytes in 0 blocks
==59798==         suppressed: 0 bytes in 0 blocks
==59798==
==59798== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==59798== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Chris MacMackin
@cmacmackin
Was it only 4 bytes lost before? I'd almost worry that was just some issue with initialisation or something.
Stefano Zaghi
@szaghi
@cmacmackin Chris, this is a synthetic test designed to raise GNU memory leaks, you can check it on leaks_hunter
The test is very simple, it must return 0 bytes lost
In few hours I should be able to compare performances of polymorphic operators and real ones
Damian Rouson
@rouson
I haven't followed this discussion in detail. As you guys know, I'll respond a lot more in calls than text of any form. I just can't keep up with all the text flying by me every day. Maybe it's a sign of my age. What caught my eye was the mention of @cmacmackin mentioning guard and clean_tmp. Whoa... that's old-school. Presumably you picked this up from the 2003 paper by GW Stewart in ACM Fortran Forum -- not sure if Markdown syntax works here. Say it isn't so! If you're doing such things under the hood and are really confident that you have a scheme to get it right and that users of your code will never need it, then it's ok as a last resort. Otherwise, whatever led you down this path has to get fixed in the compiler or it's sure to come back to haunt users down the road. I did similar things in a paper from 2006, but in a limited setting (expression evaluation) with a clearly articulated (published) strategy that I'd like to think covered all the cases we cared about. We at least made an attempt to automate such matters in the papers that I've sent @szaghi so I'd recommend that route if it's applicable over guard and clean_temp. I really hope the compiler situation hasn't set us back 14 years. That would be a travesty.
Damian Rouson
@rouson
Also, I don't think the way to think about PURE is in terms of whether the attribute in and of itself speeds up code. That's pretty unlikely. I think of PURE and DO CONCURRENT in terms of the discipline they impart on the programmer to do things that can aid in optimization. If you write functions that conform to the restrictions that PURE requires but don't mark them PURE, a sufficiently smart compiler can do all the same related optimizations anyway. In fact, the gfortran compiler tries to detect whether your procedure could have been marked PURE even if it wasn't and the compiler marks such procedures as implicitly pure internally. Likewise, if you do the things that DO CONCURRENT requires but write a regular DO loop, a sufficiently smart compiler will be able to optimize the code in the ways that DO CONCURRENT affords (and you will also find it easier to use other speedup strategies such as multithreading with OpenMP). Then the question becomes the reverse: if I violate the requirements of PURE and DO CONCURRENT, what compiler optimizations am I preventing. Framing it this way also shows how difficult the question is to answer because one then has to ask, well... how badly am I going to violate the restrictions. The skies the limit and one can slow code down quite considerably that way if one wants to do so. For example, PURE doesn't allow for I/O. You can slow a code down as much as you want in direct proportion with the size of the file you read or write. It's kind of like when someone goes to the doctor and says, "It hurts when I do this." and the doctor responds simply, "Then don't do that."
Stefano Zaghi
@szaghi

@rouson , Damian,
thank you for your reply.

I haven't followed this discussion in detail. As you guys know, I'll respond a lot more in calls than text of any form. I just can't keep up with all the text flying by me every day. Maybe it's a sign of my age.

:smile: On the contrary, my bad spoken English prevent me to call you almost all the time...

What caught my eye was the mention of @cmacmackin mentioning guard and clean_tmp.

Good to know, I'll use that word when I really need to catch your attention :smile:

I don't think the way to think about PURE is in terms of whether the attribute in and of itself speeds up code ... if I violate the requirements of PURE and DO CONCURRENT, what compiler optimizations am I preventing.

This is exactly my point: what I would like to say Chris is that the polymorphic allocatable version violate the pure condition thus it is likely preventing optimizer, whereas the non polymorphic operators version is pure (in its contents and with the explicit attribute) thus it is likely more easy to be optimized. I was not concerned about the declaration rather about the actual contents. Moreover, I specified to Chris that performance I gained is more likely due to the fact that now the math operators act on plain real arrays, thus the compiler optimizer could be even more flavored.

The performance comparison for Chris has not yet started: my cluster and my workstation are crunching numbers this weekend, it will come next week. However I did a more synthetic" test to evaluate the *defined operators overhead in different circumstances. I compared:

  • user defined operators acting/returning real array defined as automatic array that are likely allocated on stack;
  • user defined operators acting/returning real array defined as allocatable array that are likely allocated on heap;
  • user defined operators acting/returning polymorphic allocatable class that are likely allocated on stack;
  • plain intrinsic arrays operators acting on real arrays for the reference;

My results was dramatic: all user defined operators have at least 50% overhead with respect plain intrinsic operators, with, in general, the polymorphic version the worst followed by the automatic arrays one and with the allocatable array version the better. I would really like to know you opinions. My test results can be found online here and test is this. For the sake of clearness I report the fortran code below. I hope I had make some design mistakes in the test, because the overhead is really not negligible. Are these results expected for you?

User defined operators overhead hunter

! A DEFY (DEmystyfy Fortran mYths) test.
! Author: Stefano Zaghi
! Date: 2017-05-05
!
! License: this file is licensed under the Creative Commons Attribution 4.0 license,
! see http://creativecommons.org/licenses/by/4.0/ .

module arrays
   use, intrinsic :: iso_fortran_env, only : real64

   implicit none

   type :: array_automatic
      integer                   :: n
      real(real64), allocatable :: x(:)
      contains
         procedure, pass(lhs) :: add_automatic
         generic :: operator(+) => add_automatic
         procedure, pass(lhs) :: assign_automatic
         generic :: assignment(=) => assign_automatic
   endtype array_automatic

   type :: array_allocatable
      integer                   :: n
      real(real64), allocatable :: x(:)
      contains
         procedure, pass(lhs) :: add_allocatable
         generic :: operator(+) => add_allocatable
         procedure, pass(lhs) :: assign_allocatable
         generic :: assignment(=) => assign_allocatable
   endtype array_allocatable

   type, abstract :: array_polymorphic_abstract
      contains
         procedure(add_interface), pass(lhs), deferred :: add_polymorphic
         generic :: operator(+) => add_polymorphic
         procedure(assign_interface),      pass(lhs), deferred :: assign_polymorphic
         procedure(assign_real_interface), pass(lhs), deferred :: assign_polymorphic_real
         generic :: assignment(=) => assign_polymorphic, assign_polymorphic_real
   endtype array_polymorphic_abstract

   type, extends(array_polymorphic_abstract) :: array_polymorphic
      integer                   :: n
      real(real64), allocatable :: x(:)
      contains
         procedure, pass(lhs) :: add_polymorphic
         procedure, pass(lhs) :: assign_polymorphic
         procedure, pass(lhs) :: assign_polymorphic_real
   endtype array_polymorphic

   abstract interface
      pure function add_interface(lhs, rhs) result(opr)
      import :: array_polymorphic_abstract
      class(array_polymorphic_abstract), intent(in)  :: lhs
      class(array_polymorphic_abstract), intent(in)  :: rhs
      class(array_polymorphic_abstract), allocatable :: opr
      endfunction add_interface

      pure subroutine assign_interface(lhs, rhs)
      import :: array_polymorphic_abstract
      class(array_polymorphic_abstract), intent(inout) :: lhs
      class(array_polymorphic_abstract), intent(in)    :: rhs
      endsubroutine assign_interface

      pure subroutine assign_real_interface(lhs, rhs)
      import :: array_polymorphic_abstract, real64
      class(array_polymorphic_abstract), intent(inout) :: lhs
      real(real64),                      intent(in)    :: rhs(1:)
      endsubroutine assign_real_interface
   endinterface

   contains
      pure function add_automatic(lhs, rhs) result(opr)
      class(array_automatic), intent(in) :: lhs
      type(array_automatic),  intent(in) :: rhs
      real(real64)                       :: opr(1:lhs%n)

      opr = lhs%x + rhs%x
      endfunction add_automatic

      pure subroutine assign_automatic(lhs, rhs)
      class(array_automatic), intent(inout) :: lhs
      real(real64),           intent(in)    :: rhs(1:)

      lhs%n = size(rhs, dim=1)
      lhs%x = rhs
      endsubroutine assign_automatic

      pure function add_allocatable(lhs, rhs) result(opr)
      class(array_allocatable), intent(in) :: lhs
      type(array_allocatable),  intent(in) :: rhs
      real(real64), allocatable            :: opr(:)

      opr = lhs%x + rhs%x
      endfunction add_allocatable

      pure subroutine assign_allocatable(lhs, rhs)
      class(array_allocatable), intent(inout) :: lhs
      real(real64),             intent(in)    :: rhs(1:)

      lhs%n = size(rhs, dim=1)
      lhs%x = rhs
      endsubroutine assign_allocatable
      pure function add_polymorphic(lhs, rhs) result(opr)
      class(array_polymorphic),          intent(in)  :: lhs
      class(array_polymorphic_abstract), intent(in)  :: rhs
      class(array_polymorphic_abstract), allocatable :: opr

      allocate(array_polymorphic :: opr)
      select type(opr)
      class is(array_polymorphic)
         select type(rhs)
         class is(array_polymorphic)
            opr%x = lhs%x + rhs%x
         endselect
      endselect
      endfunction add_polymorphic

      pure subroutine assign_polymorphic(lhs, rhs)
      class(array_polymorphic),          intent(inout) :: lhs
      class(array_polymorphic_abstract), intent(in)    :: rhs

      select type(rhs)
      class is(array_polymorphic)
         lhs%n = rhs%n
         lhs%x = rhs%x
      endselect
      endsubroutine assign_polymorphic

      pure subroutine assign_polymorphic_real(lhs, rhs)
      class(array_polymorphic), intent(inout) :: lhs
      real(real64),             intent(in)    :: rhs(1:)

      lhs%n = size(rhs, dim=1)
      lhs%x = rhs
      endsubroutine assign_polymorphic_real
endmodule arrays
program defy
   use, intrinsic :: iso_fortran_env, only : int64, real64
   use arrays, only : array_automatic, array_allocatable, array_polymorphic
   implicit none
   real(real64), allocatable :: a_intrinsic(:)
   real(real64), allocatable :: b_intrinsic(:)
   real(real64), allocatable :: c_intrinsic(:)
   type(array_automatic)     :: a_automatic
   type(array_automatic)     :: b_automatic
   type(array_automatic)     :: c_automatic
   type(array_allocatable)   :: a_allocatable
   type(array_allocatable)   :: b_allocatable
   type(array_allocatable)   :: c_allocatable
   type(array_polymorphic)   :: a_polymorphic
   type(array_polymorphic)   :: b_polymorphic
   type(array_polymorphic)   :: c_polymorphic
   integer(int64)            :: tic_toc(1:2)
   integer(int64)            :: count_rate
   real(real64)              :: intrinsic_time
   real(real64)              :: time
   integer                   :: N
   integer                   :: Nn
   integer                   :: i

   N = 100000
   Nn = N/100
   a_intrinsic   = [(real(i, kind=real64), i=1,N)]
   b_intrinsic   = [(real(i, kind=real64), i=1,N)]
   a_automatic   = [(real(i, kind=real64), i=1,N)]
   b_automatic   = [(real(i, kind=real64), i=1,N)]
   a_allocatable = [(real(i, kind=real64), i=1,N)]
   b_allocatable = [(real(i, kind=real64), i=1,N)]
   a_polymorphic = [(real(i, kind=real64), i=1,N)]
   b_polymorphic = [(real(i, kind=real64), i=1,N)]

   call system_clock(tic_toc(1), count_rate)
   do i=1, Nn
     c_intrinsic = a_intrinsic + b_intrinsic
   enddo
   call system_clock(tic_toc(2), count_rate)
   intrinsic_time = (tic_toc(2) - tic_toc(1)) / real(count_rate, kind=real64)
   print*, 'intrinsic: ', intrinsic_time

   call system_clock(tic_toc(1), count_rate)
   do i=1, Nn
     c_automatic = a_automatic + b_automatic
   enddo
   call system_clock(tic_toc(2), count_rate)
   time = (tic_toc(2) - tic_toc(1)) / real(count_rate, kind=real64)
   print*, 'automatic: ', time, ' + %(intrinsic): ', 100._real64 - intrinsic_time / time * 100

   call system_clock(tic_toc(1), count_rate)
   do i=1, Nn
     c_allocatable = a_allocatable + b_allocatable
   enddo