by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Mar 26 17:48

    szaghi on master

    update submodules (compare)

  • Mar 25 13:39

    szaghi on master

    update submodules (compare)

  • Nov 14 2019 20:49
    letmaik opened #39
  • Oct 25 2019 09:35

    szaghi on master

    update submodules update travis config (compare)

  • Oct 25 2019 09:30

    szaghi on master

    update submodules (compare)

  • Oct 25 2019 09:19

    szaghi on master

    update submodules update travis config (compare)

  • Oct 21 2019 06:34
    rakowsk commented #7
  • Oct 20 2019 16:09
    unfurl-links[bot] commented #7
  • Oct 20 2019 16:09
    rakowsk commented #7
  • Oct 12 2019 17:49
    ShatrovOA commented #38
  • Oct 11 2019 15:25
    szaghi labeled #38
  • Oct 11 2019 15:25
    szaghi assigned #38
  • Oct 11 2019 15:25
    szaghi commented #38
  • Oct 11 2019 13:52
    ShatrovOA edited #38
  • Oct 11 2019 13:44
    ShatrovOA opened #38
  • Sep 19 2019 11:19
    szaghi commented #7
  • Sep 19 2019 11:08

    szaghi on master

    Fix parsing bug issue#7 The me… update travis config Merge branch 'release/0.1.0' (compare)

  • Sep 19 2019 11:06

    szaghi on fix-parsing-bug-issue#7

    (compare)

  • Sep 19 2019 07:54

    szaghi on fix-parsing-bug-issue#7

    (compare)

  • Sep 19 2019 07:52
    szaghi commented #7
Stefano Zaghi
@szaghi

@cmacmackin Chris all are more clear, but indeed, I think this does not work my bug. Now I understand the care you paid on finalization, but the point is that in my test case the finalization is totally not useful... my type prototype (your fields) is something like


type :: field
   real :: i_am_static(length)
   contains
     procedure...
endtyep field

Now if I add a finalizer to field it will have no effect on the i_am_static member of the type. My leaks originate to the fact that gfortran is not able to free the static memory of class(...), allocatable function results. If I made the static member allocatable the leaks seem to vanish (but not if the allocatables are other types...) as well as if I trim out the polymorphism defining the result as type(...), allocatable the finalization is automatically done right with both static and dynamic members. So, you workaround could be very useful to fix related to dynamic members that are other class/types, but it has no effect on the static member. Tomorrow I'll try your workaround in more details, but I am going to sleep very sadly...

Anyhow, thank you very much, you are great!

Chris MacMackin
@cmacmackin
You are correct. What I do is make all of my large type components dynamic, which happened to be the case for my field types anyway. Static components can not be finalised, hence why I said that this approach doesn't stop all memory leaks. Actually, it gets a little bit more complicated then this, because you can not deallocate components of an intent(in) argument. However, in a non-pure procedure, pointer components of intent(in) objects are a bit odd. It is only required that you don't change the pointer--there is no issue with changing the thing that it points to.
What I did was define an additional, transparent, derived type called called array_1d:
```fortran
  type, public :: array_1d
    real(r8), dimension(:), allocatable, public :: array
  end type array_1d
I have similar types for higher-dimensional arrays. That way I can just deallocate the component array. You can see this in action in the source code.
Stefano Zaghi
@szaghi
@cmacmackin Chris, thank you very much for your help! Indeed, my fields have also big components defined defined as allocatable, but they have also many static components: for an integrand field could be a big block with not only the fluid dynamic fields, but also grid dimensions, species concentration, boundary conditions types... alone they could be few static bytes for each integrand, but then you have to consider that each integrand is integrated (namely added, multiplied, divided...) many times for each time step and for time accurate simulations you perform millions/billions of time steps... the few bytes leaked become quickly gigabytes. Put all of these into HPC view... it is not acceptable. Anyhow, thank you again!
Izaak "Zaak" Beekman
@zbeekman
Hi all, @DmitryLyakh pointed out his cool looking project (hosted on GitLab): Generic Fortran Containers I just thought I would pass it along!
Stefano Zaghi
@szaghi
@zbeekman @DmitryLyakh GFC is very interesting! Thank you both! Why not also GitHub? There are other sources of documentation?
Chris MacMackin
@cmacmackin

@szaghi I've come up with an idea which should work better than the "forced finalisation" approach which I'm using currently. I'll still use my guard_/clean_temp methods, but I'll couple them with an object pool. That way, when a temporary object is ready to be "cleaned", I can simply release it back into the object pool for later reuse and no memory is leaked. A pool of 100 or so should be more than enough for most applications and would have a reasonably manageable minimal memory footprint.

This approach still would not be ammenable to pure procedures, so you likely won't want to take it. However, I thought it might be worth mentioning on here in case anyone else is interested. Note that I have not actually tested it yet, or done more than sketch out the basic details.

Stefano Zaghi
@szaghi
@cmacmackin Chris, thank you very much, this is interesting. Please, keep me informed about it, in particular if you test it on your factual. Currently, I think I have found my nirvana coupling abstracts with pure math-operators (that has a restriction on abstraction, but a great performance boos), but your object pool approach could come in hand for totally abstract non pure operators. Thank you very much for your help!
Chris MacMackin
@cmacmackin
@szaghi How much of a performance boost is there from using pure procedures? Do you know what sorts of optimisations are used? This is something I've wondered about.
Stefano Zaghi
@szaghi
@cmacmackin Chris, this is just a feeling, I have not yet done accurate comparisons (just run some small tests, indeed large being 1D tests), the non-pure allocatable polymorphic version was not really usable in production due to the memory leaks. Today, I can try a more rigorous analysis (with gcc 7.1), but in the 1D tests I did, the performance improvement seems visible. However, this fact could be not (only) related to the purity: now my math operators really works on plain real arrays, each operator (+, -, *, /, **) returns a real array, polymorphic classes are totally out from this kind of operators (polymorphism returns in play, without allocatable, in assignment) , thus I think that the intrinsic optimal handling of arrays of Fortran can play a role (or I had a wrong feeling and the performance boost is not there :cry: ).
Chris MacMackin
@cmacmackin
@szaghi I just wonder because I was under the impression that PURE was mostly used for handling things like parallelisation. I'd have thought that most of the opportunities for parallelisation would occur within the type-bound operators and not in making parallel calls to the operators themselves. The big advantage I can see to your new approach, though, is that it would make it much easier to use abstract calculus with coarrays, since function results are not allowed to contain coarray components. In Scientific Software Design it was proposed that you would essentially have two versions of your types: one with coarray components and one without, where the non-coarray version would be used for function results. This greatly increases the ammount of code needed, whereas just using arrays would be much simpler. The disadvantage is that it becomes harder to use new defined operators, such as .div., .grad., .curl., on function results because they wouldn't have the necessary information about grid-layout.
Chris MacMackin
@cmacmackin

On a different note, you say you're using gcc 7.1. I compiled that today using the OpenCoarrays script. I wanted to see if it got rid of the memory leaks in my project. However, when I tried running my test suite, I found that it produced the error

Fortran runtime error: Recursive call to nonrecursive procedure 'cheb1d_scalar_grid_spacing'

When I examined the backtrace and the code, it seemed that a call to totally different type-bound procedure got mixed up with the one called grid_spacing. This happened twice, which is what ended up producing the "recursion". I have no idea what could be wrong with the compiler to produce this. Is it working properly for you?

Stefano Zaghi
@szaghi
@cmacmackin I run a simple test with 7.1. If you can wait few minutes I can try a more serious test (memory leaks seem to be still here with 7.1...)
Chris MacMackin
@cmacmackin
Good to know.
Stefano Zaghi
@szaghi

@cmacmackin Chris, I have just run a more complex test with this

╼ stefano@zaghi(02:32 PM Thu May 04) on feature/add-riemann-2D-tests [!?] desk {gcc-7.1.0 - gcc 7.1.0 environment}
├───╼ ~/fortran/FORESEER 15 files, 840Kb
└──────╼ gfortran --version
GNU Fortran (GCC) 7.1.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It seems to work exactly as in gcc 6.3

the test is in FORESEER, this one and uses a lot of OOP
Chris MacMackin
@cmacmackin
Okay, something must have gone wrong with how I compiled it. If it doesn't solve the memory leaks then I won't bother pursuing it any further.
Stefano Zaghi
@szaghi
Let me check the memory leaks issues with the dedicated tests, few minutes again :smile:
@cmacmackin Chris, we are not very fortunate... the leaks seems to be still there
╼ stefano@zaghi(02:43 PM Thu May 04) on master desk {gcc-7.1.0 - gcc 7.1.0 environment}
├───╼ ~/fortran/leaks_hunter 3 files, 88Kb
└──────╼ scripts/compile.sh src/leaks_raiser_static_intrinsic.f90 

┌╼ stefano@zaghi(02:43 PM Thu May 04) on master [?] desk {gcc-7.1.0 - gcc 7.1.0 environment}
├───╼ ~/fortran/leaks_hunter 4 files, 100Kb
└──────╼ scripts/run_valgrind.sh 
==59798== Memcheck, a memory error detector
==59798== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==59798== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
...
==59798== HEAP SUMMARY:
==59798==     in use at exit: 4 bytes in 1 blocks
==59798==   total heap usage: 20 allocs, 19 frees, 12,012 bytes allocated
==59798==
==59798== Searching for pointers to 1 not-freed blocks
==59798== Checked 101,856 bytes
==59798==
==59798== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==59798==    at 0x4C2AF1F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==59798==    by 0x40075C: __static_intrinsic_type_m_MOD_add_static_intrinsic_type (leaks_raiser_static_intrinsic.f90:24)
==59798==    by 0x40084D: MAIN__ (leaks_raiser_static_intrinsic.f90:37)
==59798==    by 0x40089F: main (leaks_raiser_static_intrinsic.f90:30)
==59798==
==59798== LEAK SUMMARY:
==59798==    definitely lost: 4 bytes in 1 blocks
==59798==    indirectly lost: 0 bytes in 0 blocks
==59798==      possibly lost: 0 bytes in 0 blocks
==59798==    still reachable: 0 bytes in 0 blocks
==59798==         suppressed: 0 bytes in 0 blocks
==59798==
==59798== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==59798== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Chris MacMackin
@cmacmackin
Was it only 4 bytes lost before? I'd almost worry that was just some issue with initialisation or something.
Stefano Zaghi
@szaghi
@cmacmackin Chris, this is a synthetic test designed to raise GNU memory leaks, you can check it on leaks_hunter
The test is very simple, it must return 0 bytes lost
In few hours I should be able to compare performances of polymorphic operators and real ones
Damian Rouson
@rouson
I haven't followed this discussion in detail. As you guys know, I'll respond a lot more in calls than text of any form. I just can't keep up with all the text flying by me every day. Maybe it's a sign of my age. What caught my eye was the mention of @cmacmackin mentioning guard and clean_tmp. Whoa... that's old-school. Presumably you picked this up from the 2003 paper by GW Stewart in ACM Fortran Forum -- not sure if Markdown syntax works here. Say it isn't so! If you're doing such things under the hood and are really confident that you have a scheme to get it right and that users of your code will never need it, then it's ok as a last resort. Otherwise, whatever led you down this path has to get fixed in the compiler or it's sure to come back to haunt users down the road. I did similar things in a paper from 2006, but in a limited setting (expression evaluation) with a clearly articulated (published) strategy that I'd like to think covered all the cases we cared about. We at least made an attempt to automate such matters in the papers that I've sent @szaghi so I'd recommend that route if it's applicable over guard and clean_temp. I really hope the compiler situation hasn't set us back 14 years. That would be a travesty.
Damian Rouson
@rouson
Also, I don't think the way to think about PURE is in terms of whether the attribute in and of itself speeds up code. That's pretty unlikely. I think of PURE and DO CONCURRENT in terms of the discipline they impart on the programmer to do things that can aid in optimization. If you write functions that conform to the restrictions that PURE requires but don't mark them PURE, a sufficiently smart compiler can do all the same related optimizations anyway. In fact, the gfortran compiler tries to detect whether your procedure could have been marked PURE even if it wasn't and the compiler marks such procedures as implicitly pure internally. Likewise, if you do the things that DO CONCURRENT requires but write a regular DO loop, a sufficiently smart compiler will be able to optimize the code in the ways that DO CONCURRENT affords (and you will also find it easier to use other speedup strategies such as multithreading with OpenMP). Then the question becomes the reverse: if I violate the requirements of PURE and DO CONCURRENT, what compiler optimizations am I preventing. Framing it this way also shows how difficult the question is to answer because one then has to ask, well... how badly am I going to violate the restrictions. The skies the limit and one can slow code down quite considerably that way if one wants to do so. For example, PURE doesn't allow for I/O. You can slow a code down as much as you want in direct proportion with the size of the file you read or write. It's kind of like when someone goes to the doctor and says, "It hurts when I do this." and the doctor responds simply, "Then don't do that."
Stefano Zaghi
@szaghi

@rouson , Damian,
thank you for your reply.

I haven't followed this discussion in detail. As you guys know, I'll respond a lot more in calls than text of any form. I just can't keep up with all the text flying by me every day. Maybe it's a sign of my age.

:smile: On the contrary, my bad spoken English prevent me to call you almost all the time...

What caught my eye was the mention of @cmacmackin mentioning guard and clean_tmp.

Good to know, I'll use that word when I really need to catch your attention :smile:

I don't think the way to think about PURE is in terms of whether the attribute in and of itself speeds up code ... if I violate the requirements of PURE and DO CONCURRENT, what compiler optimizations am I preventing.

This is exactly my point: what I would like to say Chris is that the polymorphic allocatable version violate the pure condition thus it is likely preventing optimizer, whereas the non polymorphic operators version is pure (in its contents and with the explicit attribute) thus it is likely more easy to be optimized. I was not concerned about the declaration rather about the actual contents. Moreover, I specified to Chris that performance I gained is more likely due to the fact that now the math operators act on plain real arrays, thus the compiler optimizer could be even more flavored.

The performance comparison for Chris has not yet started: my cluster and my workstation are crunching numbers this weekend, it will come next week. However I did a more synthetic" test to evaluate the *defined operators overhead in different circumstances. I compared:

  • user defined operators acting/returning real array defined as automatic array that are likely allocated on stack;
  • user defined operators acting/returning real array defined as allocatable array that are likely allocated on heap;
  • user defined operators acting/returning polymorphic allocatable class that are likely allocated on stack;
  • plain intrinsic arrays operators acting on real arrays for the reference;

My results was dramatic: all user defined operators have at least 50% overhead with respect plain intrinsic operators, with, in general, the polymorphic version the worst followed by the automatic arrays one and with the allocatable array version the better. I would really like to know you opinions. My test results can be found online here and test is this. For the sake of clearness I report the fortran code below. I hope I had make some design mistakes in the test, because the overhead is really not negligible. Are these results expected for you?

User defined operators overhead hunter

! A DEFY (DEmystyfy Fortran mYths) test.
! Author: Stefano Zaghi
! Date: 2017-05-05
!
! License: this file is licensed under the Creative Commons Attribution 4.0 license,
! see http://creativecommons.org/licenses/by/4.0/ .

module arrays
   use, intrinsic :: iso_fortran_env, only : real64

   implicit none

   type :: array_automatic
      integer                   :: n
      real(real64), allocatable :: x(:)
      contains
         procedure, pass(lhs) :: add_automatic
         generic :: operator(+) => add_automatic
         procedure, pass(lhs) :: assign_automatic
         generic :: assignment(=) => assign_automatic
   endtype array_automatic

   type :: array_allocatable
      integer                   :: n
      real(real64), allocatable :: x(:)
      contains
         procedure, pass(lhs) :: add_allocatable
         generic :: operator(+) => add_allocatable
         procedure, pass(lhs) :: assign_allocatable
         generic :: assignment(=) => assign_allocatable
   endtype array_allocatable

   type, abstract :: array_polymorphic_abstract
      contains
         procedure(add_interface), pass(lhs), deferred :: add_polymorphic
         generic :: operator(+) => add_polymorphic
         procedure(assign_interface),      pass(lhs), deferred :: assign_polymorphic
         procedure(assign_real_interface), pass(lhs), deferred :: assign_polymorphic_real
         generic :: assignment(=) => assign_polymorphic, assign_polymorphic_real
   endtype array_polymorphic_abstract

   type, extends(array_polymorphic_abstract) :: array_polymorphic
      integer                   :: n
      real(real64), allocatable :: x(:)
      contains
         procedure, pass(lhs) :: add_polymorphic
         procedure, pass(lhs) :: assign_polymorphic
         procedure, pass(lhs) :: assign_polymorphic_real
   endtype array_polymorphic

   abstract interface
      pure function add_interface(lhs, rhs) result(opr)
      import :: array_polymorphic_abstract
      class(array_polymorphic_abstract), intent(in)  :: lhs
      class(array_polymorphic_abstract), intent(in)  :: rhs
      class(array_polymorphic_abstract), allocatable :: opr
      endfunction add_interface

      pure subroutine assign_interface(lhs, rhs)
      import :: array_polymorphic_abstract
      class(array_polymorphic_abstract), intent(inout) :: lhs
      class(array_polymorphic_abstract), intent(in)    :: rhs
      endsubroutine assign_interface

      pure subroutine assign_real_interface(lhs, rhs)
      import :: array_polymorphic_abstract, real64
      class(array_polymorphic_abstract), intent(inout) :: lhs
      real(real64),                      intent(in)    :: rhs(1:)
      endsubroutine assign_real_interface
   endinterface

   contains
      pure function add_automatic(lhs, rhs) result(opr)
      class(array_automatic), intent(in) :: lhs
      type(array_automatic),  intent(in) :: rhs
      real(real64)                       :: opr(1:lhs%n)

      opr = lhs%x + rhs%x
      endfunction add_automatic

      pure subroutine assign_automatic(lhs, rhs)
      class(array_automatic), intent(inout) :: lhs
      real(real64),           intent(in)    :: rhs(1:)

      lhs%n = size(rhs, dim=1)
      lhs%x = rhs
      endsubroutine assign_automatic

      pure function add_allocatable(lhs, rhs) result(opr)
      class(array_allocatable), intent(in) :: lhs
      type(array_allocatable),  intent(in) :: rhs
      real(real64), allocatable            :: opr(:)

      opr = lhs%x + rhs%x
      endfunction add_allocatable

      pure subroutine assign_allocatable(lhs, rhs)
      class(array_allocatable), intent(inout) :: lhs
      real(real64),             intent(in)    :: rhs(1:)

      lhs%n = size(rhs, dim=1)
      lhs%x = rhs
      endsubroutine assign_allocatable
      pure function add_polymorphic(lhs, rhs) result(opr)
      class(array_polymorphic),          intent(in)  :: lhs
      class(array_polymorphic_abstract), intent(in)  :: rhs
      class(array_polymorphic_abstract), allocatable :: opr

      allocate(array_polymorphic :: opr)
      select type(opr)
      class is(array_polymorphic)
         select type(rhs)
         class is(array_polymorphic)
            opr%x = lhs%x + rhs%x
         endselect
      endselect
      endfunction add_polymorphic

      pure subroutine assign_polymorphic(lhs, rhs)
      class(array_polymorphic),          intent(inout) :: lhs
      class(array_polymorphic_abstract), intent(in)    :: rhs

      select type(rhs)
      class is(array_polymorphic)
         lhs%n = rhs%n
         lhs%x = rhs%x
      endselect
      endsubroutine assign_polymorphic

      pure subroutine assign_polymorphic_real(lhs, rhs)
      class(array_polymorphic), intent(inout) :: lhs
      real(real64),             intent(in)    :: rhs(1:)

      lhs%n = size(rhs, dim=1)
      lhs%x = rhs
      endsubroutine assign_polymorphic_real
endmodule arrays
program defy
   use, intrinsic :: iso_fortran_env, only : int64, real64
   use arrays, only : array_automatic, array_allocatable, array_polymorphic
   implicit none
   real(real64), allocatable :: a_intrinsic(:)
   real(real64), allocatable :: b_intrinsic(:)
   real(real64), allocatable :: c_intrinsic(:)
   type(array_automatic)     :: a_automatic
   type(array_automatic)     :: b_automatic
   type(array_automatic)     :: c_automatic
   type(array_allocatable)   :: a_allocatable
   type(array_allocatable)   :: b_allocatable
   type(array_allocatable)   :: c_allocatable
   type(array_polymorphic)   :: a_polymorphic
   type(array_polymorphic)   :: b_polymorphic
   type(array_polymorphic)   :: c_polymorphic
   integer(int64)            :: tic_toc(1:2)
   integer(int64)            :: count_rate
   real(real64)              :: intrinsic_time
   real(real64)              :: time
   integer                   :: N
   integer                   :: Nn
   integer                   :: i

   N = 100000
   Nn = N/100
   a_intrinsic   = [(real(i, kind=real64), i=1,N)]
   b_intrinsic   = [(real(i, kind=real64), i=1,N)]
   a_automatic   = [(real(i, kind=real64), i=1,N)]
   b_automatic   = [(real(i, kind=real64), i=1,N)]
   a_allocatable = [(real(i, kind=real64), i=1,N)]
   b_allocatable = [(real(i, kind=real64), i=1,N)]
   a_polymorphic = [(real(i, kind=real64), i=1,N)]
   b_polymorphic = [(real(i, kind=real64), i=1,N)]

   call system_clock(tic_toc(1), count_rate)
   do i=1, Nn
     c_intrinsic = a_intrinsic + b_intrinsic
   enddo
   call system_clock(tic_toc(2), count_rate)
   intrinsic_time = (tic_toc(2) - tic_toc(1)) / real(count_rate, kind=real64)
   print*, 'intrinsic: ', intrinsic_time

   call system_clock(tic_toc(1), count_rate)
   do i=1, Nn
     c_automatic = a_automatic + b_automatic
   enddo
   call system_clock(tic_toc(2), count_rate)
   time = (tic_toc(2) - tic_toc(1)) / real(count_rate, kind=real64)
   print*, 'automatic: ', time, ' + %(intrinsic): ', 100._real64 - intrinsic_time / time * 100

   call system_clock(tic_toc(1), count_rate)
   do i=1, Nn
     c_allocatable = a_allocatable + b_allocatable
   enddo
Stefano Zaghi
@szaghi
@rouson , Damian I forgot this...

I just can't keep up with all the text flying by me every day.

This is the price when you are the most experienced and the most kind Fortran programmer available :smile: To limit the spam like mine you can only become less kind, but I hope this never happens!

Chris MacMackin
@cmacmackin
@rouson I find it odd that you feel guard_temp and clean_temp are "old school", because you explicitly mention them in chapter 5 of your (relatively) recent book. The 2011 and 2012 papers you sent @szaghi definitely offer a more elegant approach, but they rely on finalisation. Unfortunately, gfortran still doesn't fully support finalisation and doesn't perform it on function results. I don't see how I can use your automated process without it.
Damian Rouson
@rouson
That's because I consider my book to be old school too! My book was submitted to the publisher in August 2010, which is centuries ago in the Internet era. :D I've learned a lot since then and both the language and compiler have advanced a lot since then. If I recall correctly, the Fortran 2008 standard was published in October 2010 so the official language standard at the time the book was submitted was Fortran 2003. Back then, there was only one Fortran 2003 compliant compiler: IBM. In fact, there was no compiler in existence that could correctly compile the one Fortran 2008 example in the book: the coarray Burgers equation solver in chapter 12 -- not even the Cray compiler and Cray invented coarrays. That was the only code in the book that we could not test before publishing. Fast forward to today and we have four Fortran 2003 compilers: IBM, Cray, Portland Group, and Intel. NAG is extremely close to full 2003 compliance (anything missing is probably minor and I imagine their next release will offer full 2003 compliance). And GNU is only missing one major 2003 feature: parameterized derived types (PDTs, which I expect gfortran developer Paul Richard Thomas will start implementing soon). Moreover, we now have two Fortran 2008 compilers: Cray and the Intel beta release. IBM is only missing one major 2008 feature: coarrays. And GNU is only missing one major 2008 feature: the aforementioned 2003 feature (PDTs). And the landscape is quite rosy even when one jumps forward to the upcoming Fortran 2015 standard. The major new features in Fortran 2015 are summarized in two Technical Specification (TS) documents: TS 29113 Further Interoperability with C and TS18508 Additional Parallel Features. Four compilers already support most or all of TS 29113: IBM, Cray, Intel, and GNU. Two compilers already support parts of TS 18508: Cray and GNU. And it gets even better: GNU is only missing one new Fortran 2015 feature: teams (which I believe I've found a Ph.D. student who is likely to work on adding support for that feature, which will take a multi-year effort). And it gets even better than that: the 2015 standard makes Fortran the first mainstream language to offer support for fault tolerance and last week's GCC 7.1 release supports that very feature: failed-image detection and simulation. Using the latter feature requires using an unreleased branch of OpenCoarrays so I haven't made any big announcements yet, but it's a huge deal for anyone interested in reaching exaflop performance on future platforms. In short, this is a new world! Think about this unrelated but interesting fact: a paper from 2003 was written before the multi-core era and now we're exiting the multi-core era and entering the many-core era with Intel's Knight's Landing chip having roughly 72 cores. The pace of change is mind-blowing. :O
Please send me a list of gfortran bug reports related to finalization and consider whether your organization can make a donation in support of fixing those bugs. We've got to move on from the old days.
Chris MacMackin
@cmacmackin
From a quick search, I have found the following open finalisation bug reports: 37336, 64290, 67471, 67444, 65347, 64290, 59694, 80524, 79311, 71798, 70863, 69298, 68778.
Some of those bugs are duplicates. I'm only a student, so I'm doubtful I'd be able to persuade anyone to make a donation. You never know, though--sometimes there is money left in a grant that's about to expire which they're looking to spend on something.
Truth be told, I'm getting really frustrated with Fortran. If I didn't already have so much effort invested in my Fortran code base, I'd probably switch to another language. There are so many bugs related to object oriented programming in gfortran and ifort, and I'm getting sick of having to work around them. Memory management is a massive pain and not something I want to be thinking about as a programmer. It is also extremely verbose and it takes considerably longer to write code in Fortran than in more concise languages.
Stefano Zaghi
@szaghi

@cmacmackin @rouson ,

Damian, you know how I think high of you, but I disagree (with respect): the world could be changed, but it currently does not. Intel and GNU have so many bugs about OOP that claiming full support of 2003 or even 2008 standard for that compilers is premature. Maybe the world will change the next year, but in 2017 I am really in trouble doing OOP in Fortran.

I really would like to know your new idea about functional programming, but I am skeptical: if defined operators have so big overhead as I shown above, how functional programming be suitable for HPC? In HASTY I tried to do a really useful, but not so complex, thing with CAF and it is stopped by compilers bugs...

Chris,

Truth be told, I'm getting really frustrated with Fortran. If I didn't already have so much effort invested in my Fortran code base, I'd probably switch to another language. There are so many bugs related to object oriented programming in gfortran and ifort, and I'm getting sick of having to work around them. Memory management is a massive pain and not something I want to be thinking about as a programmer.

I am not so young as you, but my feeling is really the same: if I did not invested so hard in Fortran, I had likely used some other language two years ago. Probably, I'll try to invest more in Python: I see more and more HPC courses about "optimizing Python for number-crunching". Python performances are the worst I could imagine, but OOP is really a "new world" in Python.

Cheers

Damian Rouson
@rouson
@cmacmackin and @szaghi, trust me that I feel your pain. At the peak of my frustrations around 2010, I was involved directly or indirectly in submitting 50-60 bug reports annually across six compilers. Part of why I encounter bugs less often now is that I lasted through that process, got reasonably speedy responses from some compiler teams, dropped the compilers from vendors that were insufficiently responsive, and went to great lengths to become crafty about funding compiler development. None of those things were straightforward or easy, but I saw them as necessary because Fortran has important features that no other language has and I care most about writing clean code. So much of what I saw in other languages seemed like a crime against humanity. The interpreted languages such as Python are factors of 2-3 slower at best and the compiled languages such as C and C++ lack even basic array manipulation facilities. And no language other than Fortran has a parallel programming model that works in distributed memory. And no other language has support for fault tolerance. To get distributed-memory parallelism and fault tolerance, you could go with MPI, but the MPI being written by almost every scientific programmer I've met will be slower, more complex, and less fault-tolerant than what a Fortran programmer can write with coarray Fortran. I hope you'll think more about how to contribute to gfortran, whether as a developer (almost all the developers are domain scientists -- few are computer scientists and none have any training in compiler development as far as I know) or through organizational funds when you reach a stage when that becomes an option via grants or contracts. GFortran has been developed primarily by volunteers and some gfortran developers would rather not accept pay because they prefer the freedom of being a volunteer, but some do accept pay and it makes a difference in getting bugs fixed in a timely manner. And it takes creativity. None of the projects I've used to pay developers had a line item in the budget that read, "Fix gfortran bugs." I had to figure out how to make it happen in support of objectives that did have a line in the budget.
Damian Rouson
@rouson
@szaghi, I don't have any great new idea about functional programming in Fortran so you'll be disappointed. I have a set of strategies that were inspired by functional programming and that I frequently employ to make the intention of the code more clear and potentially more optimizable. One is the defined operators and your latest news is discouraging with regard to the performance (recall that I worried that Abstract Calculus might be an anti-pattern for just this reason but you previously reported that Abstract Calculus did not hurt performance based on your experience with FOODIE so I wonder what changed). But I always knew there could be performance penalties associated with user-defined operators and I'm pretty sure I talk about some of those in my book (e.g., related to cache utilization and the ability of modern processors to perform a multiply and add in one clock cycle). Another idea inspired by functional programming relates to the ASSOCIATE statement. I don't think I want to go into detail in this forum just because the back-and-forth takes too much time, but I'd be glad to explain it in a call and it will be in my book. Another thing I'll cover will be the use of the functional-fortran library, of which you are aware. For now, that's it. There's no grand idea here. And then there is the use of PURE. As we all know, Fortran is not a functional programming language, but there are several ways in which Fortran programming can be influenced by functional programming concepts and that's what I mean when I talk about functional programming in Fortran.
Damian Rouson
@rouson
My new book will have two new co-authors: Salvatore Filippone and Sameer Shende. Salvatore has more than 25 years of deep experience in parallel programming and Sameer has more than 15 years of experience in parallel performance analysis. The goal is to have almost every code in the book parallel and almost every code back by performance analysis. The last thing I'll say -- and then I've got to move on to some other things for a while -- is be careful trading one set of problems for another. For many reasons, you are likely to find more robust compilers for other languages, but you'll trade the compiler bugs for another set of problems in the form of low performance or ease with which you can shoot yourself in the foot or learning curve (it takes years to be a truly competent C++ programmer, for example, whereas the students in my classes become quite competent and even at the leading edge of Fortran programming in the span of one academic quarter. That's a really powerful statement.
Stefano Zaghi
@szaghi

@rouson ,

Dear Damian, as always you are too much kind!

trust me that I feel your pain.

I know, but this does not alleviate to much the pain :smile:

I lasted through that process, got reasonably speedy responses from some compiler teams, dropped the compilers from vendors that were insufficiently responsive, and went to great lengths to become crafty about funding compiler development.

I'll try to follow your path, but in my reality searching for gfortran funding is a dream more than a challenge. In these day I'am evangelizing your idea and trying to make conscious my colleagues who are using gfortran for their research that it should be ethically and practically important to contribute to the GNU project with part of the research funding... but in Italy we do research with almost null fund.

Fortran has important features that no other language has and I care most about writing clean code. So much of what I saw in other languages seemed like a crime against humanity. The interpreted languages such as Python are factors of 2-3 slower at best and the compiled languages such as C and C++ lack even basic array manipulation facilities. And no language other than Fortran has a parallel programming model that works in distributed memory. And no other language has support for fault tolerance. To get distributed-memory parallelism and fault tolerance, you could go with MPI, but the MPI being written by almost every scientific programmer I've met will be slower, more complex, and less fault-tolerant than what a Fortran programmer can write with coarray Fortran.

I agree, this is why I selected Fortran, but currently this is all true if I do not use OOP, when OOP come in to play, all the pain highlighted by Chris arises. At the end, for the reasons you summarized and for the efforts I have already invested I'll never stop to use Fortran.

I hope you'll think more about how to contribute to gfortran, whether as a developer (almost all the developers are domain scientists -- few are computer scientists and none have any training in compiler development as far as I know) or through organizational funds...

If finding funds is a dream for me, the possibility that I can contribute to the development to gfortran is even more difficult: I am not up to the task. I know very little about C, but the big issue is that writing a compiler is an art and I am not an artist, just an oompa loompa.

I don't have any great new idea about functional programming in Fortran so you'll be disappointed. I have a set of strategies that were inspired by functional programming and that I frequently employ to make the intention of the code more clear and potentially more optimizable. One is the defined operators and your latest news is discouraging with regard to the performance (recall that I worried that Abstract Calculus might be an anti-pattern for just this reason but you previously reported that Abstract Calculus did not hurt performance based on your experience with FOODIE so I wonder what changed).

Sure, I remember your surprise, but that benchmark was really different from the one of yesterday. In FOODIE I compared Abstract Calculus with polymorphic allocatable functions (in which the ODE solver changes at runtime as well as all the operators results) with an identical test, but without abstract polymorphic operators and without changes of solvers at runtime. However, both version uses defined operators: the ACP has polymorphic allocatable (impure) operators, the other has static (pure) operators returning a type. The performances were identical between ACP and non abstract one, but this is in line with also the test I mad yesterday. What is really different is the comparison between defined operators vs intrinsic operators. For these reasons yesterday I updated our paper (soon a draft will sent to you) and I am planning to add a "performance mode* to FOODIE to allow users to select an operational mode:

  • for rapid ODE solvers development she can safely select normal mode;
  • for using FOODIE in production mode (heavy number crunching) she should select performance mode.
This new performance mode put on my shoulders (and on the developers of future ODE solvers) the burden to write also the %integrate_performance version of each solver, but it should be very easy.
Stefano Zaghi
@szaghi

For many reasons, you are likely to find more robust compilers for other languages, but you'll trade the compiler bugs for another set of problems in the form of low performance or ease with which you can shoot yourself in the foot or learning curve (it takes years to be a truly competent C++ programmer, for example, whereas the students in my classes become quite competent and even at the leading edge of Fortran programming in the span of one academic quarter. That's a really powerful statement.

I agree, this is why I select Fortran. When I start to play with CAF it takes few days to let me productive, while I am still not able to be really efficient (namely really asynchronous) with MPI after years. Fortran is still the most suitable choice for my math, but there is a lot of pain if we want to exploit OOP.

I think I'll book you soon for a talk, please speak slow :smile: (tomorrow I'll know Alessandro: I am really excited to see his exascale work)

Cheers

P.S. I am very happy read about Filippone will be your co-author. Your new book promises at lot!

Stefano Zaghi
@szaghi

@rouson @cmacmackin ,

I played with operators vs non operators mode in FOODIE... it seems confirmed the overhead of defined operators, see this

stefano@thor(11:50 AM Sun May 07) on feature/add-performance-mode [!]
~/fortran/FOODIE 21 files, 2.5Mb
→ time ./build/tests/accuracy/oscillation/oscillation -s adams_bashforth_4 -Dt 0.05 --fast
adams_bashforth_4
    steps:   20000000    Dt:      0.050, f*Dt:      0.000, E(x):  0.464E-09, E(y):  0.469E-09

real    0m5.214s
user    0m4.996s
sys    0m0.216s

stefano@thor(11:51 AM Sun May 07) on feature/add-performance-mode [!]
~/fortran/FOODIE 21 files, 2.5Mb
→ time ./build/tests/accuracy/oscillation/oscillation -s adams_bashforth_4 -Dt 0.05
adams_bashforth_4
    steps:   20000000    Dt:      0.050, f*Dt:      0.000, E(x):  0.464E-09, E(y):  0.469E-09

real    0m10.535s
user    0m10.320s
sys    0m0.216s

I added the fast mode to only Adams Bashforth solver for now, but I 'll add similar mode for all solver tomorrow, it is really simple and to the end user the change is almost seamless.

See you soon, happy "domenica" :smile:

Damian Rouson
@rouson
@cmacmackin and @szaghi Do you monitor the gfortrtan mailing list? If so, you might have seen that one finalization bug was just fixed: 79311. It's an ICE so it presumably doesn't help with the memory leaks you're seeing, but it's at least one decrement to the finalization bug count. That's progress. I'll inquire with the developer about plans for the remaining bugs on the list.
Stefano Zaghi
@szaghi
:tada: : a small progress for gfortran, but a big progress for poor Fortran men like Chris and me :smile:
Milan Curcic
@milancurcic
@szaghi I suggest we stop using words "poor" and "Fortran" in the same sentence, it only perpetuates the false stigma that this language carries.
Stefano Zaghi
@szaghi
@milancurcic Hi Milan, sorry for my bad humor, I promise I'll be more careful in the future, with hope without stigma :smile: