Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Mar 26 17:48

    szaghi on master

    update submodules (compare)

  • Mar 25 13:39

    szaghi on master

    update submodules (compare)

  • Nov 14 2019 20:49
    letmaik opened #39
  • Oct 25 2019 09:35

    szaghi on master

    update submodules update travis config (compare)

  • Oct 25 2019 09:30

    szaghi on master

    update submodules (compare)

  • Oct 25 2019 09:19

    szaghi on master

    update submodules update travis config (compare)

  • Oct 21 2019 06:34
    rakowsk commented #7
  • Oct 20 2019 16:09
    unfurl-links[bot] commented #7
  • Oct 20 2019 16:09
    rakowsk commented #7
  • Oct 12 2019 17:49
    ShatrovOA commented #38
  • Oct 11 2019 15:25
    szaghi labeled #38
  • Oct 11 2019 15:25
    szaghi assigned #38
  • Oct 11 2019 15:25
    szaghi commented #38
  • Oct 11 2019 13:52
    ShatrovOA edited #38
  • Oct 11 2019 13:44
    ShatrovOA opened #38
  • Sep 19 2019 11:19
    szaghi commented #7
  • Sep 19 2019 11:08

    szaghi on master

    Fix parsing bug issue#7 The me… update travis config Merge branch 'release/0.1.0' (compare)

  • Sep 19 2019 11:06

    szaghi on fix-parsing-bug-issue#7

    (compare)

  • Sep 19 2019 07:54

    szaghi on fix-parsing-bug-issue#7

    (compare)

  • Sep 19 2019 07:52
    szaghi commented #7
Stefano Zaghi
@szaghi

@cmacmackin Chris, I have just run a more complex test with this

╼ stefano@zaghi(02:32 PM Thu May 04) on feature/add-riemann-2D-tests [!?] desk {gcc-7.1.0 - gcc 7.1.0 environment}
├───╼ ~/fortran/FORESEER 15 files, 840Kb
└──────╼ gfortran --version
GNU Fortran (GCC) 7.1.0
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It seems to work exactly as in gcc 6.3

the test is in FORESEER, this one and uses a lot of OOP
Chris MacMackin
@cmacmackin
Okay, something must have gone wrong with how I compiled it. If it doesn't solve the memory leaks then I won't bother pursuing it any further.
Stefano Zaghi
@szaghi
Let me check the memory leaks issues with the dedicated tests, few minutes again :smile:
@cmacmackin Chris, we are not very fortunate... the leaks seems to be still there
╼ stefano@zaghi(02:43 PM Thu May 04) on master desk {gcc-7.1.0 - gcc 7.1.0 environment}
├───╼ ~/fortran/leaks_hunter 3 files, 88Kb
└──────╼ scripts/compile.sh src/leaks_raiser_static_intrinsic.f90 

┌╼ stefano@zaghi(02:43 PM Thu May 04) on master [?] desk {gcc-7.1.0 - gcc 7.1.0 environment}
├───╼ ~/fortran/leaks_hunter 4 files, 100Kb
└──────╼ scripts/run_valgrind.sh 
==59798== Memcheck, a memory error detector
==59798== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==59798== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
...
==59798== HEAP SUMMARY:
==59798==     in use at exit: 4 bytes in 1 blocks
==59798==   total heap usage: 20 allocs, 19 frees, 12,012 bytes allocated
==59798==
==59798== Searching for pointers to 1 not-freed blocks
==59798== Checked 101,856 bytes
==59798==
==59798== 4 bytes in 1 blocks are definitely lost in loss record 1 of 1
==59798==    at 0x4C2AF1F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==59798==    by 0x40075C: __static_intrinsic_type_m_MOD_add_static_intrinsic_type (leaks_raiser_static_intrinsic.f90:24)
==59798==    by 0x40084D: MAIN__ (leaks_raiser_static_intrinsic.f90:37)
==59798==    by 0x40089F: main (leaks_raiser_static_intrinsic.f90:30)
==59798==
==59798== LEAK SUMMARY:
==59798==    definitely lost: 4 bytes in 1 blocks
==59798==    indirectly lost: 0 bytes in 0 blocks
==59798==      possibly lost: 0 bytes in 0 blocks
==59798==    still reachable: 0 bytes in 0 blocks
==59798==         suppressed: 0 bytes in 0 blocks
==59798==
==59798== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
==59798== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)
Chris MacMackin
@cmacmackin
Was it only 4 bytes lost before? I'd almost worry that was just some issue with initialisation or something.
Stefano Zaghi
@szaghi
@cmacmackin Chris, this is a synthetic test designed to raise GNU memory leaks, you can check it on leaks_hunter
The test is very simple, it must return 0 bytes lost
In few hours I should be able to compare performances of polymorphic operators and real ones
Damian Rouson
@rouson
I haven't followed this discussion in detail. As you guys know, I'll respond a lot more in calls than text of any form. I just can't keep up with all the text flying by me every day. Maybe it's a sign of my age. What caught my eye was the mention of @cmacmackin mentioning guard and clean_tmp. Whoa... that's old-school. Presumably you picked this up from the 2003 paper by GW Stewart in ACM Fortran Forum -- not sure if Markdown syntax works here. Say it isn't so! If you're doing such things under the hood and are really confident that you have a scheme to get it right and that users of your code will never need it, then it's ok as a last resort. Otherwise, whatever led you down this path has to get fixed in the compiler or it's sure to come back to haunt users down the road. I did similar things in a paper from 2006, but in a limited setting (expression evaluation) with a clearly articulated (published) strategy that I'd like to think covered all the cases we cared about. We at least made an attempt to automate such matters in the papers that I've sent @szaghi so I'd recommend that route if it's applicable over guard and clean_temp. I really hope the compiler situation hasn't set us back 14 years. That would be a travesty.
Damian Rouson
@rouson
Also, I don't think the way to think about PURE is in terms of whether the attribute in and of itself speeds up code. That's pretty unlikely. I think of PURE and DO CONCURRENT in terms of the discipline they impart on the programmer to do things that can aid in optimization. If you write functions that conform to the restrictions that PURE requires but don't mark them PURE, a sufficiently smart compiler can do all the same related optimizations anyway. In fact, the gfortran compiler tries to detect whether your procedure could have been marked PURE even if it wasn't and the compiler marks such procedures as implicitly pure internally. Likewise, if you do the things that DO CONCURRENT requires but write a regular DO loop, a sufficiently smart compiler will be able to optimize the code in the ways that DO CONCURRENT affords (and you will also find it easier to use other speedup strategies such as multithreading with OpenMP). Then the question becomes the reverse: if I violate the requirements of PURE and DO CONCURRENT, what compiler optimizations am I preventing. Framing it this way also shows how difficult the question is to answer because one then has to ask, well... how badly am I going to violate the restrictions. The skies the limit and one can slow code down quite considerably that way if one wants to do so. For example, PURE doesn't allow for I/O. You can slow a code down as much as you want in direct proportion with the size of the file you read or write. It's kind of like when someone goes to the doctor and says, "It hurts when I do this." and the doctor responds simply, "Then don't do that."
Stefano Zaghi
@szaghi

@rouson , Damian,
thank you for your reply.

I haven't followed this discussion in detail. As you guys know, I'll respond a lot more in calls than text of any form. I just can't keep up with all the text flying by me every day. Maybe it's a sign of my age.

:smile: On the contrary, my bad spoken English prevent me to call you almost all the time...

What caught my eye was the mention of @cmacmackin mentioning guard and clean_tmp.

Good to know, I'll use that word when I really need to catch your attention :smile:

I don't think the way to think about PURE is in terms of whether the attribute in and of itself speeds up code ... if I violate the requirements of PURE and DO CONCURRENT, what compiler optimizations am I preventing.

This is exactly my point: what I would like to say Chris is that the polymorphic allocatable version violate the pure condition thus it is likely preventing optimizer, whereas the non polymorphic operators version is pure (in its contents and with the explicit attribute) thus it is likely more easy to be optimized. I was not concerned about the declaration rather about the actual contents. Moreover, I specified to Chris that performance I gained is more likely due to the fact that now the math operators act on plain real arrays, thus the compiler optimizer could be even more flavored.

The performance comparison for Chris has not yet started: my cluster and my workstation are crunching numbers this weekend, it will come next week. However I did a more synthetic" test to evaluate the *defined operators overhead in different circumstances. I compared:

  • user defined operators acting/returning real array defined as automatic array that are likely allocated on stack;
  • user defined operators acting/returning real array defined as allocatable array that are likely allocated on heap;
  • user defined operators acting/returning polymorphic allocatable class that are likely allocated on stack;
  • plain intrinsic arrays operators acting on real arrays for the reference;

My results was dramatic: all user defined operators have at least 50% overhead with respect plain intrinsic operators, with, in general, the polymorphic version the worst followed by the automatic arrays one and with the allocatable array version the better. I would really like to know you opinions. My test results can be found online here and test is this. For the sake of clearness I report the fortran code below. I hope I had make some design mistakes in the test, because the overhead is really not negligible. Are these results expected for you?

User defined operators overhead hunter

! A DEFY (DEmystyfy Fortran mYths) test.
! Author: Stefano Zaghi
! Date: 2017-05-05
!
! License: this file is licensed under the Creative Commons Attribution 4.0 license,
! see http://creativecommons.org/licenses/by/4.0/ .

module arrays
   use, intrinsic :: iso_fortran_env, only : real64

   implicit none

   type :: array_automatic
      integer                   :: n
      real(real64), allocatable :: x(:)
      contains
         procedure, pass(lhs) :: add_automatic
         generic :: operator(+) => add_automatic
         procedure, pass(lhs) :: assign_automatic
         generic :: assignment(=) => assign_automatic
   endtype array_automatic

   type :: array_allocatable
      integer                   :: n
      real(real64), allocatable :: x(:)
      contains
         procedure, pass(lhs) :: add_allocatable
         generic :: operator(+) => add_allocatable
         procedure, pass(lhs) :: assign_allocatable
         generic :: assignment(=) => assign_allocatable
   endtype array_allocatable

   type, abstract :: array_polymorphic_abstract
      contains
         procedure(add_interface), pass(lhs), deferred :: add_polymorphic
         generic :: operator(+) => add_polymorphic
         procedure(assign_interface),      pass(lhs), deferred :: assign_polymorphic
         procedure(assign_real_interface), pass(lhs), deferred :: assign_polymorphic_real
         generic :: assignment(=) => assign_polymorphic, assign_polymorphic_real
   endtype array_polymorphic_abstract

   type, extends(array_polymorphic_abstract) :: array_polymorphic
      integer                   :: n
      real(real64), allocatable :: x(:)
      contains
         procedure, pass(lhs) :: add_polymorphic
         procedure, pass(lhs) :: assign_polymorphic
         procedure, pass(lhs) :: assign_polymorphic_real
   endtype array_polymorphic

   abstract interface
      pure function add_interface(lhs, rhs) result(opr)
      import :: array_polymorphic_abstract
      class(array_polymorphic_abstract), intent(in)  :: lhs
      class(array_polymorphic_abstract), intent(in)  :: rhs
      class(array_polymorphic_abstract), allocatable :: opr
      endfunction add_interface

      pure subroutine assign_interface(lhs, rhs)
      import :: array_polymorphic_abstract
      class(array_polymorphic_abstract), intent(inout) :: lhs
      class(array_polymorphic_abstract), intent(in)    :: rhs
      endsubroutine assign_interface

      pure subroutine assign_real_interface(lhs, rhs)
      import :: array_polymorphic_abstract, real64
      class(array_polymorphic_abstract), intent(inout) :: lhs
      real(real64),                      intent(in)    :: rhs(1:)
      endsubroutine assign_real_interface
   endinterface

   contains
      pure function add_automatic(lhs, rhs) result(opr)
      class(array_automatic), intent(in) :: lhs
      type(array_automatic),  intent(in) :: rhs
      real(real64)                       :: opr(1:lhs%n)

      opr = lhs%x + rhs%x
      endfunction add_automatic

      pure subroutine assign_automatic(lhs, rhs)
      class(array_automatic), intent(inout) :: lhs
      real(real64),           intent(in)    :: rhs(1:)

      lhs%n = size(rhs, dim=1)
      lhs%x = rhs
      endsubroutine assign_automatic

      pure function add_allocatable(lhs, rhs) result(opr)
      class(array_allocatable), intent(in) :: lhs
      type(array_allocatable),  intent(in) :: rhs
      real(real64), allocatable            :: opr(:)

      opr = lhs%x + rhs%x
      endfunction add_allocatable

      pure subroutine assign_allocatable(lhs, rhs)
      class(array_allocatable), intent(inout) :: lhs
      real(real64),             intent(in)    :: rhs(1:)

      lhs%n = size(rhs, dim=1)
      lhs%x = rhs
      endsubroutine assign_allocatable
      pure function add_polymorphic(lhs, rhs) result(opr)
      class(array_polymorphic),          intent(in)  :: lhs
      class(array_polymorphic_abstract), intent(in)  :: rhs
      class(array_polymorphic_abstract), allocatable :: opr

      allocate(array_polymorphic :: opr)
      select type(opr)
      class is(array_polymorphic)
         select type(rhs)
         class is(array_polymorphic)
            opr%x = lhs%x + rhs%x
         endselect
      endselect
      endfunction add_polymorphic

      pure subroutine assign_polymorphic(lhs, rhs)
      class(array_polymorphic),          intent(inout) :: lhs
      class(array_polymorphic_abstract), intent(in)    :: rhs

      select type(rhs)
      class is(array_polymorphic)
         lhs%n = rhs%n
         lhs%x = rhs%x
      endselect
      endsubroutine assign_polymorphic

      pure subroutine assign_polymorphic_real(lhs, rhs)
      class(array_polymorphic), intent(inout) :: lhs
      real(real64),             intent(in)    :: rhs(1:)

      lhs%n = size(rhs, dim=1)
      lhs%x = rhs
      endsubroutine assign_polymorphic_real
endmodule arrays
program defy
   use, intrinsic :: iso_fortran_env, only : int64, real64
   use arrays, only : array_automatic, array_allocatable, array_polymorphic
   implicit none
   real(real64), allocatable :: a_intrinsic(:)
   real(real64), allocatable :: b_intrinsic(:)
   real(real64), allocatable :: c_intrinsic(:)
   type(array_automatic)     :: a_automatic
   type(array_automatic)     :: b_automatic
   type(array_automatic)     :: c_automatic
   type(array_allocatable)   :: a_allocatable
   type(array_allocatable)   :: b_allocatable
   type(array_allocatable)   :: c_allocatable
   type(array_polymorphic)   :: a_polymorphic
   type(array_polymorphic)   :: b_polymorphic
   type(array_polymorphic)   :: c_polymorphic
   integer(int64)            :: tic_toc(1:2)
   integer(int64)            :: count_rate
   real(real64)              :: intrinsic_time
   real(real64)              :: time
   integer                   :: N
   integer                   :: Nn
   integer                   :: i

   N = 100000
   Nn = N/100
   a_intrinsic   = [(real(i, kind=real64), i=1,N)]
   b_intrinsic   = [(real(i, kind=real64), i=1,N)]
   a_automatic   = [(real(i, kind=real64), i=1,N)]
   b_automatic   = [(real(i, kind=real64), i=1,N)]
   a_allocatable = [(real(i, kind=real64), i=1,N)]
   b_allocatable = [(real(i, kind=real64), i=1,N)]
   a_polymorphic = [(real(i, kind=real64), i=1,N)]
   b_polymorphic = [(real(i, kind=real64), i=1,N)]

   call system_clock(tic_toc(1), count_rate)
   do i=1, Nn
     c_intrinsic = a_intrinsic + b_intrinsic
   enddo
   call system_clock(tic_toc(2), count_rate)
   intrinsic_time = (tic_toc(2) - tic_toc(1)) / real(count_rate, kind=real64)
   print*, 'intrinsic: ', intrinsic_time

   call system_clock(tic_toc(1), count_rate)
   do i=1, Nn
     c_automatic = a_automatic + b_automatic
   enddo
   call system_clock(tic_toc(2), count_rate)
   time = (tic_toc(2) - tic_toc(1)) / real(count_rate, kind=real64)
   print*, 'automatic: ', time, ' + %(intrinsic): ', 100._real64 - intrinsic_time / time * 100

   call system_clock(tic_toc(1), count_rate)
   do i=1, Nn
     c_allocatable = a_allocatable + b_allocatable
   enddo
Stefano Zaghi
@szaghi
@rouson , Damian I forgot this...

I just can't keep up with all the text flying by me every day.

This is the price when you are the most experienced and the most kind Fortran programmer available :smile: To limit the spam like mine you can only become less kind, but I hope this never happens!

Chris MacMackin
@cmacmackin
@rouson I find it odd that you feel guard_temp and clean_temp are "old school", because you explicitly mention them in chapter 5 of your (relatively) recent book. The 2011 and 2012 papers you sent @szaghi definitely offer a more elegant approach, but they rely on finalisation. Unfortunately, gfortran still doesn't fully support finalisation and doesn't perform it on function results. I don't see how I can use your automated process without it.
Damian Rouson
@rouson
That's because I consider my book to be old school too! My book was submitted to the publisher in August 2010, which is centuries ago in the Internet era. :D I've learned a lot since then and both the language and compiler have advanced a lot since then. If I recall correctly, the Fortran 2008 standard was published in October 2010 so the official language standard at the time the book was submitted was Fortran 2003. Back then, there was only one Fortran 2003 compliant compiler: IBM. In fact, there was no compiler in existence that could correctly compile the one Fortran 2008 example in the book: the coarray Burgers equation solver in chapter 12 -- not even the Cray compiler and Cray invented coarrays. That was the only code in the book that we could not test before publishing. Fast forward to today and we have four Fortran 2003 compilers: IBM, Cray, Portland Group, and Intel. NAG is extremely close to full 2003 compliance (anything missing is probably minor and I imagine their next release will offer full 2003 compliance). And GNU is only missing one major 2003 feature: parameterized derived types (PDTs, which I expect gfortran developer Paul Richard Thomas will start implementing soon). Moreover, we now have two Fortran 2008 compilers: Cray and the Intel beta release. IBM is only missing one major 2008 feature: coarrays. And GNU is only missing one major 2008 feature: the aforementioned 2003 feature (PDTs). And the landscape is quite rosy even when one jumps forward to the upcoming Fortran 2015 standard. The major new features in Fortran 2015 are summarized in two Technical Specification (TS) documents: TS 29113 Further Interoperability with C and TS18508 Additional Parallel Features. Four compilers already support most or all of TS 29113: IBM, Cray, Intel, and GNU. Two compilers already support parts of TS 18508: Cray and GNU. And it gets even better: GNU is only missing one new Fortran 2015 feature: teams (which I believe I've found a Ph.D. student who is likely to work on adding support for that feature, which will take a multi-year effort). And it gets even better than that: the 2015 standard makes Fortran the first mainstream language to offer support for fault tolerance and last week's GCC 7.1 release supports that very feature: failed-image detection and simulation. Using the latter feature requires using an unreleased branch of OpenCoarrays so I haven't made any big announcements yet, but it's a huge deal for anyone interested in reaching exaflop performance on future platforms. In short, this is a new world! Think about this unrelated but interesting fact: a paper from 2003 was written before the multi-core era and now we're exiting the multi-core era and entering the many-core era with Intel's Knight's Landing chip having roughly 72 cores. The pace of change is mind-blowing. :O
Please send me a list of gfortran bug reports related to finalization and consider whether your organization can make a donation in support of fixing those bugs. We've got to move on from the old days.
Chris MacMackin
@cmacmackin
From a quick search, I have found the following open finalisation bug reports: 37336, 64290, 67471, 67444, 65347, 64290, 59694, 80524, 79311, 71798, 70863, 69298, 68778.
Some of those bugs are duplicates. I'm only a student, so I'm doubtful I'd be able to persuade anyone to make a donation. You never know, though--sometimes there is money left in a grant that's about to expire which they're looking to spend on something.
Truth be told, I'm getting really frustrated with Fortran. If I didn't already have so much effort invested in my Fortran code base, I'd probably switch to another language. There are so many bugs related to object oriented programming in gfortran and ifort, and I'm getting sick of having to work around them. Memory management is a massive pain and not something I want to be thinking about as a programmer. It is also extremely verbose and it takes considerably longer to write code in Fortran than in more concise languages.
Stefano Zaghi
@szaghi

@cmacmackin @rouson ,

Damian, you know how I think high of you, but I disagree (with respect): the world could be changed, but it currently does not. Intel and GNU have so many bugs about OOP that claiming full support of 2003 or even 2008 standard for that compilers is premature. Maybe the world will change the next year, but in 2017 I am really in trouble doing OOP in Fortran.

I really would like to know your new idea about functional programming, but I am skeptical: if defined operators have so big overhead as I shown above, how functional programming be suitable for HPC? In HASTY I tried to do a really useful, but not so complex, thing with CAF and it is stopped by compilers bugs...

Chris,

Truth be told, I'm getting really frustrated with Fortran. If I didn't already have so much effort invested in my Fortran code base, I'd probably switch to another language. There are so many bugs related to object oriented programming in gfortran and ifort, and I'm getting sick of having to work around them. Memory management is a massive pain and not something I want to be thinking about as a programmer.

I am not so young as you, but my feeling is really the same: if I did not invested so hard in Fortran, I had likely used some other language two years ago. Probably, I'll try to invest more in Python: I see more and more HPC courses about "optimizing Python for number-crunching". Python performances are the worst I could imagine, but OOP is really a "new world" in Python.

Cheers

Damian Rouson
@rouson
@cmacmackin and @szaghi, trust me that I feel your pain. At the peak of my frustrations around 2010, I was involved directly or indirectly in submitting 50-60 bug reports annually across six compilers. Part of why I encounter bugs less often now is that I lasted through that process, got reasonably speedy responses from some compiler teams, dropped the compilers from vendors that were insufficiently responsive, and went to great lengths to become crafty about funding compiler development. None of those things were straightforward or easy, but I saw them as necessary because Fortran has important features that no other language has and I care most about writing clean code. So much of what I saw in other languages seemed like a crime against humanity. The interpreted languages such as Python are factors of 2-3 slower at best and the compiled languages such as C and C++ lack even basic array manipulation facilities. And no language other than Fortran has a parallel programming model that works in distributed memory. And no other language has support for fault tolerance. To get distributed-memory parallelism and fault tolerance, you could go with MPI, but the MPI being written by almost every scientific programmer I've met will be slower, more complex, and less fault-tolerant than what a Fortran programmer can write with coarray Fortran. I hope you'll think more about how to contribute to gfortran, whether as a developer (almost all the developers are domain scientists -- few are computer scientists and none have any training in compiler development as far as I know) or through organizational funds when you reach a stage when that becomes an option via grants or contracts. GFortran has been developed primarily by volunteers and some gfortran developers would rather not accept pay because they prefer the freedom of being a volunteer, but some do accept pay and it makes a difference in getting bugs fixed in a timely manner. And it takes creativity. None of the projects I've used to pay developers had a line item in the budget that read, "Fix gfortran bugs." I had to figure out how to make it happen in support of objectives that did have a line in the budget.
Damian Rouson
@rouson
@szaghi, I don't have any great new idea about functional programming in Fortran so you'll be disappointed. I have a set of strategies that were inspired by functional programming and that I frequently employ to make the intention of the code more clear and potentially more optimizable. One is the defined operators and your latest news is discouraging with regard to the performance (recall that I worried that Abstract Calculus might be an anti-pattern for just this reason but you previously reported that Abstract Calculus did not hurt performance based on your experience with FOODIE so I wonder what changed). But I always knew there could be performance penalties associated with user-defined operators and I'm pretty sure I talk about some of those in my book (e.g., related to cache utilization and the ability of modern processors to perform a multiply and add in one clock cycle). Another idea inspired by functional programming relates to the ASSOCIATE statement. I don't think I want to go into detail in this forum just because the back-and-forth takes too much time, but I'd be glad to explain it in a call and it will be in my book. Another thing I'll cover will be the use of the functional-fortran library, of which you are aware. For now, that's it. There's no grand idea here. And then there is the use of PURE. As we all know, Fortran is not a functional programming language, but there are several ways in which Fortran programming can be influenced by functional programming concepts and that's what I mean when I talk about functional programming in Fortran.
Damian Rouson
@rouson
My new book will have two new co-authors: Salvatore Filippone and Sameer Shende. Salvatore has more than 25 years of deep experience in parallel programming and Sameer has more than 15 years of experience in parallel performance analysis. The goal is to have almost every code in the book parallel and almost every code back by performance analysis. The last thing I'll say -- and then I've got to move on to some other things for a while -- is be careful trading one set of problems for another. For many reasons, you are likely to find more robust compilers for other languages, but you'll trade the compiler bugs for another set of problems in the form of low performance or ease with which you can shoot yourself in the foot or learning curve (it takes years to be a truly competent C++ programmer, for example, whereas the students in my classes become quite competent and even at the leading edge of Fortran programming in the span of one academic quarter. That's a really powerful statement.
Stefano Zaghi
@szaghi

@rouson ,

Dear Damian, as always you are too much kind!

trust me that I feel your pain.

I know, but this does not alleviate to much the pain :smile:

I lasted through that process, got reasonably speedy responses from some compiler teams, dropped the compilers from vendors that were insufficiently responsive, and went to great lengths to become crafty about funding compiler development.

I'll try to follow your path, but in my reality searching for gfortran funding is a dream more than a challenge. In these day I'am evangelizing your idea and trying to make conscious my colleagues who are using gfortran for their research that it should be ethically and practically important to contribute to the GNU project with part of the research funding... but in Italy we do research with almost null fund.

Fortran has important features that no other language has and I care most about writing clean code. So much of what I saw in other languages seemed like a crime against humanity. The interpreted languages such as Python are factors of 2-3 slower at best and the compiled languages such as C and C++ lack even basic array manipulation facilities. And no language other than Fortran has a parallel programming model that works in distributed memory. And no other language has support for fault tolerance. To get distributed-memory parallelism and fault tolerance, you could go with MPI, but the MPI being written by almost every scientific programmer I've met will be slower, more complex, and less fault-tolerant than what a Fortran programmer can write with coarray Fortran.

I agree, this is why I selected Fortran, but currently this is all true if I do not use OOP, when OOP come in to play, all the pain highlighted by Chris arises. At the end, for the reasons you summarized and for the efforts I have already invested I'll never stop to use Fortran.

I hope you'll think more about how to contribute to gfortran, whether as a developer (almost all the developers are domain scientists -- few are computer scientists and none have any training in compiler development as far as I know) or through organizational funds...

If finding funds is a dream for me, the possibility that I can contribute to the development to gfortran is even more difficult: I am not up to the task. I know very little about C, but the big issue is that writing a compiler is an art and I am not an artist, just an oompa loompa.

I don't have any great new idea about functional programming in Fortran so you'll be disappointed. I have a set of strategies that were inspired by functional programming and that I frequently employ to make the intention of the code more clear and potentially more optimizable. One is the defined operators and your latest news is discouraging with regard to the performance (recall that I worried that Abstract Calculus might be an anti-pattern for just this reason but you previously reported that Abstract Calculus did not hurt performance based on your experience with FOODIE so I wonder what changed).

Sure, I remember your surprise, but that benchmark was really different from the one of yesterday. In FOODIE I compared Abstract Calculus with polymorphic allocatable functions (in which the ODE solver changes at runtime as well as all the operators results) with an identical test, but without abstract polymorphic operators and without changes of solvers at runtime. However, both version uses defined operators: the ACP has polymorphic allocatable (impure) operators, the other has static (pure) operators returning a type. The performances were identical between ACP and non abstract one, but this is in line with also the test I mad yesterday. What is really different is the comparison between defined operators vs intrinsic operators. For these reasons yesterday I updated our paper (soon a draft will sent to you) and I am planning to add a "performance mode* to FOODIE to allow users to select an operational mode:

  • for rapid ODE solvers development she can safely select normal mode;
  • for using FOODIE in production mode (heavy number crunching) she should select performance mode.
This new performance mode put on my shoulders (and on the developers of future ODE solvers) the burden to write also the %integrate_performance version of each solver, but it should be very easy.
Stefano Zaghi
@szaghi

For many reasons, you are likely to find more robust compilers for other languages, but you'll trade the compiler bugs for another set of problems in the form of low performance or ease with which you can shoot yourself in the foot or learning curve (it takes years to be a truly competent C++ programmer, for example, whereas the students in my classes become quite competent and even at the leading edge of Fortran programming in the span of one academic quarter. That's a really powerful statement.

I agree, this is why I select Fortran. When I start to play with CAF it takes few days to let me productive, while I am still not able to be really efficient (namely really asynchronous) with MPI after years. Fortran is still the most suitable choice for my math, but there is a lot of pain if we want to exploit OOP.

I think I'll book you soon for a talk, please speak slow :smile: (tomorrow I'll know Alessandro: I am really excited to see his exascale work)

Cheers

P.S. I am very happy read about Filippone will be your co-author. Your new book promises at lot!

Stefano Zaghi
@szaghi

@rouson @cmacmackin ,

I played with operators vs non operators mode in FOODIE... it seems confirmed the overhead of defined operators, see this

stefano@thor(11:50 AM Sun May 07) on feature/add-performance-mode [!]
~/fortran/FOODIE 21 files, 2.5Mb
→ time ./build/tests/accuracy/oscillation/oscillation -s adams_bashforth_4 -Dt 0.05 --fast
adams_bashforth_4
    steps:   20000000    Dt:      0.050, f*Dt:      0.000, E(x):  0.464E-09, E(y):  0.469E-09

real    0m5.214s
user    0m4.996s
sys    0m0.216s

stefano@thor(11:51 AM Sun May 07) on feature/add-performance-mode [!]
~/fortran/FOODIE 21 files, 2.5Mb
→ time ./build/tests/accuracy/oscillation/oscillation -s adams_bashforth_4 -Dt 0.05
adams_bashforth_4
    steps:   20000000    Dt:      0.050, f*Dt:      0.000, E(x):  0.464E-09, E(y):  0.469E-09

real    0m10.535s
user    0m10.320s
sys    0m0.216s

I added the fast mode to only Adams Bashforth solver for now, but I 'll add similar mode for all solver tomorrow, it is really simple and to the end user the change is almost seamless.

See you soon, happy "domenica" :smile:

Damian Rouson
@rouson
@cmacmackin and @szaghi Do you monitor the gfortrtan mailing list? If so, you might have seen that one finalization bug was just fixed: 79311. It's an ICE so it presumably doesn't help with the memory leaks you're seeing, but it's at least one decrement to the finalization bug count. That's progress. I'll inquire with the developer about plans for the remaining bugs on the list.
Stefano Zaghi
@szaghi
:tada: : a small progress for gfortran, but a big progress for poor Fortran men like Chris and me :smile:
Milan Curcic
@milancurcic
@szaghi I suggest we stop using words "poor" and "Fortran" in the same sentence, it only perpetuates the false stigma that this language carries.
Stefano Zaghi
@szaghi
@milancurcic Hi Milan, sorry for my bad humor, I promise I'll be more careful in the future, with hope without stigma :smile:
Milan Curcic
@milancurcic
@szaghi Thanks Stefano! I am convinced of your genuinely great intentions :)
Neil Carlson
@nncarlson
I've been loosely following the recent discussion with great empathy. I want to remind people that Fortran /= gfortran. There are better compilers out there than gfortran. It would be ideal to have a top-notch free Fortran compiler, but that's not where we are right now. I understand that everyone's situation and priorities are different, but it might be worthwhile considering using a different compiler.
Stefano Zaghi
@szaghi

@nncarlson Dear Neil, thank you for sharing your thoughts, it is appreciated.

If the idea of Fortran == gfortran was conveyed by me, my bad, it is not my thought neither I want to convey it. In my view a good program must be tested with as much as possible different compilers to obtain cross-verification: compilers are programs as others thus they could (and are) be bugged as others. To me Fortran == iso-standard-xx.

My current feeling is, however, sad. Due to the sempiterna lack of funds in my research institute I have to strongly rely on free compilers; the access to commercial compilers is possible only when we buy core-hours at HPC facilities or when we obtain a grant at them (1 or 2 times for year, in mean). So, my view is strictly related to Intel and GNU: both have serious bugs about OOP, thus this blocks me.

I tested PGI, but it has too much limited support to F03/08 and no support at all to CAF; it was even very inefficient (in some scenario) if compared with Intel and GNU.

I used IBM XLF when I had a grant on PowerPC cluster, it is a great compiler, but it is not an option for x86 GNU/Linux.

Others said great things about Cray, but I did not never accessed to a CRAY cluster.

Finally there is NAG that seems great, but it is too expensive for me and Cineca (the HPC where I often obtain grants) does not provide it.

All said means, *I agree with you, Fortran /= gfortran, but, for someone like me Fortran ~= gfortran + ifort is a good approximation :cry:

Cheers

Neil Carlson
@nncarlson
@szaghi, I'm very curious to hear what your OOP issues with the Intel compiler are (perhaps off-line). That is our go-to production compiler now. It had many problems in the past (I reported many) but has greatly improved in the current version. I'm not aware of any current issues that effect me (well, perhaps one ...), but it sounds like it still has some significant problems that I should be looking to avoid.
Stefano Zaghi
@szaghi
@nncarlson Hi Neil, currently these are the issues frustrating me with Intel: CAF issue and ADT issue (this other is similar or the same issue of the other ADT issue.). I am in the 2018 beta testing and these bugs seem to be not fixed.
Jacob Williams
@jacobwilliams
Neil Carlson
@nncarlson
@jacobwilliams Actually the trick of boxing the polymorphic pointer also allows you to preserve the metadata associated with the polymorphic variable. An example:
Neil Carlson
@nncarlson
program main

  use iso_c_binding

  type box
    class(*), pointer :: p => null()
  end type
  type(box), target  :: pbox
  type(box), pointer :: qbox
  type(c_ptr) :: cp

  allocate(pbox%p, source=1)
  cp = c_loc(pbox)
  call c_f_pointer(cp, qbox)

  select type (q => qbox%p)
  type is (integer)
    print *, 'got integer', q
  class default
    print *, 'lost dynamic type'
  end select

end program
Jacob Williams
@jacobwilliams
Thanks @nncarlson ! I'm still not sure what is "legal" and what isn't with C_LOC but I like this approach (at least it works on both ifort and gfortran).
Neil Carlson
@nncarlson
It also works for NAG. Using C_LOC on a polymorphic variable is an error with NAG, and I too am not sure what is "legal". I got mixed signals from FortranFan and Steve Lionel (your Intel forum link).
Stefano Zaghi
@szaghi
@jacobwilliams @nncarlson Interop-C is still a mystery for me... can I ask some details about when/where the box wrapper plus c_loc turn to be useful? Cheer
Jacob Williams
@jacobwilliams
@szaghi I'm doing some experiments using it as a way to call object-oriented fortran code from Python. I'll try to post about it soon.
Neil Carlson
@nncarlson
My use case (https://github.com/nncarlson/yajl-fort/blob/master/src/yajl_fort.F90) involved a C library that needed call back functions, which I implemented in Fortran. One of the arguments to a call-back was a user-supplied void pointer to "context data" that the call-back needed. It's a pretty standard approach in the C world. My ideal call-back was a type bound procedure of a polymorphic type with data components of the type providing the necessary context data. To get this to work with the C library, I passed the c_loc of a box wrapper around the polymorphic type pointer as the "context data". The function, whose pointer I passed as the call-back, turned this pointer back to a box around the polymorphic type pointer, and then invoked the type bound procedure that was the actual call-back.
Not sure if any of that made sense -- perhaps it's better explained with an example.
Stefano Zaghi
@szaghi
@jacobwilliams Jacob, your use case is very interesting for me, I am now using ctypes in Python but I am not still able to exploit an OOD Fortran class with it, only non OOD procedures. If you will go with this, please share your result :smile:
@nncarlson Thank you very much for your clarification, call back world is still far from way, but I am looking to your code to learn it. Thank you again.