szaghi on master
update submodules update travis config (compare)
szaghi on master
update submodules update travis config (compare)
szaghi on master
Fix parsing bug issue#7 The me… update travis config Merge branch 'release/0.1.0' (compare)
szaghi on fix-parsing-bug-issue#7
szaghi on fix-parsing-bug-issue#7
@rouson , Damian,
thank you for your reply.
I haven't followed this discussion in detail. As you guys know, I'll respond a lot more in calls than text of any form. I just can't keep up with all the text flying by me every day. Maybe it's a sign of my age.
:smile: On the contrary, my bad spoken English prevent me to call you almost all the time...
What caught my eye was the mention of @cmacmackin mentioning guard and clean_tmp.
Good to know, I'll use that word when I really need to catch your attention :smile:
I don't think the way to think about PURE is in terms of whether the attribute in and of itself speeds up code ... if I violate the requirements of PURE and DO CONCURRENT, what compiler optimizations am I preventing.
This is exactly my point: what I would like to say Chris is that the polymorphic allocatable version violate the pure condition thus it is likely preventing optimizer, whereas the non polymorphic operators version is pure (in its contents and with the explicit attribute) thus it is likely more easy to be optimized. I was not concerned about the declaration rather about the actual contents. Moreover, I specified to Chris that performance I gained is more likely due to the fact that now the math operators act on plain real arrays, thus the compiler optimizer could be even more flavored.
The performance comparison for Chris has not yet started: my cluster and my workstation are crunching numbers this weekend, it will come next week. However I did a more synthetic" test to evaluate the *defined operators overhead in different circumstances. I compared:
My results was dramatic: all user defined operators have at least 50% overhead with respect plain intrinsic operators, with, in general, the polymorphic version the worst followed by the automatic arrays one and with the allocatable array version the better. I would really like to know you opinions. My test results can be found online here and test is this. For the sake of clearness I report the fortran code below. I hope I had make some design mistakes in the test, because the overhead is really not negligible. Are these results expected for you?
! A DEFY (DEmystyfy Fortran mYths) test. ! Author: Stefano Zaghi ! Date: 2017-05-05 ! ! License: this file is licensed under the Creative Commons Attribution 4.0 license, ! see http://creativecommons.org/licenses/by/4.0/ . module arrays use, intrinsic :: iso_fortran_env, only : real64 implicit none type :: array_automatic integer :: n real(real64), allocatable :: x(:) contains procedure, pass(lhs) :: add_automatic generic :: operator(+) => add_automatic procedure, pass(lhs) :: assign_automatic generic :: assignment(=) => assign_automatic endtype array_automatic type :: array_allocatable integer :: n real(real64), allocatable :: x(:) contains procedure, pass(lhs) :: add_allocatable generic :: operator(+) => add_allocatable procedure, pass(lhs) :: assign_allocatable generic :: assignment(=) => assign_allocatable endtype array_allocatable type, abstract :: array_polymorphic_abstract contains procedure(add_interface), pass(lhs), deferred :: add_polymorphic generic :: operator(+) => add_polymorphic procedure(assign_interface), pass(lhs), deferred :: assign_polymorphic procedure(assign_real_interface), pass(lhs), deferred :: assign_polymorphic_real generic :: assignment(=) => assign_polymorphic, assign_polymorphic_real endtype array_polymorphic_abstract type, extends(array_polymorphic_abstract) :: array_polymorphic integer :: n real(real64), allocatable :: x(:) contains procedure, pass(lhs) :: add_polymorphic procedure, pass(lhs) :: assign_polymorphic procedure, pass(lhs) :: assign_polymorphic_real endtype array_polymorphic abstract interface pure function add_interface(lhs, rhs) result(opr) import :: array_polymorphic_abstract class(array_polymorphic_abstract), intent(in) :: lhs class(array_polymorphic_abstract), intent(in) :: rhs class(array_polymorphic_abstract), allocatable :: opr endfunction add_interface pure subroutine assign_interface(lhs, rhs) import :: array_polymorphic_abstract class(array_polymorphic_abstract), intent(inout) :: lhs class(array_polymorphic_abstract), intent(in) :: rhs endsubroutine assign_interface pure subroutine assign_real_interface(lhs, rhs) import :: array_polymorphic_abstract, real64 class(array_polymorphic_abstract), intent(inout) :: lhs real(real64), intent(in) :: rhs(1:) endsubroutine assign_real_interface endinterface contains pure function add_automatic(lhs, rhs) result(opr) class(array_automatic), intent(in) :: lhs type(array_automatic), intent(in) :: rhs real(real64) :: opr(1:lhs%n) opr = lhs%x + rhs%x endfunction add_automatic pure subroutine assign_automatic(lhs, rhs) class(array_automatic), intent(inout) :: lhs real(real64), intent(in) :: rhs(1:) lhs%n = size(rhs, dim=1) lhs%x = rhs endsubroutine assign_automatic pure function add_allocatable(lhs, rhs) result(opr) class(array_allocatable), intent(in) :: lhs type(array_allocatable), intent(in) :: rhs real(real64), allocatable :: opr(:) opr = lhs%x + rhs%x endfunction add_allocatable pure subroutine assign_allocatable(lhs, rhs) class(array_allocatable), intent(inout) :: lhs real(real64), intent(in) :: rhs(1:) lhs%n = size(rhs, dim=1) lhs%x = rhs endsubroutine assign_allocatable
pure function add_polymorphic(lhs, rhs) result(opr) class(array_polymorphic), intent(in) :: lhs class(array_polymorphic_abstract), intent(in) :: rhs class(array_polymorphic_abstract), allocatable :: opr allocate(array_polymorphic :: opr) select type(opr) class is(array_polymorphic) select type(rhs) class is(array_polymorphic) opr%x = lhs%x + rhs%x endselect endselect endfunction add_polymorphic pure subroutine assign_polymorphic(lhs, rhs) class(array_polymorphic), intent(inout) :: lhs class(array_polymorphic_abstract), intent(in) :: rhs select type(rhs) class is(array_polymorphic) lhs%n = rhs%n lhs%x = rhs%x endselect endsubroutine assign_polymorphic pure subroutine assign_polymorphic_real(lhs, rhs) class(array_polymorphic), intent(inout) :: lhs real(real64), intent(in) :: rhs(1:) lhs%n = size(rhs, dim=1) lhs%x = rhs endsubroutine assign_polymorphic_real endmodule arrays
program defy use, intrinsic :: iso_fortran_env, only : int64, real64 use arrays, only : array_automatic, array_allocatable, array_polymorphic implicit none real(real64), allocatable :: a_intrinsic(:) real(real64), allocatable :: b_intrinsic(:) real(real64), allocatable :: c_intrinsic(:) type(array_automatic) :: a_automatic type(array_automatic) :: b_automatic type(array_automatic) :: c_automatic type(array_allocatable) :: a_allocatable type(array_allocatable) :: b_allocatable type(array_allocatable) :: c_allocatable type(array_polymorphic) :: a_polymorphic type(array_polymorphic) :: b_polymorphic type(array_polymorphic) :: c_polymorphic integer(int64) :: tic_toc(1:2) integer(int64) :: count_rate real(real64) :: intrinsic_time real(real64) :: time integer :: N integer :: Nn integer :: i N = 100000 Nn = N/100 a_intrinsic = [(real(i, kind=real64), i=1,N)] b_intrinsic = [(real(i, kind=real64), i=1,N)] a_automatic = [(real(i, kind=real64), i=1,N)] b_automatic = [(real(i, kind=real64), i=1,N)] a_allocatable = [(real(i, kind=real64), i=1,N)] b_allocatable = [(real(i, kind=real64), i=1,N)] a_polymorphic = [(real(i, kind=real64), i=1,N)] b_polymorphic = [(real(i, kind=real64), i=1,N)] call system_clock(tic_toc(1), count_rate) do i=1, Nn c_intrinsic = a_intrinsic + b_intrinsic enddo call system_clock(tic_toc(2), count_rate) intrinsic_time = (tic_toc(2) - tic_toc(1)) / real(count_rate, kind=real64) print*, 'intrinsic: ', intrinsic_time call system_clock(tic_toc(1), count_rate) do i=1, Nn c_automatic = a_automatic + b_automatic enddo call system_clock(tic_toc(2), count_rate) time = (tic_toc(2) - tic_toc(1)) / real(count_rate, kind=real64) print*, 'automatic: ', time, ' + %(intrinsic): ', 100._real64 - intrinsic_time / time * 100 call system_clock(tic_toc(1), count_rate) do i=1, Nn c_allocatable = a_allocatable + b_allocatable enddo
I just can't keep up with all the text flying by me every day.
This is the price when you are the most experienced and the most kind Fortran programmer available :smile: To limit the spam like mine you can only become less kind, but I hope this never happens!
clean_tempare "old school", because you explicitly mention them in chapter 5 of your (relatively) recent book. The 2011 and 2012 papers you sent @szaghi definitely offer a more elegant approach, but they rely on finalisation. Unfortunately,
gfortranstill doesn't fully support finalisation and doesn't perform it on function results. I don't see how I can use your automated process without it.
@cmacmackin @rouson ,
Damian, you know how I think high of you, but I disagree (with respect): the world could be changed, but it currently does not. Intel and GNU have so many bugs about OOP that claiming full support of 2003 or even 2008 standard for that compilers is premature. Maybe the world will change the next year, but in 2017 I am really in trouble doing OOP in Fortran.
I really would like to know your new idea about functional programming, but I am skeptical: if defined operators have so big overhead as I shown above, how functional programming be suitable for HPC? In HASTY I tried to do a really useful, but not so complex, thing with CAF and it is stopped by compilers bugs...
Truth be told, I'm getting really frustrated with Fortran. If I didn't already have so much effort invested in my Fortran code base, I'd probably switch to another language. There are so many bugs related to object oriented programming in gfortran and ifort, and I'm getting sick of having to work around them. Memory management is a massive pain and not something I want to be thinking about as a programmer.
I am not so young as you, but my feeling is really the same: if I did not invested so hard in Fortran, I had likely used some other language two years ago. Probably, I'll try to invest more in Python: I see more and more HPC courses about "optimizing Python for number-crunching". Python performances are the worst I could imagine, but OOP is really a "new world" in Python.
Dear Damian, as always you are too much kind!
trust me that I feel your pain.
I know, but this does not alleviate to much the pain :smile:
I lasted through that process, got reasonably speedy responses from some compiler teams, dropped the compilers from vendors that were insufficiently responsive, and went to great lengths to become crafty about funding compiler development.
I'll try to follow your path, but in my reality searching for gfortran funding is a dream more than a challenge. In these day I'am evangelizing your idea and trying to make conscious my colleagues who are using gfortran for their research that it should be ethically and practically important to contribute to the GNU project with part of the research funding... but in Italy we do research with almost null fund.
Fortran has important features that no other language has and I care most about writing clean code. So much of what I saw in other languages seemed like a crime against humanity. The interpreted languages such as Python are factors of 2-3 slower at best and the compiled languages such as C and C++ lack even basic array manipulation facilities. And no language other than Fortran has a parallel programming model that works in distributed memory. And no other language has support for fault tolerance. To get distributed-memory parallelism and fault tolerance, you could go with MPI, but the MPI being written by almost every scientific programmer I've met will be slower, more complex, and less fault-tolerant than what a Fortran programmer can write with coarray Fortran.
I agree, this is why I selected Fortran, but currently this is all true if I do not use OOP, when OOP come in to play, all the pain highlighted by Chris arises. At the end, for the reasons you summarized and for the efforts I have already invested I'll never stop to use Fortran.
I hope you'll think more about how to contribute to gfortran, whether as a developer (almost all the developers are domain scientists -- few are computer scientists and none have any training in compiler development as far as I know) or through organizational funds...
If finding funds is a dream for me, the possibility that I can contribute to the development to gfortran is even more difficult: I am not up to the task. I know very little about C, but the big issue is that writing a compiler is an art and I am not an artist, just an oompa loompa.
I don't have any great new idea about functional programming in Fortran so you'll be disappointed. I have a set of strategies that were inspired by functional programming and that I frequently employ to make the intention of the code more clear and potentially more optimizable. One is the defined operators and your latest news is discouraging with regard to the performance (recall that I worried that Abstract Calculus might be an anti-pattern for just this reason but you previously reported that Abstract Calculus did not hurt performance based on your experience with FOODIE so I wonder what changed).
Sure, I remember your surprise, but that benchmark was really different from the one of yesterday. In FOODIE I compared Abstract Calculus with polymorphic allocatable functions (in which the ODE solver changes at runtime as well as all the operators results) with an identical test, but without abstract polymorphic operators and without changes of solvers at runtime. However, both version uses defined operators: the ACP has polymorphic allocatable (impure) operators, the other has static (pure) operators returning a type. The performances were identical between ACP and non abstract one, but this is in line with also the test I mad yesterday. What is really different is the comparison between defined operators vs intrinsic operators. For these reasons yesterday I updated our paper (soon a draft will sent to you) and I am planning to add a "performance mode* to FOODIE to allow users to select an operational mode:
%integrate_performanceversion of each solver, but it should be very easy.
For many reasons, you are likely to find more robust compilers for other languages, but you'll trade the compiler bugs for another set of problems in the form of low performance or ease with which you can shoot yourself in the foot or learning curve (it takes years to be a truly competent C++ programmer, for example, whereas the students in my classes become quite competent and even at the leading edge of Fortran programming in the span of one academic quarter. That's a really powerful statement.
I agree, this is why I select Fortran. When I start to play with CAF it takes few days to let me productive, while I am still not able to be really efficient (namely really asynchronous) with MPI after years. Fortran is still the most suitable choice for my math, but there is a lot of pain if we want to exploit OOP.
I think I'll book you soon for a talk, please speak slow :smile: (tomorrow I'll know Alessandro: I am really excited to see his exascale work)
P.S. I am very happy read about Filippone will be your co-author. Your new book promises at lot!
@rouson @cmacmackin ,
I played with operators vs non operators mode in FOODIE... it seems confirmed the overhead of defined operators, see this
stefano@thor(11:50 AM Sun May 07) on feature/add-performance-mode [!] ~/fortran/FOODIE 21 files, 2.5Mb → time ./build/tests/accuracy/oscillation/oscillation -s adams_bashforth_4 -Dt 0.05 --fast adams_bashforth_4 steps: 20000000 Dt: 0.050, f*Dt: 0.000, E(x): 0.464E-09, E(y): 0.469E-09 real 0m5.214s user 0m4.996s sys 0m0.216s stefano@thor(11:51 AM Sun May 07) on feature/add-performance-mode [!] ~/fortran/FOODIE 21 files, 2.5Mb → time ./build/tests/accuracy/oscillation/oscillation -s adams_bashforth_4 -Dt 0.05 adams_bashforth_4 steps: 20000000 Dt: 0.050, f*Dt: 0.000, E(x): 0.464E-09, E(y): 0.469E-09 real 0m10.535s user 0m10.320s sys 0m0.216s
I added the fast mode to only Adams Bashforth solver for now, but I 'll add similar mode for all solver tomorrow, it is really simple and to the end user the change is almost seamless.
See you soon, happy "domenica" :smile:
@nncarlson Dear Neil, thank you for sharing your thoughts, it is appreciated.
If the idea of
Fortran == gfortran was conveyed by me, my bad, it is not my thought neither I want to convey it. In my view a good program must be tested with as much as possible different compilers to obtain cross-verification: compilers are programs as others thus they could (and are) be bugged as others. To me
Fortran == iso-standard-xx.
My current feeling is, however, sad. Due to the sempiterna lack of funds in my research institute I have to strongly rely on free compilers; the access to commercial compilers is possible only when we buy core-hours at HPC facilities or when we obtain a grant at them (1 or 2 times for year, in mean). So, my view is strictly related to Intel and GNU: both have serious bugs about OOP, thus this blocks me.
I tested PGI, but it has too much limited support to F03/08 and no support at all to CAF; it was even very inefficient (in some scenario) if compared with Intel and GNU.
I used IBM XLF when I had a grant on PowerPC cluster, it is a great compiler, but it is not an option for x86 GNU/Linux.
Others said great things about Cray, but I did not never accessed to a CRAY cluster.
Finally there is NAG that seems great, but it is too expensive for me and Cineca (the HPC where I often obtain grants) does not provide it.
All said means, *I agree with you,
Fortran /= gfortran, but, for someone like me
Fortran ~= gfortran + ifort is a good approximation :cry:
program main use iso_c_binding type box class(*), pointer :: p => null() end type type(box), target :: pbox type(box), pointer :: qbox type(c_ptr) :: cp allocate(pbox%p, source=1) cp = c_loc(pbox) call c_f_pointer(cp, qbox) select type (q => qbox%p) type is (integer) print *, 'got integer', q class default print *, 'lost dynamic type' end select end program
c_locof a box wrapper around the polymorphic type pointer as the "context data". The function, whose pointer I passed as the call-back, turned this pointer back to a box around the polymorphic type pointer, and then invoked the type bound procedure that was the actual call-back.
Hi @/all, Just wanted to let you know that you can now try OpenCoarrays in the cloud via Binder. It is implemented as a kernel for Jupyter over at https://github.com/sourceryinstitute/jupyter-CAF-kernel. You can launch the binder (which also has python, Julia, and R kernels installed) using this button: .
Navigate to the index.ipynb file to run a demo. Or create a new notebook using the Coarray Fortran kernel and run your own experimental code, after seeing a few tutorial details in the index.ipynb file. If you just want to skip straight to that file use this link: https://bit.ly/TryCoarrays. To get to the full on binder instance, same as the button, go to https://bit.ly/CAF-Binder
>kernel-nameat the top of the cell. So you could create a notebook with python, fortran, julia, r etc. cells. Perhaps this could be useful for computing some data in one language (Fortran) and then plotting it and/or post processing it in another language (python or R)
I read this comment . It is very interesting for me, I would like to add some Autotools capabilities in FoBiS. I know about your long experience in the field whereas my knowledge of Autotools is near zero. Can you point me to some good references about the right way to identify compilers and their features? For example, I am now trying to implement in FoBiS a simple feature that should check is a compiler support the iso_10646 character kind; my idea is to that FoBiS create on the fly a simple test, invoke
selected_char_kind print the result and capture it in order to understand if the compiler support it. Is this the right way (similar to the autotools one)?
Thank you in advance.
My best regards.