These are chat archives for buddhi1980/mandelbulber2

3rd
Apr 2016
Krzysztof Marczak
@buddhi1980
Apr 03 2016 08:39
@bermarte, could you give us the code for fractal which was rendered here http://www.fractalforums.com/index.php?action=gallery;sa=view;id=18872
Was it rendered as a primitive object or it's iterated formula?
Would be nice to implement it in Mandelbulber
Sebastian Jennen
@zebastian
Apr 03 2016 10:12
@buddhi1980 : have you disabled coverity scan? last scan is from 25 days ago and i could not find how to start the scan.
i think there could be some possible issues with file_downloader.cpp and it would be good, to get the feedback from coverity
Krzysztof Marczak
@buddhi1980
Apr 03 2016 12:09
I'm updating project in coverity scan manually
Sebastian Jennen
@zebastian
Apr 03 2016 14:34
thanks, turned out i was right about file_downloader.
but i expected more than one reported error :P
Sebastian Jennen
@zebastian
Apr 03 2016 17:25
i made some more profiling:
sExtendedAux::operator=(sExtendedAux const&) is rather costly. 15% - 20% of total costs with my current settings.
I think this gets only called in compute_fractal.cpp line 112: extendedAux[sequence] = extendedAux[lastSequnce];
can we make this var a pointer, like
sExtendedAux* extendedAux[NUMBER_OF_FRACTALS];
?
Krzysztof Marczak
@buddhi1980
Apr 03 2016 17:32
Unfortuanatelly we need to copy the data in line 112. Your proposal with changing it to sExtendedAux* extendedAux[NUMBER_OF_FRACTALS] will change nothing, because it will be still an array.
What I'm thinking now, if we really neet to have this array at all. I will check what will happen if we use one common extendedAux for all formulas.
Krzysztof Marczak
@buddhi1980
Apr 03 2016 17:46
You can test it now. I have removed an array. Now it's just one common structure.
Sebastian Jennen
@zebastian
Apr 03 2016 17:57
works great, total cost decreased and my current animation render jumped from 8 minutes pending to 7 minutes.
Sebastian Jennen
@zebastian
Apr 03 2016 18:38
we should get -Ofast to the mandelbulber.pro files with default disabled. maybe with some more flags wrapped in some kind of definition.
hpc_mode (high performance computing) or something like that.
in my current animation render this is the difference between 12 hours and 4 hours.
people using the program in a render farm may be interested to compile the program directly on the render nodes with this option.
Krzysztof Marczak
@buddhi1980
Apr 03 2016 18:39
is -Ofast even faster than --ffast-math?
Sebastian Jennen
@zebastian
Apr 03 2016 18:50
Yes, the 12 hour reference was with --fast-math, so the speed up was ~times 3. but this probably highly depends on what is rendered.
The drawback is, that the generated binary is highly bound to the cpu, it was Compiled for
Krzysztof Marczak
@buddhi1980
Apr 03 2016 18:57
this flag shoudl be used instead of -O3?
Sebastian Jennen
@zebastian
Apr 03 2016 18:58
Here is the GCC reference in this:
It
It turns in o3 , fast-math and some other aggressive optimizations
Krzysztof Marczak
@buddhi1980
Apr 03 2016 19:00
now I'm recompiling the program to see speed difference
Sebastian Jennen
@zebastian
Apr 03 2016 19:00
It has got another drawback: compile time increases significantly.
Krzysztof Marczak
@buddhi1980
Apr 03 2016 19:06
I cannot see any speed diffrenece. But I already used -march=native and -msse2
I have in bash.rc folowing line: export CXXFLAGS="-march=native -msse2" to optimize all application which I compile
right now i am running this with:
/mandelbulber2/mandelbulber2/build-mandelbulber-Desktop-Release/mandelbulber2 -n -K -s 0 -e 2400 settings/hybrid\ 2\ animation3fastRender.fract
Sebastian Jennen
@zebastian
Apr 03 2016 19:30
maybe you can give it a try, if -Ofast is taking any effect on this.
i added it to the, make sure to use the correct Release / Debug. and to re - qmake. When you can see the flag in the compilation output it takes effect.
These were the issues i ran into, when trying to change compilation flags...
Krzysztof Marczak
@buddhi1980
Apr 03 2016 19:33
I have recreated makefile with qmake to be sure that -Ofast is used. I have also seen it in compler output. Can you test if you will get the same speed up with -march=native and -msse2? By the way what CPU do you have? I have Intel i7
Sebastian Jennen
@zebastian
Apr 03 2016 19:45
me too, i have got a
Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz
i tried with -march=native -msse2 instead of -Ofast
it is even worse, than not using any more flags at all
Krzysztof Marczak
@buddhi1980
Apr 03 2016 19:46
so I need to do more trials. But not today. Now I'm focusing on materials (actually redisigning of primitive objects definitions)
Sebastian Jennen
@zebastian
Apr 03 2016 19:47
so it is:
'no additional flags' -> ~2 days 20 hours
-Ofast -> 20 hours
-march=native -msse2 -> 4 days 5 hours
great, i want to fix some issues on github than i can support you with material implementation.
Krzysztof Marczak
@buddhi1980
Apr 03 2016 21:20
With actual setup (-ffast-math, -march=native, -msse2) I already have estimated to end: 17h 14m 5.3s (i7-4790 3.6GHz), so my setup is not renderering slower, but your just started rendering faster. Maybe that's why I cannot see any benefit from using -Ofast