- Join over
**1.5M+ people** - Join over
**100K+ communities** - Free
**without limits** - Create
**your own community**

Automated thermodynamic database development within the CALPHAD method using pycalphad

- Jun 26 18:28
bocklund on master

Remove unnecessary print (compare)

- Jun 26 18:16bocklund closed #126
- Jun 26 18:16
bocklund on master

FIX: Migrate to tinydb v4+ (#12… (compare)

- Jun 26 18:16bocklund closed #127
- Jun 26 18:02bocklund opened #127
- Jun 26 18:02
bocklund on tinydb-fix

Fix for empty queries to see wh… Fix pickling WIP: Failing test for applying … and 4 more (compare)

- Jun 26 15:57bocklund commented #126
- Jun 26 14:20davidkleiven opened #126
- Apr 12 00:27
bocklund on 0.7.7

- Apr 12 00:27
bocklund on master

REL: 0.7.7 (compare)

- Apr 11 23:19bocklund commented #71
- Apr 11 23:18bocklund closed #104
- Apr 11 23:18bocklund commented #104
- Apr 11 23:16
bocklund on master

ENH: Preliminary support for th… (compare)

- Apr 11 23:16bocklund closed #124
- Apr 11 21:01
bocklund on master

DOC: Update documentation recip… (compare)

- Apr 11 19:19
bocklund on master

ENH: Update the preferred metho… (compare)

- Apr 11 19:19bocklund closed #125
- Apr 11 19:11bocklund closed #110
- Apr 11 19:11bocklund commented #110

Hi @jan-janssen, I haven't had time to go back to this yet. Looking through some files and your data, it might be sufficient to just increase the weights for the fcc solubility and the Laves solubility. If you have the vanilla datasets from the ESPEI-datasets repo, try adding weights in the following files:

```
zpf/CU-MG-ZPF-CUMG2-FCC_A1-LAVES_C15-LIQUID-Bagnoud1978.json set weight to 5.0
zpf/CU-MG-ZPF-FCC_A1-LAVES_C15-LIQUID-Jones1931.json weight to 10.0
```

If those weights work well, it would be great if you were able to submit a PR to the datasets repo to update those weights and to the ESPEI repo to update the documentation with the new images for the example. I agree with your previous comment that we can limit to the run in the example to maybe 100 or 200 iterations, verified with the lnprob vs. iterations plot

I updated the weights - the results changed a little bit but I am not sure if it is an improvement or not.

Hey @spike_k_gitlab, you can use pycalphad to plot the activity, along with your experimental data: https://pycalphad.org/docs/latest/examples/PlotActivity.html

@bocklund thank you for reply. Also when I try adding experimental Hmix for liquid phase following Cu-Mg example ESPEI throws an error https://pastebin.com/ZW1T9AuL . Any ideas how it could be fixed?

You should be able to drag and drop the files into Gitter, but if you still have problems, you can email it to me at bjb54@psu.edu

Here's the modified database with that line commented out

Here is my lnprob trace.

You should see something like:

```
TRACE:root:Probability for initial parameters
TRACE:root:Likelihood - 2.44s - Thermochemical: -136.338. ZPF: -2363.344. Activity: -11834.342. Total: -14334.025.
TRACE:root:Proposal - lnprior: 0.0000, lnlike: -14334.0251, lnprob: -14334.0251
```

And you database

I'm wondering how the probabilities after the jumps compare to the probabilities of the inital set of parameters. We start off by perturbing the parameters, so it's normal that they start a little worse and converge to (and improve on) your inital set of parameters.

```
TRACE:root:Probability for initial parameters
/home/semikk/anaconda3/lib/python3.7/site-packages/pycalphad/core/lower_convex_hull.py:136: RuntimeWarning: invalid value encountered in double_scalars
result_array_GM_values[it.multi_index] = new_energy / molesum
TRACE:root:Likelihood - 6.71s - Thermochemical: -1004.646. ZPF: -3424.380. Activity: -14400.540. Total: -18829.566.
TRACE:root:Proposal - lnprior: 0.0000, lnlike: -18829.5658, lnprob: -18829.5658
```

The log-likelihood for the activity data in that proposal is about 50x greater than the input, so the set of parameters that correspond to that proposal will be rejected because the probability is much lower (since the overall probability is so much higher than the probability of any of your chains).

Can you find any examples where the

`Total`

is close to your -15,500 or -16,000?
For more context: the output will contain the likelihoods for *all* proposed sets of parameters (accepted and rejected), but the ones you see in the plot of

`-lnprob`

are only the accepted parameters that are in the Markov chains
Just based on how large the log-likelihood is there, my guess is that one or more of your parameters have diverged and you are sampling from very unreasonable parameters.

Can you see if the values of the parameters of the chains that have significant jumps in probability are reasonable? Plotting the parameters might be helpful, i.e. https://espei.org/en/latest/recipes.html#visualize-the-trace-of-each-parameter

That's okay. I'm pretty sure it's the activity data. Another way to check would be to start a new run for 1 iteration using the output database from your 1000 iteration runs as the input database, since the final database that you get is based on the parameter set with the highest probability.

But I think it would be more valuable to check into the parameters and see what their values are

Since the second one is a coefficient that's multiplied by

`T`

in the model, that's actually a really big change. A 40x change (-5 to -200) in this parameter would give a decrease in the energy of 40 kJ at 1000 K. Usually a good rule of thumb is that the `b`

coefficient should be roughly 1/3000th of the `a`

coefficient. So for `a = -14000`

, `b ~= -5`

.
I'd double check your activity data, plotting with the pycalphad link I sent the other day, if possible.

I noticed in the files you sent me before that there are some unusual values, e.g. `Si-acr-75cha_1.json`

has

```
"X_SI": [
0.650,
0.600,
0.500,
0.399,
0.303,
0.200,
0.099,
0.790,
0.700
]
```

and

```
"values": [[[
0.597,
0.513,
0.356,
0.196,
0.104,
0.078,
0.033,
0.004,
0.080
]]],
```

For a reference of pure Si, the compositions should follow the trend in the mole fraction of Si. The last two compositions are for `X(SI)=0.790`

and `X(SI)=0.700`

, which don't match up well with the activities

The ones with pure Al reference state all seem okay, but some other pure Si ones you may want to check into, includuing

`Si-acr-75cha_2.json`

and `Si-acr-75cha_3.json`

(maybe the activites are just out of order in this one?)
My current hypothesis is that some of the inconsistent data may be contributing to an unrealistically large log-likelihood and that these non-physical data are dominating the likelihood. If that's true, then some of your parameters (like the second one above) may be able to wander parameters space until they find some sets of values that drastically reduce the log-likelihood for the non-physical values of activity, but these are just unlucky guesses and that causes the chains to get stuck there.

I have some alternate hypotheses we can explore if the bad data hypothesis turns out to be wrong or not fix the issue