Just a silly idea, but I believe there is one thing that could be useful at some point of the development. I mean, maybe it would be wise to add an information about the pyfr version somewhere in the output of the simulation, that could help to either track and compare the scalability version-to-version or any other improvements in the code later on. This could be placed i.e. in the header of residual file or in some separate summary file that could contain also some performance benchmarking info. What do you think?
Thanks for the suggestion. This is something we have discussed quite a lot. One issue is that you can never be sure that the version that is run is exactly the same as a release. The user may have made some ‘small tweaks’ (or even bigger changes), and the code would still tag results as being from e.g. release v1.4.0, when in fact they were not, which has the potential to cause significant confusion and erroneous comparisons.
We have also considered embedding the entire source code in every solution dump etc. such that solution files could at some level re-generate themselves, and this is something we may add going forwards. Although even that is not fool proof since you may have a difference dependency stack in the future.
How about storing the git hash and the output of git diff? If you are
on a public branch that might be a good compromise and a valid
reference no?
We did consider this a couple of years back. It is a nice idea, but
unfortunately gives rise to an annoying number of edge-cases.
Firstly, the use my not be running PyFR from an Git checkout. It could
be they run python setup.py install and are running it from elsewhere.
(One workaround is to add code to setup.py to take a diff there, but
there is still no guarantee that setup.py is being run from a checkout.)
Secondly, we would need to ensure that we grab both staged and unstaged
differences from Git --- otherwise we would miss out on newly added
files. This is fiddly and can have unintended consequences (say if you
are doing some debugging and leave a 200 MiB profile file in the PyFR
directory; now this critter will find itself inside all of your newly
minted .pyfrs files.)
Thirdly, having a Git hash (+ diff) is no good unless you can find that
hash. If the code is never pushed or backed up it is all too easy to
lose everything. Hashes also do not hold up well against git rebases;
so you can end up thinking you've lost your code when you haven't.
There is also a privacy issue. However, this is more around user
education than anything else rather than a technical problem.
As such my long term preference is to look to getting PyFR to embed its
source code in output files. Python knows what it is currently
executing and where the code came from. Once you have the source you
can then diff it against currently released version of PyFR to get an
idea about its genesis. Moreover, once you have the source code you
have the source code. It therefore becomes possible to restart
simulations and to post-process output files (hence the .pyfrs file
becomes largely self-documenting as it embeds the code required to
process itself).
I see all your points and understand how sophisticated the situation is. The idea given by Freddie of putting the source code into the output seems to be the best option, but I’m afraid it can possibly make the comparison even more complicated due to information overflow. Let say, you will make some important step forward and the code become 2.0.0 and that may be attractive to me to reinstall the code, but its performance in some particular application of my interest will drop down. It would be easier to have a clear and trackable versioning system, so I was wondering if at least the main branch of the code (developed by your original team) could have a stamp (hash as suggested by Matthieu) to indicate the version of the code (or particular solver) to make it easier. At the moment, having installed both 1.3.0 and 1.4.0, I can’t distinguish the version of the code used in the simulations of the same project simply by checking the output, I need to make my own script for that.
I think we would still have the issues Freddie and I stated previously.
Since we distribute the source of releases, someone could change the source of v1.4.0 for example, such that it performed differently, but it would still tag the files it produced as being from v1.4.0. I think this would lead to significant confusion.
If we embed the source in each output, then appropriate use of diff could identify if it was indeed run using a release, and if not, exactly what had changed.
I undestand all of these and I agree with the idea of extensive output as only this one will be a reliable reference for the user. I wonder only if you have a vision of what you want to really present in the output? Do you want to print an algorithm, a set of equation or simply copy of source files used in the calculation? Could you say more on this topic? Also when do you want to introduce this feature into the code? Sorry if misunderstood or missed something in the documentation, but I’m just a one-week-user of your code and I want to get familliar with it