I saw many changes related to pyc last week, so I had a look. I don't
understand well these issues. Here are my notes to try to understand the
context ;-) I don't request any change, I'm fine with the latest choices
made in Fedora.
There are different issues:
(1) Performance regression on importing .py files
(2) Getting reproducible .rpm binaries
(3) .pyc files are not fully reproducible
In the reverse order:
(3) should be fixed in Python, I reported the bug upstream:
(2) is a work-in-progress, Fedora builds are not reproducible yet.
(1) is the main question here.
First of all, the rationale for the .pyc file change in Python 3.7 is
described in the PEP 552:
"Reproducibility is important for security"
But I'm not sure how important is it if it's only done half-way? Fedora
doesn't seem to support reproducible builds yet, even if recent changes
show that it's moving on. Old documents about Fedora:
Debian is making good progress:
OpenSUSE is also working on that:
On Python 3.7 and newer, if SOURCE_DATE_EPOCH is set, py_compile uses
hash-based pyc file: pyc files don't contain a timepstamp, but a hash
"The default is PycInvalidationMode.CHECKED_HASH if the
SOURCE_DATE_EPOCH environment variable is set, otherwise the default is
PycInvalidationMode.TIMESTAMP." says py_compile doc.
When importing a .py file, the content of the .py file is hashed and
compared to the hash stored in the .pyc file.
For timestamp based .pyc, only the mtime attribute of the .py and .pyc
files are compared: .pyc is only regenerated if its mtime is older than
the .py file.
Note: Python 3.7 also has a --check-hash-based-pycs command line option,
but it looks to be for specific use cases. (See also
redhat-rpm-config was modified 18 days ago in Fedora to set
SOURCE_DATE_EPOCH to the timestamp of the topmost changelog entry:
OpenSUSE had issues with reproducible Python build and .pyc files:
For this reason, %clamp_mtime_to_source_date_epoch is still off by
default (in Fedora and OpenSUSE).
SOURCE_DATE_EPOCH was disabled in the Python 3.7 package ("
%global source_date_epoch_from_changelog 0
"), because test_cmd_line_script, test_multiprocessing_main_handling and
test_runpy fail if SOURCE_DATE_EPOCH is set. These tests have been fixed
in Python 3.8.
glib2 sets PYTHONHASHSEED=0 environment variable to workaround one of
the remaining bug for reproducible .pyc files: frozenset are not written
in a deterministic order in .pyc files:
This issue should be fixed in Python, I reported the bug upstream:
If SOURCE_DATE_EPOCH is set when building Python, pyc uses the hash, no
timestamp. Some people consider that it's a performance regression. So
Python was modified to force the usage of timestamp when Python is
built: when RPM_BUILD_ROOT env var is set.
Night gathers, and now my watch begins. It shall not end until my death.