dustymabe reported a new issue against the project: `atomic-wg` that you are following: `` We have been trying to build atomic workstation for a few days and haven't really been successful. We consistently get a stale file handle issue:
``` Committing: 100% error: While writing rootfs to mtree: fstatat(18/f352b4dfa8c0892a1a89dd62a541c800399f4e96d43bc23ccf6c78fb66b6bd.filez): Stale file handle ```
It would be easy to blame NFS for this, but I think this warrants some investigation because our atomic host ostree composes have been succeeding, while the atomic workstation has been failing. So all other things being equal Atomic Workstation is failing. It is also consistently failing at the same place, which makes me think it's not networking issues necessarily, but possibly either some issue with ostree/rpm-ostree or some issue with NFS that gets aggravated with ostree/rpm-ostree.
I do notice that rawhide composes seem to be working fine. The only difference I can see there is a newer version of rpm-ostree. We should get the new version of rpm-ostree into 27 stable (should be in the next run) and see if that helps.
Here are the error logs:
- [1](https://kojipkgs.fedoraproject.org/compose/updates/Fedora-27-updates-2017111...) - [2](https://kojipkgs.fedoraproject.org/compose/updates/Fedora-27-updates-2017111...) - this one succeeded but there are missing objects in the repo:
``` Nov 21 10:41:50 localhost.localdomain ostree[2755]: libostree HTTP error from remote fedora-workstation for https://kojipkgs.fedoraproject.org/compose/updates/atomic/deltas/rM/AH5m0iAva2gB_4KLzfVBD7RMQiHpdVZLhI3bdUxCs/superblock: Server returned HTTP 404 Nov 21 10:41:50 localhost.localdomain ostree[2755]: libostree HTTP error from remote fedora-workstation for https://kojipkgs.fedoraproject.org/compose/updates/atomic/objects/8d/9d7dcc283355a3bd956c323febb2f4bd9a3de5f9c5bef71f2c491955b3ecf5.filez: Server returned HTTP 404 ``` - [3](https://kojipkgs.fedoraproject.org/compose/updates/Fedora-27-updates-2017111...) - [4](https://kojipkgs.fedoraproject.org/compose/updates/Fedora-27-updates-2017111...) - [5](https://kojipkgs.fedoraproject.org/compose/updates/Fedora-27-updates-2017111...) - [6](https://kojipkgs.fedoraproject.org/compose/updates/Fedora-27-updates-2017111...) - [7](https://kojipkgs.fedoraproject.org/compose/updates/Fedora-27-updates-2017111...) - [8](https://kojipkgs.fedoraproject.org/compose/updates/Fedora-27-updates-2017111...) - [9](https://kojipkgs.fedoraproject.org/compose/updates/Fedora-27-updates-2017112...) ``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
walters added a new comment to an issue you are following: `` pungi is writing directly into an archive repo NFS mounted here? Are there concurrent writes?
One tricky thing here is that a lot of what libostree is doing for local filesystem repos is almost an anti-pattern for NFS, mainly our use of the `tmp/` dir for staging. See also https://github.com/ostreedev/ostree/issues/1184
A lot of those issues go away with the "compose into bare-user, pull-local to archive" pattern. ``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
dustymabe added a new comment to an issue you are following: ``
pungi is writing directly into an archive repo NFS mounted here? Are there concurrent writes?
yes and yes. it's being written into an NFS mounted repo and there are multiple composes going on at the same time.
the main reason for using a networked repo is because the compose could happen on any koji builder and also we want for the new commit we make to have a parent commit. It would be nice to use local storage for the compose and then use pull-local, but we haven't implemented that yet.
One tricky thing here is that a lot of what libostree is doing for local filesystem repos is almost an anti-pattern for NFS, mainly our use of the tmp/ dir for staging.
where is the tmp dir located? could it be made to use a local tmp vs one on NFS?
What is the recommendation based on that issue? Set fsync opt to disabled?
A lot of those issues go away with the "compose into bare-user, pull-local to archive" pattern.
is the use of an archive repo a particular problem here? i.e. would using bare-user be more likely to not see problems? ``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
dustymabe added a new comment to an issue you are following: `` I found one aarch64 run that failed in this same way during this time frame: [1](https://kojipkgs.fedoraproject.org/compose/updates/Fedora-27-updates-2017111...) ``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
walters added a new comment to an issue you are following: ``
where is the tmp dir located? could it be made to use a local tmp vs one on NFS?
In `$repo/tmp` - I suspect (not sure) that it's concurrency there that's the issue. There were some issues [fixed upstream](https://github.com/ostreedev/ostree/pull/1346) here. Changing pungi to use a local `bare-user` repo is effectively doing "local tmp". I am not sure rpm-ostree should be in the game of detecting and special casing NFS, but I'm not opposed to it either. We could probably add an rpm-ostree option to disable its use of a staging dir. ``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
walters added a new comment to an issue you are following: `` I filed https://pagure.io/pungi/pull-request/805 - I think it will help but I'll still need to do some libostree work here. ``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
dustymabe added a new comment to an issue you are following: ``
In $repo/tmp - I suspect (not sure) that it's concurrency there that's the issue. There were some issues fixed upstream here.
good to know - do you think it would be worth backporting that to f27 to see if that solves the problem?
Changing pungi to use a local bare-user repo is effectively doing "local tmp". I am not sure rpm-ostree should be in the game of detecting and special casing NFS, but I'm not opposed to it either. We could probably add an rpm-ostree option to disable its use of a staging dir.
I definitely agree special casing NFS is not desirable. Could we not just add an option to tell rpm-ostree what tmp staging dir to use? i.e. use `/tmp` (local fs) but the repo we operate on is `/nfs/mounted/repo`. is that what `--workdir` does?
I filed https://pagure.io/pungi/pull-request/805 - I think it will help but I'll still need to do some libostree work here.
Thanks! Just for dummies, why is using bare-repo attractive here since we're going to end up putting it in a archive repo anyway?
``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
walters added a new comment to an issue you are following: `` It's [in the docs](https://ostree.readthedocs.io/en/latest/manual/buildsystem-and-repos/)
This is because OSTree has to re-checksum and recompress the content each time it's committed. (Most of the CPU time is spent in compression which gets thrown away if the content turns out to be already stored).
``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
dustymabe added a new comment to an issue you are following: ``
It's in the docs
This is because OSTree has to re-checksum and recompress the content each time it's committed. (Most of the CPU time is spent in compression which gets thrown away if the content turns out to be already stored).
yes, but in your pull request you are making a tmp empty repo so none of the content would already exist in the repo so we wouldn't save any operations?
Also I had a few other questions in https://pagure.io/atomic-wg/issue/387#comment-480937, if you don't mind answering. ``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
walters added a new comment to an issue you are following: `` https://github.com/projectatomic/rpm-ostree/pull/1111 is likely to fix this. ``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
jlebon added a new comment to an issue you are following: `` Bodhi update with patch above: https://bodhi.fedoraproject.org/updates/rpm-ostree-2017.10-3.fc27. ``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
dustymabe added a new comment to an issue you are following: ``
Bodhi update with patch above: https://bodhi.fedoraproject.org/updates/rpm-ostree-2017.10-3.fc27.
With that it works! https://kojipkgs.fedoraproject.org/compose/updates/Fedora-27-updates-2017112... ``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
dustymabe added a new comment to an issue you are following: `` rpm-ostree-2017.10-3.fc27 just made it to stable so we can close this. although I fully intend to review and get https://pagure.io/pungi/pull-request/805 merged as well. ``
To reply, visit the link below or just reply to this email https://pagure.io/atomic-wg/issue/387
The status of the issue: `atomic workstation failures with stale file handle` of project: `atomic-wg` has been updated to: Closed as Fixed by dustymabe.
atomic@lists.fedoraproject.org