Hi,
On 7/6/23 11:10, Aoife Moloney wrote:
Important process note: we are experimenting with using Fedora
(trimming stuff because this proposal is huge)
We intend to deploy the Endless OS metrics system. [https://blogs.gnome.org/wjjt/2023/07/05/endless-oss-privacy-preserving-metri... This blog post] contains a description of how the system works. We do not plan to deploy the eos-phone-home component in Fedora.
So, the following is just _my_ opinion, don't read more than that into it:
Having finally had a chance to look at the list of collected metrics i'm a bit worried about just how much information is being/can be gathered by the project, as well as the frequency it is being gathered.
Personally, I think it would benefit fedora if questions such as "is anyone actually using this hardware/driver/package" could be answered. OTOH, the metrics presented above go far beyond that. I'm not sure why its necessary to know how many times, or how long a particular application is being used.
=== How will data collection be approved? ===
The proposal owners feel it is essential to ensure the Fedora community has ultimate oversight over metrics collection. Community control is required to maintain user trust. If this change proposal is approved, then we'll need new policies and procedures to ensure community oversight over metrics collection and ensure Fedora users can be confident that our metrics collection does not violate their privacy.
So, I would suggest that the intended metrics are included as part of this proposal as well as the interval, and that it wouldn't be changed without further community approval. Doing this would go a long way to convincing me, and likely others, that its not worth the effort to manually rip the entire subsystem out of fedora at the first chance on my machines.
If there is to be a "process" for changing them, then I think that needs to be documented here rather than hand waving it away too.
We can say "we would never collect personally-identifiable data" and write software that really doesn't collect any such data, but this alone will never be enough to ensure user confidence. We will need a metrics collection policy that describes what sort of data may be collected by Fedora (anonymous, non-invasive), and what sort of data may not be collected. Such a policy does not exist currently. We will also want to ensure the Fedora community has ultimate control over which particular metrics are collected. One option is that each metric to be collected should be separately approved by FESCo. Collection of particular metrics in a particular data format is ultimately an engineering decision, and therefore FESCo seems like an appropriate approval point. Because FESCo members are elected regularly by the Fedora community, this also provides the community with ultimate control over metrics collection via the election process. But other oversight and approval structures would work too.
=== What data might we collect? ===
We are not proposing to collect any of these particular metrics just yet, because a process for Fedora community approval of metrics to be collected does not yet exist. That said, in the interests of maximum transparency, we wish to give you an idea of what sorts of metrics we might propose to collect in the future.
One of the main goals of metrics collection is to analyze whether Red Hat is achieving its goal to make Fedora Workstation the premier developer platform for cloud software development. Accordingly, we want to know things like which IDEs are most popular among our users, and which runtimes are used to create containers using Toolbx.
IMHO, the data shouldn't be collected more frequently than every 6 months or so, which allows each collection to be presented to the user, rather than having it just uploading the data in the background. Nor should it be tracking _user_ actions, which I would differentiate from machine state (bios machine type, RAM, installed packages, application crashes, failed suspend/resume, kinds of things).
But given course grained tracking, why isn't it part of server/IoT/etc as well, other than the current focus on gnome? Surely knowing that only one user is running $APPLICATION on a server is useful too.
Metrics can also be used to inform user interface design decisions. For example, we want to collect the clickthrough rate of the recommended software banners in GNOME Software to assess which banners are actually useful to users. We also want to know how frequently panels in gnome-control-center are visited to determine which panels could be consolidated or removed, because there are other settings we want to add, but our usability research indicates that the current high quantity of settings panels already makes it difficult for users to find commonly-used settings.
(trimming)
=== User control ===
A new metrics collection setting will be added to the privacy page in gnome-initial-setup and also to the privacy page in gnome-control-center. This setting will be a toggle that will enable or disable metrics collection for the entire system. We want to ensure that metrics are never submitted to Fedora without the user's knowledge and consent, so the underlying setting will be off by default in order to ensure metrics upload is not unexpectedly turned on when upgrading from an older version of Fedora. However, we also want to ensure that the data we collect is meaningful, so gnome-initial-setup will default to displaying the toggle as enabled, even though the underlying setting will initially be disabled. (The underlying setting will not actually be enabled until the user finishes the privacy page, to ensure users have the opportunity to disable the setting before any data is uploaded.) This is to ensure the system is opt-out, not opt-in. This is essential because we know that opt-in metrics are not very useful. Few users would opt in, and these users would not be representative of Fedora users as a whole. We are not interested in opt-in metrics.
I also think its useful here to describe _exactly_ how to disable/remove the component, as well as where the opt-in/out settings are stored in the filesystem, how to change it, and where the log of reported data for a given machine can be retrieved.
To make this a little more confusing, metrics collection is actually separate from uploading. Collection is always initially enabled, while uploading is always initially disabled. The graphical toggle enables or disables both at the same time. That is, a newly-installed Fedora system will always collect metrics locally at first, but the collected metrics will be deleted and never submitted to Fedora if the user disables the metrics collection toggle on the privacy page. If the user leaves the toggle enabled, then the collected metrics may be submitted only after finishing the privacy page.
(trimmed rest)
Thanks for getting this far.