On Fri, 3 May 2013 14:35:45 +0200
Lukas Zapletal <lzap(a)redhat.com> wrote:
Hello,
I have two students interested in diploma thesis called Yum plugin for
suggesting packages based on usage:
http://bit.ly/18hrHbL
TL;DR - from anonymized access log, create a database of suggested
packages using data mining techniques and provide a Yum plugin that
would suggest "Users of vim also installed: ctags, git, ..."
So can you explain how this would work?
How do we know that any particular person who installed yum installed
anything else? Are you using IP address to try and see what each IP
user installed? I can think of... a lot of ways that won't work. ;)
Another approach might be to work on
https://fedorahosted.org/census/
This is the replacement for smolt, but never seems to have gotten very
far. It would be an application end users install.
I am gonna create a Fedora Feature wiki page shortly describing this
in more detail. Our goal is to offer this project for integration into
Fedora later on, at least provide Fedora packages for it.
To do that, we need good source of data. It would be best to collect
access logs from one or two main Fedora mirrors. We would provide
short script in Python that would parse access logs and anonymize the
data (IP address hash-salted) and filtered only relevant data (RPM
files from latest Fedora release or updates repositories). That would
be phase one which should give us a sample data.
We had a discussion about making our logs public a while back, and I
think that discussion ended with us saying the IP addresses wouldn't be
safe to publish, even hashed.
http://lists.fedoraproject.org/pipermail/infrastructure/2012-April/011658...
Phase two would be to integrate this script with logrotate and for
one
Fedora release cycle (Fedora 19) the script would collect relevant
anonymized data into a file. Final suggested package database would be
created from this file (or maybe files to allow us to move them on the
fly out of the stat directory).
The big (legal) question is if we are able to provide this anonymized
data to public, or if we want to sign NDA with all people involved. I
am CCing Tom for this question.
it's been asked before.
I want to be cautious about this. ;)
I need your help with connecting to relevant people. Any comments
are
appreciated.
Many thanks and I hope this effort will lead to improving user
experience with Fedora packaging.
kevin