On Thu, 28 Feb 2019 at 00:06, Stephen John Smoogen <smooge(a)gmail.com> wrote:
On Wed, 27 Feb 2019 at 16:05, Jim Perrin <jperrin(a)redhat.com> wrote:
>
> How much heresy is involved in us using Amazon's elasticsearch service
> for this, so that we don't have yet-another-thing to maintain?
>
I was wondering how much data are we looking to shove there, does that
data need to be 'protected', and how fast do we need it to be for us
to talk back and forth to the cloud. The heresy side I don't have any
say in..
For fedora-packages we want to store documents that contains packages
informations (see the current structure used
https://github.com/fedora-infra/fedora-packages/blob/master/fedoracommuni...).
Currently in production we have 23849 documents in the xapian database
so I honestly don't think that will be much trouble for elasticsearch.
Writing to the cluster should be restricted and I think the search
service should be public, elasticsearch provides Security Privileges
(
https://www.elastic.co/guide/en/x-pack/current/security-privileges.html)
that seems to fit with that idea.
Indexing does not have to be crazy fast, for example currently
fedora-packages indexing takes between 2 to 3 hours so I don't think
network latency will matter much here. Searching is a bit more
sensitive since users usually don't want to wait more than a seconds
or so to get a search results but if we use the elasticsearch
javascript library
(
https://www.elastic.co/guide/en/elasticsearch/client/javascript-api/curre...)
and handle the search in the frontend then it does not have to go via
our infrastructure.
>
>
> > On 2/27/19 4:19 AM, Stephen John Smoogen wrote:
> > > On Tue, 26 Feb 2019 at 14:39, Clement Verna
<cverna(a)fedoraproject.org> wrote:
> > >>
> > >> Hi all,
> > >>
> > >> fedora-packages [0] code base is showing its age. The code base and
> > >> the technology stack (Turbogears2 [1] web framework and the Moksha
> > >> [2] middleware) is currently not ready for Python3 and I am not
> > >> planning to do the work required to make it Python3 compatible, so the
> > >> application will stop working when Fedora 29 is EOL.
> > >>
> > >> In order to keep the service running, I have started a Proof Of
> > >> Concept (fedora-search [3]) to replace the backend of the application.
> > >> Fedora-search would be a REST API service offering full test search
> > >> API. Such a service would then be available for other application to
> > >> use, fedora-packages would then become a frontend only application
> > >> using the service provided by fedora-search.
> > >>
> > >> While the POC shows that this is a viable solution, I don't think
that
> > >> we should be proceeding that way, for the simple reason that this add
> > >> yet another code base to maintain, I think we should use this
> > >> opportunity to consider using Elasticsearch instead of maintaining our
> > >> own "search engine".
> > >>
> > >
> > > The main issues to getting elasticsearch working in the past was the
following:
> > >
> > > 1 The number of systems needed to make it work. There is a large
> > > difference from their 'proof-of-concept see how great this is' to
'ok
> > > you want to do anything with load' setups in everything from storage
> > > to number of search nodes to network speeds. [The number of hardware
> > > for the data we have was to start with 5-8 dedicated Dell systems,
> > > some amount of shared fast storage, and N virtual machines with a
> > > 10-40GB backbone.. or throwing all of Fedora Infrastructure at once
> > > into the cloud.. because the feed it from PHX2 to the cloud is
> > > expensive.]
> > >
> > > 2. Packaging of elasticsearch was a mess. At the time we had rules
> > > that all packages needed to be packaged in Fedora and follow Fedora
> > > packaging rules. [This one has been relaxed.]
> > >
> > > 3. Running of elasticsearch was a large service in itself. It doesn't
> > > take care of itself and we would need one or more people who know it
> > > well to keep it running. [This goes down the ladder.. the logstash
> > > backends are also full services.. ] Most of that was written in Java
> > > which no one on the team at the time had good experiences with.
> > >
> > > 4. A kibana/elasticsearch query expert. Just like any database, most
> > > of the queries you can make are the worse kind. They will take a lot
> > > more CPU/memory/time than they should making just grepping for data
> > > faster.
> > >
> > > However that is 3-5 years ago.. so a lot has changed since then.
> > >
> > >
> > >> I think that Elasticsearch offers quite a few advantages :
> > >> - Powerful Query language
> > >> - Python bindings
> > >> - Javascript bindings
> > >> - Can be deployed in our infrastructure or used as a service
> > >> - Can be useful for other applications ( docs.fp.o, pagure, ??)
> > >>
> > >> So what is the general feeling about using Elasticsearch in our
> > >> infrastructure ? Should we look at deploying a cluster in our infra /
> > >> Should we approach the Council to see if we can get founding to have
> > >> this service hosted by Elastic ?
> > >>
> > >> Thanks
> > >> Clément
> > >>
> > >> [0] -
https://apps.fedoraproject.org/packages/
> > >> [1] -
http://www.turbogears.org/
> > >> [2] -
https://mokshaproject.github.io/mokshaproject.net/
> > >> [3] -
https://github.com/fedora-infra/fedora-search
> > >> _______________________________________________
> > >> infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org
> > >> To unsubscribe send an email to
infrastructure-leave(a)lists.fedoraproject.org
> > >> Fedora Code of Conduct:
https://getfedora.org/code-of-conduct.html
> > >> List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
> > >> List Archives:
https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora...
> > >
> > >
> > >
> > _______________________________________________
> > infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org
> > To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org
> > Fedora Code of Conduct:
https://getfedora.org/code-of-conduct.html
> > List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives:
https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora...
>
>
>
> --
> Stephen J Smoogen.
> _______________________________________________
> infrastructure mailing list -- infrastructure(a)lists.fedoraproject.org
> To unsubscribe send an email to infrastructure-leave(a)lists.fedoraproject.org
> Fedora Code of Conduct:
https://getfedora.org/code-of-conduct.html
> List Guidelines:
https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
https://lists.fedoraproject.org/archives/list/infrastructure@lists.fedora...