Shout out to my fellow Flocker, Matt...

-------- Original Message --------
Subject: RFC: Proposal for a more agile "Fedora.next" (draft of my Flock talk)
Date: Mon, 22 Jul 2013 09:38:54 -0400
From: Matthew Miller <mattdm@fedoraproject.org>
Reply-To: Development discussions related to Fedora <devel@lists.fedoraproject.org>
To: Fedora Development List <devel@lists.fedoraproject.org>

<snip>

Obviously, no-bundled-libs is a crucial part of the packaging guidelines
today. As a sysadmin, I know why it's important. This is not just a noble
goal, but also something that pragmatically makes systems better. But, it's
also keeping us from having software that people really use in Fedora. Chef
and Hadoop are two big examples. This hurts us more than it helps the world.
So, in some areas, we need a different approach.


The Big Data SIG is trying to adapt Hadoop 2.x into Fedora for F20, and I'll be sharing our insights on this at Flock in a couple of weeks. In Matt's conceptual architecture I suppose Hadoop Common would live in the Ring 2-to-3 orbit somewhere. It is a core in it's own right (it provides a distributed, replicated file system) in that there is an every growing software ecosystem that has emerged around it, and the SIG would like Fedora to be the OS of choice for that ecosystem. Stable enough for deployment but a feature-rich, current and productive environment for the developers in that evolving ecosystem. The Hadoop runtime is an orchestration of JVM-based daemons which can be viewed as system-level services, thus an obvious candidate for well-defined integration with Fedora via packaging: correct permissions, systemd scripts, logs, etc.

However, the root of that core is a set of older and deprecated Java dependencies (e.g., Jetty 6, Tomcat 5.5) which are expressed via the Apache Maven build tool. The "quick and dirty" label used by another poster of a VERY popular build tool like Maven does it a disservice. The fact is that it is exceedingly popular in the Java development community and has been for some time. Anyway, the challenge for this project is the reconciliation of it's stable dependencies versus the ever-changing bleeding edge that is typically found in the latest Fedora release. A lot of our efforts so far have been the various API and build specification changes necessary to try to make Hadoop fit into Fedora.

So far, so good...sort of. We can make the basic use case and tests work with the modified dependencies but in doing so we risk giving up parity with the Apache baseline (including the JRE) and potentially lose out to other so-called "dirty RPMs". Ideally, we wouldn't be forced into some of these adaptations and compromises if there were Fedora packaging alternatives that would give us (a SIG ring?) more control over the bundles needed by Hadoop as opposed to the ones mandated by the latest Fedora release. Make no mistake: patches are fed from the SIG to the Hadoop community to try to bump the versions there. But the upstream project can't and won't chase an ever-vanishing point in the distance. They view their lower dependencies much like a stable OS such as RHEL and change should be deliberated there.

I feel like Matt has at least kick-started the discussion around how Fedora could evolve to support orthogonal dependency models that more readily adapt to external projects like Hadoop. Not that our SIG has any profound answers. :-)

Thus, we are very interested in any packaging architecture proposals that could help relieve our initiative's pain points, and look forward to further constructive discussion of the same.

My $0.02,
\Pete


-- 
Peter MacKinnon
MRG Grid/Big Data
Red Hat Inc.
Raleigh, NC