On 27 February 2018 at 06:34, José Abílio Matos <jaomatos@gmail.com> wrote:

On Wednesday, 21 February 2018 03.10.31 WET Max Pyziur wrote:

> ggplot2, tibble, tidyr dplyr. They seem to be popular and becoming more

> integral to R.

>

> As for the point about 435 on Ubuntu vs the ~140 on Fedora: I assume those

> 435 are reflective of popularity, frequency of usage, and maintenance. It

> would be ridiculous to put all 6,000 CRAN packages into the Fedora eco

> system.

>

> But consider perl and the number of packages that have been rpm'd, even

> though some are close to stale. The benefit of having a package is that it

> is built with the whole distro in mind.



This has pros and cons.  Pro is that you can install packages without
having to find and install supporting libraries.  Con is that newer
packages may require versions of supporting packages that are
newer than the version in the distro.  

> If you install packages local to a user, then they might not/probably are

> not available to other users (but to engage in self argument: how many

> other "users" have access to your own systems - desktop & laptop?)



At work we have an operational system with multiple users.  The current
version uses a bare C++ library, but the same library is wrapped by
an R package which is far easier to use than a system where every
change needs a C++ compiler.  The operational system uses a
number of very large 3rd party applications, including R, each expecting
specific versions of libraries.   Getting all these applications to play
nicely is difficult.  A few years ago, many of the potential conflicts
were avoided by static linking.  The recent trend in which every application
"calls home"  combined with transitions to https has caused many
developers to abandon static linking, so there are more now more
conflicts than ever before.   The conflicts are not just differing versions
of libraries, there are many libraries (e.g., the proj4 projections) that
provide data files.  Some applications provide non-standard versions of these
data files, and expect users to set an environment variable that
instructs the library to the "special" versions of the data files.  This
breaks all the applications that were developed to work with the
standard data files.

There are people looking to address some of these issues using
lightweight virtualization, but at present those techniques are
limited to ad-hoc projects and testing.

 

> Sure, there is little challenge to installing R packages using

> install.packages("SomePackageName"); my concern is more for the sake of

> consistency: if perl, python, php, etc., have their modules/function

> libraries built for Fedora, why not R?

>

> Curious, not kvetching,

>

> MP

 

BTW I think that CRAN now is over 10 000 packages so even 435 is less than 5%.

 

I maintain some of those packages in Fedora, and from practical experience I 

can tell you that one of the problems when packaging a new R package into 

Fedora is that every time you start you have to unravel all the dependencies. 

Sometimes you need to go 5 levels deeps, with a net result of ~40/50 new 

packages that need to be added before adding the package you interested in.

 


How often does a new R package require a version or configuration
of a library that differs from what the distro provides?  Libraries like hdf5 and gdal
have many configuration options.  In my experience, distro versions often omit
options needed for my work (remote sensing).  Too often you can easily install
an R package, only to discover that it fails for your "use case" and has to be
rebuilt using non-distro libraries.
 

Recently some of the packages required to have the packages you referred above

are starting to show in Fedora. But that is a process that takes time and requires energy.

 

I presented a talk in useR 2008:

https://jamatos.fedorapeople.org/talk-user2008.pdf

 

FWIW this problem is not specific to R the same happens for python. There are 

packages where unraveling new dependencies is also a problem.


The talk mentions some of the issues with "suggested" packages.  Of course,
the first question is whether all the "suggested" packages can be built using
R install.packages().  Creating distro packages is labor intensive.  Packages that
are very widely used should be in distro packages, but so should packages
that are less widely used but still popular and which require extra steps to
build (e.g., rely on 3rd party libraries so need dev packages outside the
normal "build-essential" lists , or which don't reliable detect installed distro
packages).  It would be nice to have such  packages identified on CRAN so
package developers can work on the issues.

Many large organizations provide managed platforms for large applications.  
These platforms get distro packages, are are generally expected to have a 5-year
lifetime.  After year 2 of the 5, configuring large apps  generally means installing
R packages, as well as updated or differently configured versions of key libraries,
from source.   As a result, large applications are being designed to provide
a full set of libraries and tools, including R, from the start.  The Anaconda Python
distributions are an example of this trend.   If this continues, distros used
for large applcations will be installed with a minimal set of distro packages
and users will rely on applications to provide suitable libraries and tool
versions.

In my field (remote sensing) national space agencies (NASA, ESA, JAXA)
provide large packages to worldwide user communities.   Typically, the
applications are developed on the linux platform chosen by the
developing agency, so users may try to obtain the same platform.  In
practice, however, users may be constrained by their own organization's
standards or by a need to use packages from multiple agencies, and
may be forced to recompile a package before it can be used. 

It is interesting to compare linux to the other Unix (macOS).  With macOS
you get very few of the libraries and tools found on linux.   There are three
popular package systems (macports, fink, homebrew).  Any of these can
provide a consistent set of packages.  I have used macports, and sometimes
needed to create a local version of a package to get a configuration needed
by some use case. 

--
George N. White III