My purpose is to queue up a bunch of tasks > #cpus, and have #cpus run at a time in parallel.
So if I have 120 jobs to run, and 32 cores, I want to queue them all up and run 32 in parallel at a time.
Or, maybe I need to set --ncpus=16 so schedule 16 parallel  jobs instead of 32 (my scheduler is very simple and doesn't know about free memory)

On Fri, Feb 4, 2022 at 10:24 AM John Mellor <john.mellor@gmail.com> wrote:
On 2022-02-04 09:04, Neal Becker wrote:
After this discussion, I needed a simple batch scheduling system.  I tried installing and starting condor on F35.  Never saw so many selinux problems.  Couldn't dnf remove it fast enough.

After a bit of searching, I found the system I wrote 11 years ago.  

I just finished updating for py3 and a few more tweaks.

It's a very simple system that runs on the local host and allows you to submit jobs.  It will schedule them up to the #cpus.  There are commands to list the queue, kill jobs, suspend and continue them.  Does just what I need.

If it helps you too that's be great.

On Wed, Jan 26, 2022 at 2:04 PM Fred Erickson <fredferickson@gmail.com> wrote:
On Wed, 26 Jan 2022 12:59:23 -0500
Neal Becker <ndbecker2@gmail.com> wrote:

> I've needed this over the years but all the ones I've seen appeared
> much too complex for my simple use case.  I ended up writing my own
> using pyxmlrpc.  Unfortunately haven't used it for years and don't
> know if I could find it again (was uploaded to pypi at one time).
>
> Are any of these batch systems simple to install, use, and maintain?
>

I see batch was included with my f34 system and condor is provided in
the updates repo.


I'm confused.  Why do you feel the need for an overly-complicated job scheduler, emulating the mainframe scheduling mess?  Linux is a unix-like system.  Why don't you simply use the already-installed batch command?  It can easily handle thousands of simultaneously-scheduled batch jobs without bringing the system to its knees.

Unless your jobs are 100% cpu-bound, scheduling jobs by the number of cpus seems just wrong, leaving a lot of unused cpu cycles on the table.  If your jobs are i/o-bound, then disk or network load seems like it should be taken into account, not cpus.

Luckily, the overall system load is always calculated for you with no complicated mechanisms required. The batch command by default will not schedule a job if the load > 1.5 so that you do not impact foreground processes very much.  You can also renice your batch job as required to lessen the as-running impact even more.

If what you really need is a CI/CD system, then use the correct tools for the job.  Batch is generally not considered to be one of them.  Install Jenkins or CircleCI or any of a dozen tools that are built to do this right.

--

John Mellor

_______________________________________________
users mailing list -- users@lists.fedoraproject.org
To unsubscribe send an email to users-leave@lists.fedoraproject.org
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/users@lists.fedoraproject.org
Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure


--
Those who don't understand recursion are doomed to repeat it