Cron works great when you don’t need to guarantee execution, e.g., if a server goes down. Unfortunately, all the alternatives are pretty heavyweight, e.g., Jenkins, Azkaban, Airflow. I’ve been working a job scheduler that strives to work like a distributed cron. It works with very little code, because it leans heavily on Postgres (for distributed locking, parsing time interval expressions, configuration storage, log storage) and PostgREST (for the HTTP API). The application binary (~100 lines of haskell), polls for new jobs, then checks out and execute tasks. The code is here if you’re interested:
It compiles to machine code, so deploying the binary is easy. That said, I’d like to add some tooling to simplify deploying and configuring Postgres and PostgREST.
Other toolsets include Paul Jarc's runwhen (http://code.dogmap.org/runwhen/) which is designed for first-class services individually scheduling themselves.
To be fair, all of these are node-local, not distributed, which is what the parent was talking about. (Some of the ones you mention are also very old and unmaintained.)
For a distributed cron, you need a more sophisticated, transactional system that can deal with node failures.
There are many cron replacements, but they generally don't tackle the main problems with the original cron, such as the inability to retry, queue, prevent overlapping jobs, enforce timeouts, report errors and metrics in a machine-readable way (spewing out emails is not a good solution), etc.
I think what you're describing may be a batch sceduling system, such as PBS or LFS.
Those seem to be rarer, since the legitimate use case for a distributed system (what used to be called "grid computing") were rare.
Nowadays, I assume everyone just uses Yarn on Hadoop, when they think scaling "up" ends at a mid-range 2U server. I don't know how good its actual time-of-day/calendar based scheduling is, though.
I disagree. If you really look at the problem space, it turns out that "classic cron" is just a "batch scheduling system" that is poorly implemented.
For example: Want to run a backup every night? With cron, you run into several issues: A backup job could fail; how do you recover? A backup job could run (due to sudden latencies) for an unexpectedly long time, overlapping with the next scheduled time; how do you prevent "dogpiling"? How do you record the complete log about when each job ran, what it output, and whether it was successful or not? And so on. Or for that matter: What if the box that is supposed to schedule this job goes down?
These are fundamental systems operations tasks that you want Unix to solve. Unfortunately, cron leaves all the actual hard challenges unsolved; it's fundamentally just a forker. Cron, then, isn't really useful for much except extremely mundane node-local tasks such as cleaning up $TMP. I can think of relatively few tasks in a modern environment that can use cron without running into its deficiencies.
This means that, for example, backups tend to be handled by an overlapping system that actually has these things built in. This is a shame, because the Unix philosophy wisely encourages us to separate out concerns into complementary tools that fit together. Instead of a rich, modular, failure-tolerant system that only knows how to execute jobs, not what the jobs are, you get various monolithic tools that build all of the logic into themselves.
Not "everyone just uses Yarn" at all. For the projects I work on, we use distributed cronjobs on Kubernetes, which solve pretty much all of the problems with cron. For many people, both Hadoop and Kubernetes are overkill, though, yet they would benefit from a resilient batch scheduler.
> I disagree. If you really look at the problem space, it turns out that "classic cron" is just a "batch scheduling system" that is poorly implemented.
I'm a bit confused. It still sounds like you're describing a batch scheduling system and that cron isn't one (because it only implements a narrow function of such a system).
What features does PBS (not cron) lack that any of the re-inventions do have?
> Unfortunately, cron leaves all the actual hard challenges unsolved
> the Unix philosophy wisely encourages us to separate out concerns into complementary tools that fit together.
I'm also having trouble reconciling these two positions. Cron does one thing, which is forking on a schedule. To leave something like "dogpiling" for another utility (e.g. dotlockfile) to solve seems consistent with the Unix philosophy.
> Not "everyone just uses Yarn" at all. For the projects I work on, we use distributed cronjobs on Kubernetes
That doesn't quite refute my point, as you can feel free to consider "Yarn" as merely a metaphor for the currently most-popular inbuilt scheduler of a currently popular distributed computing platform.
Thanks for sharing these. In my case, I needed a system capable of sending payments at precise intervals, ensuring that a single payment was sent exactly once, despite the job being shared by a group of servers (for redundancy and high availability). Postgres provides the central locking, so tasks can be handed out to the pool of workers, while also supporting failover by streaming to a warm-standby replica.
When I was running both at the same time about five years ago, yeah, I personally found Jenkins to be more operationally expensive than a small postgresql instance. It required more memory, more CPU, and more time to set up and keep running as desired. Maybe things have improved or maybe it's just a YMMV thing, but yeah, I would have said the same thing as your parent without thinking twice.
Edit to add: oh yeah, I just read another comment that reminded me that it also ate up lots of disk space as well.
As someone who has administrated both, I’d rather manage 10 Postgres instances than one Jenkins box. No question.
Edit: I should expound. Jenkins seems like it has a lot of clunky moving parts. It all works, and I’d rather use it than anything else, but it’s kind of like IKEA furniture: you use it because you have to, not necessarily because you want to.
It’s also incredibly difficult to automate. I can configure Postgres with a config file or two and easily use Ansible to get the exact same instance every time. Jenkins has to be dragged into Automation Alley kicking and screaming. I partially blame this on the fact that Jenkins has nontrivial amounts of configuration that’s done via GUI. I approach a long-running Jenkins instance with the same fear and dread I approach a Windows box that hasn’t been restarted in six months. I.e. the box is now a snowflake and trying to make it reproducible and automated is going to be a bad time.
I could go on, but as a devops critter, Postgres wins every time.
Jenkins is probably 100 times easier to deploy, configure, operate, and scale than Postgres+whatever, which is a horrifying statement, but still true. If what you have to do is schedule and run arbitrary jobs on any kind of machine, it's no contest. They aren't even related. Postgres is a relational database, and Jenkins is a single Java process that stores flat files on local disk, connects to remote nodes with SSH, and has a thousand plugins.
It's like comparing a missile with an airplane. One gets where it needs to go faster and more efficiently, and the other one transports people.
Two ansible roles by the same author, supporting both Centos and Ubuntu. Not hugely different in complexity IMO. Installing Postgres on FreeBSD, though, is little more than
pkg install postgresql10-server
sysrc postgresql_enable=“YES”
service postgresql initdb
service postgresql start
I get that they install differently, but that's not the point. The point is that they do very different things, in different ways. Postgres is not a Jenkins replacement. Postgres is storage and querying. Its equivalent on Jenkins is XML files.
Jenkins is one of those things that looks good until you trip over the numerous bugs in it. Such as the slightly terribly leaky implementation that hoses itself once a week, the severely million inodes it decides to gobble up which makes backups a chore etc.
Sure it works but from an admin perspective it’s horrid.
Is this really more horrid than having to hire a DBA to maintain, operate and upgrade the cron replacement? I can write a script to identify and remediate bugs in apps. I can't so easily write a script to diagnose badly used database apps and get the developers to rewrite their queries to not blow up the database servers. It's much easier to set up, lock down, and maintain Jenkins than Postgres+whatever, imo. But I guess everything can go to hell depending on how it's used.
>Is this really more horrid than having to hire a DBA to maintain, operate and upgrade the cron replacement?
Only you don't. I'm not sure why you seem afraid of Postgres, but millions of people use it, across tons of companies, and even for personal projects, and it's trivial to setup and run. Oh, and those lots of plugins and stuff you mention? You don't have to use them if you don't need them. They don't even enter the picture at all.
Because I've managed database applications before?
> and it's trivial to setup and run
Jenkins is more trivial. It's one process. It doesn't depend on an external high-availability networked data service. Backup is 'cp -r SRC DEST', or a plugin if you're fancy. And, again, Postgres does not replace Jenkins, it's just the storage and querying. It adds a lot of complexity and service availability points of failure that Jenkins does not have.
Last I checked. The plugin to do backup didn't work.
Copying the jenkins directory can be done but restoring it on another fresh install will not work. There are many files that needs to be deleted manually until the instance can start without error.
> The filesystem is a shitty database. Thought we’d all learned that by now.
How dare you :) "The" filesystem is a great database.. for certain applications. Big, binary blobs of video, store, and even index, particularly well!
I think what the collective "we" haven't learned is to avoid trying to think about scale intuitively (rather than "doing the math") and to avoid extrapolating from the trivial scenario.
It's why "latency numbers every programmer should know" is still a thing.
It’s ok for the narrow set of use cases that it is good at. Outside of that it needs something that has some intelligence and ability to read optimise it.
Heavily locked stuff, lots of small things, huge number of locatable data entries, not so much. Which is Jenkins.
As for scale, Incidentally I worked on a very old filesystem based store back when we had spindles to contend with. We had four racks of Sun disk array trays. The only way it performed well was keeping it to 2Gb spindles. I’m well aware of scale issues on file systems, perhaps moreso than the Jenkins developers. We had to scale that to 4000 concurrent users.
> It’s ok for the narrow set of use cases that it is good at
My comment was partly in jest, and mostly hoping to spur conversation, not as a true disagreement or criticism with your comment.
Still, saying "It's OK" for those narrow use cases may be too dismissive, even if "great" is an exaggeration. There are plenty of examples where DBMSes (especially relational ones) have fared poorly in comparison.
> Outside of that it needs something that has some intelligence
I fear I'm missing your point here. Certainly "the filesystem" as in the Unix syscall interface to a hiearchical arrangement of files, lacks intelligence, but that doesn't mean the specific underlying implementation must.
We've even come a long way from every being the Berekeley Fast Filesystem. Besides the many choices of underlying filesystems (including CoW ones like ZFS and BtrFS)
> and ability to read optimise it.
I assume by read optimization you don't just mean something like the buffer cache, but a user-specified index?
> Heavily locked stuff, lots of small things, huge number of locatable data entries, not so much. Which is Jenkins.
Does Jenkins use external locking? It would be odd in light of some of the comments elsewhere in the thread touting its advantage of being a single process. Of course, even if it's using only locking internal to itself, there's a good argument that its authors needlessly re-invented a DBMS (which we've seen happen when other niche-use databases get used for broader purposes).
I'm not sure a large number of tiny files is inherently problematic for a filesystem, merely problematic for existing implementations, and some are better at it than others. What about something like libferris (assuming perfectly spherical cows and ignoring the performance implications of FUSE for a moment), which can back a filesystem with an arbitrary database?
IOW, is Jenkins-using-the-filesystem an Ops optimization/tuning problem, or is it a more fundamental problem that can only be addressed with modifying its code?
> The only way it performed well was keeping it to 2Gb spindles.
I worked with the aforemention hardware extensively, early in my career, but at a company that wrote a data warehousing (aka OLAP) RDBMS.
I'm reasonably confident in saying that your performance observations have almost nothing to do with the filesystem itself and everything to do with I/O performance in general. Large numbers of smaller spindles were absolutely required good database performance and scalability.
> I’m well aware of scale issues on file systems
I didn't mean to suggest you didn't, rather the opposite, as "the collective we" was a euphemism meant to imply "everyone but us".
Anyway, my overall point is that you and I may be acutely aware of the real, practical problems with scaling filesystems, but we're rare. Since there's nothing fundamental/theoretical that makes the filesystem an obviously poor choice at modest scale (the definition of which increases as computing power increases), the lesson does not get learned by everyone.
Instead, because truly large scale becomes rarer and rarer as computers become more powerful (CPU more than I/O, of course, but then.. SSDs), every time the lesson is re-learned by an individual/company, they think it's a new, or at least unique problem, and we end up with a re-invention of Portable Batch System (a fairer characterization than re-invention of cron, IMO).
For me operating means running with no single point of failure with monitoring and upgrading with very little or no outage. Running a postgres HA is hell and we are moving away from it to a cloud managed (postgres) database because it's one of the most complicated and error probe part of are infrastructure. Jenkins sucks because it needs gui interaction on install (or at least it did years ago).
https://github.com/finix-payments/jobs
It compiles to machine code, so deploying the binary is easy. That said, I’d like to add some tooling to simplify deploying and configuring Postgres and PostgREST.