Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So, for context, my experience is limited to trying to get a MariaDB Galera cluster running. Specifically using the Bitnami image. So my issues might not apply to every single stateful app out there. I'm also running all of this on a vsphere in our own data center. Not in the cloud.

Swarm does not support dependencies between services. See [0]. It also does not support deploying replicas one at a time. See [1] where I'm asking for that support.

In the case of Galera, you need a master node to be up and running before you add any new nodes. I'm pretty sure that when you're initiating any kind of stateful clustered app, you'd want to start with one node at a time to be safe. You can't do that in Swarm using a replicated service. All replicas start at the same time.

Using a service per instance might work, but you need to be sure you have storage figured out so that when you update your stack to add a new service to the stack, the initial service will get the data it was initiated with. (Since when you restart a stack to add the new service, the old service will also get restarted. If I'm remembering what I found correctly.)

Then there's updating services/replicas. You cannot have Swarm kill a service/replica until after the replacement is actually up and running. Which means you'll need to create a new volume every time you need to upgrade, otherwise you'll end up with two instances of your app using the same data.

To complicate things, as far as I can tell, Swarm doesn't yet support CSI plugins. So you're pretty much stuck with local or nfs storage. If you're using local storage when deploying new replicas/services, you better hope the first replica/service starts up on same node it was on before...

All that combined means I haven't figured out how I can run a Galera cluster on Swarm. Even if I use a service per instance, updates are going to fail unless I do some deep customization on the Galera image I'm using to make it use unique data directories per startup. Even if I succeed in that, I'll still have to figure out how to clean out old data... I mean, I could manually add a new volume and service, then manually remove the old volume and service for each instance of Galera I'm running. But at the point, why bother with containers?

Anyway, I'm pretty sure I've done my research and am correct on all of this, but I'd be happy to be proven wrong. Swarm/Portainer/Traefik is a really really nice stack...

[0] https://github.com/moby/moby/issues/31333 [1] https://github.com/moby/moby/issues/43937



If you are interested in making this work with any of the constraints, I am sure that there is a way to work around all these issues.

About [0]/[1]: I guess you are right in this not working out of the box, but this could possibly be worked around with a custom entrypoint that behaves differently on which slot the task is running in.

> (Since when you restart a stack to add the new service, the old service will also get restarted. If I'm remembering what I found correctly.)

Are you sure the Docker Image digest did not change? Have you tried pinning an actual Docker Image digest?

> Then there's updating services/replicas. You cannot have Swarm kill a service/replica until after the replacement is actually up and running. Which means you'll need to create a new volume every time you need to upgrade, otherwise you'll end up with two instances of your app using the same data.

Is this true even with "oder: stop-first"?

> To complicate things, as far as I can tell, Swarm doesn't yet support CSI plugins. So you're pretty much stuck with local or nfs storage. If you're using local storage when deploying new replicas/services, you better hope the first replica/service starts up on same node it was on before...

True, but there are still some volume plugins that work around and local storage should work if you use labels to pin the replicas to nodes.


Finally have time to look into your suggestions. Hopefully you check your comments every once in a while...

> Are you sure the Docker Image digest did not change? Have you tried pinning an actual Docker Image digest?

Mostly sure. Many of my tests only changed the docker-compose file, not the actual image. So even though GitLab was rebuilding the image, the image digest would not have changed. I'll try to find time to pin the digest just to double check, though.

> Is this true even with "oder: stop-first"?

Er, did you mean "order"? I only see `--update-order` as a flag on the `docker service update` command. I do not see it in the docker-compose specification. So far all my tests have been through Portainer's stack deployment feature. So all changes are in my docker-compose file.

Maybe it would just work if I stuck it in the deploy.update section? I'll try.

> True, but there are still some volume plugins that work around and local storage should work if you use labels to pin the replicas to nodes.

I have tried pinning specific services to specific nodes to make local storage work. And I've use labels to force only one replica per node when using replicas.

What volume plugins are you thinking of? I haven't found any that seem to be maintained outside of local storage and nfs. And maybe some that would work if I were in some cloud host...

Anyway, thanks for giving me a couple things to try. :)


About the order. I mean the order in deploy

``` deploy: mode: replicated replicas: 3 update_config: order: stop-first parallelism: 1 rollback_config: order: stop-first parallelism: 1 restart_policy: condition: on-failure ```

For the volume plugins:

We are using Hetzner, and this one works great: https://github.com/costela/docker-volume-hetzner . Also, there exists one for glusterfs (https://github.com/chrisbecke/glusterfs-volume).


Thanks!

I also found the docs: https://docs.docker.com/compose/compose-file/deploy/#update_... Not sure how I missed that when I was looking at it before... :\

I'll look into those plugins.


I learned about a lot of things by watching videos by Bret Fisher. He has a lot of good resources on running Docker Swarm




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: