• 1 Post
  • 24 Comments
Joined 8 months ago
cake
Cake day: November 15th, 2023

help-circle

  • Of course the problem is solved, but that doesn’t mean that the solution is easy. Also, distributed protocols still need to work on top of a complicated network and with real-life constraints in terms of performances (to list a few). A bug, misconfiguration, oversight and you have a problem.

    Just to make an example, I remember a Kafka cluster with 5 replicas completely shitting its pants for 6h to rebalance data during a planned maintenance where one node was brought offline. It caused one of the longest outages to date with the websites which relied on it offline. Was it our fault? Was it a misconfiguration? A bug? It doesn’t matter, it’s a complex system which was implemented and probably something was missed.

    Technology is implemented by people, complexity increased the chances of mistakes, not sure this can be argued.

    Making it harder to identify SPOF means you might miss your SPOF, and that means having liabilities, and having anyway scenarios where your system can crash, in addition for paying quite a lot to build a resilience that you don’t achieve.

    A single instance with 2 failure scenarios (disk failure and network failure) - to make an example - is not more fragile than a distributed system with 20 failure scenarios. Failure scenarios and SPOF can have compensating controls and be mitigated successfully. A complex system where these can’t be fully identified can’t have compensating control and residual risk might be much harder. So yes, a single disk can fail more likely than 3 disks at once, but this doesn’t give the whole picture.



  • I wish it worked like that, but I donct think it does. Connecting clouds means introducing many complex problems. Data synchronization and avoiding split-brain scenarios, a network setup way more complex, stateful storage that needs to take into account all the quirks and peculiarities of all services across all clouds, service accounts and permissions that need to be granted and segregated for all of them, and way more. You may gain resilience in some areas, but you introduce a lot more things that can fail, be misconfigured or compromised.

    Plus, a complex setup makes it harder by definition to identify SPOFs, especially considering it’s very likely nobody in the workforce is going to be an expert in all the clouds in use.

    To keep using your simile of the disks, a single disk with a backup might be a better solution for many people, considering you otherwise might need a RAID controller that can fail and all the knowledge to handle and manage a RAID array properly, in addition to paying 4 or 5 times the storage. Obviously this is just to make a point, I don’t actually think that RAID 5 vs JBOD introduces comparable complexity compared to what multi-cloud architecture does to single-cloud.







  • Is that what you get with Cloud? Because there are still a million ways to shoot yourself in the foot. The main difference is that the single genius doesn’t need to implement things him/herself, but decisions still need to be taken and fragile setups can still be built.

    Imagine an ec2 instance in a satellite account performing some business critical function with an instance role, whose custom IAM policy allows to do it in another account. Clouds are not giving you good engineering, they are giving you premade building blocks, you can absolutely still make a mess with those. Even more, the complexity and the immense portfolio of features can allow very creative ways to build very low-quality systems.

    I think you can have good, boring, simple systems built by engineers. With or without Cloud services.




  • Thanks!

    But I have to think the massive costs of cloud junk also pay a role in stuff like a calendar charging double digit annual fees for something that takes very little storage and very little computation (and you of course can’t just buy software any more).

    Absolutely agree. I did not even think about this aspect, but I think you are absolutely spot on. Building something with huge costs is something that ultimately gets passed to the users in addition to the rent-seeking aspect.

    I have no words for multi-cloud.

    You and me both. I have to work with it and the reality is, there is nobody who actually understands the whole thing. The level of complexity (and fragility, I might add) of it all is astonishing. And all of this to mitigate some (honestly) low risk of downtime from the cloud provider. I have lobbied a little bit against at work, but ultimately it has become a marketing tool to sell to customers, so goodbye any hope of rational evaluation…








  • but that also shows that most modern software is poorly written

    Does it? I mean, this is especially annoying with old software, maybe dynamically linked or PHP, or stuff like that. Modern tools (go, rust) don’t actually even have this problem. Dependencies are annoying in general, I don’t think it’s a property of modern software.

    Yes, that’s exactly point point. There are many options, yet people stick with Docker and DockerHub (that is everything but open).

    Who are these people? There are tons of registries that people use, github has its own, quay.io, etc. You also can simply publish Dockerfiles and people can build themselves. Ofc Docker has the edge because it was the first mainstream tool, and it’s still a great choice for single machine deployments, but it’s far from the only used. Kubernetes abandoned Docker as default runtime for years, for example… who are you referring to?

    Yes… maybe we just need some automation/orchestration tool for that. This is like saying that it’s way too hard to download the rootfs of some distro, unpack it and then use unshare to launch a shell on a isolated namespace… Docker as you said provides a convenient API but it doesn’t mean we can’t do the same for systemd.

    But Systemd also uses unshare, chroot, etc. They are at the same level of abstraction. Docker (and container runtimes) are simply specialized tools, while systemd is not. Why wouldn’t I use a tool that is meant for this when it’s available. I suppose bubblewrap does something similar too (used by Flatpak), and I am sure there are more.

    Completely proprietary… like QEMU/libvirt? :P

    Right, because organizations generally run QEMU, not VMware, Nutanix and another handful of proprietary platforms… :)


  • Most of the pro-Docker arguments go around security

    Actually Docker and the success of containers is mostly due to the ease of shipping code that carries its own dependencies and can be run anywhere. Security is a side-effect and definitely not the reason why containers picked-up.

    systemd can provide as much isolation a docker containers and 2) there are other container solutions that are at least as safe as Docker and nobody cares about them.

    Yes, and it’s much harder to achieve the same. In systemd you need to use 30 different options to get what using containers you achieve almost instantly and with much less hussle. I made an example on my blog where I decided to run blocky in Systemd and not in Docker. It’s just less convenient and accessible, harder to debug and also relies on each individual user to do it, while with containers a lot gets packed into the image and therefore harder to mess up.

    Docker isn’t totally proprietary

    There are a many container runtimes (CRI-O, podman, mirantis, containerd, etc.). Docker is just a convenient API, containers are fully implemented just with Linux native features (namespaces, seccomp, capabilities, cgroups) and images follow an open standard (OCI).

    I will avoid comment what looks like a rant, but I want to simply remind you that containers are the successor of VMs (virtualize everything!), platforms that were completely proprietary and in the hands of a handful of vendors, while containers use only native OS features and are therefore a step towards openness.