Build Infrastructure - The good, bad, and ugly

Build Infrastructure - The good, bad, and ugly

Summary

This is a "full disclosure," generally technical, mostly explanatory post about our build infrastructure.

Our transparency report shows Amazon AWS EC2 instances and x86-64 hardware.

This is for 2 reasons (TLDR)

  • x86-64 is required to build Armbian from sources
  • Amazon EC2 has arm servers available that can help us better load balance our current builds bringing more stability to our build infrastructure and while also reducing our administrative overhead

Some Rough Information...

about our build infrastructure.

  • It takes 2-4 hours to build Plume on our current build lollipops
  • There is a usb bug that can cause Plume arm builds to fail at different points without warning
  • It takes ~30 minutes to build Plume on Amazon AWS EC2 arm instances
  • EC2 builds of Plume are stable
  • It takes ~100Gb of base storage to build Armbian
  • Armbian storage can easily grow well above 200Gb
    • We had ~200Gb of storage used by Armbian builds because cleanups were failing and caching was set a little too aggressive
  • It takes at least 8 hours to build Armbian on our current build lollipop
  • Armbian builds average 10-14 hours on our current build lollipop
  • It takes about 5 hours to build Armbian on Amazon AWS EC2 x86-64 instances. It's usually faster
  • Each build lollipop costs 200-300 dollars in hardware and shipping costs to deploy (please note: this does NOT represent a typical hardware deployment)
    • Orange Pi PC or Orange Pi PC Plus board or Intel NUC
    • 250Gb Samsung T5 SSD
    • 500Gb Samsung T5 SSD
    • Noctua 40mm fan
    • 3d printed case
  • We build a lot of software via Jenkins beyond the above. See http://docker.lollipopcloud.solutions/ and http://dl.lollipopcloud.solutions/ for a sample of our build infrastructure output
  • We run our backups and website deployments through our build infrastructure for consistency

What This Means

While the builds named above run... our entire build infrastructure can be tied up for 4-12 hours. Between the data transfers, builds and more it's a lot for the little bit of hardware we have on hand. It works but at the cost of time and stability.

We've logged into Jenkins to check the status of builds and seen a queue 12 items deep. Granted some of this was small stuff but the bigger jobs like Armbian, Plume and others were causing a deep queue of builds to form.

It gets even worse as we are testing new deployments, fixing and re-running failed jobs and similar day to day tasks.

Clearly we need to expand.

Why x86-64 and EC2?

x86-64

Simply put: it's required for some of our builds. Armbian in particular can only build on x86-64 hardware without major changes. Given the nature of changing and basically overhauling the existing, well developed Armbian build process; we've opted to just use x86-64 hardware. The great news is you can use a little Intel Atom compute stick to build Armbian if you choose. We started out with an Intel Atom compute stick and ultimately out grew the hardware. We're currently investigating our options for bringing a new x86-64 build box online. It will probably be more powerful than a typical lollipop due to the hardware requirements of Armbian builds. Right now we've successfully tested EC2 based builds for Armbian and they are much faster and more reliable.

We don't like deviating from "tiny computers" much but we also want to ensure we provide our users quality builds in a timely manner. Unfortunately this means we're likely going to be stuck with bigger x86-64 hardware for Armbian and possibly other builds. We are investigating but this is the most balanced path forward we've been able to find. See the Time, Stability, Cost section below for some additional details.

EC2

EC2 is a temporary option we're using for our build infrastructure. We didn't choose EC2 lightly. Our build infra can barely keep up with all the builds (what we and our contributes use day to day) and we were forced to find a way to grow our build infrastructure quickly. EC2 ended up being the cheapest, fastest path to grow our build infra. It's a temporary solution while we investigate hardware purchases and other options that will work better for our project over the long term. By using EC2 we're incurring extra costs every time a large build runs and we'd much prefer to move to dedicated hardware. We just don't know what that additional hardware will be presently.

The silver lining with EC2 is that our big builds run really, really fast and are stable again. As well the rest of our build infrastructure.

Time, Stability, Cost

Some other concerns we've had over the last several months are time, stability and cost. We are a project made up of volunteers and our time is limited. We also much prefer our build infrastructure to be "hands off" and very stable. And cost: well, we prefer to save money as much as possible.

These are very difficult concerns to balance while also remaining true to the project ethos of "you can self host on inexpensive, quality, mini-computers." The build infrastructure in particular has pushed this ethos to the absolute limit. We can run our build infrastructure purely on mini-computers but at the same time the costs associated with this approach are very high. It also increases our time requirements by 10x or more in some cases. Never mind our build infrastructure can be de-stabilized quickly if we aren't careful about managing our builds (quantity, scheduling, etc).

We are committed to our project ethos and that won't change. However, we will have to make concessions for the build infrastructure to help balance the triangle of time, stability and cost. Concessions like deploying a "big" x86-64 computer to handle our Armbian builds so they don't take over 10 hours to complete and (likely) a "big" arm server to handle more concurrent builds than a mini-computer can handle. last month has been very difficult for us when it comes to the build infrastructure. We want to continue to grow our offerings but we're quickly finding the limits of our free time, spending capacity (cost) and the stability of our build infrastructure.

Please know that this is ONLY for the build infrastructure. Our publicly facing services like our website, chat, ActivityPub instances and more will remain on computers that are similar in specs to what we officially support.

The Future

About 1 year ago we launched Lollipop Cloud. In that year...

  • Our community has grown
  • We have active users
  • We have community members who have remixed our approach for their own needs

This is more than we ever expected and are grateful.

Going forward we intend to keep adding services, improving our documentation and ensuring our build infrastructure is stable. We hope the above helps explain why we will likely have "big" lollipop hardware deployed very soon.