Thanks for taking the time to share your experience. I work on GitHub Actions and am familiar with the limitations you're experiencing. Below is more info on where we're at with each of these issues, using the numbers in your Table of Contents. Also, we have a lot of other new things planned for Actions as you can see in our roadmap. https://github.com/github/roadmap/projects/1?card_filter_que...
2.1 We're starting work on this in the next couple of months. We plan to ship it in early 2022.
2.2 We want to speed up the pace of GitHub Enterprise Server releases, but I don't have more info to share.
2.3 We're looking at ways to not require a GHEC account or "unified" license.
2.4 The limits are much higher with the GitHub hosted runners, but this is a current limit of self-hosted runners.
3.1 It's on our backlog. No date to share.
3.2 I haven't heard this before and thanks for sharing the scenario. We'll think about it.
3.3 This will ship in October.
3.4 We're doing some performance optimizations for GHES 3.4 that should fix this.
3.6 We have a couple API improvements coming later this year.
3.7 We're looking into this, but no dates to share.
We're dedicated to making Actions a great experience. As you would assume, I'm very excited about the future of Actions and getting feedback like this helps us make it better.
3.5 I was excited to see action composition until I realized that the composed action will still be showed as one step and that it can just compose actions instead of whole workflows.
I want to share whole jobs across my repos, including matrixes and all. I could wrap some common workflows into one action, but then the whole complexity will be hidden in a single step.
So I resorted to creating GHA Templates and manually synching them with a CLI tool[1], which isn't great, but at least I don't have to copy-paste changes across repos and I can keep a "main version" of the workflow
Related to 3.3 (You can’t restart a single job of a workflow), is there any reason why the numeric id of a specific job run isn't available anywhere? There's GITHUB_RUN_ID, but that only identifies the workflow run, not the individual jobs, and this isn't unique across run/job restarts. Services like https://coveralls.io/ need an actually unique job run id, and they could also use the numeric id to link to the specific job run in a build matrix.
No, there's no reason why that isn't available. This is a very sensible request and we'll add this information to the context available to workflow runs.
I love GitHub Actions, so much so I’ve published lots of Actions on the Marketplace. I love that I can contribute to better pipelines for all.
That said, I have frustrations with the management of the GitHub Actions repositories: which is to say, there isn’t any. Most issues and Pull Requests languish, without acknowledgement or engagement.
I appreciate that managing issues and contributions is a full time job, and your team probably doesn’t have capacity right now: if you could magic up some capacity, that would be great, but… a more realistic request would be to evaluate capacity and follow Terraforms lead in being open about pausing contributions —- rather than READMEs that encourage contributions, which (despite the best of intentions) will never be reviewed.
Regarding 3.5 (actions calling other actions) – please continue investing in composite actions. They are a well-balanced abstraction for managing workflow complexity. I can see how they could become too powerful and encourage bad patterns (YAML programming), but that's a good problem to have.
We have recently switched to GitHub actions and in addition to the above, there are two others which impact us:
Sometimes the checkout action fetches the wrong commit! The SHA in the GitHub environment variable says one commit, but the actual code is different(!). Because we don't know why this happens we basically need to do an unshallow fetch of the whole repo to be sure we have what we expect.
Using autoscaling self hosted runners, it is not currently possible to instruct the agent to finalise the current job but to accept none after it. This is essential to avoid broken workflows while scaling in. Gitlab supports this via a signal, but there is no equivalent.
> Sometimes the checkout action fetches the wrong commit!
I'm a product manager at GitHub, and I'm investigating this now. Feel free to drop me an email (ethomson@github.com) if you can share more information about the workflow runs that were affected by this.
> Using autoscaling self hosted runners, it is not currently possible to instruct the agent to finalise the current job but to accept none after it.
We're building this functionality out right now, so that you can have self-hosted runners that will run a single job and we will let you subscribe to webhooks that will let you scale up/down your fleet of runners.
From an outside perspective it looks like neither of them is going away anytime soon and that they just want you to use the one that you feel more comfortable with.
It's pretty clear that Azure DevOps is going the way of the dodo. They haven't even finished migrating their "old" pipelines to the "new" pipelines, and I'm already seeing some things that are GitHub Actions only.
They never officially killed Silverlight so I doubt they'll kill Azure DevOps either. They most likely will at some point deemphasize Azure DevOps to favor GitHub Actions tho.
I wouldn't be surprised if Azure DevOps still pops up in random places for decades to come or if they decide to "unify" the two products without changing the underlying codebases (like OneDrive for business actually using Sharepoint under the hood).
Unlike Google, Microsoft seems to commit to legacy software to a fault. As long as there are enterprises paying good money for support, it'll probably not go away completely.
EDIT: Microsoft only stopped licensing OEM Windows for Workgroups 3.11 in 2008 (15 years after the first release of WfW 3.11). Apparently it was still being used by some embedded device manufacturers: https://docs.microsoft.com/en-us/archive/blogs/jcoyne/its-th...
In the company I am working we’ve decided to use Github actions for our CI pipelines instead of deploying any “on prem” solution. I’ve worked a lot in the past with Jenkins and Travis and I also played a bit at home with Github actions. Now that I am using it in real world scenarios I have to say that I am a bit disappointed. In my opinion, as soon as you try to do something a bit complex you end up having to implement some nasty hack. Also I found that the Github Actions marketplace is a bit of a sink. You have to spend a good amount of time browsing it for finding something decent that hasn’t been discontinued, or is a pointless fork or at least is actively maintained. This happens even for basic functionalities.
I known is a fairly recent platform but I was expecting much more compared to what other services offer.
In my experience GHA far outclasses Jenkins and Travis. After GHA released composable actions (https://github.blog/changelog/2021-08-25-github-actions-redu...), there's not much of a comparison to be made. GitLab is more on par, but the docs are less accessible and they don't seem to have a long-term development strategy the way GHA does.
As far as the limitations in the blog post, they are real, but most of them are limitations in features that are not available to begin with in Jenkins or Travis.
> GitLab is more on par, but the docs are less accessible and...
Interesting - I find GitLab docs easy to use while I hate GH ones. It takes a long time for me to find the relevant part, even more to determine that this is really all there is, and then some extra to guess which interpretation of the text suits the reality. It has never occured to me that someone might prefer GH docs to GL ones. :)
Composable actions… which must be kept in an entirely separate repository. Yes they technically have the feature now, but that is a really poor showing. I won’t be using it for anything because it takes the CI code away from where it’s being run. It’s a niche for people publishing generic actions eg a language specific cache, but most people are not in the business of producing actions for the marketplace. And I don’t like that it’s one repo per action.
I use a lot of composite actions in the same repo as well as composite actions across repos.
Eg. `workflow1` and `workflow2` both call `composite/action`:
.github/workflows/workflow1.yml
.github/workflows/workflow2.yml
.github/actions/composite/action.yml
The only missing bit in that is a bit more support in a composite action: `if` and a few other keywords. Also, it's a bit annoying to access private across across repos.
First, thank you. I have been looking for a way to do this and came to the comments to see if there were workarounds to use composite actions without putting them in a separate repo.
Which brings me the second comment... this is proof the docs suck. I have looked and looked for any evidence that this exact scenario was possible. There are no examples or documentation of this feature anywhere that I could find.
Absolutely agree. Docs are either perfect (5%) or…completely obscure (95%). I can't tell the difference between Github Actions and API documentation in places, but I desire functional examples. No clue where I built it up, but I'm comfortable now…lots of trial and error.
For a lot of Github Actions, I actually follow the Github Roadmap through to implementation as the PRs actually have good examples in them… https://github.com/github/roadmap/projects/1
Nope, I have a repo consuming 10+ workflows, 3 shared actions, and a handful of private workflows (the way private workflows sucks now though, don't recommend–basically just a git clone into your Github Action).
This is all non-Enterprise, private repos, no clue in GHEC/S.
On par? Gitlab CI is lightyears ahead of GHA and shows no sign of stopping: child pipelines, dynamic pipeline generation, very flexible conditionals, composable pipeline definitions, security, merge trains.
GHA is very basic, even simplest thing like individual job restart is not implemented.
However, having also used Buildkite, I'm not sure I could go back to GHA - it's slightly nicer to target, and at least as large a step towards reliability & predictability as Travis -> GHA was.
I use Buildkite all day as part of my job, and I have to say, it's amazing. The company is great to work with, the product is highly flexible for a variety of complex build workflows teams want to implement, and has scaled beautifully for 1000+ concurrent build machines split between on-prem and cloud. No affiliation, just a customer.
My company uses GitLab CI/CD, and I've tried to set up GitHub actions on a personal repo. I've found GitLab's docs to be excellent. GitHub was a bit more confusing, but I got there in the end.
This is why I like GitLab much more. They implemented a proper CI.
I think what GitHub was going for is to create some kind of ecosystem for proprietary 3rd party applications and use that as a revenue. That approach only crippled the whole functionality.
Maybe it's just me, but I refuse to use third party actions in our build and deployment steps. It seems like such an obvious and avoidable attack surface to me, and one where an attached can have root on our production instances and at least some access to all developer machines.
You can solve that by pinning commit hashes, so that nobody can change the code of the actions you use without your consent. You can then use Dependabot to automatically get PRs to update to the latest version of each action when it comes out. You still get the chance to review each PR before it goes in.
I wish GitHub would implement a security setting requiring repositories to do this within an organization.
To be honest I've always found the best approach with most of these systems is to checking your magic as shell-scripts inside your repository.
Then you're much much more portable. Want to run tests? Run ".ci/tests.sh", want to generate artifacts "make", or ".ci/build.sh".
All systems, be they github actions, jenkins, gitlab-runners, and everything else allow you to clone/update your repository and run something from within it. Which keeps things mostly portable.
I put together a simple github action a long time ago, but now of course I realize it is overkill:
I strongly agree. Another benefit with having whatever yaml based CI system simply run your basic shell scripts is that you can actually test your CI runs without a full build cycle.
The biggest limitation I'm hitting with Github Actions right now is that there's no real support for queueing jobs up. Github actions do support a 'concurrency group' primitive that will prevent two actions in the same concurrency group from running at the same time, but this only allows you to have one item executing and one item queued up. If you try to queue up another one, the first queued item will be cancelled.
In our case we've got some cypress tests that we want to run on a specific on-premise server every time we create a pull request. They take about 20 minutes to run, and we're creating a lot of pull requests, so you have to carefully check what github actions are executing before you create a pull request or push new changes to it. I'd love support for proper queues like what teamcity and other CI systems have.
- only saves cache at the end of a workflow, and only if the workflow succeeds. This could be solved with CircleCI's approach of having a save-cache step and a restore-cache step.
- Cache is super slow for self-hosted runners, so it makes more sense to have a local cache instead of using the action
- only 5gb of storage size. This was supposed to be increased via billable cache storage[1], but it's been on the backburner since July 2020
- In addition to the above, you can't use a different storage backend on the official action (which would allow storing over 5GB of cache via your own storage). The best workaround is to use a user-provided action which utilizes the s3 api[2].
Caching is a joke. But then again, comparing GHA to GitLab CI is like comparing a pogo stick to a bicycle. Sure, both can be used to get there, but one ride is a bit... bumpy.
I'm heavily using cache on self hosted runners that are running in ec2 machines on aws. It's my experience that the cache isn't "super slow".
Because of a bug (in my action) I was pushing/pulling gigabytes of data into the cache. On large files I was seeing ~125MB/s (that's Byte) download speeds.
I haven't used the GitHub hosted runners much so I'm not sure how it compares though.
The hosted runners seem to be very varied in performance. Just today an actions/checkout action on GitHub's hosted Linux box went down to 1 megabit per second and was going to take 30+ minutes to clone a 3gb repository. Re-running the workflow an hour later had it running fine.
Wow, reading this and all the comments makes me realise GHA is really far behind Gitlab CI. They've at least been playing catch-up since the MS acquisition, having stagnated for years before that.
I really wonder why would anyone self-host GitHub. Gitlab has a much more feature rich, mature and cheap ( there's a perfectly usable free version) offering. Yeah, someone might prefer Github's UX, but is it really worth it to pay for a worse product?
GitHub’s UX is worth it, IMO. GitLab tries to do too many things and it’s really hard to get my bearings (it’s like Azure DevOps’ problem times 10). And if GitHub Actions doesn’t work, it’s really easy to integrate third party CI solutions.
> GitHub’s UX is worth it, IMO. GitLab tries to do too many things and it’s really hard to get my bearings (it’s like Azure DevOps’ problem times 10).
That's subjective. Gitlab does many more things, but nobody is forcing you to use them, and unused menu buttons don't matter much for the main workflows ( commits, MRs, repo visualisation, wikis, CI/CD).
Yeah I liked Gitlab when we used it for everything, but once we started to integrate with other services (Jira etc) instead of the built in functionality it felt like you were fighting how they had designed it.
We've put a lot of effort into our integration with Jira. The docs [1] detail how it works and we have a blog [2] with some screenshots so you can see what it looks like in action.
> I really wonder why would anyone self-host GitHub. Gitlab has a much more feature rich, mature and cheap offering.
before i understood your comment i was going to reply "security, data security & control" -- as a reason orgs might choose to self-host vs outsource hosting SaaS github.
But you're probably arguing for self-hosted gitlab over self-hosted github.
Good question! i've worked somewhere with self-hosted github, but they were using it prior to github actions / workflows, so it was a combination of self-hosted github and N different CI and deployment automation tools ( bamboo! GoCD! udeploy! jenkins! google cloud build! poor bloody operator manually copying deployment payload onto jump box as the bureaucracy won't approve automation!)
That's usually the wrong question, as people only make greenfield decisions once.
If, however, you've been running GHE for years and have it integrated with other tooling and workflows, and your developers understand it, the question is,
"Why would anyone rip all that out, replace it, reintegrate it and retrain everyone over a handful of features, most of which you don't care about?"
"Because there's one feature we really care about".
That's why we migrated to GitLab CI. We made some pretty complicated pipelines, with tens of stages, many of them dynamic, and it worked with minimal hassle. It was a great success story internally.
> "Why would anyone rip all that out, replace it, reintegrate it and retrain everyone over a handful of features, most of which you don't care about
Functioning CI/CD for free is certainly a huge feature to care about. For a user moving between Github and Gitlab is at worst a slight annoyance, the UX and DX are pretty similar; most third-party tooling that isn't github.com ( SaaS) only supports both Github and Gitlab self-hosted (i'd even wager there are more tools that support self-hosted Gitlab than Github, the first is much more popular).
Holy shit wat? Gitlab CI is a clusterfuck that is impossible to use for all but the most simple use-cases. We have a monorepo at work, and GitLab CI never fucking works. Here are some examples:
- Fallback cache keys don't work (because it doesn't compose with cache changing cache ids). So each time someone clears the runner caches, the fallback cache stops working.
- `if`/`rules` and `needs` don't work together, and cause the build to fail with a *yaml build failure*. Wat? This is even a documented failure mode now. This is a huge issue for monorepos.
- gitlab-runner is full of weird behaviors, like `gitlab-runner unregister <runner-name>` fails if that runner was already deleted (through the gitlab UI), but won't remove it from the local config. But `gitlab-runner` has a separate command, gitlab-runner verify --delete, which does just that...
It feels like every time I use it, I run into a bug. I've heavily used github actions in many projects of similar sizes, and have yet to come across a single bug.
A monorepo isn't "all but the most simple use-cases", it's usually a fairly complex usecase, and Gitlab have a myriad of ways to make monorepo CI easier - dynamic pipelines, remote triggers, includes, etc.
> `if`/`rules` and `needs` don't work together, and cause the build to fail with a yaml build failure. Wat? This is even a documented failure mode now. This is a huge issue for monorepos
Could you elaborate on this? I have a fairly complex pipeline which has rules and needs, and it works without any yaml errors.
> Could you elaborate on this? I have a fairly complex pipeline which has rules and needs, and it works without any yaml errors.
Sorry, it was with `only:changes` and `needs`. Take a look at this issue[0] and this pipeline[1]. I've failed to find the failure mode in the documentation, so I suppose it may have been fixed since then - but we've developed an in-house workaround in the meantime that I'd trust a lot more than anything coming from Gitlab.
--------
> A monorepo isn't "all but the most simple use-cases", it's usually a fairly complex usecase, and Gitlab have a myriad of ways to make monorepo CI easier - dynamic pipelines, remote triggers, includes, etc.
And every single feature you've mentioned here has a bug when combined with something else. That's the problem with Gitlab CI: everything works in isolation, but nothing composes properly.
Take includes: they don't work with anchors, so you couldn't have a generic template rules in the "main" file getting reused in the included files. This makes sense though! Anchors are a yaml feature. So gitlab added their own pseudo-anchors, called `extends`. You'd assume a smart, context-aware merge to happen, but no! Gitlab decided to go with a dumb object merge. Because the `script` step is a list of string, if both the parent and the child specify a `script`, only the child's will be used! Gitlab has a `before_script` step which can be used to workaround the issue for single levels of inheritance, but anything more complex ends up in a dead end. This feels like a feature that's been bolted on without any sort of design work.
Monorepo case is solved by top level job dispatching based on path to a child pipeline, No need to make a mess of rules in a singe gitlab-ci.yml file. If child pipelines need common steps, they can be neatly included
GitLab team-member here. Obviously coming in with a lot of bias, but I wanted to address how each point relates to GitLab CI/CD’s view of the world. I’m also thinking about writing a longer post with more details as I have a lot of thoughts (™) about this topic.
2.1 Caching isn’t available: GitLab has this everywhere.
2.2 GitHub Enterprise Server is behind GitHub Enterprise Cloud: GitLab ships the same code to GitLab.com as it does to our self-managed customers. This was a tough decision but has a lot of benefits...the central being feature parity and scalability for self-managed folks
2.3 Using Public GitHub.com Actions: This is a symptom more than the problem itself - relying on third-party plugins for build jobs is scary, and leads to many of the same issues we’ve seen in the Jenkins ecosystem - easy to get started, hard to maintain.
2.4 Dockerhub pull rate limiting: for self-hosted runners, you can use a registry mirror or Dependency Proxy to reduce your number of pulls from Docker Hub. The key is the entire platform has to be there to enable the right workflows.
3.1 No dropdowns for manually triggered jobs: GitLab also doesn’t have drop downs, but does have the ability to pre-fill these values.
3.2 Self-hosted runner default labels: I think this is also more of a symptom than a problem. 3.3 Being able to tag and use runners for specific tasks is key - so I understand the frustration and we’ve spent a lot of time on this.
3.4 You can’t restart a single job of a workflow: You can do this with GitLab.
3.5 Slow log output: I haven’t seen this be a problem, and is a benefit of our scalability features being built into the self-managed code.
3.6 You can’t have actions that call other actions: There are lots of ways to relate pipelines (parent/child, triggers. etc.) in GitLab.
3.7 Metrics and observability: The GitLab runner has Prometheus build in, and the dashboards we use to manage GitLab.com are partially public: https://dashboards.gitlab.com
3.8 Workflow YAML Syntax can confusing: This can be really hard to get right. I learned to stop worrying and love the YAML long ago, and I know we’ve got through a lot of iterations to try and get this right.
I'd love to know where folks think I got this assessment wrong. And is there value in writing more about it?
Also, scheduled jobs are a joke, they are routinely one to several hours late. I have to use the API to manually trigger jobs (workflow_dispatch) whenever I need scheduling that is remotely time sensitive.
This was the first thing that came to mind for me too. I don't need it to be exact, sure, but the scheduled jobs are more than an hour off more often than not. Sometimes several hours as you mentioned. Frequently they would also just not run at all!
It was so bad I ended up scheduling my cron jobs to run 4 - 5 hours early and just sit there idling until the actual execution time came around haha.
Ultimately not running at all started to be a real pain point, I gave up and paid $5 for a monthly box to sit there and run crond to save on the headache.
I have a few suspicions that it's intentionally done this way for balancing demand spikes and that jobs scheduled for the top of hour are worst affected.
GitHub Actions are a fantastic experience for serverless applications. I am working on a serverless project where we use GitHub Actions exclusively for CI/CD as well as running automated tests. We rely heavily on Lambda, S3, and DynamoDB. Our client app is static JS files we serve over Cloud Font. GitHub Actions make our piplines accessible to any developer on the team. Since we only pay for what we use with our serverless infrastructure, we can even deploy each pull request to the cloud rather inexpensively and leverage GitHub's environments to help manage the cleanup for us. This allows our team members to review and test changes in their browser before we pull them into our development branch. We additionally can run Playwright E2E tests to verify that none of our critical user workflow scenarios have broken resulting from the PR changes. I love this development experience and would have a hard time going back to anything else.
thank you for sharing -- as i was reading through i tried to understand what about the workflow was specific to github actions vs other CI automation.
> GitHub Actions make our pipelines accessible to any developer on the team
Do you reckon this accessibility is a combination of (i) storing the pipeline definitions in the application's source repo, where application developers can find them easily, not hidden/scattered elsewhere in other repos or behind management UIs, and (ii) a relatively simple and documented pipeline syntax?
The first example I can think of a tool that supported this workflow was Travis CI ~ 2011 - 2012. Appveyor offered similar capabilities quite early as well. Same workflow can be done with Gitlab, Google cloud build.
> we can even deploy each pull request to the cloud rather inexpensively and leverage GitHub's environments to help manage the cleanup for us. This allows our team members to review and test changes in their browser before we pull them into our development branch
Yeah, this kind of workflow is great. Another way this kind of workflow can be done is to create simple command line tools that developers can use to create and destroy temporary test environments running their speculative changes. In some cases, for rapid experimentation, it can be great to be able to spin up N temporary environments in parallel with different changes without tying it to pull requests. But I can see that tying the temporary environment lifecycle to the lifecycle of a PR might make it easier to share demos of proposed changes with reviewers.
Out of curiosity, how reliable do you find the environment cleanup is? I remember building a similar create-temp-environment / destroy-temp-environment workflow for ephemeral databases running in AWS RDS driven by jenkins pipelines. It took a few months of tweaking to figure out how to ensure the RDS databases got torn down correctly and not "leaked" even if the jenkins master or workers failed midway through pipeline execution. From memory we had a bunch of exception handling in a jenkins groovy scripted pipeline that would run on the master jenkins to try to do cleanup, and even that wouldn't work all of the time, so we had a second cleanup job on a cron-schedule to detect and kill leaked resources.
> Do you reckon this accessibility is a combination of
Yes, exactly. All of our build pipelines for a repository are included in the .github folder in the root of the repo. It makes it easier for team members to feel comfortable making changes and submitting a PR for them. You can setup an ACT container so you can test GitHub Action changes locally before pushing them too (see https://github.com/nektos/act )
> Out of curiosity, how reliable do you find the environment cleanup is?
So far, environment cleanup has been reliable, but I have noticed where it failed to cleanup some provisioned resources once in a blue moon. I blame this more on our code than GitHub Actions. I periodically review our sandbox environments to ensure we didn't miss deleting anything.
Actually, most points in the article are the basis on why we created BuildJet.
We initially tried to solve these annoyances by creating a CI with speed and the YAML config as a USP. We got 4x speed and a much better YAML config structure, but despite these improvements we noticed that it people had a mental barrier to migrate to a new unknown CI.
But like OP we always enjoyed the experience of using GitHub Actions, so with this in mind. We decided to build BuildJet for GitHub Action[1] that uses the same infrastructure but plugs right into Github Action as a "self-hosted" runner, which is automatically set up for you with OAauth. This resulted on average a 2x speed improvement for half the cost(due to us being close to the metal). Easy to install and easy to revert.
Just checked it out. I like the idea of BuildJet. Something dead simple like this is what I've been looking for.
The biggest problem is that I don't feel comfortable putting our IP out on a server managed by a small provider. If you had an offering where I could self-host, I'd be very interested.
BTW. You have a broken link to "Privacy" from the "Terms" page.
I think GH Actions is a pretty cool idea. I don't use them, myself, because, every time I count myself, I keep coming up "1."
When I left my last job, and started working on my own, I set up things like CI/D, JIRA, Jenkins, etc. These were the bread and butter for development in my old shop.
But they are "Concrete Galoshes"[0], and work very well for teams, as opposed to ICs. As a single developer, working alone, the infrastructure overhead just slowed me down, and, ironically, interfered with Quality.
When GH Actions were first announced (I can't remember, but they may have been beta, then), I set up several of them, on my busier projects. They worked great, until I started to introduce some pivots, and I realized that there was actually no advantage to them. I ran the tests manually, anyway, and the Actions just gave me one more thing to tweak. It was annoying, getting the failure messages, when I knew damn well, the project was fine. I'd just forgotten to tweak the Action. I introduce frequent changes, in my work, and that is great.
I don’t think you need to be in a team to get good mileage out of Actions.
For example. Some open source code I need to publish to package-indexes or marketplaces, and will have an action to do that whenever I publish a release, I’ve found it to be very useful and saves a lot of time.
I really like Github Actions but it still only feels appropriate for small projects. Unless I'm missing something, I didn't see a good mechanism for monorepos. I'm thinking in terms of there being shared pieces that get built/tested and then products that sit on top. The complication comes in test avoidance. I don't remember why I didn't like doing this all in one workflow with jobs (though it was going to require an orchestrator job setting variables to choose which downstream jobs to run). For chaining workflows/pipelines, when I looked, you could only trigger other workflows for master, defeating the point.
Among my small, open source work, probably my biggest complaint is actions running in forks. Wastes a lot of resources on their side and limits my concurrent runners for projects in my personal space. For companies, depending on the setup, this would eat their compute minutes.
Also annoying that PR actions can't post to the PR. I can understand there are security limitations but it makes it so a lot of nice features don't exist for most people.
"You can’t have actions that call other actions" - I think it's possible to use the repository_dispatch trigger described at https://docs.github.com/en/actions/reference/events-that-tri... for this - you'd need a separate GitHub personal access token, but using that it should be possible to trigger a workflow in any other repository you own from an API call in another action.
The issue is that once you add a PAT, you’ve imbued the build with your permissions. You can bound which permissions by call type (so, for example, you can make a token that can’t modify your profile info), but you can’t bound it by repository. So if you need the build to trigger write actions on one repo, now it has them for all repos.
Agreed, this is an infuriating limitation of PATs - I would desperately like to be able to create a PAT with much more finely grained permissions - can only trigger this single REST API for this specific repo for example.
You can also create a brand new GitHub user account, grant it access to just a single repository and then create a PAT for that user account - annoying but it does at least let you scope down the permissions a bit.
My experience with GHA is that it can be awesome for things like small projects that want to enforce linters, unit tests, etc.
Once you get into more complex things - like building docker images, storing into an artifact repository, baking amis, running integration or end to end tests, etc, it can be a pain.
It was a great place for us to start but we've since moved to BuildKite.
The only thing we've ever trusted GH actions with is enforcing check builds pass before a PR is allowed to merge.
Everything else is managed via a custom tool we use for packaging & deploying our product.
Even our simple "run this 1 build command and ensure exit code == 0" action seems to have a semi-weekly issue like stuck "waiting for status" and other unexplained failures throughout. We don't want to put any more eggs into that particular basket right now.
We've been loving GHA: CI/CD-as-code, pull requests capture deploy history (CI, stage/restage, deploy), and labeling a PR with 'release' is enough to generate our on-prem + multi-cloud artifacts.
Our main gotchas are roughly:
- GH-hosted runners have too little RAM/HD for big docker software. They push you to self-hosted runners for that, which is fine in theory, but GHA/Azure doesn't actually support serverless runners, so that falls flat in practice. We don't want to be turning machines on/off, that's GHA's job. We experimented with GHA -> Packer -> Azure for serverless, but it was slow and Packer frequently leaves zombie machines, so we went back to tweaking the low-RAM runners provided by our enterprise plan.
- Security: We want contactors etc. to be able to run limited GHA CI jobs and use that quota, but not higher-trust GHA CD ones. This is tricky at a configuration level. Ex: It seems like we'd need to funny things like main repo for CI w/ CI secrets, and a separate repo for CD w/ CD secrets, and only give untrusted folks access to the CD-cred repo. We've thought of other possibilities as well, but in general, it's frightening.
- Big Docker images: We do spend more time than I'd like messing with optimizing Docker caching as GPU containers are embarrassingly big (we use dockerhub vs github's due to sizes/pricing/etc), think both multi-stage containers + multi-step jobs (monorepo/microservices). I think they're in a good position to speed that up!
I'm optimistic about these, but tricky to align with MS/GH PM personal team priorities :)
Basically you can define deployment environments which have their own secrets and configure it so only authorized users can approve workflows which access those environments.
Ah this is a good step. I hadn't looked since the env var vulns earlier in the ~year. Add in GH runner domain/vpc safelisting and a lot of the basics get solved!
the first commenter was right on the env thing helping with ci/cd secret isolation
the open issue is that GHA env is vulnerable to typical npm/pip/conda/etc supply chain attacks. it lets them do funny network requests (both data exfil + integrity, and even quota wastage / DoS):
- GitHub hosted runners should allow inbound/outbound domain safelists, full isolation w internal repo safelisting, or something else like it
- and/or self-hosted runners should have an Azure serverless mode where the host owner can do their own presets. less good b/c trickier to maintain typical safelists
currently , GHA is default-allow for supply chain attacks =\
in theory folks can try docker-in-docker to limit network access, or the equiv lower, and layer on internal repos to prevent 0-days, but this gets gnarly quickly and more sanely handled by the managed CI env we are all paying for
I work on Flux, and for the reasons you mentioned we advocate for a strict separation between CI and CD, where CI is responsible for building and testing images and pushing out releases, and CD is responsible for production. There is no crossover between these roles, and this makes it very secure.
You use a policy CRD called ImagePolicy to declare what versions should be matched and deployed automatically. You can have them deployed directly to production, or if it makes you more comfortable with your contractors, to an "auto-PR" branch which simply queues them up for a release engineer with the required juice to approve and merge the changes to prod.
There is no deploy job in CI, since the deployment in prod is whatever happens to be on the main branch in the config repo. This is a protected branch which only releng can merge changes into, usually only when the release checks are satisfied. Flux (which scans any image repos) updates those manifests when there is a release, either directly or via pull request.
This strategy I think would be able to address your Security concerns regarding contractors, by reducing the responsibility of your CI system to only CI, and not to handle CD anymore.
This is the subject of my talk at KubeCon[1], although you might not be able to get that from the title, (I'm presenting this topic for a Jenkins audience, but the focus is on Flux and how Flux works) so I'm hoping it should be applicable to a broader audience, certainly inclusive of GitHub users :)
Yep -- and it goes deeper, even for CI. The CI/CD secrets vulnerability is just the most burning example.
Ex: Network segmentation for CI. We only expect CI to communicate with dockerhub, conda/pypi, and the CI service (ex: azure). Something similar for CD. That _should_ be settable via GHA, but isn't. In a world of weird npm/python scripts and github action marketplace... scary we can't. Self-hosted runners can in theory do this via custom network policies, but it's a PITA for something ~everyone should be doing out-of-the-box.
Edit: HN won't let me respond to the below. Imagine something like IP theft, wanting to mine our repo for everything labeled "security", dig into our version #'s for viable CVEs, or force-push some git commits (incl. history rewriting). Defense-in-depth says we shouldn't make that unnecessarily easy for anyone who plants a backdoor that runs during CI's `npm install` / `RUN xyz` phase. Network + RBAC are basically table stakes for almost everyone building enterprise software, so GHA doesn't have to reinvent the wheel here, just do it + make it friendly.
I think those problems solve themselves when you can say "CI doesn't need to communicate with production at all anymore" – the Flux controllers live on the cluster, or on one cluster which is in position to communicate with others on the network in case of multi-cluster. They do not have any outward facing interfaces (except one), they live on the Kube API. (The Flux API is the CRD within the Kubernetes API.)
The reason for CI to communicate with production is to handle deployments, right? So take away that responsibility, and now you can safely isolate GitHub Server and shunt it into a network that only has access to outside artifacts that are necessary, push access to an image registry, and that's it.
Deployment status notifications can go from Flux to CI, so you can read about your deployments via commit status messages, but the communication is one-way. Changes are always pulled into the cluster and never pushed. You can set up webhooks in both directions, of course, for faster round trips (it makes a big difference for devs when iterating to not have to do a context switch and manually trigger a deploy, or to wait seconds not minutes and get immediate feedback after git push) but many will also opt not to do this in order to further confine and constrain to limit attack surfaces, it works just as well so long as they can live with some 5-10 minute waits for polling to do the job.
Edit: heaven forbid we have a conversation, Hacker News won't allow fast replies :D thanks for giving examples
+100000. any more or less serious case deployment must be decoupled from build. It brings some challenges, but deploy and build are two different cycles what you cant put on the same assembly area. It can be triggered, but it should not be same pipelin
I've been testing out Github Actions for a few weeks now, for the most part I really like it, there are a few features missing but I think the fundamentals of the product are solid with the public catalogue of actions being the killer feature.
The biggest issue I have is around self-hosted runners.
1. There's no official auto-scaling runner option, so even if you're paying Github (aka Microsoft) for Enterprise - they're not going to support your auto-scaling EKS/GKE/EC2/whatever runners.
2. You can't register self-hosted runners without a Personal Access Token - the key word being _Personal_. Your automation code for provisioning runners should not rely on an individuals Github access token just to register, they need to have a system like GitLab has where you can generate a registration token per-organisation/team/repo that allows you to programmatically register runners.
We use https://github.com/actions-runner-controller/actions-runner-... to auto scale on EKS. It also allows you to use a GitHub App for the runner registration instead of personal access tokens. Also it seems like the project is receives support from GitHub because they’re getting early access to test out features.
Thanks, I had a play with this and submitted a PR to add some configuration options to the CRD.
It seemed decent but I hit two problems:
1. I didn't manage to get autoscaling to work - I suspect my helm templates might have been incorrect.
2. Docker-in-Docker (DIND) I know works, but one of the clients I'm working with has switched to containerd and the controller got a little confused by Docker-in-Containerd (DINC - you heard it here first!), I know really they should be using Kaniko/Buildah etc... but their devs aren't ready to make the change yet.
I think I recall seeing something on the GitHub forums where the recommended approach was the create a separate service user account and all your non-personal-PATs with that account.
Surprised there's no mention of the inability to view logs generated before the page was loaded until the job completes. This one drives me crazy when I have long running, silent activity.
My wishlist item would be more variants of Windows server versions so that we could build Windows containers for more versions of Windows. I realize the fault lies with Windows containers pinning the container base version to the host version, but I'm still stuck with the burden.
I think GitHub Actions got the model correct, using everything as events to trigger any number of workflows. This is far simpler to maintain than a single workflow with conditionals and wait states that you see with other systems.
Build engineer at a large Fortune 500 shop here - the largest impediment to us even enabling GitHub Actions is that you either disable them entirely for your org or allow repo admins to enable them _per repo_. We have several hundred repos in one org alone and we cannot simply enable them for everyone with admin access (for us if you create a repo, you get admin access).
Not OP, but in our case allowing essentially the entire company to start to use GHA would result in a huge load increase on our GHE deployment. Even with self-hosted runners per org, there is a lot of overhead that is ran on the github servers themselves. We are working towards rolling it out, but there are definitely some scalability issues.
If off-premise is an option for you I’d like to suggest a service I run called SurplusCI (surplusci.com). It’s cheaper than self hosting most of the time and is SHOULD be roughly zero toil for you.
Ah you're absolutely right -- I mistook the "overhead" as running some actions on regular GHA but they were talking about overhead on the on-prem runners that already exist.
Yeah it's pretty unlikely that they're running GHE for any other reason than needing to keep everything on prem.
> For the vast majority of use cases, the YAML syntax is sane and is similar to other CI systems. It gets super clumsy when you want to assign an output of a step to a variable that you can refer to later.
The verbosity of accessing output has the added benefit of making it much clearer that the 2 workflow steps are tightly interdependent on each other.
I’ve been loving GH Actions a ton but one issue I ran into is that I have a couple tests that get run and then if it’s a merge to prod branch, an image build action.
The problem is that I cannot make the image build action happen IFF both testing actions pass. I had to combine all three actions into one.
It works. But now there’s a “skipped” step that’s skipped 99% of the time and makes no sense for a lot of PRs. It also means I have a Frankenstein monster action that does three long lists of very different things. All just so I can make 3 depend on 1 and 2.
The other problem is that to develop and Test an action, I have to just push to origin a thousand times. My kingdom for a CI system that _trivially_ enables me to install a single one-liner program that lets me locally test my actions at near 1:1 compatibility.
I was planning to migrate to Github from Gitlab, but hearing about all these missing basic features (no caching, no restarting individual jobs?) I think I’ll stick with Gitlab for the forseeable future.
Gitlab isn’t supported by the enterprise, so I basically have to maintsin the whole thing myself. We also don’t have a license, so it’s missing some much needed features.
Apparently it’s still better than Github though xD
One extremely painful "gotcha" we encountered was trying to push code to a protected branch from inside a workflow.
With the default GITHUB_TOKEN, you can't push to protected branches. If you decide to use personal access tokens, you can push to protected branches, BUT that will trigger other workflows. That can cause an infinite loop of workflows.
We still couldn't figure out how to push code to a protected branch without triggering the same/other workflows.
> We still couldn't figure out how to push code to a protected branch without triggering the same/other workflows.
If you include `[skip ci]` anywhere in the commit message, that commit won't trigger any github actions. This is a built-in behavior; you don't have to manually check for that string in any downstream actions. I spent a long time trying to work around the same constraint before my colleague pointed out that `[skip ci]` was a thing.
One trick that might work is to include a token such as 'DO_NOT_TRIGGER_ACTIONS' in your automated commit message, then have your other action skip any commits that include that token using this in the workflow:
Is it possible yet to (without hoops) have an action workflow that runs on pull requests to <default_branch> without having to name that branch explicitly so that you can have a write-once-run-everywhere-forever process across repos that might change their default branch name? Last I saw there was a template parameter, but those get populated at creation not at runtime.
I've yet to come across a complete tutorial for setting up cached docker builds (one involving a rust compilation) within a Github Actions workflow. I've been figuring it out from pieces of info scattered across blog posts and github repos. How is anyone managing this today? How/where are you persisting cached objects?
Has anyone got experience or resources for running a big iOS application on GHA? Is it possible, or is it pretty much a toy right now? Say you have a big CI pipeline like;
* Full clean build including dependencies (support Carthage, Cocoapods, SPM)
* Running multiple test suites that takes maybe 5+ hours for full suite?
> migrate every engineering team at Venmo to GitHub Actions
Oh jesus christ. I feel for you dude.
We evaluated GHA, and we still are trying to use it, but there is a barrage of problems and limitations, including cost, lack of functionality, and technical issues. It's really only suitable (at scale) for linting, or generating Changelogs, or something else trivial. I use it in my OSS projects to run tests, and it's okay for that (though impossible to just tail a build log when it's large)
Drone.io is still an amazingly effective system that matches GHA (and has _crazy_ features like build parameters) but is more flexible. Of course you'll have to pay for commercial licenses, but if it's between paying for GHA or Drone, I highly recommend Drone instead. Drone is stupidly easy to maintain (infrastructure-wise).
2.1 We're starting work on this in the next couple of months. We plan to ship it in early 2022.
2.2 We want to speed up the pace of GitHub Enterprise Server releases, but I don't have more info to share.
2.3 We're looking at ways to not require a GHEC account or "unified" license.
2.4 The limits are much higher with the GitHub hosted runners, but this is a current limit of self-hosted runners.
3.1 It's on our backlog. No date to share.
3.2 I haven't heard this before and thanks for sharing the scenario. We'll think about it.
3.3 This will ship in October.
3.4 We're doing some performance optimizations for GHES 3.4 that should fix this.
3.5 This shipped recently - https://github.blog/changelog/2021-08-25-github-actions-redu...
3.6 We have a couple API improvements coming later this year.
3.7 We're looking into this, but no dates to share.
We're dedicated to making Actions a great experience. As you would assume, I'm very excited about the future of Actions and getting feedback like this helps us make it better.