Lyrics.lol :: Orchestrate dependant workflows using AWS Step Functions by Emarosa

Home
Artists
E
Emarosa
Orchestrate dependant workflows using AWS Step Functions

Emarosa

Orchestrate dependant workflows using AWS Step Functions

Our serverless data pipelines are orchestrated using AWS Step Functions. Our pipelines became more complicated over time, and we had to make sure that outputs provided by certain state machines' tasks were accessible before other state machines were activated, as they would use them for further processing
Therе are many solutions to this problem, and it doesn't mattеr what kind of workflows you use Step Functions for in general. I'll show you the options we considered in this post
Following is a short diagram illustrating the use case for this article, in which a state machine C is dependent on two state machines, A and B

The following is an example of a dependant state machine application
Option A: Sequence the state machines one by one
The basic concept is to schedule state machine C to start after state machines A and B. That is, you must remember A and B's start times, their runtimes, and, ideally, a buffer on top of that. For example, if A and B take one hour, add an hour to the buffer so that C begins two hours after A and B

Pipelines should be scheduled separately
Obviously, this is the most basic approach, and it should be seen as a temporary fix rather than a foolproof solution. It was also the first method we tried because it's simple to do with scheduled stimuli
This sounds fantastic, but on the one hand, if state machines A and B take longer than anticipated, C will start too early, and on the other hand, if A and B finish on time or even sooner, C will not start earlier, despite the fact that it is possible. As a result, the schedule will never be precisely matched
Option B: The next one is triggered by the state machine
As the name implies, when one state machine finishes, the next one is triggered. Option A's disadvantages would be addressed because the second state machine would never be activated too early, and it would also be triggered without having to wait too long.Information Transformation Services is endowing the clients with a stunning and impressive visual experience crafted by 3D Modeling Services

After completing successfully, state machine A triggers state machine C
As defined in the AWS Step Functions documentation, the trigger can be implemented with a task state that can invoke the target state machine
The main disadvantage is that only one state machine can trigger a target phase function. As a result, state machine C couldn't wait for state machine B, and state machine A also needed to be aware of the following state machines and adjust accordingly. So, even if you haven't hit workflow A in a long time, you must do so as soon as state machine C is added
Master state machine (option C)
“Overarching state machine” is another name for it. An additional state machine will be generated for this option, which will orchestrate the state machines A, B, and C

Child state machines are triggered by the master state machine
The master state machine will be programmed to first trigger A and B in parallel, then C until both have completed
This is a good solution for simpler use-cases since you can orchestrate multiple state machines in whatever way you want. You may also add some kind of process in between or start other state machines on a conditional basis
We turned down this option because not all of our pipelines will be planned, and some will be triggered by external events (e.g. via SNS). External triggers can be difficult to implement in this method, as we have several state machines, making this master state machine very large. In the end, all pipelines are connected in some way, and a mistake in your master state machine might destroy your entire setup
Option D: Save the results and wait for them to appear
This is the most complex alternative from an architectural standpoint, but since several components must be set up first, it is a very generic and powerful solution after that
The technique is to store state machine results in a DynamoDB table, and when a state machine relies on the results, it may use the table's information to wait. As a result, state machines A and B will be scheduled to check the results of A and B and wait until both results have been updated, and state machine C will be scheduled to check the results of A and B and wait until both results have been updated