Categories
Current Trends How It Works Technology

Bidirectional (2 Way) Sync Using Pipedream

What is Bidirectional Sync?

Bidirectional sync, also known as 2-way sync, is a type of data synchronization process that involves data being synced in both directions, meaning information can be transferred from one system to another and vice versa. It ensures that any changes made in either system are reflected in both systems. This type of synchronization is commonly used in cloud-based applications and services, where data needs to be shared across multiple devices and users.
While most automation workflows will generally flow one-way (such as in a DAG), there are occasions where you may want to keep data synchronized in a non-hierarchical manner between two different applications. In this article, we will use Trello and Google Sheets as a use cases.
In classical software engineering cases, this is accomplished using a server-client model where the server acts as the “single source of truth” from which the client updates. However, if we are referring to essentially two applications that have their own “servers”, what is the best way to keep information organized and synchronized between these two resources?

Challenges in Bidirectional Sync

When referring to bidirectional synchronization, there are a few key challenges that need to be weighed and balanced when designing a bidirectional sync system:

Maintaining Data Consistency

Bidirectional sync between two web applications can be challenging when it comes to maintaining data consistency. For example, if one application is updated, the changes must be reflected in the other application to keep the data consistent across both applications. This problem is trickier than most people realize due to issues like how fields are mapped, how individual “rows” or entries are keyed to each other, and how frequently a service can send/receive updates.

Conflict Resolution

When synchronizing data between two web applications, conflicts can arise in how the data is represented or stored. This can lead to discrepancies between the two applications and must be resolved in order to keep the data consistent. Depending on how robust the data pipeline is, it would be possible to maintain a clean connection if there is a transparent keying system as mentioned above.
Depending on the scale and how critical the data is, it may make sense to either have a database external to both services that store transactions or have one service act as the “single source of truth” for both services. That way, in a conflict, the system falls back to a single service.

Security

Bidirectional sync between two web applications can pose a security risk if not implemented properly. It’s important to ensure that both applications are secure and that any data transmitted between them is encrypted. Many low-code/serverless workflow automation pipelines exist that can accomplish this in a secure manner, including Pipedream, Make.com, and Zapier, to name a few. 

However, at scale, these options can become expensive to maintain. In many cases, it’s best to prototype with these serverless workflow automation tools, then invest in engineering a custom cloud solution based on the design of your low-code solution. We can help with the transition from low-code to full-scale cloud builds. Contact us today for details!

Performance

Bidirectional sync can be resource intensive, as data must be constantly transmitted between devices. This can lead to slow performance and a degraded user experience. This is often dependent on the scale of the data transferred and stored and HOW it’s transmitted. If each operation requires a complete synchronization of both data sources, then this will be much more computationally costly than single incremental updates as they occur. 

Strategies for Establishing Bidirectional Sync

The main thing to consider when setting up a bidirectional sycn workflow is to prevent the “infinite loop” problem in the case of a non-hierarchical system. The main strategies we can implement to prevent this would be:

Last Modified

When performing a sync, we would want to include data about when a “row” in our data was last updated/created. If it’s relatively recent (within the past X seconds or so) we would want to ignore an update and stop the workflow from continuing to prevent wasted resources. As an extension of this, we could also check if the values are different from each other and if there is no difference between the existing row and the incoming update, break the workflow cycle. 

Is User

Some servers and automation tools may allow us to check if the incoming change is coming from either a human user or a program/API and react accordingly. If the change is coming from a non-human actor, we could easily close the loop this way and prevent wasted computation resources. 

Toggle

In some situations, a toggle could be added in a server-stored value that would prevent a workflow or function from initiating if something like isUpdating is true. After the update is complete, this could be flipped to false to reallow flow between the two resources. 

Bidirectional Sync Use Case: Trello and Google Sheets

Google Sheets is a fantastic spreadsheet application that is popular for it’s ease of sharing, free access to anyone, plethora of plugins, and the ability to write pseudo-javascript to drastically increase it’s functionality.

Trello, one of the top project management tools, is able to organize tasks and just about anything related to a project’s needs. 

We were approached to test a bidirectional sync system between the two tools since they both have API access, and we decided to give it a go! The architecture we came up with involved using Pipedream as our server to communicate and relay changes between the two applications. For this project, we simply mapped the Title, Description, and Status fields between a Google Sheet and the Trello Cards on a given board, like so:

To do this, we created two workflows in Pipedream: one responsible for processing updates FROM Trello TO Google Sheets, and another FROM Google Sheets TO Trello. Initially, we also wanted to account for the “infinite loop” problem (where one update would create a continuous loop of updates), but it turned out this was not necessary for this task likely due to the onEdit() event on Google Sheets side only responding to changes initiated by humans (and not the API). 

Given Google Sheets only takes one dimensional data, there were some limitations with how many fields from a Trello Card could be included in Sheets due to parsing and data structure complexity. However, in this scenario, for full-scale access to all data structures, you could use Airtable or Notion as an alternative database since it has more complex data structures built in like lists and the ability to key to other tables. 

Challenges

In addition to the data structure issues above, while implementing this workflow in Pipedream, the batch updates from Google Sheets also proved to be problematic as Pipedream does not handle iteration by default in a single “run”. We had to essentially split the workflow into “receiving” and “processing” to be able to handle these edge cases. You can read more about the status of this on their GitHub.

Leave a Reply

Your email address will not be published. Required fields are marked *