When implementing any large system and designing it's processes there comes a time you need to decide whether to send data to other systems (Push) or make it available for them to come and get (Pull). Looking at most of the systems around me, people have tended to design the push model. They have processes that as a last step send a file somewhere either for other systems to consume or in case of reports or csv files, for users to access.
Perhaps the reason behind the push model preference is that the sender system is the instigator. We wonder how the receiver will know the information is ready if we don't send it to them.
Push is fraught with issues
- security - to send a file to another's server sending system needs to get access to the target system. Once you have sent a file somewhere, you cannot be sure what sort of access control is imposed on it.
- retention and archival - once you have sent a file, you don't know how long it will be retained. If there are retention requirements, you need to keep a copy and that is likely duplicated at the target.
- issues on the target - target file servers frequently run out of space. They are generally managed by someone else. Problem resolution then means 2 teams are involved. The target support team to fix space issues. The sender support team to re-run the process which sends the file. In a pull model, generation and storage of output is in one team. Receiver is not tightly couples with Sender (who now becames Producer rather than Sender).
This is why I'm an advocate of the pull or self service model.
To address the issue of receiver not knowing when the data is available, you can use a notification mechanism. Notification mechanisms such as messaging or email do not have the same security and reliability issues as file transfer. Messages tend to be smaller. The sender only needs access to their sending mechanism and not the target. You don't need access to my inbox to send me an email. Space is rarely an issue because messaging systems tend to be monitored and small notifications tend not to be the straw that breaks the camels back. Large output files can often be the thing that fills the target system's space. And when there is an issue with sending a message, there is generally bounce / response.
Hopefully that helps you in designing your solutions.
The Amazon Effect on Open Source
5 years ago