These are chat archives for jdubray/sam

4th
Nov 2017
Paolo Furini
@pfurini
Nov 04 2017 11:10
A question re consistency when doing distributed transactions that call into external and uncontrolled services (like a call to a google service, or whatever).. In your real-world practice, what's the best way to deal with this?
2-phase commit is not feasible with external services, and the easiest solution I see is using a compensation pattern.. How can I model a SAM service to include compensations in case of failures?
And, in your experience, how to deal with compensating an external call that potentially has multiple side-effects? That is, the call could trigger other actions on the external system, and we should be able to keep track of all those, because if another subsequent action inside the "transaction" fails, we should be able to "compensate" all the side-effects of the previously succeeded call..
Holger Winkelmann
@hwinkel
Nov 04 2017 11:25
Is this scenario not a long running process, which may fail, and therefore from a state perspective needs to be modelled like this. If everything works well usually the intermediate progress states of the external call will not be recognised apart some wait indicators. If it fails this should be known and shown in the derived state in the Ui to the user to take next action.
Paolo Furini
@pfurini
Nov 04 2017 11:32
@hwinkel yes, you're right, we should distinguish from interactive and non-interactive process. The case for interactive process can be modeled like you said, delegating the "compensating" actions to the interactive user. But for non-interactive process, where you can't or don't want to deal with user actions, I'd need for some pattern to inject the steps needed for compensating every single potential failing action (dealing with uncontrolled external services, I mean)
some of these "actions" could be even modeled as a work queue to be processed by a "human actor" later on, for example some service personnel, but in that case the process should be put in an "on hold" state indefinitely, until someone takes the appropriate actions and "resume" the process manually
but, again, some processes may not fit well for this approach, because they should "fail fast" and be able to be retried by the user of the system
Paolo Furini
@pfurini
Nov 04 2017 11:39
I'd like to know from @jdubray if he has some SAM examples (not coded, but conceptual models) that deal with such compensating architecture
Holger Winkelmann
@hwinkel
Nov 04 2017 11:40
The question would be, what is non interactive? If something fails and it has no impact just log it somehow for further analysis. If it has impact another user role need to deal with this error or failure state anyway. And it becomes interactive for another user role. I know that goes behound the case of the first hand user interaction, but IMHO this could not be solved in this context anyway, failure is failure and need to be dealed with anyhow.
Paolo Furini
@pfurini
Nov 04 2017 11:43
I agree with you, but every manual intervention you put on behalf of service persons, is time lost by the entire company.. not every failure can be compensated automatically, and that's a reality, but as long as there is a possible "automatic" compensation, that should be modeled in the process
Victor Noël
@victornoel
Nov 04 2017 11:43
@pfurini I think you will find answers to this kind of question more in the SOA / orchestration community... well I know @jdubray is knowledgeable in this domain, but I don't know if SAM itself has answers other than how to implement the patterns once you found them :)
it's true that there should be some "correct" way to do action's compensation in SAM
interesting discussion this :)
Paolo Furini
@pfurini
Nov 04 2017 11:48
@victornoel this is a link he shared some time ago in this room: https://www.youtube.com/watch?v=QDufAdzZZt0
and he said he was invited as an expert on the WS-CDL working group.. so I imagine he has something to say in this regard ;)
Victor Noël
@victornoel
Nov 04 2017 11:53
yes I'm sure he does :)
Jean-Jacques Dubray
@jdubray
Nov 04 2017 12:03
:-)
Actually, yes it does... since this is where it all started, I only started looking at how SAM would apply to front-end architecture in the fall of 2015.
The fundamental problem you want to solve is "state alignement", whether you have two communicating independent processes or two companies, one need to keep a shared view of the state. (Multiparty state alignment is even more fun).
Jean-Jacques Dubray
@jdubray
Nov 04 2017 12:10
At a simple level, some people use events to align state but you can quickly see that if I simply publish an event about something that happened on my side, I have no garantee that a subscriber will flip to a mutually agreed known state. I can send some money but there is no garantee you will ship the goods in return. Events are so primitive when it comes to state alignment that it is not even funny. When you use them you end-up implementing ad hoc state alignment protocols and that is very hard.
With that in mind, there are two known approaches to state alignment:
  • central coordination/orchestration
  • peer-to-peer choreography
The work that you want to look at if you have some time is WS-CAF I can't think of a better specification even though one of the key authors turn later to the dark side and became a RESTafarian, for reasons that I still cannot understand, perhaps because no one paid any attention to his spec.
Paolo Furini
@pfurini
Nov 04 2017 12:16
yes that's a question that comes up every time I try to embrace a "microservice" architecture.. I know it is impossible to achieve causal or strong consistency if you deal with external owned services, but every other pattern out there seems always a "better than nothing" solution. Eventual consistency is something that could fit some processes, but it is not a panacea for everything. The compensation pattern could fit well with a peer-to-peer choreography, if the compensation steps are embedded in the process itself, and the process state is self-contained and it is kept consistent across nodes using for example STM
Jean-Jacques Dubray
@jdubray
Nov 04 2017 12:16
For a while people have tried to use BPEL to manage these orchestrations, but BPEL engines are cluncky and buggy (Apache ODE should never be used)
I would not look at "compensation" in isolation. The BPEL is quite well designed, it's semantics are worth a look.
More than 10 years ago I analysed all these specs in an effort to better understand their semantics (for intance BPEL)
As you can see from this diagram, it's not something you can easily emulate with a few lines of code. It doesn't mean that every time you will have to use all the semantics, but as I said before when you programming model is unbalanced, you have to reify semantics with what you have and that's when things become painful, very painful in this case.
Paolo Furini
@pfurini
Nov 04 2017 12:21
coordination is essential for keeping track and coordinating services, but they can easily introduce single point of failure in the overall system.. there could be a feasible way to have an embedded coordination unit that is then passed along the nodes, using some established technologies like distributed STM?
Holger Winkelmann
@hwinkel
Nov 04 2017 12:23
HI, I agree SAM has, and may should not have, no answer to the Problem of State Alignement. It could only help to visualize the state of the given transaction in question. I think everything which creates at least awareness about this problem, and not hiding everything behind a synchrones direct model changing REST call is already a advantage to some current "solutions" an APIs. if we go down the road of BPEL etc. it will become a endless cross ownership engineering approach. What I tried to say with the "manual" approch of defining a role which interactively deals with the known failure state is better then crawling log files for a assumed unknown failure state, which definitely consumes even more time.
Jean-Jacques Dubray
@jdubray
Nov 04 2017 12:23
correct, this is all a matter of compromise, performance being probably worse than SPoF. BlockChain being the extreme end of the spectrum.
@hwinkel I am not at all recommending to use BPEL (I am a semantics guy so I try to get the semantics right and not focus on the technologies much)
Holger Winkelmann
@hwinkel
Nov 04 2017 12:25
@jdubray understood, was just a example saying the same thing.
What I lkie on SAM is to create a awareness of the required State.
Jean-Jacques Dubray
@jdubray
Nov 04 2017 12:27
It turns out that I had recommended using BPEL and Apache ODE to a client back in 2014 and even though I know the author of ODE personally (he lives in Seattle now) and I had what I thought was a good relationship with the vendor providing some support on ODE, it turned out that it didn't work, and neither of them told me. Just to be clear I had no financial interest in recommending ODE, at the time, I felt this was the right technology for the job, and the price was not high, since it was just support on FOSS.
So I had to go back to the drawing board and design an orchestration language from scratch. Understanding the scope of it, that was not something I could do in a month or two.
That's when I connected the dots are realized that all the discussions I have had about TLA+ and state machines where a game changer in that space.
Paolo Furini
@pfurini
Nov 04 2017 12:29
probably what I'd like to achieve, is to model small units of work, or processes, using embedded SAM instances, and be able to model the simplest compensation steps in the instance itself.. then the instance with embedded state will be available on all deployed nodes, dynamically, using for example transactional memory (e.g. Narayana STM) and hazelcast
Jean-Jacques Dubray
@jdubray
Nov 04 2017 12:29
SAM allows you to write orchestrations in plain Java and Javascript, which was and still is unheard of.
yes, I produced a library in Java and Javascript for that purpose but it does not have persistence.
Paolo Furini
@pfurini
Nov 04 2017 12:31
that's why I'd like to introduce Narayama, to be able to define some resilent state, that the STM engine will recover in case of failures..
Holger Winkelmann
@hwinkel
Nov 04 2017 12:31
In a world everybody is crying for stateless applications, APIs etc. often the required Application or Transaction State is missing. Seems people mix up stateless Protocols with no State in the Application. And end up with Stateless API which have a synchronous assumption that after a "200 OK" everything is fine, but in reality just the Intent of the State or Model Change is modeled in the "200 OK" there is no agreement about the actual transaction really has happened. I know everybody here know that, but on my day to day business i see this naive assumptions in numerous APIs and UIs.
Jean-Jacques Dubray
@jdubray
Nov 04 2017 12:32
@hwinkel yes, that has been my point over the last 20 years or so. Our industry tend to never look at the (minimum) semantics required to achieve something and that's a very big problem.
Holger Winkelmann
@hwinkel
Nov 04 2017 12:33
I.e. Look at the Openstack APIs and compare this with Kubernetes APIs.
Paolo Furini
@pfurini
Nov 04 2017 12:33
I agree, that's the problem with 90% of modern so-called REST(ful) apis
and I'm tired of this approximation, so I'd like to introduce correctness as first citizen in my up-coming projects
Jean-Jacques Dubray
@jdubray
Nov 04 2017 12:34
I tried to talk to the RESTafarians about this for years, it really impacted my career negatively because I was percieved as very negative, when in essence most of my peers were selling snake oil.
Holger Winkelmann
@hwinkel
Nov 04 2017 12:35
Kubernetes follows mostly a Intent Model where Openstack follows the synchronose Model. And tries to solve issues with messsage queues etc etc, which does not help either with fundamental problem not knowing the actual state of a given model change transaction
Jean-Jacques Dubray
@jdubray
Nov 04 2017 12:35
@pfurini it's not so hard, but it's not trivial. All the semantics you will find in BPEL and WS-CDL or WS-CAF somehow are legitimate and you need to be aware of them.
Paolo Furini
@pfurini
Nov 04 2017 12:36
if you want to deploy fully stateless apis, then sooner or later you'll need a coordinating party.. if that's completely programmed in your end-product, for example a SPA application, well it could be feasible if you are able to do it right
Jean-Jacques Dubray
@jdubray
Nov 04 2017 12:36
@hwinkel I am not familiar with them, but I can imagine a little bit, orchestration is orchestration.
@pfurini yes, absolutely.
That's where SAM helps because the Model is the best place (and aligns) to manage coordinations.
SAM being composable, you can also delegate the coordination to a (more) central coordinator
Holger Winkelmann
@hwinkel
Nov 04 2017 12:38
Yes, but solved differently , I.e. Kubernetes follows a Intent based model for state changes (like SAM Proposal action) and then a controller tries to fullfilll the intent and report back once finished or report failure state.
this makes the whole thing very robust and introspectable if something runs wrong
Jean-Jacques Dubray
@jdubray
Nov 04 2017 12:39
absolutely, deciding the end state from the moment an event is triggered is the worst coding practice possible.
@pfurini I am sure you have seen this picture since I posted it before
But if you want to do things cleanly, you want to have a clear view of your consistency layer (which I call the service layer, but I am not caught up on names)
The STAR libraries I mentioned have a bit more semantics that makes it easier to write orchestrations. SAM is just a container at that point, I believe the right container for writing orchestration, but you would still have to code them pretty much from scratch.
But, please don't underestimate this question, writing orchestration is not easy. All I can tell you is that SAM/STAR will keep you on the right path.
Paolo Furini
@pfurini
Nov 04 2017 13:22
yeah I know it's a complex matter, and don't want to underestimate nothing ;)
Jean-Jacques Dubray
@jdubray
Nov 04 2017 13:25
@pfurini At the same time, you want a progressive solution where simple cases can be handled with not too much effort/infrastructure, and progressively increase in complexity. EDA is attractive because it is simple enough to start, but it does not scale (in scope) with complexity.
I feel a SAM based approach offers a good compromise. It's not harder to adopt initially but will allow you to support the most complex scenarios, including scaling in volume.
Paolo Furini
@pfurini
Nov 04 2017 13:27
I was only thinking, for fun, that probably one could be able to achieve some lightweight DTC mechanism by being able to serialize/deserialize a SAM engine's state (for a single process), and that state could be even passed to/from consuming client, without maintaining server-state
I mean, all this REST stateless APIs have a commonality: they pretend coordination is not a problem
Maybe that's not something feasible for monster processes, but for light to medium ones it could be achievable without too much overhead on the communications
Jean-Jacques Dubray
@jdubray
Nov 04 2017 13:36
This is actually a perfectly valid way to manage coordination.
What people never understood or simply overlooked was that REST so called uniform interface matches the lifecycle of a document (create, update, ...). Hardly any business entity have such a simple lifecycle. There are a few like a customer (kind of), a session (though REST doesn't know about expiry).
If I remember correctly WS-CAF suppors a mode where you can pass the context in such fashion. In the end as long as you achieve state alignment (the state within a microservice is aligned with the state in another) there is no wrong method. As I mentioned in some simple cases, EDA works too.
Paolo Furini
@pfurini
Nov 04 2017 13:39
that's the point.. you're right that when APIs are treated like "remote" actions in a SAM process, one could achieve a greater correctness when coordinating a process. But then we could extend this bi-directionally, that is a SAM process could span from clients to multiple services, and from multiple services to clients
Jean-Jacques Dubray
@jdubray
Nov 04 2017 13:39
What is difficult is to detect bugs, especially the rare ones. AWS for instance used TLA+ do debug something with a 1/10Million occurence and it was (if I recall) something like the 37th state transition in the lifecycle. You can't debug that with the human brain.
@pfurini absolutely. The killer feature in SAM you cannot find anywhere else (even in BPEL) is Safety conditions in the State function (aka the invariants of TLA+). Of course it takes time to also identify these safety conditions, but at least you get a sense that something unwanted happened then you can look at the log of actions and acceptors and more easily debug the system.
Instead of writing insipid unit tests, people should spend time writing safety conditions.
(I'll tweet that)
Victor Noël
@victornoel
Nov 04 2017 13:43
@jdubray do you have simple example of safety condition? is that something like an assert you would put in the state function that says "the current state of the system is incoherent, there is a bug somewhere"?
Paolo Furini
@pfurini
Nov 04 2017 13:43
Yes, that'd require experience in writing such safety conditions, but will give an in-depth knowledge of the problem domain, that's difficult to achieve otherwise
Jean-Jacques Dubray
@jdubray
Nov 04 2017 13:44
It's a negative assert, it is a state you know the system should not reach
Paolo Furini
@pfurini
Nov 04 2017 13:44
but how do you react, a log or sending an alert to an alerting system?
Victor Noël
@victornoel
Nov 04 2017 13:44
ok, and it is useful as a way to find bugs, not for the program's functional behaviour?
@pfurini I would suppose so yes...
or it could also throw an exception when running in dev mode or during tests
Jean-Jacques Dubray
@jdubray
Nov 04 2017 13:45
yes, primiraly for finding bugs, but you can also implement "next-actions" that:
  • at a minimum notify someone
  • take a corrective action to avoid catastrophic behavior
Paolo Furini
@pfurini
Nov 04 2017 13:46
so they could be used for "compensating" in a compensation pattern?
Victor Noël
@victornoel
Nov 04 2017 13:46
ok, I see, yes, the fact that it is in the state function gives it a good position to do something meaningful
Jean-Jacques Dubray
@jdubray
Nov 04 2017 13:46
But yes, in general, it is not part of the behavior of the program, as the name says, it's a safety condition
@victornoel absolutely
@pfurini I would not present it that way
it's really a bug detection mechanism
You should never reach a safety condition so you should not really define the behavior of your program based on it
But detecting it and getting a trace is huge for rare bugs
You can also add the safety condition later, once the bug is detected, it will help you trace it.
Paolo Furini
@pfurini
Nov 04 2017 13:47
ok, so expected failures should instead be modeled as primary actions?
Jean-Jacques Dubray
@jdubray
Nov 04 2017 13:48
yes
a typical business failure is you send a purchase order, but the inventory unexpectedly went to zero (someone broke the part you just ordered)
it was in the inventory, but it's not anymore
that's a great state alignment exercise between the order and inventory microservices, not to mention the customer.
That's why I recommend not playing with SAM semantics and say "oh I don't need XXX", yes it's true for any given case, you can use fewer semantics, but the problem comes when your use case evolves and now you need to support more complex requirements, that's when the structure of SAM comes into play.
Jean-Jacques Dubray
@jdubray
Nov 04 2017 13:53
I researched that topic a lot. I can't claim I have the ultimate answer, all I can tell you is that SAM is the best answer I can give you.
Paolo Furini
@pfurini
Nov 04 2017 13:54
a use case for a compensating action:
  • a user wants to register on my system, but my system needs an account on behalf of the user on other 2 platforms
  • make a call to platform A to create the user
  • make a call to platform B ...
  • write the user record(s) on own system
Jean-Jacques Dubray
@jdubray
Nov 04 2017 13:55
yes absolutely, especially if these registrations could fail at a later time (or be revoked).
It's a common practice to take an optimistic view and you claim succes of a transaction, before it has been fully recorded. All these cases can be complex to unwind.
Paolo Furini
@pfurini
Nov 04 2017 13:56
to be able to attach the compensation action(s), I'd have to model failure conditions as primary intents in this case..
Jean-Jacques Dubray
@jdubray
Nov 04 2017 13:57
yes exactly. FSM are not a good solution, they are unmaintable and for the most part very hard to get right.
Paolo Furini
@pfurini
Nov 04 2017 13:59
but as you said unwinding some external calls could be complex.. and here a 2-phase commit can help only with the last step, because that's owned by the system
Jean-Jacques Dubray
@jdubray
Nov 04 2017 13:59
BPEL for instance was close but the "try/catch" model is too simplistic compared to Model/State in SAM. It's not just about a participant telling you something went wrong.
Victor Noël
@victornoel
Nov 04 2017 14:00
I'm not clear how the model/state in SAM can compare to a try/catch in something like bpel.. could you elaborate @jdubray?
Paolo Furini
@pfurini
Nov 04 2017 14:02
I think the model/state pattern can help in writing complex processes in a more readable way, for a programmer.. that is by writing code that speaks for itself
Jean-Jacques Dubray
@jdubray
Nov 04 2017 14:03
yes exactly, with try/catch you are pushing the burden of detecting error conditions on the microservice
that's not always true/possible
Imagine the case where you reserve a trip with the constraint that the total trip cost cannot be more than $2000. So 3rd party microservices would reserve tickets, car, hotel and return the (dynamic) price with the reservation. You can only make the decision to cancel once you know all the details.

writing complex processes in a more readable way, for a programmer

That's the key

Jean-Jacques Dubray
@jdubray
Nov 04 2017 14:09
when you use BPEL, you need a BPEL expert. That takes quite a bit of effor to master, not to mention that it only works with well behaved services/microservices
Now one of these services change, how do you change the orchestration? It's really painful to write BPEL orchestrations.
Paolo Furini
@pfurini
Nov 04 2017 14:09
I prefer actionable models, and not declarative ones
Jean-Jacques Dubray
@jdubray
Nov 04 2017 14:10
It's really a question of semantics Cogent/Anemic and Monadic/Polyadic
A programming language is more cogent (ability to code logic) than BPEL but often too monadic (not enough semantics)
So in some cases BPEL will do a better job, but then it becomes a skills problem
Paolo Furini
@pfurini
Nov 04 2017 14:12
in the end every model must be executed.. I prefer to add semantics on my actionable models
Jean-Jacques Dubray
@jdubray
Nov 04 2017 14:13
That's why I always will try to move the discussion to the semantics level. Fewer semantics means more coding (and things could go wrong bugs, maintenance...).
Sometimes it's very difficult. Without SAM I would argue it's very difficult to write Orchestrations in typical programming languages.
Paolo Furini
@pfurini
Nov 04 2017 14:13
I agree, but higher abstractions means more rigidity and less adaptability
I speak of BPEL and the like
Jean-Jacques Dubray
@jdubray
Nov 04 2017 14:14
Yes absolutely agreed, I'll take cogency over polyadism any day
that's why I no longer use Model-Driven-Software-Engineering.
I am not saying it is useless, but the argument between cogency and polyadism is settled, at least in my head.
Paolo Furini
@pfurini
Nov 04 2017 14:16
I know it might seem OT, but take for example the GoLang programming language. One of the main reasons of its spread lies in a strange mix of rigidity, simplicity, and readability
Jean-Jacques Dubray
@jdubray
Nov 04 2017 14:16
But without SAM, for me it was a toss up, I could end up writing lots of code to do something that would be easy(ier) with an MDSE approach.
@pfurini sorry, I don't understand?
Paolo Furini
@pfurini
Nov 04 2017 14:18
What I mean is that programmers and architects are tired of excessive complexity, and strive to find more expressive ways to achieve their goals
GO success relies primarily on its perceived simplicity and readability, and easy on-boarding for newcomers in a project
Jean-Jacques Dubray
@jdubray
Nov 04 2017 14:20
yes, agreed. Sorry I don't know much about GoLang
Paolo Furini
@pfurini
Nov 04 2017 14:22
the reality I myself don't know much of it, but some time ago I needed a feature that was not present in docker-machine utility (written in GO). After cloning the repo, and digging into the code for say 10 minutes, I was able to code the new feature myself, and recompile it
I never saw a line of GO before..
Jean-Jacques Dubray
@jdubray
Nov 04 2017 14:23
But you'r good. It would take me months to train my old brain to parse a new language.
I'll meet a friend tomorrow who recently started working with go, I'll try to pick his (young) brain.
Paolo Furini
@pfurini
Nov 04 2017 14:25
but that's also the biggest drawback of languages like GO, or JS for example.. the easier learning curve means that there is potentially ton of poorly written code out there

It would take me months to train my old brain to parse a new language.

Ask your friend tomorrow.. I'm sure he'll agree that you could be able to start coding in go in less than an hour

Jean-Jacques Dubray
@jdubray
Nov 04 2017 14:27
I will
Paolo Furini
@pfurini
Nov 04 2017 14:28
I'd be curious to see a SAM service written in go.. :smile:
Jean-Jacques Dubray
@jdubray
Nov 04 2017 14:29
you will...
I have to get ready for my day.
thank you for the discussion
Paolo Furini
@pfurini
Nov 04 2017 14:33
thank you for your willingness to help ;)
Jean-Jacques Dubray
@jdubray
Nov 04 2017 14:42
:+1: