Everything about these libraries that makes everithing infinitely composable https://github.com/transient-haskell/transient/blob/master/README.md
docker run -it -p 80:80 agocorona/tryhplay
and ,point the browser to http://docker ip/try/mybudget.hs/edit
Cool. Now I´m playing with a little modifications to make HTTP interfaces for Web programs much more meaningful.
This program:
main0=keep $ initNode $ do
showURL
local $ choose[1..3 :: Int]
when executed runghc Test.hs -p start/localhost:8000
produces this output which is the URL of the point at which showURL
is located
'http://localhost:8000/0/0/e/e/'
opening the browser with this URL, the following is displayed in the browser:
SMore/0/10003000/1/e/
SMore/0/10003000/2/e/
SMore/0/10003000/3/e/
SLast/0/10003000/
If you follow https://gist.github.com/agocorona/3b20c32d4a1fc47f48342a213c1c1fce you will understand: Each thread created by choose produces an output line, which contains all the logged by the program until the thread terminates. The last one, which closes the connection, is produced when there is no more thread running.
Any point in the cloud monad can be invoked by HTTP , then it executes the corresponding actions and receive all the responses, so the interoperability with any other language, system etc is guaranteed.
Until now the connections were not closed when the response was complete. and to receive the response, the program should have an extra teleport. Now that is transparent and any cloud program receives proper responses.
The idea to blend all the kind of modalities of programming under a single execution model is a massive simplification which demand a accommodation of the mind. I though that it could be sooo good to have. The cloud monad allows so many things at the same time: It allows logging, stop and resume execution, distributed computing, web nodes, creation of HTTP endpoints... The transient monad under it allows for all the rest of the effects: threading, events, streaming, unlimited states, early termination, parallelism, concurrency, backtracking, compensations of transactions, better exceptions.
All of this based on principle: continuations. It simplifies everything. We store continuations in the mind. When we have to stop a plan because we have to solve some details before doing the current step, we solve them and continue. That is the way the mind abstract from details and think at an high level: Because we use continuations.
True continuations remember the complete history of past steps, so the continuation can make use of any of the previous results. That happens with you and me when we execute a plan and stop the plan to do some other thing or some thing necessary for the continuation of the plan. That also happens in languages which support continuations as first class features.
But look at callbacks and stackless continuations in general: They are like when you do a task and communicate to other the rest of the tasks written in paper, together with the data that you have obtained and think he would need. The person who continues is in another environment and is amnesic. If not all the data is available, he should ask you back or he would re-adquire the data from the environment or in some global state in which you stored more data.
That is little composable. This means that, if the work need many interruptions to be continued by different amnesic people who continue your tasks, then such dispersion of data and processing and some possible bad global storage usage would make the whole chain of work very fragile. We find one of the shortcomings of bureaucracy and non-composable code.
True continuations is as if you would continue each step after an interruption without forgetting anything. There is no loss of data and therefore there is no communication problem, no discontinuity no different environments.
When the task to do is shared among many actors: persons or threads, then the problem of stackless continuations and callbacks are multiplied. Using true continuations is as if you can clone yourself with all the memory intact to do the different tasks. In programming you can execute the continuation in different threads. That is what transient does.
Continuations in human terms is the historical memory of what we did up to a point and the memory about the next steps to do. It is the key to think and act at a high level and also is the key to program at a high level: continuations in a way or another are used whenever the programmer need to invoke a lower level functionality and continue the program flow without being cluttered by these low level details: look at blocking system calls like file and stream operations, for example: they use continuations at the kernel level to resume your program when the file IO operation has finished. If you don't have these continuation abilities to call, then you have to handle the continuation manually, using callbacks, contexts or global variables with all the problems related.
Even function calls are very lightweight continuations. They are the most lightweight continuations. They execute in the same stack context. The other side of the spectrum are the distributed continuations of transient, where a portion of the stack is serialized, transported, the continuation executed and the new components of the stack are returned back to the caller, which continues the execution.
What is said about bureaucracy above can be mitigated by a close and true friendship. That makes the flow of information easy because in the chain of work the caller gives all information that the continuator need. They are all "a single person" . That's why unstructured working teams outperform theoretically well organized experts who work according to rules and procedures. But this hardly scales to big organizations or entire societies without authentic brotherhood among people who don't know each other. Christian brotherhood is necessary for an higher civilization.
If functional programming is programming with functions and functions are lightweight continuations then ... follow the reasoning.
purity and type safety are poor selling points for functional programming that only scratch the surface of real programming problems
Functions are to pure composition what continuations are to effectful composition. The latter does hit in the core of computing problems. Composing software modules that execute effects is what matters, since, either you like or not, effectful code is 90% of sofware programs
Continuations however had some problems. hard continuations using c primitives like setjmp and longjmp , that is are used by the OS ubiquitously to perform synchronous file IO and for hadling multitasking among other things. In these cases, the stack the instruction pointer , the IO info are stored and re-executed. Without that it wouldn't be possible to combine file IO operations with CPU bound operations. It wouldn't be possible multiuser and multiprocessing Operating Systems. That is, to begin with, a HUGE success in making pieces of code (programs) that can be developed independently and combine together to make bigger things. This agrees with some imperfect definition of COMPOSITION.
At the language level, they have been used in languages like schema and Smalltalk for making asynchronous things synchronous, notably, continuation-based web frameworks and continuation based distributed computing. but continuation info store and retrieval demand time and space storage. When there are a lot of sessions the storage should be not in memory but in disk. These development tend to save entire suspensions, that includes data and code of the continuation which is huge.
The Haskell continuation monad (or the codensity monad, which is a variation, exposes k, the continuation which is serializable in languages like the two above (generating big serialization chunks, as I said before), but it not in Haskell. it also has applicative, but no alternative composition. The instances do not consider the possibility that each member of the alternative, applicative or monadic combination run in different threads. As a consequence, if the continuation transformer includes the IO monad, only allows combination with blocking. whenever IO is executed within the continuation monad within an applicative, it blocks. This is not what would be necessary for composable concurrence.
To summarize, there is no way to express algebraic combinations of parallel and concurrent pieces of software so that these pieces make use of past results in scope as if we were running a single thread. There is no way to combine then algebraically with different asyncronous inputs (streams, sockets, mouse events, keyboard input). Much less when all of this run distributed among different nodes.
Transient does it.
However these continuation-based services are created in the 70s. The new programming problems of the 90s: Web programming and mouse input or in general, concurrent management of events was not solved the same way. While the OS brough primitives like epoll which partially solved the problem of composition of inputs, in the user space, the frameworks for web programming , distributed computing (actor model) and GUI programming opted for callback-based schemas. Which is a hell.
The barrier to the adoption of composable frameworks for these problems has been the difficulty to make continuations compose well among them at the language level unless you block. For example if I want to read two files (which use continuations at the OS level) in the program either you do it sequentially or you must fork another thread, which break composition. In transient, you can express this algebraically with (async (read file1) <|> async (read file2) >>= process
and each file will be processed in parallel. If you use <>
instead of <|>
the two files will be concatenated and the whole content will be processed in one thread. The same construction could be used for sources of events: ((react onmouseEvent >> return "hello")<|> async(read file) ) >>= process
and these expressions can be combined as well with the same operators.
The stack can be transported as a path of serialized variables separated by "/". this makes such continuations human readable like HTTP GET paths.
main= keep $ initNode $ inputNodes <|> do
let name= "WORLD"
node <- local $ do
ns <- getNodes
guard (length ns > 1) <|> error "I need to know the other node"
return $ ns !! 1
resp <- runAt node $ do
localIO $ print "producing response here"
return $ "hello "++ name
localIO $ print resp
In the program above, the message sent to the remote node in runAt
is:
SMore/0/30105000/e/f/w/("localhost",8001,[])/e/
And the response of the remote node is
SMore/30105000/40106000/"hello WORLD"/e/
SLast/30105000/40106000/
The first means: I'm the continuation 30105000. I call you at the beginning of your program (continuation 0) please consume these variables and continue executing. Note that the list of nodes from getNodes
are out of scope and not transported.
The latter means: I'm the continuation 40106000 and I call back your continuation 30105000 which is the one who called me. Please add "hello WORLD" to the stack of that continuation (which is the value of the variableresp
) and continue executing. There "SLast..." response means: I have nothing to send from 40106000 . you can DELETE your stack context to save space. I deleted mine.
The latter is a solution for another problem: how to free continuation contexts when there are hundreds, thounsands of them.
The fascinating thing is that the remote execution could be invoked by a simple HTTP client. With curl, for example, I can invoke any local
statement of the program, and receive his response.
if the second node is at localhost:8001:
>curl http://localhost:8001/0/0/e/f/w/("localhost",8001,[])/e/
SMore/30105000/40106000/"hello WORLD"/e/
SLast/30105000/40106000/
Streaming
Why SMore and SLast? The response in the above example has been received. Why wait for SLast to close all the contexts? Because unlike all RPC mechanism the continuation based remote communication support streaming and keep being composable. There is no loops and no callbacks. There are no special construct, no new operators, just some more primitives. The responses could be from 0 any number, including infinite.
This
resp <- runAt node $ do
i <- choose [1..5] -- better: threads 0 $ choose [1..5]
localIO $ print "producing response here"
return $ "hello "++ name ++ show i
Will produce five SMore responses and the SLast message. Better prefix choose
with thread 0
otherwise five threads will produce simultaneously their responses and compete to write in the network socket to send his responses. with threads 0
the main thread will execute a loop and will write the responses one by one. Network streaming solved and a number of things like thread parallelism are involved in this and complement themselves nicely if this it is necessary.
choose
is a combination of async
composed with the alternative operator which tries to use all the threads possible.
Asynchronous communications
It also support 0 responses, in which case, SLast will be sent immediately:
runAt node $ do
localIO $ print "no response produced here"
empty
Generators
Generators in python can yield
lists of values that can be used as ranges in for loops. Using generators, the fibonacci serie is more or less:
def fibonacci():
yield 0;
a=0
b=1
while true
yield b
a,b= b,a+b
for num in fibonacci
print num
if num > 10000 break
Yield can be called at any point of the execution of the generator. In the example is used in two locations. next
can be used to iterate the values of the generator. This is something that can be implemented using continuations but it seems to me that this kind of coding is not right. It is also imperative and use loops explicitly. It is however example of pull like streaming. Normally imperative languages are push-like since execution is from top to bottom. Pure Haskell code is lazy of course, so it is pull like. Monadic code is also push-like.
A transient program can return more than one value. Here the function fibonacci return a value b and a infinite recursive set of values. if no thread limit is established it will try to return all of them at the same time using infinite threads. It is a more functional and recursive definition of the iterator, without using mutation. There are no loops.killBranch
kill the thread/s that does the work the hard way. I need one more civilized way to end a thread since kill is asynchronous and takes some time to make it work. Probably endBranck
sequential= threads 0
yield= async . return
mainfib= keep $ sequential $ do
let fibonacci= fib 0 1
where
fib a b= yield b <|> fib b (a+b)
num <- fibonacci
liftIO $ print num
when (n > 10000) killBranch
An generator is like a coroutine , a cooperative thread which yields a result and stop. next
continue execution of the generator until the next yield. Generators, like iterators, do not compose. I can not write for num in fibonacci and factorial
. but I can write in Transient: num <- fibonacci <|> evens
Since it is pull based, it is passive, I can not inject events from outside to the loop. For example, I can not read from the keyboard, except by blocking and this is not good for composition, although generators do not compose, as I said. In the other side, I can write in transient: num <- fibonacci <|> input (< 10000) "enter a number"
. Both sources will be injectected in the computation.
main= keep $ initNode $ inputNodes <|> do
local $ option "go" "start mining"
nodes <- local getNodes
curr <- foldr (<|>) mempty $ map (\n -> runAt n minecurrency) nodes -- ==clustered minecurrency
localIO $ print ("new currency mined:",curr)
minieone :: IO Currency
minecurrency= local $ waitEvents mineone :: Cloud Currency
Example loops:
for loop:
forM [1..10] $ \x -> proc c
next
Equivalent to
mapM proc [1..10]
next
Executes the computation proc for each one of the values sequentially. For example:
formM ["file1","file2"...] $ \fil -> do
content <- readFile fil
process content
If we need to do it in parallel for each value of the loop
fork $ choose[1..10] >>= proc -- fork x= (abduce >> x) <|> return()
next
Use threads
to reduce the parallelism.
This executes the loop in another set of threads. The current one executes immediately next
If we want to wait and pass the list of results of the loop to next
:
results <-collect 0 $ choose[1..10] >>= proc
next results
collect 0
wait until no thread activity is present in his argument and gather all the results
proc
is now a transient computation, proc can return (yield) any quantity of results, depending on how you define it, so, in general, the length of the list gathered by collect is not the length of the list of choose
. If it is sure that proc will return just one result, then group
instead of collect would do the job and it is faster.
Resource management (again)
I have a problem with resource management when the resource is shared among different threads. Imagine a huge file that could be processed in chunks by different threads, but also a list of results which are being filled by different threads or even different distributed computations. Coding each particular case is easy, but not the general one. It is necessary to detect the extinction of thread activity so that the resource which is used before the final action: closing a file, returning a result etc.
Until now I did it in some cases by catching runtime errors in MVars shared among threads, but the detection of MVar readers with no writers active is very fragile and may not work when the code is relocated. I could use TVars but I would like to use neither of both in order to make transient as simple and portable as possible. I also use a periodic inspection (polling) of continuation states to know when to send a SLast message to a waiting node in order to discard a continuation for which no node will call back. All of this could be substituted by a single and faster mechanism.
It is difficult because, in the general case, threads can spawn more threads. Moreover, a single thread can return many results (or no result). So simply counting the number of results to decide when it is finished is not possible.
Finally I managed to improve onFinish
for this purpose, even if there are many threads involved. Now a onFinish
will only close the resource when there is no thread activity after it.
This code spawn three threads by choose
. I labeled them using labelState
so that they could be identified better with showThreads
. At the end of the execution of the three threads the finish message is printed. Provisionally the Finish message have the identifier of the last thread which finalized his execution:
main= keep' $ do
onFinish1 $ \r -> liftIO $ print ("finish",r)
i <- choose[1..3]
labelState $ fromString $ "thread from "++ show i
th <- liftIO $ myThreadId
topState >>= showThreads
liftIO $ print ("end",th)
The output is:
tests# runghc -w -i../src Test
---------Threads-----------
top 76
top 77
work,thread from 1 78 <--
("end",ThreadId 78)
---------Threads-----------
top 76
top 77
work,thread from 1 78 dead
work,thread from 2 79 <--
("end",ThreadId 79)
---------Threads-----------
top 76
top,thread from 3 77 <--
work,thread from 1 78 dead
work,thread from 2 79 dead
("end",ThreadId 77)
("finish","ThreadId 77")
tests#
In the same way I've created a binary onFinish
which will return a result when all the threads of the first operand terminates so I can return a result. The schema is:
main= keep' $ do
container <- create some mutable container to gather results
results <- SomeProcessWith container `onFinish1'` \_ -> return container
This has direct applicability to collect
that until now worked with a fragile detection of MVar exceptions and the exit of `keep'' which worked the same way. The former unary onFinish has applicability for sending SLast message to remote nodes in order to make them garbage collect their continuations.
With this solution, the scope in which the resource is visible match the lifespan of the resource. It is not needed to bracketing the use of the resource by inverting the control with withResource
. For example:
test= do
result <- do
onFinish1 $ \e -> liftIO $ print ("END", e)
liftIO $ threadDelay 1000000
return "something"
liftIO $ print result
seems that if result
would be a file descriptor, it would escape the finalization, but that is not the case since the finalizer works for all the execution flow which passed troug onFinish
> runghc test
"something"
("END","ThreadId 80")
Going all the time trough unexplored territories implies expecting unexpected difficulties and setbacks. It takes time before problems are solved. Initially with solutions a little messy under the hood. takes a lot of tests of alternatives and factorization to reach the point where there is no better solution.
Better solution= the smaller set of components that compose.