Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 07:40
    avysk commented #1510
  • 03:27
    Padarn opened #1519
  • Apr 15 05:43
    PallavRastogi commented #1469
  • Apr 15 05:43
    PallavRastogi commented #1469
  • Apr 05 12:17
    minrk closed #1515
  • Apr 05 12:16
    minrk commented #1515
  • Apr 04 19:55
    minrk commented #1517
  • Apr 02 09:51
    Jeducious commented #1515
  • Apr 02 08:53
    everthu commented #1517
  • Apr 01 16:51
    Udayaprasad edited #1518
  • Apr 01 16:50
    Udayaprasad edited #1518
  • Apr 01 16:46
    Udayaprasad edited #1518
  • Apr 01 16:44
    Udayaprasad edited #1518
  • Apr 01 16:44
    Udayaprasad opened #1518
  • Apr 01 13:20
    minrk commented #1515
  • Apr 01 13:19
    minrk labeled #1517
  • Apr 01 13:19
    minrk commented #1517
  • Apr 01 13:11
    minrk commented #1516
  • Apr 01 13:11

    minrk on master

    Correct docstring for ProcessMo… Merge pull request #1516 from m… (compare)

  • Apr 01 13:11
    minrk closed #1516
Allie
@cite-reader
That auth code looks correct, but the messaging pattern is broken.
Excel Arcelo
@excelarcelo
@cite-reader what do you mean broken?
Allie
@cite-reader
I'm looking for the place in the doc that explains this but if my recollection is correct the pub-sub pattern deals in multi-part messages, where the first frame is the topic, and the remainder is the payload. You're only sending a single part message; I haven't tried that but it's at least unexpected.
No, actually, I might be wrong about that.
Excel Arcelo
@excelarcelo
socket.rcv() is what is wrong
Allie
@cite-reader
Okay yeah I'm wrong, ignore that. I don't know what I was remembering.
Excel Arcelo
@excelarcelo
its pub sub :D
Allie
@cite-reader
Yeah I was going to point that out but it's past midnight for me, I'm not at my best.
Excel Arcelo
@excelarcelo
removed that but still client has no received message
@cite-reader you must be tired
Allie
@cite-reader
yes.
Excel Arcelo
@excelarcelo
@cite-reader thank you i'll just google for answers,last thing do you have documentation for pub sub authentications?
Allie
@cite-reader
Not specifically, but the security mechanism is ignorant of the socket types involved. In the wire protocol, metadata isn't exchanged until the very last phase of the security handshake, so it'd be hard to treat pub-sub specially in the first place.
You might find it illuminating to open up Wireshark or something and look at what's happening when your subscriber tries to connect. It's not the most obvious protocol to follow but maybe.
Excel Arcelo
@excelarcelo
hey everyone can you whitelist an ip using pub sub?
阿飞
@guocdfeifei
大家好,我是阿飞,是个33岁的老男人。很高兴加入这个大家庭
郑嘉成
@CastleYeager
如何测试pyZMQ消息的延迟和抖动?
How to test the latency and jitter of pyZMQ messages?
是在本机上编写一个发布者和一个订阅者,测试单向发送消息的延迟和抖动
Is to write a publisher and a subscriber on the local machine to test the delay and jitter of sending messages one-way
NumesSanguis
@NumesSanguis_gitlab
@CastleYeager You can add a timestamp in the message of your publisher, and compare the time of when you receive the message in your subscriber. Then store those values in a .csv or so for further analysis.
I'm having an issue with ZeroMQ sockets not closing properly when they have been created in a subprocesses/thread. Can someone take a look?:
https://stackoverflow.com/questions/64191495/run-multiple-python-files-in-separate-thread-process-from-python-pyzmq-sockets
郑嘉成
@CastleYeager
@NumesSanguis_gitlab Thx!!!
And I want to know what is the header format of the pyzmq message?
Nikolay Alexandrov
@nikolayay

hey guys

super stressed out student here. my masters cloud coursework is to build a scalable, fault tolerant application to solve an 'embarrassingly parallel task'. I spent the last week gingerly digging myself into a hole of distributed systems and am experiencing severe conceptual overload. I reach out to you, kind people.

for the task settled on Mandelbrot sets to make cool pictures.

the architecture i arrived at (lifted from zeromq textbook) is as follows:

  1. ventilator splits task in chunks and fans out to workers
  2. worker completes task and pushes to sink
  3. sink reconstructs original task and writes to storage

as you can see there are a number of faults:

  1. how to keep chunks ordered after processing?
  2. will I get dogged by my professor for introducing points of failure like ventilator and sink (perhaps replication is a solution?)
  3. to address scaling, what is my metric? does CPU usage work well enough? perhaps I should just sent tasks out whole and stick to CPU?
  4. orchestration, I am attempting to use Kubernetes and its going okay, mostly worried about replication and persisting results
  5. deployment, we got $150 AWS-bucks to flaunt, but I am unsure of how to go about deployment, it seemed easy until I realised that you have to allocate nodes(?).

I am not married to neither the task I picked nor the architecture at this point, It was really fun learning this stuff. If anybody has any advice on any of the topics pleaseeeee help me. Cheers!

Allie
@cite-reader
Two questions. First, has the course given you any prior exposure to AWS or are they throwing you in blind? Second, are you expected to build your own messaging infrastructure or would it fit the brief to outsource that to an AWS service as well?
Nikolay Alexandrov
@nikolayay
@cite-reader hi!

@cite-reader we had some teaching in aws, but not a lot beyond starting an ec2 and running minikube on it; the course was mostly conceptual; so yeah throwing you with one eye closed kind of

the brief is very unspecific, i thought to use zeromq for messaging; to be frank I wasn't sure aws offered that capability

Allie
@cite-reader
Not knowing anything about the course, when I hear "scalable, fault-tolerant application to solve an embarrassingly parallel task deployed to AWS", the architecture that comes to mind is an SQS queue processed by a consumer deployed to AWS Lambda. It's of course not the only thing that will work, but it's reasonably canonical for the shape of problem and as a working engineer I'd want a specific, compelling reason before choosing something different. "The professor doesn't like it" is such a reason, so talk to them about it. If you could reasonably expect chunk processing to take longer than Lambda's hard timeout is the other that immediately comes to mind, but for Mandelbrot I doubt that's much of a concern.
Nikolay Alexandrov
@nikolayay
@cite-reader its more that they dont want us to rely too much on the provider for functionality, you get more marks if you design things yourself so i was hoping for something i can roll myself. kinda looking at processing the twitter api stream over mandelbrot because i had a hard time motivating a cloud architecture for the latter
@cite-reader what are your thoughts on reasonably implementing the brief without relying on the platform?
Nikolay Alexandrov
@nikolayay
@cite-reader they also expect us to implement some sort of fault injection to demonstrate the app still works
Allie
@cite-reader

I'd probably deploy workers to a spot fleet. With platform messaging systems disfavored Lambda doesn't have the complexity advantage of being invoked for you and with a source as noisy as Twitter they'll likely spend no time idle, probably negating the cost advantage. Spot instances can be deeply discounted compared to equivalent on-demand pricing at the cost of being preemptable, but you're already building fault tolerance right? A spot fleet is like an autoscaling group, but can have spot instances in it.

Choosing an instance type can be tricky. Generally you want the newest class of instances that fits the workload because as a rule they provide equal or better performance than similarly-sized instances of previous generations at lower prices. The absolute newest generation (the t4g, m6g, etc. instance classes) make that a little trickier because they're arm64 machines and pyzmq only provides wheels for x86 and amd64, but that likely isn't insurmountable. If you don't want to bother look at the t3 or m5 or etc. classes. (Maybe the a variants, which are running AMD chips rather than Intel, and are therefore somewhat cheaper. Not the a1 series, which is the first-generation arm instances; you want i.e. t3a for AMD chips.) The common wisdom is that t instances are mostly good for making you sad, but eeehhh... they're cheap and they come smaller than mseries do, so if the workload isn't super CPU-intensive I've found they're totally fine. If the workload does use lots of CPU, measured in terms of "the instance consumes burst credits," the m series is a better default option. You want your thing to scale out, so don't worry too much about choosing the right size of instance: going up one size is like taking two of the previous size and welding them together, and priced exactly accordingly, so choosing an instance type larger than your workers need in order to function is of little benefit.

If my understanding of the Twitter API is correct there's no performance benefit to having multiple ventilators, since each client associated to the same principal will see exactly the same stream. You could implement a hot standby in various ways but I might not bother; deploying the ventilator into its own autoscaling group means AWS will bring it back automatically if it becomes unhealthy, so the hot standby can be a stretch goal for you if there's time remaining. This I wouldn't use a spot fleet for; preemption is fine for workers, which have small working sets and their tasks can be easily retried, but the ventilator going down and needing recovery should be extraordinary.

Yassine Benyahia
@Wronskia
Hello everyone, I am wondering how to properly handle server errors? I am using a ROUTER-DEALER pattern, the server receives a job and does some computation before returning response. How can i handle errors during computation server time and propagate this to the client (something similar to 500 Internal Server Error)? For now it seems that if such a case happens the client is just waiting indefinitely
thanks a lot
my client looks like this
Yassine Benyahia
@Wronskia
class Client:
    def __init__(self, host, port):
        self.zmq_context = zmq.Context()
        self.socket = self.zmq_context.socket(zmq.DEALER)
        self.socket.connect(f"tcp://{host}:{port}")
        self.identity = "123"

    def recv_array(self, socket, flags=0, copy=False, track=False):
        """recv a numpy array"""
        md = socket.recv_json(flags=flags)
        msg = socket.recv(flags=flags, copy=copy, track=track)
        buf = memoryview(msg)
        A = np.frombuffer(buf, dtype=md["dtype"])
        return A.reshape(md["shape"])

    def run(self, request_data):
        self.send(json.dumps(request_data))
        result = self.recv_array(self.socket)
        return result
and the worker
class Worker(threading.Thread):
    def __init__(self, zmq_context, encoder, _id):
        threading.Thread.__init__(self)
        self.zmq_context = zmq_context
        self.worker_id = _id
        self.encoder = encoder

    def run(self):
        socket = self.zmq_context.socket(zmq.DEALER)
        socket.connect("inproc://backend")

        while True:
            client_id = socket.recv()
            request = socket.recv()
            result = self.compute(request)
            flags = 0
            md = dict(dtype=str(result.dtype), shape=result.shape,)
            socket.send(client_id, zmq.SNDMORE)
            socket.send_json(md, flags | zmq.SNDMORE)
            socket.send(result, flags=0, copy=False)
and the server
class Server(object):
    def __init__(self, embedding, model, port, num_workers=4):
        self.zmq_context = zmq.Context()
        self.port = port
        self.num_workers = num_workers
        self.encoder = Encoder()

    def start(self):
        socket_front = self.zmq_context.socket(zmq.ROUTER)
        socket_front.bind(f"tcp://0.0.0.0:{self.port}")

        # Backend socket to distribute work.
        socket_back = self.zmq_context.socket(zmq.DEALER)
        socket_back.bind("inproc://backend")

        # Start workers.
        for i in range(0, self.num_workers):
            worker = Worker(self.zmq_context, self.encoder, i)
            worker.start()
            logger.info(f"[WORKER-{i}]: ready and listening!")

        # Use built in queue device to distribute requests among workers.
        # What queue device does internally is,
        #   1. Read a client's socket ID and request.
        #   2. Send socket ID and request to a worker.
        #   3. Read a client's socket ID and result from a worker.
        #   4. Route result back to the client using socket ID.
        zmq.device(zmq.QUEUE, socket_front, socket_back)
Min RK
@minrk
Add to your message protocol a status field (could be in your existing md dict) that defines the message structure for success or failure. For example:
try:
    result = self.compute(request)
except Exception as e:
    # or transform the error more deliberately
    md = dict(
        status="error",
        error=str(e),
    )
    # empty result to ensure message consistency
    result = b""
else:
    # success
    md = dict(
        status="ok",
        dtype=str(result.dtype),
        shape=result.shape,
    )

socket.send(client_id, flags | zmq.SNDMORE)
socket.send_json(md, flags | zmq.SNDMORE)
socket.send(result, flags, copy=False)
So you always get a reply, even on errors. Then it's up to the client to handle that error reply appropriately.
Yassine Benyahia
@Wronskia
Great, Thanks a lot @minrk
Min RK
@minrk
you may also want to deal with timeouts to handle errors at a higher level that prevent replies (e.g. worker crashes)
Yassine Benyahia
@Wronskia
Make sense, Thanks! I guess i would have to this in the client, using setsockopt
Min RK
@minrk
yes, or use a poller
Yassine Benyahia
@Wronskia
would something like this work
def run(self, request_data):
        self.socket.setsockopt(zmq.RCVTIMEO, self.timeout)
        self.send(json.dumps(request_data))
        try:
            result = self.recv_array(self.socket)
        except zmq.error.Again as e:
            # log e
            raise TimeoutError("timeout")

        return result
Min RK
@minrk
I expect so, yes
François Tessier
@hephtaicie
Hello everyone! I'm trying to build a simple scheduler using zmq (Python). On one side, there are multiple clients that submit requests to the scheduler (a single app). On the other side, there are servers that register to the scheduler and offer a list of available resources. When a client submit a request, the scheduler will determine the resources that will be allocated to this client. Then, this scheduler will reach the corresponding server to let it know that this set of resources is now allocated by a client. The server returns info about the resources to the scheduler which forwards it to the calling client.