Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • Nov 18 20:03
    luxb70 opened #155
  • Aug 24 10:24
    Kishi85 closed #153
  • Aug 24 10:24
    Kishi85 commented #153
  • Aug 24 08:21
    bakwc commented #153
  • Aug 24 08:21
    bakwc commented #153
  • Aug 24 08:19
    bakwc opened #154
  • Aug 24 08:15

    bakwc on 0.3.10

    (compare)

  • Aug 24 08:14

    bakwc on master

    New release (compare)

  • Aug 24 08:13

    bakwc on master

    Removed debug requests (compare)

  • Aug 24 06:05
    Kishi85 edited #153
  • Aug 24 05:43
    Kishi85 opened #153
  • Jul 30 08:33

    bakwc on 0.3.9

    (compare)

  • Jul 30 08:33

    bakwc on master

    New version (compare)

  • Jul 30 08:29
    bakwc closed #142
  • Jul 29 23:33
    protosam commented #142
  • Jul 29 23:33
    protosam commented #142
  • Jul 29 11:14
    bakwc commented #142
  • Jul 29 11:13
    bakwc commented #142
  • Jul 29 11:13

    bakwc on master

    dns resolve issue Merge branch 'master' into fixD… Merge pull request #152 from ba… (compare)

  • Jul 29 11:13
    bakwc closed #152
quocbao
@baonq-me
This problem can be solved easily if you redirect all client request to a single server that is a leader in raft
Filipp Ozinov
@bakwc

Should one lock manager oversee all of the locks, or should there be a lock manager for each lock?

One lock manager is enough, it can handle multiple locks

Better solution will be to add a @repicated method to your object that modifies edges, eg:
@replicated
def changeEdgeColor(self, edgeID = 12345, edgeColor = 'blue'):
You don't need any lock to execute repicated methods, they always executed at the same order at all servers (but you need to handle case where edge with given id is already deleted - just skip editing for example, and all edgeID-s should be unique).
Matt Hall
@mattall

Thank you both! I am using replicated sync for all of my graph-changing methods.

My server has multiple replicated classes, so forwarding to the leader is not an option since the leader of the different classes might not be on the same server. Basically, I let the server who caught the request handle responding to it, and make sure that it uses replicated methods for changing anything across the cluster.

The thing is that with the graph object, I want to make updates to multiple edges atomic. For instance if multiple clients are trying to change edges on different paths, and those paths overlap with a common edge, then I only want one client to have access to all of those edges at any time.

@replicated_sync
def changeEdgeColor(self, edgeID = 12345, edgeColor = 'blue'):
    ...
    return "Success!"

@replicated_sync
def changeManyEdges(self, list_of_edges = [1, 2, 3] ):
     sort(list_of_edges)
     for edge in list_of_edges:
         self.__LockManager.tryAcquire(edge)

     for edge in list_of_edges:
         changeEdgeColor(self, edge, 'blue')

     for edge in list_of_edge:
         self.__LockManager.release(edge)

One question here about the lock manager. it seems like tryAcquire does not require me to create the lock myself. Is that true?

if existingLock is None or existingLock[0] == clientID:
            self.__locks[lockID] = (clientID, currentTime)
return True
Geert Audenaert
@FastGeert
Hi guys. I'm using pysyncobj in gevent application where i do monkey.patch_all() as the first line that gets executed. From time to time i get the following stack trace that messes up the application:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.6/dist-packages/gevent/threading.py", line 177, in run
    super(Thread, self).run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pysyncobj/syncobj.py", line 500, in _autoTickThread
    self._onTick(self.__conf.autoTickPeriod)
  File "/usr/local/lib/python3.6/dist-packages/pysyncobj/syncobj.py", line 577, in _onTick
    res = self.__doApplyCommand(entry[0])
  File "/usr/local/lib/python3.6/dist-packages/pysyncobj/syncobj.py", line 701, in __doApplyCommand
    return self._idToMethod[funcID](*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pysyncobj/syncobj.py", line 1389, in newFunc
    return func(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/pysyncobj/batteries.py", line 419, in prolongate
    for lockID in self.__locks.keys():
RuntimeError: dictionary changed size during iteration
Could this be related to the use of gevent?
Filipp Ozinov
@bakwc
Hi, @FastGeert , thanks for report. It is bug in our library, created an issue, bakwc/PySyncObj#102
Filipp Ozinov
@bakwc
Fixed it, try version from github
Geert Audenaert
@FastGeert
Great, will do.
I'm really trilled by your project btw. I was wondering why consumers cannot be added after SyncObj is created?
MoayedHajiAli
@MoayedHajiAli
Hello everyone.
Hope everyone is fine.
I have a simple question about PySyncObj.
When sending too many RPS, the leader fails.
That happens in the provided benchmark as well.
It is weird for me though, as I expect more packets to be dropped when sending to many requests but not the leader to fail.
can anyone please explain why does this happen ?
Filipp Ozinov
@bakwc
What do you mean by leader fails?
MoayedHajiAli
@MoayedHajiAli
Hi.
Thanks for your fast reply and Sorry for the delay in the reply. I did not notice your reply.
I mean when sending a high RPS, in the RPS benchmark, some packets fail right?
and the FAIL_REASON that is reported is, in a high percentage, 5 (which indicates a LEADER_CHANGE)
However, it seems strange to me because I think that the leader should stay the same whatever the number of packets that are sent.
given a sufficient timeout, the leader should maintain its state and never change, unless the leader itself fail for any reason.
I might be understanding something wrong?
MoayedHajiAli
@MoayedHajiAli
Another question, do you have an idea of how I might benchmark the throughput of a sync program?
sending the MAX RPS every second and then counting how many responses the client receive every second will do?
MoayedHajiAli
@MoayedHajiAli
Hello.
Filipp Ozinov
@bakwc
1) Packets are not lost, they are stored in a queue (it use tcp).

sending the MAX RPS every second and then counting how many responses the client receive every second will do

yep, that way is ok

quocbao
@baonq-me
@MoayedHajiAli I think the result will depends on network speed and IOPS of storage backend rather than the algorithm itself
MoayedHajiAli
@MoayedHajiAli
Thanks for your reply.
But still, I did not get it why the leader is changed when sending a high RPS.
I tracked the leader and it actually changing sometimes when sending a high number of requests.
For example in the RPS benchmark I get something like this for around 20000 RPS:
SUCCESS RATE: 0.169840984098
AVG DELAY: 4.72234106064
LOST RATE: 0.270159015902
ERRORS STATS: 2
1 0.212185504265
5 0.787814495735
number 5 indicates a LEADER_CHANGE which seems weird for me.
I understand that not all the packets will success, but why is the leader changing?
why does the leader crash?
In a normal connection when sending too many requests, some of them will be dropped, or will not receive a response, or whatever. But the server will never crash.
In our case, the leader crashes or it timeout I am not sure. But the leader changes.
@baonq-me I understand that.
But as I am testing on the same machine (which means the network connection is reliable to some extent), the leader has no reason to change.
Filipp Ozinov
@bakwc
package queue fills, leader does not have enought time to process all packages, so it can't response in time.
package1 was sent at time 100
package2 was sent at time 100
package10000 was sent at time 120
after that package that pings leader was sent
but leader was busy processing packages 1-10000, so he answered to ping package only in a minute
MoayedHajiAli
@MoayedHajiAli
Hmmm I understand that. but how is this connected to the change in the leader?
if we have a cluster of 3 servers and server 1 was the leader. Now after sending the 15000 packages, for example, the leader will be busy processing the packages 1-10000 and the other 5000 should wait or will be responded to as the QUEUE_FULL (error number 1). but what happens is that a new election is started and server 2, for example, become a leader.
Is this normal? shouldn't the leader maintain its state and never change unless we manually killed the leader?
please note that when this happens, either the server not the connection crashes but still the leader changes.
I tried increasing the timeout, but the same problem happened.
Filipp Ozinov
@bakwc
Leader elections starts when leader does not response to ping. That is what happens when leader get a lot of messages - he fails to response in time
This is normal
If leader is overloaded - he can not be leader because he don't have enough processor power to process all mesages - we need another more powerfull leader
MoayedHajiAli
@MoayedHajiAli
hmmm now I get it.
I thought the heartbeat should get a priority.
Thank you so much ^_^
schmidtfx
@schmidtfx
Howdy, is there a desired way to redistribute load if a node goes down? In my scenario, I'm having n number of processes running, whereby the processes are opening websocket connections. Each process will be responsible for a subset of the required websockets. If a process goes down, I'd like to redistribute
Filipp Ozinov
@bakwc
You should use "distributed lock". Process should acquire lock before processing a subset of sockets. You should multiple processes responsible for one subset, one is processing and others are waiting untill lock will be freed.
Mike
@mikesneider
Hi everyone, I hope everyone it's great!
I am trying to start an example project with PySyncObj, I'm trying to run the example "counter", but, when I try to run the example the message "Usage: .\counter.py self_port partner1_port partner2_port ..." shows. sorry I am just learning about it. I'll appreciate your help.
Filipp Ozinov
@bakwc
have you tried to specify self port and partner ports?
eg. ./counter.py 12345 12346 12347
Mike
@mikesneider
Thank you very much, I just add some ports to the "sys.argv" list. Now I can run the example, but, the function getLeader is always None, then it never adds some value. some know why? when I run getStatus, I get that: {'version': '0.3.7', 'revision': 'deprecated', 'self': TCPNode('localhost:1234'), 'state': 0, 'leader': None, 'partner_nodes_count': 2, 'partner_node_status_server_localhost:1236': 0, 'partner_node_status_server_localhost:1235': 0, 'readonly_nodes_count': 0, 'log_len': 1, 'last_applied': 1, 'commit_idx': 1, 'raft_term': 0, 'next_node_idx_count': 0, 'match_idx_count': 0, 'leader_commit_idx': None, 'uptime': 8, 'self_code_version': 0, 'enabled_code_version': 0}
Filipp Ozinov
@bakwc
you need to run another 2 instances to make a cluster of 3 instances in total (don't forget to change ports of each instance)
It doesn't work with 1 node
Mike
@mikesneider
Thank you very much, I achieved to run the example, now I m trying to understand every value of the getStatus(), someone knows a reference for this?
Filipp Ozinov
@bakwc
It's internal raft info, you need to read original raft paper to understand every parameter
Mike
@mikesneider
many thaks!
Mike
@mikesneider
Hi, I have a question, how can I crash on purpose one of the instances?
Filipp Ozinov
@bakwc
press "Ctrl+C" in terminal?
Mike
@mikesneider
ups, sound logic, thank you!
MikeinBoulder
@MikeinBoulder
with ReplDict is it possible to do nested dictionaries?
camillosir
@camillosir
hi there, i'm new to pysyncobj which i recently discovered and it looks very good to fit in my project on a small network of agents which collaborate on an optimization problem to minimize energy costs. i started playing around by looking at the test_syncobj and kvstorage modules. one important feature of my application is that the collaboration should still work when some agents go offline and even in case that only 1 is alive. now i observed that in this case the the replicated dictionary (as in kvstorage example) can no longer be used. more specifically, with 3 agents configured and only 1 alive a set value is not executed until at least another comes up. how could i get this work with only a single node alive? thx for your help
2 replies
Pratik Poudel
@powerpratik
Hello, I am new in distributed systems. I tried implementing the examples and it worked and am trying to implement this on in a new custom example. I have 2 questions, 1) Instead of using terminal to run multiple nodes, I used two instances of KVSTorage class(which is in the example file) I am able to generate the object but upon checking the status, all the nodes are down and the leader is not elected. Can you help me with this? I know my question might be very simplistic but I am new to this.
2) Is this raft implementation available in simulation environment and if yes, I would be very happy to be pointed in that direction and if no, can you point out some of the things that needs to be considered if we were to do so? like the time consideration in the simulated environment vs raw implementation of PySyncObj?
Thanks for the help.
Lorenzo Meninato
@lmeninato
Is there a recommended way to set up a daemon process/thread for a PySyncObj node? I'd love to see an example of this. I can get it working by following the examples and running processes in different terminals. I really just want to set up a cluster and be able to send the cluster messages easily. If I do kv = KVStorage(host:port, partners) but host:port != actual host and port of current process, will this let me send request to the cluster as a client?