Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
  • 16:27
    vmikhailov commented #6174
  • 09:46
    topnotch48 starred dotnet/orleans
  • 07:55
    turowicz commented #6073
  • 07:16
    hesamdarbandi starred dotnet/orleans
  • 00:37
  • Dec 13 23:34
  • Dec 13 20:33
    Deneas starred dotnet/orleans
  • Dec 13 19:59
    ReubenBond commented #6158
  • Dec 13 19:36
  • Dec 13 17:40
    DVDpro starred dotnet/orleans
  • Dec 13 14:19
    srberard commented #6162
  • Dec 13 14:18
    srberard commented #6162
  • Dec 13 12:56
    Vlad-Stryapko opened #6181
  • Dec 13 12:53
    vyshkant opened #6180
  • Dec 13 10:01
    vmikhailov edited #6174
  • Dec 13 08:50
    pherbel commented #6178
  • Dec 13 08:48
    pherbel commented #6178
  • Dec 13 08:37
    pherbel edited #6178
  • Dec 13 07:40
    rvplauborg commented #6178
  • Dec 13 07:30
    rvplauborg commented #5110
Gutemberg Ribeiro
@galvesribeiro
humm
Veikko Eeva
@veikkoeeva
@galvesribeiro Evening.
Gutemberg Ribeiro
@galvesribeiro
:)
Gabriel Kliot
@gabikliot
@ashkan-saeedi-mazdeh , regarding courses I just meant watching or reading online courses from top universities. Not really attending in person.
@ashkan-saeedi-mazdeh , so back to distributed top-k: one idea is what I wrote already:
Divide the population into x partitions (lets say X is 100), calc top k on each partition separately and aggregate from them into a global one.
Gabriel Kliot
@gabikliot
More details: each player will calc local scores. Every time his local top k changes, he will report for first level aggregator his new score results. THis 1st level aggregator will update his local top K. I that does not change, do nothing. Otherwise, update the one global second level aggregator.
That way you have 2 level tree (actually 3 levels if you also consider the local player aggregation). That is called aggregation tree. Very standard trick in information retrieval and big data algorithms.
Rob Brink
@brinkrob
this only works if you only are interested in the global top k right? For example, if k is 10, only the top 10 from each partition need to be reported up. But this would not work if you wanted a complete ordering, or if k > partition size? Or am i missing something.
Gabriel Kliot
@gabikliot
here is another good survey: "An Overview of Distributed Top-K Ranking Algorithms". Google it.
it is fort global top k.
that is what he asked - global top K. Not global ordering.
it does not work for not top k, since it was not meant to solve not top K. It only works for top K. If you have a different problem, like sort a huge array, the solution will differ.
if k>X (X being number of partition), yes it works for K>X, why not.
lets says X is 100 and K is 1000. No problem.
Now @ashkan-saeedi-mazdeh asked for some courses/books about large data algorithms and distributed algorithms:
Gabriel Kliot
@gabikliot
IR - Information Retrieval:
1) http://web.stanford.edu/class/cs276/ Probably a god lecture is: "Guest speaker: Jeff Dean on the evolution of Google's search and retrieval system "
2) http://courses.ischool.berkeley.edu/i240/s13/schedule.php
Distributed data:
1) A Survey of Distributed Data Aggregation Algorithms - http://arxiv.org/abs/1110.0725 I think this is what oyu are looking for. One paper survey. Very good.
2) "Algorithms for Distributed and Streaming Data " http://people.cs.umass.edu/~mcgregor/bigdata.html
Looks like this should get you started. :-) Enjoy!
Rob Brink
@brinkrob
ok, my questions were born out of ignorance for how top-k worked... after some reading I get it. :)
Veikko Eeva
@veikkoeeva
Assembly unloading and Roslyn scripts loaded into domains. Might be of interest here too: https://gitter.im/dotnet/coreclr?at=55ddaf7b36e894436a9b27ae (@ReubenBond, @sergeybykov?).
Heh, I see I got a bit carried away with my spatials. :)
Veikko Eeva
@veikkoeeva
@ReubenBond To continue about entropy (cf. others), I wandered off a bit to think about load balancers. They need massive amounts of entropy to terminate TLS, so how do they get it? Can some be conversed with abbreviated handshakes, session tickets (one would need to look at the steps where entropy would be needed etc.). But I'm rambling before going to bed, what I meant was to link this: http://www.securityweek.com/how-tap-hardware-random-number-generator-your-load-balancer.
Short and interesting.
Veikko Eeva
@veikkoeeva
Actually, funny thing, that brings in mind Maxwell's Demon, the little bastard sitting on the gate between two rooms and letting only one bit pass through at a time. Probably somewhat interesting connection between physics (real world) and information technology (carries over to computational complexity. I think I've seen once a paper wherein some people were devising a method to use scattering of light (photons perhaps) to calculate TSP optimally, but got into trouble on how to measure it.
All right, it's almost midnight. It's hot here and kids aren't sleeping well, so I'm off topic. But now into bed in earnest.
(With some hand-waving it's probably possible to make a connection from these to distributed systems and Orleans. :))
Reuben Bond
@ReubenBond
@gabikliot thanks for posting those links
Thanks, @veikkoeeva
Jakub Konecki
@jkonecki
@reubenbond by single partition I mean ES in a cluster environment works in master - slave mode with one node writing the events and maintaining global order (each events gets a consecutive id) and asynchronously publishes to replica nodes. You get the scalability of reads as each replica can serve the immutable events but you're limited to a single master node for writes. I believe the ES has writes really optimised but still...
James Andrew-Smith
@james-andrewsmith
@jkonecki Wouldn't increasing the number of shards in ES distribute the writes between multiple master nodes?
Jakub Konecki
@jkonecki
@james-andrewsmith I don't think ES has sharing build in (you can obviously shard manually) - I think clustering has single master + multiple catchup replicas. Unless sharing is part of commercial offering. Shading will help but you are still limited to number of shards you can provision and your sharing function. In case of table storage you don't need to worry about it as each stream is an own partition.
James Andrew-Smith
@james-andrewsmith

@jkonecki We use the OSS version, I don't believe there is an enterprise one, just support.

When you create a new index you specify the number of shards (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-create-index.html)

This cannot be changed later, although moving data into new indexes is pretty trivial. Indexes are relatively cheap so you can create some really interesting structures. For example Kibana creates an index for each day, which makes dropping old data (or reducing the replicas on old data) really easy.

I guess the point is you can even "partition" by index, although that wouldn't be my first thought. Also, you can also specify the "routing" value, which determines which shard data will end up on (I do this for our tenants), so all their data is located on the same node.
Naeem Khedarun
@naeemkhedarun
Hey I wonder if someone can help. I have an azure deployment with 1 Silo (in a worker role) and 1 Client (in an Azure website).
They are both in the same virtual network, with the worker role in a subnet
They are configured to use AzureTable, and I can see the Silo writing active states into there
But when the client connects I get this error:
System.TimeoutException: Response did not arrive on time in 00:00:30 for message: Request *cli/f48911bc@4439eba4->S10.0.0.8:30000:178365626TypeManagerId@S00000011 #5: Orleans.Runtime.ITypeManager:GetTypeCodeMap(). Target History is: <S10.0.0.8:30000:178365626:TypeManagerId:@S00000011
The IP and gateway port is correct, so I'm not sure why its timing out
All the call does is return true so shouldn't be slow
James Andrew-Smith
@james-andrewsmith
@naeemkhedarun Have you configured the firewall on the worker?
That always bites me,
Naeem Khedarun
@naeemkhedarun
Hey @james-andrewsmith, you mean this: <InternalEndpoint name="OrleansProxyEndpoint" protocol="tcp" port="30000" />
Or windows firewall?
James Andrew-Smith
@james-andrewsmith
Windows firewall, I would just remote into the box and temporarily disable it to check.