by

Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Activity
  • Jun 05 20:30
    Arkatufus converted_to_draft #4453
  • Jun 05 20:29
    Arkatufus synchronize #4453
  • Jun 05 20:29
    Arkatufus commented #4453
  • Jun 05 18:44
    Aaronontheweb commented #4453
  • Jun 05 16:58
    Arkatufus opened #4453
  • Jun 05 10:51
    AndrewBoklashko commented #1675
  • Jun 05 10:18
    AndrewBoklashko commented #1675
  • Jun 05 07:34
    thomaslazar commented #4449
  • Jun 05 06:49

    dependabot-preview[bot] on nuget

    Bump Google.Protobuf from 3.12.… (compare)

  • Jun 05 06:49
    dependabot-preview[bot] labeled #4452
  • Jun 05 06:49
    dependabot-preview[bot] opened #4452
  • Jun 05 05:06
    andersstorhaug edited #4451
  • Jun 04 19:53
    Arkatufus synchronize #4448
  • Jun 04 19:28
    snekbaev commented #4446
  • Jun 04 19:28
    snekbaev commented #4446
  • Jun 04 19:27
    snekbaev commented #4446
  • Jun 04 19:13
    Arkatufus commented #4446
  • Jun 04 19:13
    Arkatufus commented #4446
  • Jun 04 19:03
    snekbaev commented #4446
  • Jun 04 19:03
    snekbaev commented #4446
Aaron Stannard
@Aaronontheweb
it will automatically launch the benchmark
after running the normal MSBuild / NUnit stuff
Natan Vivo
@nvivo
just downloaded.. let me try
Aaron Stannard
@Aaronontheweb
:+1:
Natan Vivo
@nvivo
ok. run the test
I'll take a look at the code, not sure if I can help, but I'll give it a shot
just to understand, what you are doing is replacing the LinkedList for something else that does the swap internally?
Aaron Stannard
@Aaronontheweb
I use a LinkedList on top of the custom TaskScheduler I wrote
(based off of an MSFT example)
Aaron Stannard
@Aaronontheweb
but the actual threadpool uses this:
 internal sealed class WorkerQueue
        {
            internal ConcurrentQueue<Action> WorkQueue = new ConcurrentQueue<Action>();
            internal readonly ManualResetEventSlim Event = new ManualResetEventSlim(false);

            public void AddWork(Action work)
            {
                WorkQueue.Enqueue(work);
                Event.Set();
            }
        }
(it's further down the page)
Natan Vivo
@nvivo
hm, ok
Aaron Stannard
@Aaronontheweb
the task scheduler is just so we can schedule on top of this thread pool
doesn't really do much
Max Gortman
@nayato
btw, CLR moved to concurrent queue to improve GC
it's easier for GC to walk queue segments compared to linked list nodes
Natan Vivo
@nvivo
so, the DedicatedThreadPool will accept work from multiple threads, but the worker is tied to a specific thread, correct?
Natan Vivo
@nvivo
got the idea here...
Roman Golenok
@shersh
Hey guys
How to use Database in Akka.NET ? Any samples? Do I need to create Actor for Db reading operations or I should perform action inside actor where it need?
Natan Vivo
@nvivo
@Aaronontheweb, @rogeralsing, I found some ways to optimize the queue. A simple thing is that Count is quite expensive in ConcurrentQueue, and it's being called twice in the PoolWorker. Just by calling it once you can improve perf by ~15%.
For more gains, the test needs to be changed because the way it is, it considers the worst possible scenario where messages are being queued and dequeued at the same time in all threads all the time, so ConcurrentQueue is already very optimized for this and it will be hard to beat it. But if you change the pattern to add some messagesm wait for a while then add more, you can measure other improvements with some tricks.
I tested a variation of your swap pattern, where I basically return the entire concurrentqueue at once and create another. locks only on add and dequeueall. I could see about 40% improvements doing that vs dequeueing one at a time
Natan Vivo
@nvivo
well.. just some ideas...
Roger Johansson
@rogeralsing
Neat, that was what i tried to solve lockless (but failed due to stale reads)
Is the perf gain st
Still true in a multi producer senario?
Or does the gain vanish then due to contention?
Bartosz Sypytkowski
@Horusiath
@rogeralsing @Aaronontheweb Do we have some release planned in the near future?
Roger Johansson
@rogeralsing
Not that I know of, unless @Aaronontheweb has something planned
Natan Vivo
@nvivo
@rogeralsing I didn't change the tests, so it remained one producer. But it seems most of the time spent in ConcurrentQueue is not actually waiting, but dealing with the internal structures of it.
What I did that helped is that I grabbed the ConcurrentQueue code from corefx and added it locally to the project, and run the profiler.
We can see that in the code that is on github, ~30 % of the time is spent on TryDequeue and 30% is on Count
most of this time is just looping through segments inside the concurrent queue
so, it's not actually locks, just regular code running. it's hard to optimize it more for general use cases.
(when I say locks in concurrentqueue, I mean spinwait, not the lock keyword)
Natan Vivo
@nvivo
if you were to pass all the overhead to adding and zero to retrieving, the most optimal solution would be to use a simple List<T> with a predefined size using locks/spinwait on Add, and for dequeue you have a single method that locks/spinwait, and exchange the list with a new one. you can then loop the list with tasks in the pool directly as an array. the issue is that it's hard to know what size this array should be
so, becomes a trade off to a linkedlist doing the same.
after that, the path would be a linked list of arrays, and that's exactly what ConcurrentQueue is...
Natan Vivo
@nvivo
the point I said about the test is that unless you have some time to fill a list with some items without dequeueing, the gains you have for swapping the list are lost. in the test, it keeps producing and swapping all the time, so the cost becomes of swapping becomes higher. if you just give it some time to fill a list and swap with a minimum amount of items, swapping becomes cheaper and these other solutions apply
another thing.. if using concurrentqueue, replacing Count for IsEmpty is much cheaper, as IsEmpty only checks the local segment, while Count needs to loop through all of them
Roger Johansson
@rogeralsing
:+1:
Roger Johansson
@rogeralsing
another solution would be to add a true DequeAll in a modified version of ConcurrentQueue. that would be feasable too, right? we could just make the concurrentqueue head point to where the last segment is, and then enumerate over all of the segments we just removed... havent checked the code for it, but that should work, right?
Natan Vivo
@nvivo
It could work. One of the things I thought is just using that code and tweaking some parameter to match the usage. ConcurrentQueue has a lot of parameters inside that were probably calculated for the most general cases
as I understand, what it does inside is that it creates a 32-item array for each segment, and it grows by linking them. Each segment has its lock, so you only lock if you are modifying that 32-item boundary
From the profiler, even with the 4th iteration (10 million actions I guess?), the time it wasted on spinwait was about 4% only. 30% was just moving through segments
so, in theory reducing the # of segments (that is - increasing the segment size) could help...
but at some point I wonder if you're not optimizing only for the benchmark
in the case of DequeueAll, if it needs to loop and lock through all segments, there might not be a lot of gain. But it could get the entire segment as an array and work those before going to the next