Where communities thrive


  • Join over 1.5M+ people
  • Join over 100K+ communities
  • Free without limits
  • Create your own community
People
Repo info
Activity
    Gil Tene
    @giltene
    @ldemailly Sorry for the very late respose (didn't look here in a while). The only proper way to compute response time for a service that serves thing at e.g. 5 qps in a scenario where when clients come in with requests at 10qps is to model a never-evding growing queue of started-but-not-yet-served requests. If you do anythjing else (e.g. back off when you can't serve them, cap the depth, or match the rate to what can actually get done), you are modeling a dirrent scenario.
    Gil Tene
    @giltene
    The "does my respinse time grow linearly with test length when I try to push more than the server can handle" is a basic sanity check for all load generators. Any load generator that doe snot not exhibit this linear growth when claiming to genearte a rate that is higher than is sustainable by the server is exhibiting a basic bug. And that bug usually shows up at the micro-to-sub-second-levels as huge coordinated omission and completely wrong response time reoprting. E.g. if server dramatically slows down for 0.5 seconds and can only sustain 1/10th of the rate during that time, such load generators will "temporarily" adjust to what the server can do when the real world will actually queue things up. Similarly, when load generators attempt to model variable incoming rates (e.g. a spike of 1000 messages arrivign within 20 msec every second, rather than 1000 messeages per second) their coordinated omission backoff usually mis-reports reponse time behavior by orders of magnitude.
    Alec
    @ahothan
    @giltene sure I'm going to ask my colleague to do a pull request as he did it under his name. The changes we did allows reporting the encoded histogram (base64) at the end of the wrk2 run and also allows to report latency histograms at intervals (e.g. every 10 seconds) which allows us to track latency variations over time using a TSDB and tools like grafana.
    Alec
    @ahothan
    I am also working in introducing Hdr into the networking benchmarking industry since all existing network traffic generators today(open source of commercial) do not use a good latency histogram implementation. Most of then are in house custom developed with very little features and too primitive reporting formats: some would report just min/max/avg, some others very basic and coarse fixed size bucket dumps, some would provide a limited list of percentile values - clearly insufficient for doing any form of smart aggregation. One simple example, without even talking about distributed runs, is the ability to collect latency histograms per flow of traffic: e.g. 1 histogram per direction for a bi-drectional traffic, or even flow grouping are important concepts that are completely overlooked today due to the limitations of the histogram libraries used.
    having low cost histograms per flow group and ability to get an aggregated view is pretty important and not supported at all even in commercial traffic generators that cost as much as a nice high end new car apiece. We can discuss this further in the Hdr channel...
    Laurent Demailly
    @ldemailly
    @ahothan merging multiple histograms and a distributed mode is in the works on fortio. for now the export model of fortio is json (or proto/go struct)
    Laurent Demailly
    @ldemailly
    Gil thanks for the reply. It’s an interesting “what if” on issuing additional requests when the service already is slow/can’t keep up - in general I’m more interested in data before that stage but I will think about the option to switch from fixed connection at target rate to “making N req/sec no matter what” (a bit worried that will just mean the client/driver will OOM…)
    Alec
    @ahothan
    @ldemailly I'm curious to see how you will do the merge + serialization, why reinvent the wheel?
    Hdr has pretty much standardized the serialization format with many languages support
    "I’m more interested in data before that stage" - the problem is ... you might not even know when you are past that stage or not!
    Laurent Demailly
    @ldemailly
    I do know... I run at lower qps than max
    I also print warnings if more than 10% of the sleep are 'negative' (ie it's falling behind)
    Alec
    @ahothan
    you may not always know from the qps absolute value because it depends on the system under test, 10 qps may be super easy for some or super hard for others
    Laurent Demailly
    @ldemailly
    as I said, I run first at max to get an idea of the max, then run at 80% of that max
    Alec
    @ahothan
    what if the max varies over time?
    as soon as you slip on the interval (what you call negative sleep), you fall into the coordinated omission situation
    Alec
    @ahothan
    a feature that was useful for us is to be able to generate "rolling" histograms over time, basically to report a full latency histogram for the last interval (e.g. every 30 seconds) - not sure if fortio also supports it?
    this is for systems that can vary greatly over time due to some abnormal condition
    e.g. we use it to test failover situations (in networking you have redundant packet paths and routing will follow the alternate path if the main path is no longer passing traffic)
    or in storage benchmarking this can be used to see the impact of one node in a cluster going down
    path failover or http server redundancy will typically also apply for HTTP workloads
    Laurent Demailly
    @ldemailly
    if your system can’t do the fixed qps you target, there isn’t much point of that fixed qps target, and I have plenty of ways to detect that case.
    Marwan Rabbâa
    @waghanza
    hi @/all
    we are using wrk on https://github.com/tbrand/which_is_the_fastest (on the next release). we may consider using wrk2, is there any plan about merging those two projects ?
    Laurent Demailly
    @ldemailly
    Latency at max is really not that meaningful so indeed I suggest you use wrk2 instead (or fortio :p )
    Marwan Rabbâa
    @waghanza
    sorry I do not understand
    why not meaningful ?
    Laurent Demailly
    @ldemailly
    because at the maximum throughput, something is saturated and the latency just increases; you want to check latency at the “knee” of the curve latency vs qps; typically around 75-80% of the max (ie if your system can do 10k qps; check latency at 7.5k qps)
    Marwan Rabbâa
    @waghanza
    ok
    Gil Tene
    @giltene
    More specifically, you want to check for what (highest) sustained throughput can be carried without causing latency or response time to become unacceptable. You can’t find this by picking some % of the max observed latency (under which latency will virtually always be unacceptable, if there is a latency requirement). In some systems that would happen at 80% of max throughout, in some at 5%, and in some at 99.9%.
    Gil Tene
    @giltene
    You need to test a range of throughputs to find the highest sustainable throughout. And you need to test at a any given throughout for a prolonged period (long enough for side effects like accumulated background work to exhibit their symptoms), which on many systems means tens of minutes. The “classic” ramp-up technique (e.g. a 100 minutes test ramping from 0 to 1000 clients at a ramp rate of 10 clients per minute) virtually always gives you wrong data, since it tends to detect “breakage” at a throughout that is much higher than what is actually sustainable (if you ran at that throughout for e.g. 100 minutes)
    Gil Tene
    @giltene
    Doing a quick set of tests to identify the likely “knee” point (finding things that break for sure is quick) is a good way to focus the remaining tests, but often when you continue to test for real, many systems will find that it is hard to maintain a throughout that is even 20% of where “things stopped breaking in 2 minute tests” without the common impacts of background accumulated-debt work causing service level failures (and e.g. flipped circuit breakers). Things like periodic journal or other buffer flushing, data merging (e.g. table compaction), re-indexing, cache or catalog refreshing, garbage collection of all sorts, and even exhausting scheduler quotas are all examples of accumulated debt that is paid after some time, and have delayed effects which happen anywhere from 10s or milliseconds to tens of minutes after the actual operations that incurred the debt have completed. Such accumulated debt will cause future operations to cross latency response time requirement boundaries, which is why you need to keep going at a given throughout fir quite a while if you wish to know that it is sustainable.
    Unfortunately sustainable throughout (which per the above, is much more time consuming to establish in experiments) is the thing that you need to know in order to answer “how much can this instance take” or “how many instances do I need to carry a load of X” questions. A temporary max established in a twofold ten minute test is useless in estimating that in most systems.
    Andriy Plokhotnyuk
    @plokhotnyuk
    @giltene Gil, please share what do you think about following benchmarks which aim to test throughput of all contemporary web libraries and frameworks: https://github.com/TechEmpower/FrameworkBenchmarks
    Here are results of latest sprint for simple HTTP/1.1 with JSON serialization: https://www.techempower.com/benchmarks/#section=test&runid=3da523ee-fff1-45d8-9044-7feb532bf9ee&hw=ph&test=json
    Sample of raw logs with results for some of frameworks from top-10 at previous chart: https://tfb-status.techempower.com/unzip/results.2018-06-28-04-18-09-819.zip/results/20180625165335/colossus/json/raw.txt
    Results of other sprints for both types of servers (physical and cloud) are gathered on this page: https://tfb-status.techempower.com/
    Samuel Williams
    @ioquatix
    How do you maintain constant throughput if the server is not fast enough?
    Gil Tene
    @giltene
    The constant throughout is for the load, not the server. It defines the model of when requests are supposed to be initiated, regardless of what the server can actually do. The response time of a request is then measured from when it was supposed to start to when it actually completed. If the server is not fast enough it will “fall behind”
    Gil Tene
    @giltene
    and the response times will start growing linearly with time (the longer you spend being slower than the incoming rate, the longer the response times for incoming requests get ). If the slowness was temporary (as it often is, with e.g. glitches, pauses, short stalls), it will eventually recover. If it never catches up, response times grow to infinity. (Think of a line of people at a coffee shop, and a barista that cant make coffee as fast as people are coming in, making the line grow, and with it the “how long did it take to get my coffee” metric)
    Samuel Williams
    @ioquatix
    You should put that explanation in the readme.
    Gil Tene
    @giltene
    Do you mean the coffee shop example? The readme slready explain the technique used: “The model I chose for avoiding Coordinated Omission in wrk2 combines the use of constant throughput load generation with latency measurement that takes the intended constant throughput into account. Rather than measure response latency from the time that the actual transmission of a request occurred, wrk2 measures response latency from the time the transmission should have occurred according to the constant throughput configured for the run. When responses take longer than normal (arriving later than the next request should have been sent), the true latency of the subsequent requests will be appropriately reflected in the recorded latency stats.”
    Samuel Williams
    @ioquatix
    Coffee shop example
    I read the README and didn't get it even the bit quoted above
    but 🤷‍♂️
    Fatima Tahir
    @Fatahir
    Hi, anyone knows how can i print all the requests and latencies generated by wrk2?
    Samuel Williams
    @ioquatix
    Fork the code and write some C :p
    Fatima Tahir
    @Fatahir
    I want to set each thread address differently. Using wrk.lookup, what i understand is that it can take only host and port. if i want to set thread address like "localhost:8080/index.html", how can i set to thread.addr? Also I have four threads and each thread has different address then how can i generate latency of each address? latency of overall script can be computed in done function using latency:percentile. but i dont know how can i compute latency of each thread.
    samcgardner
    @samcgardner
    Is there a good rule of thumb of number of threads/number of connections to use with wrk/wrk2?