最新文章专题视频专题问答1问答10问答100问答1000问答2000关键字专题1关键字专题50关键字专题500关键字专题1500TAG最新视频文章推荐1 推荐3 推荐5 推荐7 推荐9 推荐11 推荐13 推荐15 推荐17 推荐19 推荐21 推荐23 推荐25 推荐27 推荐29 推荐31 推荐33 推荐35 推荐37视频文章20视频文章30视频文章40视频文章50视频文章60 视频文章70视频文章80视频文章90视频文章100视频文章120视频文章140 视频2关键字专题关键字专题tag2tag3文章专题文章专题2文章索引1文章索引2文章索引3文章索引4文章索引5123456789101112131415文章专题3
问答文章1 问答文章501 问答文章1001 问答文章1501 问答文章2001 问答文章2501 问答文章3001 问答文章3501 问答文章4001 问答文章4501 问答文章5001 问答文章5501 问答文章6001 问答文章6501 问答文章7001 问答文章7501 问答文章8001 问答文章8501 问答文章9001 问答文章9501
当前位置: 首页 - 科技 - 知识百科 - 正文

Redislatencyspikesandthe99thpercentile

来源:懂视网 责编:小采 时间:2020-11-09 12:59:14
文档

Redislatencyspikesandthe99thpercentile

Redislatencyspikesandthe99thpercentile:One interesting thing about the Stripe blog post about Redis is that they included latency graphs obtained during their tests. In order to persist on disk Redis requires to call the fork() system call. Usually forking using physical server
推荐度:
导读Redislatencyspikesandthe99thpercentile:One interesting thing about the Stripe blog post about Redis is that they included latency graphs obtained during their tests. In order to persist on disk Redis requires to call the fork() system call. Usually forking using physical server

One interesting thing about the Stripe blog post about Redis is that they included latency graphs obtained during their tests. In order to persist on disk Redis requires to call the fork() system call. Usually forking using physical server

One interesting thing about the Stripe blog post about Redis is that they included latency graphs obtained during their tests. In order to persist on disk Redis requires to call the fork() system call. Usually forking using physical servers, and most hypervisors, is fast even with big processes. However Xen is slow to fork, so with certain EC2 instance types (and other virtual servers providers as well), it is possible to have serious latency spikes every time the parent process forks in order to persist on disk. The Stripe graph is pretty clear in this regard.

img://antirez.com/misc/stripe-latency.png

As you can guess, if you perform a latency test during the fork, all the requests crossing the moment the parent process forks will be delayed up to one second (taking as example the graph above, not sure about what was the process size nor the EC2 instance). This will produce a number of samples with high latency, and will affect the 99th percentile result.

To change instance type, configuration, setup, or whatever in order to improve this behavior is a good idea, and there are use cases where even a single request having a too high latency is unacceptable. However apparently it is not obvious how latency spikes of 1 second every 30 minutes (or more, if you use AOF with the right rewrite triggers) is very different from latency spikes which are evenly distributed in the set of requests.

With evenly distributed spikes, if the generation of a page needs to perform a number of requests to a Redis server in order to create the output, it is very likely that a page view will incur in the latency penalty: this impacts the quality of service in a great way potentially, check this link: http://latencytipoftheday.blogspot.it/2014/06/latencytipoftheday-most-page-loads.html.

However 1 second of latency every 30 minutes run is a completely different thing. For once, the percentile with good latency gets better *as the number of requests increase*, since the more the requests are, the more this second of latency will be unlikely to get over-represented in the samples (if you have just 1 request per minute, and one of those requests happen to hit the high latency, it will affect the 99.99th percentile much more than what happens with 100 requests per second).

Second: most page views will be unaffected. The only users that will see the 1 second delay are the ones that make a request crossing the fork call. All the other requests will experience an extremely low probability of hitting a request that has a latency which is significantly worse than the average latency. Also note that a page view crossing the fork time, even when composed of 100 requests, can’t be delayed for more than a second, since the requests are completed as soon as the fork() call terminates.

The bottom line here is that, if there are hard latency requirements for each single request, it is clear that a setup where a request can be delayed 1 second from time to time is a big problem. However when the goal is to provide a good quality of service, the distribution of the latency spikes have a huge effect on the outcome. Redis latency spikes due to fork on Xen are isolated points in the line of time, so they affect a percentage of page views, even when the page views are composed of a big number of Redis requests, which is proportional to the latency spike total time percentage, which is, 1 second every 1800 seconds in this case, so only the 0.05% of the page views will be affected.

Latency characteristics are hard to capture with a single metric: the full percentile curve and the distribution of the spikes, can provide a better picture. In general good rule of thumbs are a good way to start a research, and in general it is true that the average latency is a poor metric. However to promote a rule of thumb into an absolute truth has also its disadvantages, since many complex things remain complex, and in need for close inspection, regardless of our willingness to over simplify them.

At the same time, fork delays in EC2 instances are one of the worst experiences for Redis users in one of the most popular runtime environments today, so I’m starting to test Redis with EC2 regularly now: we’ll soon have EC2 specific optimization pages on the Redis official documentation, and a way to operate master-slaves replicas with persistence disabled in a safer way.

If you need EC2 + Redis master with persistence disabled now, the simplest to deploy “quick fix” is to disable automatic restarts of Redis instances, and use Sentinel for failover, so that crashed masters will not automatically return available, and will be failed over by Sentinel. The system administrator can restart the master manually after checking that the failover was successful and there is a new active master.

EDIT: Make sure to check the Hacker News thread that contains interesting information about EC2, Xen and fork time: https://news.ycombinator.com/item?id=8532851. Also not all the EC2 instances are the same, and certain types provide great fork time on pair with bare metal systems: https://redislabs.com/blog/testing-fork-time-on-awsxen-infrastructure#.VFJQ-JPF8yF Comments

声明:本网页内容旨在传播知识,若有侵权等问题请及时与本网联系,我们将在第一时间删除处理。TEL:177 7030 7066 E-MAIL:11247931@qq.com

文档

Redislatencyspikesandthe99thpercentile

Redislatencyspikesandthe99thpercentile:One interesting thing about the Stripe blog post about Redis is that they included latency graphs obtained during their tests. In order to persist on disk Redis requires to call the fork() system call. Usually forking using physical server
推荐度:
标签: and 99 the
  • 热门焦点

最新推荐

猜你喜欢

热门推荐

专题
Top