Kibana 4 仍然不支持“其它”聚合

因为 Kibana 3 早就停止开发和维护了,也不支持新版本的 Elasticsearch,现在无论愿不愿意其实都该使用 Kibana 4 了,但是对于我来说 Kibana 4 有很大的问题——无法在饼图里面添加 Top N 之后,把其它的东西统统归入“其它(other)”值里面去。缺乏这个功能,很多时候都导致图表的实际意义并不大了,因为没有正确显示某个值的百分比。

Github 上面早有 issue 提交和跟踪了,但是还没有解决。

Kibana 4 另外的一个失望之处是矢量地图被抛弃了。

只能希望慢慢修复好吧。。。

CentOS 7 上面 curl 的响应时间增大问题

前段时间一同事在问为甚么同样是 curl ,在 CentOS 7 上面抓取一个网页的域名解析延迟总是会多 150ms 左右?甚至在抓取 localhost 时候也是一样。具体说来,使用的命令是:

[root@centos7 ~]# curl -o /dev/null -s -w \
'%{time_namelookup}:%{time_connect}:%{time_starttransfer}: \
%{time_total}:%{speed_download}\n' http://localhost
0.150:0.151:0.151:0.151:47442.000

对比了一下 CentOS 6 上面的结果,确实差距很大

[root@centos6 ~]# curl -o /dev/null -s -w \
'%{time_namelookup}:%{time_connect}:%{time_starttransfer}: \
%{time_total}:%{speed_download}\n' http://localhost
0.001:0.001:0.001:0.001:114975.000

同事怀疑是 CentOS 7 的域名解析方式有变化,然而我换做用 wget 去抓发现 CentOS 7 延迟也是很低的,排除了这个可能。原因应该是出在 curl 上面。

排查方式:
用 ltrace 加上 -T 参数打印时间戳跟踪了一下,发现是 curl_easy_perform() 这个函数这里出了问题:

Google 了一番之后,发现这个问题其实已经有了修复,将 50ms, 100ms, 150ms… 这样的延迟增长替换为 4ms, 8ms, 16ms… 这样的了。只不过 CentOS 7 里面的版本较老,还没有加入这个补丁。

今天因为 RHEL 7.2 的发布,看了下 Changelog,也已经合入这个补丁了。
https://bugzilla.redhat.com/show_bug.cgi?id=1130239

国内网站全站 HTTPS 逐渐流行

刚才浏览豆瓣的时候,发现从 Google 结果页跳的豆瓣音乐页面已经是 https 开头了,于是再试了一次它的主站点(https://www.douban.com)也已经可以访问,支持了 TLS 协议,终于不是再像不久前那样跳回到未加密的 http 页面。

用浏览器控制台稍微看了下豆瓣的资源加载情况(https://music.douban.com/subject/26651208/):
(图片可以右键打开看原图)

对比通过 http 协议请求的时候:

两者的区别其实有些有意思的地方。

现如今一个 web 页面已经不太可能只是引用单一域名下面资源(因为性能、扩展等诸多方面的原因),而这点恰好成为现实中部署 https 的最大阻碍之一——你需要将页面包含的所有资源都走 https 才行!在 CDN 时代,这意味着你需要 CDN 的支持才行。举例来说就是 http://music.douban.com/subject/26651208/ 这样的 url,如果你想将它部署到 https,那么页面里面加载的 img#.douban.com 域名也同样要部署 https!

看起来走 https 的豆瓣的部署是通过将js, css, 图片等静态资源放到一个新的域名(doubanio.com)下面来提供的,而不是像一般的做法那样——直接在原来的 img#.douban.com 静态资源域名下面加 https。这在实现上需要后端根据不同协议在页面插入不同域名。这点来说,推测应该是这样:豆瓣为了尽量不影响 http 协议的访问,引入了新的 doubanio.com 域名来发布静态资源。实际上通过查看 doubanio.com 的 DNS CNAME 记录,可以看到这个域名是指向了腾讯云的 CDN,豆瓣静态资源站点是通过腾讯云来支持 CDN 的。说实话,看到国内 CDN 终于也开始支持 https,其实挺意外的(当然,也许是我在这点的认知已经太落伍了,12306 的“证书污染”可能是过去式了吧)。如果没有记错的话,以前 img#.douban.com 这些域名是在用蓝汛的 CDN,至于为啥不用了,本人当然是不知道的:)。目测豆瓣在 https 部署稳定后,也会将 http 页面的静态资源转向 doubanio.com (其实目前观察来看已经有这个趋势了)。这样,整个 https 化的过程就是平滑而用户感知不到的了。

腾讯云CDN 用了 spdy/3.1 协议,这点对于性能来说还是挺不错的。不过也许为了安全性,豆瓣正式推出后应该加个 HSTS 响应头的,毕竟,这么做的本意也就是为了用户的安全,默认的 http 还是会让用户被运营商劫持个遍。

另外一个观测到也正在部署 https 的站点是京东。

已经使用了通配证书,但是静态资源站点却并没有走 https,因此浏览器拒绝加载了。看了下,仍然是 CDN 不支持,直接访问 https 的话,是 12306 的证书。

这么多年以来,国内的一些大网站终于开始部署 https,此前有百度、阿里,现在豆瓣、京东正在部署。对于用户来说这是个好事,至少可以少很多中间人攻击/劫持,运营商插入广告等行为(这个应该才是电商网站部署的初衷)。但是更重要的也许还是数据库里面用户信息的保护。

Varnish: How do You Know Your Backend Responses Properly?

You might use varnishstat to monitor backend healthy status, but that is not so directly, instead you can do this:

# varnishlog -g session -C -q 'ReqHeader:Host ~ "example.com"' | grep Status
--  RespStatus     200
--  RespStatus     200
--  RespStatus     200
--  RespStatus     200
--- BerespStatus   200
--- ObjStatus      200
--  RespStatus     200
--  RespStatus     200
--  RespStatus     200
--  RespStatus     200
--- BerespStatus   200
--- ObjStatus      200
--  RespStatus     200
--- BerespStatus   200
--- ObjStatus      200
--  RespStatus     200
--- BerespStatus   200
--- ObjStatus      200
--  RespStatus     200
--- BerespStatus   200
...

Convenient, right? you can tail and grep whatever you want of varnishlog’s output.

Whitelisting IPs in Limiting Request Rate of Nginx and Varnish

Sometimes we want to exclude IP blocks from limited request rate zone of web servers, here is how we can do it in Nginx and Varnish, the Nginx way needs crappy hacks, on the other hand, Varnish handles it really elegant.

The Nginx way:

# in the http block, we define a zone, and use the geoip
# module to map IP addresses to variable
http {
    ...
    limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;
    # geo directive maps $remote_addr to $block variable.
    # limited as default
    geo $block {
        default          limited;
        10.10.10.0/24    blacklist;
        192.168.1.0/24   whitelist;
        include more_geoip_map.conf;
    }
}

# server block
server {
    ...
    location /wherever {
        if ($block = whitelist) { return 600; }
        if ($block = limited)   { return 601; }
        if ($block = blacklist) { return 403; }
        # error code 600 and 601 goes to each's internal location.
        error_page 600 = @whitelist;
        error_page 601 = @limited;
    }

    # @whitelist have no limiting, it just passes
    # the requests to backend.
    location @whitelist {
        proxy_pass http://backend;
        # feel free to log into other file.
        #access_log /var/log/nginx/$host_whitelist.access.log;
    }

    # insert limit_req here.
    location @limited {
        limit_req zone=one burst=1 nodelay;
        proxy_pass http://backend;
        # feel free to log into other file.
        #access_log /var/log/nginx/$host_limited.access.log;
    }
    ...
}

The Varnish way:

vcl 4.0;

import vsthrottle;

acl blacklist {
    "10.10.10.0/24";
}

acl whitelist {
    192.168.1.0/24;
}

sub vcl_recv {
    # take client.ip as identify to distinguish clients.
    set client.identity = client.ip;
    if (client.ip ~ blacklist) {
        return (synth(403, "Forbidden"));
    }
    if ((client.ip !~ whitelist) && vsthrottle.is_denied(client.identity, 25, 10s)) {
        return (synth(429, "Too Many Requests"));
    }
}

As you can see, unlike Nginx, Varnish has the powerful if directive, it works just like you’d expect.

Advanced Limiting Request with Nginx (or Openresty)

Nginx had the ngx_http_limit_req_module which can be used to limit request processing rate, but most people seem to only used its basic feature: limiting request rate by the remote address, like this:

http {
    limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s;
    ...
    server {
        ...
        location /search/ {
            limit_req zone=one burst=5;
        }

This is the example configuration taken from Nginx’s official document, the limit_req_zone directive takes variable $binary_remote_addr as a key to limit incoming request. The key fills a zone named one, which is defined by the zone parameter, it could use up to 10m memory. And the rate parameter says that the maximum request rate of each $binary_remote_addr is 1 per second. In the search location block, we can use limit_req directive to refer the one zone, with the bursts not exceeding 5 requests.

Everything seems great, we have configured Nginx against rogue bots/spiders, right?

No! That configuration won’t work in real life, it should never used in your production environments! Take these following circumstances as examples:

  • When users access your website behind a NAT, they share the same public IP, thus Nginx will use only one $binary_remote_addr to do limit request. Hundreds of users in total could only be able access your website 1 time per second!
  • A botnet is used to crawl your website, it use different IP addresses each time. Again, in this situation, limiting by $binary_remote_addr is totally useless.

So what configuration should we use then? We need to use a different variable as the key, or even multiple variables combined together (since version 1.7.6, limit_req_zone‘s key can take multiple variables). Instead of remote address, it is better to use request HTTP headers to distinguish users apart, like User-Agent, Referer, Cookie, etc. These headers are easy to access in Nginx, they are exposed as built in variables, like $http_user_agent, $http_referer, $cookie_name, etc.

For example, this is a better way to define a zone:

http {
    limit_req_zone $binary_remote_addr$http_user_agent zone=two:10m rate=90r/m;
}

It combines $binary_remote_addr and $http_user_agent together, so different user agent behind NATed network can be distinguished. But it is still not perfect, multiple users could use a same browser, same version, thus they send same User-Agent header! Another problem is that the length of $http_user_agent variable is not fixed (unlike $binary_remote_addr), a long header could use a lot of memory of the zone, may exceeds it.

To solve the first problem, we can use more variables there, cookies would be great, since different user sends their unique cookies, like $cookie_userid, but this still leaves us the second problem. The answer it to use the hashes of variables instead.

Thers is a third-party module called set-misc-nginx-module, we can use it to generate hashes from variables. If you are using Openresty, this moule is already included. So the configuration is like this:

http {
    ...
    limit_req_zone $binary_remote_addr$cookie_hash$ua_hash zone=three:10m rate=90r/m;
    ...
    server {
        ...
        set_md5 $cookie_hash $cookie_userid;
        set_md5 $ua_hash $http_user_agent;
        ...
    }
}

It is OK we used $cookie_hash and $ua_hash in the http block before they are defined in server block. This configuration is great now.

Let’s continue with the distributed botnet problem now, we need to take $binary_remote_addr out of the key, and since those bots usually don’t send Referer header (else you can found what unique about it by yourself), we can take advantage of it. This configuration should take care of it:

http {
    ...
    limit_req_zone $cookie_hash$referer_hash$ua_hash zone=three:10m rate=90r/m;
    ...
    server {
        ...
        set_md5 $cookie_hash $cookie_userid;
        set_md5 $referer_hash $http_referer;
        set_md5 $ua_hash $http_user_agent;
        ...
    }
}