Recently, "coveting" a bargain, I changed the server to Alibaba Cloud International Edition Hong Kong VPS host. The biggest feeling after moving here is not the speed improvement - because the Korean kdatacenter VPS originally used is also very fast when accessed on the telecommunications network - but the performance of the Alibaba Cloud VPS host is really good compared to other VPS hosts that are also 1GB. It's a bit weak.

I just used a tool to scan the system load of Alibaba Cloud Hong Kong VPS and the system load kept rising. Sometimes I just found that I couldn't open it when I was writing an article in the background. When I opened the server logs, I found a lot of abnormal IPs, such as port scanning, SQL injection, "missing" website backup files, and AB stress testing... In short, there were many abnormal IPs. Human access behavior.

Analyzing IP behavior in logs is a tiring job, especially when the logs exceed hundreds of MB, and it is very slow to open them as text files. This article will share two excellent server log analysis tools: ngxtop and GoAccess. ngxtop analyzes Nginx log files and displays them in real time using an interface similar to the top command.

ngxtop can analyze previous log files, and can also monitor server logs in real time. You can quickly find out the most visited IP addresses in the logs, 403/500/404 errors, requested page rankings, etc. in the port just like using the Top command. ngxtop is a lightweight tool, while GoAccess is more like a powerful log statistics tool.

GoAccess is not only rich in graphics and text, but also fast, with a log record parsing speed of 8W per second, and websocket refreshes statistics every 10 seconds. GoAccess not only has a command line interface, but can also generate an Html web page, allowing you to intuitively see the number of visitors in the log, requested files, 404 and other errors, user location, browser, operating system, source URL, HTTP status code, etc. .

Server log analysis tools: ngxtop and GoAccess - real-time monitoring and visual management to quickly find the source of exceptions

In short, ngxtop and GoAccess are two powerful tools for analyzing server logs such as Apache and Nginx. Once you master these two software, the "potentially" dangerous IPs for the server can be revealed. As a webmaster, learning how to use one more software means that you can save another server expense, such as:

  1. Use Huginn to capture RSS and WeChat public account updates from any website - create a one-stop information reading platform
  2. Linux VPS mounts Google Drive and Dropbox-realizes VPS host data synchronization and backup
  3. Three free tools to help you detect the authenticity of VPS servers - VPS host performance and speed test methods

PS: Updated on October 21, 2017, Friends who are interested in using Alibaba Cloud Hong Kong VPS can read my review first: Alibaba Cloud International Version Hong Kong Computer Room Speed ​​and Performance Evaluation - Fast but Disk IO and Memory It's the bottleneck. If you also want to monitor the stability of the VPS server network, you can use this tool: Smokeping installation and configuration - a free open source network performance monitoring tool for visual master/slave deployment.

1. Installation and use of ngxtop

1.1  ngxtop installation

Project homepage:

  1. HTTPS://GitHub.com/Lebinhong/That little top

ngxtop is suitable for Nginx server logs. ngxtop can be installed with the following command:

Fedora:yum install python-pip
CentOS/RHEL需先安装EPEL,安装完后:yum install python-pip
Debian/Ubuntu:apt-get install python-pip

pip install ngxtop

Or you can install it directly from source:

wget https://github.com/lebinh/ngxtop/archive/master.zip -O ngxtop-master.zip
unzip ngxtop-master.zip
cd ngxtop-master
python setup.py install

1.2  ngxtop usage

The basic usage of ngxtop is as follows:

gxtop [选项]
ngxtop [选项] (print|top|avg|sum) <变量>
ngxtop info

The options have the following parameters:

-l : Specify the full path of the log file (Nginx or Apache2)

-f : log format

–no-follow: Process the currently written log file instead of processing newly added logs to the log file in real time

-t : update frequency

-n: display line number

-o : sorting rule (default is access count)

-a …, –a …: Add expressions (generally aggregate expressions such as sum, avg, min, max, etc.) to the output.

-v: output detailed information

-i: Only process records that match the rules

-c <file> or –config <file> specifies the nginx configuration file and automatically analyzes the log format

-i <filter-expression> or –filter <filter-expression> Filters that satisfy the expression will be processed

-p <filter-expression> or –pre-filter <filter-expression> in-filter expression to check in pre-parsing phase.

The variables of ngxtop are: remote_addr, remote_user, time_local, request, request_path, status, body_bytes_sent, http_referer, http_user_agent. It is mainly used to analyze the IP address, request path, HTTP status, referer, user_agent, etc.

1.3  ngxtop example

You can use ngxtop info to view the local server log address and the path of the configuration file. However, it can only display some default log files, and some customized log files will not be displayed.

Find out the top ten request URLs for 404 errors. command code:

ngxtop -l /data/wwwlogs/wzfou.com_nginx.log --no-follow top request_path --filter 'status == 404'

The effect is as follows:

Find out the top ten most visited IP addresses. command code:

ngxtop -l /data/wwwlogs/wzfou.com_nginx.log --no-follow --group-by remote_addr

The effect is as follows:

Find out the top ten traffic IP addresses. command code:

ngxtop -l /data/wwwlogs/wzfou.com_nginx.log --no-follow --order-by 'avg(bytes_sent) * count'

Find out error codes above 400 and display request, status, and http_referer. command code:

ngxtop -l /data/wwwlogs/wzfou.com_nginx.log --no-follow -i 'status >= 400' print request status http_referer

Find the top 10 average bytes_sent values ​​with status code 200 and request_path starting with wzfou. command code:

ngxtop -l /data/wwwlogs/wzfou.com_nginx.log --no-follow avg bytes_sent --filter 'status == 200 and request_path.startswith("wzfou")'

Through the above commands, you can basically quickly lock the IP addresses of some abnormal requests. By default, ngxtop will display the first 10 records. You can add the parameter –n xxx to control the number displayed. The command is as follows:

ngxtop -l /data/wwwlogs/wzfou.com_nginx.log --no-follow --group-by remote_addr -n 20

ngxtop implements monitoring server logs. To display Nginx log monitoring in real time, you only need the –no-follow parameter. The command is as follows:

ngxtop -l /data/wwwlogs/wzfou.com_nginx.log --group-by remote_addr

The effect is as follows:

2. Installation and use of GoAccess

2.1  GoAccess installation

GoAccess official website:

  1. HTTPS://go access.IO/

GoAccess supports the analysis of server logs such as Apache, Nginx, Amazon S3, Elastic Load Balancing, and CloudFront. The installation command is as follows:

apt-get install libncursesw5-dev libgeoip-dev
wget http://tar.goaccess.io/goaccess-1.2.tar.gz
tar -xzvf goaccess-1.2.tar.gz
cd goaccess-1.2/
./configure --enable-utf8 --enable-geoip=legacy
make
make install

Depending on your own needs, you can adjust configuration options when installing GoAccess. as follows:

--enable-debug Compile with debug flags and turn off compiler optimizations.
--enable-utf8 Wide character support. Depends on Ncursesw module.
--enable-geoip=<legacy|mmdb> Geolocation support. Depends on MaxMind GeoIP module. legacy will use the original GeoIP database. mmdb will use the enhanced GeoIP2 database.
--enable-tcb=<memhash|btree> Tokyo Cabinet storage support. memhash will use Tokyo Cabinet's in-memory hash database. btree will use Tokyo Cabinet's on-disk B+Tree database.
–disable-zlib Disables the use of zlib compression on B+Tree databases.
--disable-bzip Disable the use of bzip2 compression on B+Tree databases.
--with-getline causes to use a dynamically extending line buffer for parsing full line requests, otherwise a fixed-size (4096) buffer will be used.
--with-openssl Enables communication between GoAccess and its WebSocket server to support OpenSSL.

2.2  GoAccess usage

The GoAccess syntax is as follows:

goaccess [filename] [ options ... ] [-c][-M][-H][-q][-d][...]

Commonly used parameters are described as follows:

-f –log-file=<logfile>

Specify the path to the input log file. If an input file is specified in the configuration file, it takes precedence over one specified on the command line via the -f parameter.

-l –log-debug=<filename>

Send all debugging information to the specified file. Configuration options need to be specified --enable-debug

-p –config-file=<configfile>

Specifies to use a custom configuration file. If this parameter is set, its priority will be higher than the global configuration file (if any).

–invalid-requests=<filename>

Log invalid requests to the specified file.

--no-global-config

Disable loading of global configuration files. Possible directories should be /usr/etc/, /etc/ or /usr/local/etc/, unless specified when running ./configure --sysconfdir=/dir.

-a –agent-list

Open the UserAgent list. When turned on, the parsing speed will be reduced.

-d –with-output-resolver

Enable IP resolution when outputting HTML or JSON reports.

-e –exclude-ip <IP|IP-range>

Exclude an IPv4 or IPv6 address. Use connectors to represent IP segments (start-end).

-H –http-protocol=<yes|no>

HTTP request protocol switch. A request field will be created containing the request protocol + the real request.

-M –http-method=<yes|no>

HTTP request method switch. A request field will be created containing the request method + the real request.

-o –output=<json|csv>

Redirect the given file to standard output, using the suffix to determine the output format:

-q –no-query-string

Ignore the requested query string. That is: www.google.com/page.htm?query => www.google.com/page.htm
Note: Removing the query string will greatly reduce memory consumption, especially for timestamp requests.

-r –no-term-resolver

Disable IP resolution during terminal output.

–444-as-404

Treat non-standard status 444 as 404.

–4xx-to-unique-count

Add the number of 4xx client errors to the number of unique visitors.

–all-static-files

Count static files containing query strings.

–date-spec=<date|hr>

Set the date display format, one is the standard date format (default), and the other is the format with hours appended to the date.
Only valid in guest panel. Useful for analyzing visitor data at an hourly level. Display format example: 18/Dec/2010:19

--double-decode

Decode a double-encoded value. Including UserAgent, Request and Referer.

–enable-panel=<PANEL>

Open the specified panel. Panel list:

–hour-spec=<hour|min>

Set the display format of the time, one is the standard time format (default), and the other is the format with minutes appended to the time (every ten minutes).
is used for the time distribution panel. Useful for analyzing traffic spikes during specific time periods.

–ignore-crawlers

Ignore crawlers.

–ignore-panel=<PANEL>

Ignore the specified panel. Panel list:

–ignore-referer=<referer>

Ignore the source of statistics. Wildcards are supported. For example: *.domain.com ww?.domain.*

–ignore-status=<STATUS>

Ignore parsing or display one or more status codes. If there are multiple status codes, use this parameter to specify one at a time.

–num-tests=<number>

Set the number of test lines, that is, test the access log using the given log/date/time format. The default value is 10 lines. If set to 0 , the parser will not do any testing and will directly parse the entire file. If a line matches the given log/date/time format before reaching number, the parser will consider the log file to be valid, otherwise GoAccess will return EXIT_FAILURE and Display relevant error messages.

–process-and-exit

Parse the log and do not output data when exiting. Mainly used when you only want to add data to the disk database without outputting reports.

–real-os

Display the real operating system name. For example: Windows XP, Snow Leopard.

–sort-panel=<PANEL,FIELD,ORDER>

S sorts the panels during initial loading. Sort options separated by commas. Options use the format: PANEL, METRIC, ORDER

--static-file <extension>

Add static file suffix. For example: .mp3. Suffix names are case-sensitive.

-g –std-geoip

Standard GeoIP database, low memory footprint.

–geoip-database <geocityfile>

Set the GeoIP database path. For example: GeoLiteCity.dat. It needs to be downloaded locally from maxmind.com. Support is available for both IPv4 and IPv6. Note: --geoip-city-data is an alias for --geoip-database.
Note: If using GeoIP2, you need to download the city/country database from MaxMind and set it via --geoip-database.

GoAccess log format. GoAccess also has a parameter used to set the server log format: –log-format <logformat>. The parameter log-format is followed by a space character or tab delimiter (t), which is used to specify the log string format.

If you are using the predefined log format name in the table below, it can be directly used as a variable in the GoAccess log/date/time format.

COMBINED     | 联合日志格式
VCOMBINED    | 支持虚拟主机的联合日志格式
COMMON       | 通用日志格式
VCOMMON      | 支持虚拟主机的通用日志格式
W3C          | W3C 扩展日志格式
SQUID        | Native Squid 日志格式
CLOUDFRONT   | 亚马逊 CloudFront Web 分布式系统
CLOUDSTORAGE | 谷歌云存储
AWSELB       | 亚马逊弹性负载均衡
AWSS3        | 亚马逊简单存储服务 (S3)

2.3  How to set the log format

First, take a look at what your log_format is. The log_format directive is as follows:

Syntax: log_format name string …;
Default value: log_format combined “…”;
Configuration section: http

name represents the format name, and string represents the equivalent format. log_format has a default combined log format that does not need to be set, which is equivalent to apache's combined log format, as shown below:

log_format  combined  '$remote_addr - $remote_user  [$time_local]  '
                                   ' "$request"  $status  $body_bytes_sent  '
                                   ' "$http_referer"  "$http_user_agent" ';

Of course, you can also add the information that the server log needs to obtain according to your own needs. The following variables are allowed in the log format:

remote_addr, $http_x_forwarded_for 记录客户端IP地址
remote_user 记录客户端用户名称
request 记录请求的URL和HTTP协议
status 记录请求状态
body_bytes_sent 发送给客户端的字节数,不包括响应头的大小; 该变量与Apache模块mod_log_config里的“%B”参数兼容。
bytes_sent 发送给客户端的总字节数。
connection 连接的序列号。
connection_requests 当前通过一个连接获得的请求数量。
msec 日志写入时间。单位为秒,精度是毫秒。
pipe 如果请求是通过HTTP流水线(pipelined)发送,pipe值为“p”,否则为“.”。
http_referer 记录从哪个页面链接访问过来的
http_user_agent 记录客户端浏览器相关信息
request_length 请求的长度(包括请求行,请求头和请求正文)。
request_time 请求处理时间,单位为秒,精度毫秒; 从读入客户端的第一个字节开始,直到把最后一个字符发送给客户端后进行日志写入为止。
time_iso8601 ISO8601标准格式下的本地时间。
time_local 通用日志格式下的本地时间。

The following is an example of log_format setting:

http {
 log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
                                        '"$status" $body_bytes_sent "$http_referer" '
                                        '"$http_user_agent" "$http_x_forwarded_for" '
                                        '"$gzip_ratio" $request_time $bytes_sent $request_length';
 
 log_format srcache_log '$remote_addr - $remote_user [$time_local] "$request" '
                                '"$status" $body_bytes_sent $request_time $bytes_sent $request_length '
                                '[$upstream_response_time] [$srcache_fetch_status] [$srcache_store_status] [$srcache_expire]';
 
 open_log_file_cache max=1000 inactive=60s;
 
 server {
 server_name ~^(www.)?(.+)$;
 access_log logs/$2-access.log main;
 error_log logs/$2-error.log;
 
 location /srcache {
 access_log logs/access-srcache.log srcache_log;
 }
 }
}

This is the correspondence between GoAccess log format and server log-format:

%t 匹配time-format格式的时间字段

%d 匹配date-format格式的日期字段

%h host(客户端ip地址,包括ipv4和ipv6)

%r 来自客户端的请求行

%m 请求的方法

%U URL路径

%H 请求协议

%s 服务器响应的状态码

%b 服务器返回的内容大小

%R HTTP请求头的referer字段

%u 用户代理的HTTP请求报头

%D 请求所花费的时间,单位微秒

%T 请求所花费的时间,单位秒

%^ 忽略这一字段

GoAccess log modification example:

log_format access '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $http_x_forwarded_for $request_time $upstream_response_time';

GoAccess's default configuration file goaccess.conf is placed in the /usr/local/etc path. The default format is: log-format %h %^[%d: %t %^] "%r" %s %b "%R" "%u". Observe the above server log format and find that there are more response times $request_time and upstream response time $upstream_response_time.

We can modify it as follows:

原来:log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u"
修改:log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u" %^ %^ %T

The last three elements %^ %^ %T, %^ means ignore, %T means "response time in seconds, accurate to milliseconds" . By modifying the format of GoAccess, it corresponds to our server log format.

2.4  GoAccess Example

If your server uses common log format, joint log format, including virtual host, W3C format and Amazon CloudFront (distributed download), etc., there is no need to modify the log format, just use it directly. Both LNMP and Oneinstack use the COMBINED format.

GoAccess analyzes the log command in the terminal:

goaccess -d -f /data/wwwlogs/wzfou.com_nginx.log --log-format=COMBINED

In the above command, -f specifies the log /path/to/log to be analyzed; –log-format log format, LNMP default format is: COMBINED. The effect after running is as follows:

The operation hot keys are as follows:

F1 or h main help page.

F5Redraw the main window.

qQuit the program, current window or crashed module.

o or ENTER expands the selected module or opens a window.

0-9 and Shift + 0 activate the selected module.

jScroll down in the expanded module.

kScroll up in the expanded module.

cSet or change the color scheme.

^ fScroll forward one screen in the current module.

^ bScroll backward one screen in the current module.

TABtoggle module (forward).

SHIFT + TABSwitch module (backwards).

s Sorts the options of active modules.

/ searches in all modules (supports regular expressions).

nFind the location where the event will occur next.

gMove to the first option or the top of the screen.

GMove to the last option or the bottom of the screen.

If you want to view the top ten most visited IP addresses, press the numeric key 5 to open the module details. As shown below:

Several other digital corresponding modules are as follows:

Press 1 to target "Visits per day"

Press 2 to navigate to "Most Requested URLs"

Press 3 to navigate to "Most Requested Static Files"

Press 4 to navigate to "Most Requested 404s"

Press 5 to navigate to "Most Requested User IPs"

Press 6 to navigate to "User's operating system"

Press 7 to navigate to "User's Browser"

Press 8 to navigate to Hourly Statistics

GoAccess output. The command is as follows:

Generate an HTML report:

# goaccess access.log -a -o report.html

Generate a JSON report:

# goaccess access.log -a -d -o report.json

Generate a CSV file:

# goaccess access.log --no-csv-summary -o report.csv

GGoAccess is very flexible and supports real-time parsing and filtering. For example: you need to quickly diagnose problems by monitoring real-time logs:

# tail -f access.log | goaccess -

Even better, you can also use tail -f to work with a pattern matching tool, such as: grep, awk, sed, etc.

# tail -f access.log | grep -i --line-buffered 'firefox' | goaccess --log-format=COMBINED -

Or you can parse the file from scratch with the pipe open and apply a filter at the same time:

# tail -f -n +0 access.log | grep --line-buffered 'Firefox' | goaccess -o out.html --real-time-html -

For example:

Output GoAccess to Html and then access it with a browser. You can use the following command:

goaccess -d -f /data/wwwlogs/wzfou.com_nginx.log --log-format=COMBINED -a > /data/wwwroot/howsvps.com/wzfou.html

The effect of accessing it with a browser is as follows (click to enlarge):

The charts output by GoAccess are very beautiful, and you can also view detailed options in the charts. For details, you can also check out the Demo on the official website: https://rt.goaccess.io/

Output GoAccess to Html and refresh it in real time. The command is as follows:

goaccess -d -f /data/wwwlogs/howsvps.com_nginx.log --log-format=COMBINED -a > /data/wwwroot/howsvps.com/wzfou.html --real-time-html --port=9870 --daemonize

The above shows that after the daemon starts GoAccess, it uses Websocket to establish a long connection. It listens to port 7890 by default. You can specify the port number through the --port parameter.

After specifying the port number, remember to open it in your VPS firewall:

iptables -A INPUT -p tcp -m tcp --dport 9870 -j ACCEPT

3. Summary

ngxtop is suitable for simple search needs, while GoAccess focuses on overall analysis and can even be used as statistics. If your website uses SSL, when GoAccess outputs the HTML page for real-time monitoring, remember to configure ssl-cert and ssl- in the configuration file goaccess.conf key item.

Of course, we can also use crontab to periodically ask GoAccess to generate statistical HTML pages, which is equivalent to real-time online monitoring of server logs. The code is as follows (generate HtmL page every 5 minutes):

*/5 * * * * goaccess -d -f /data/wwwlogs/wzfou.com_nginx.log --log-format=COMBINED -a > /data/wwwroot/wzfou.com/wzfou.html

For those who don’t know how to set the GoAccess log format, you can directly use the online conversion tool : https://github.com/stockrt/nginx2goaccess. Order:

用法: ./nginx2goaccess.sh '<log_format>'

./nginx2goaccess.sh '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent"'

The output results can be directly used in GoAccess log format settings:

- Generated goaccess config:

time-format %T
date-format %d/%b/%Y
log_format %h - %^ [%d:%t %^] "%r" %s %b "%R" "%u"

Leave a Reply