Versioner sammenlignet

Nøgle

  • Linjen blev tilføjet.
  • Denne linje blev fjernet.
  • Formatering blev ændret.

This little article is trying to find the "real user" browsing hits on my Homepage. In front of the homepage is an Apache2 acting as proxy.

All logs are gathered in splunkSplunk. In the next part, "crawlers" is a synomyn for Robots, Crawlers and Spiders - hence non-live machines.

...

Step - Filter page views

First, we need to include what actually is a "view" and not REST, saving pages, uploading stuff etc. For Confluence, all views are in one of 2 forms:

...

So, we try to eliminate them with exclusions:

Kodeblok
useragent!="*bot*" useragent!="*spider*" useragent!="*facebookexternalhit*" useragent!="*crawler*" useragent!="*Datadog Agent*"

...

Monitoring Tools monitoring can fill a lot in the logs; to control and identify these, I have ensured the monitoring tool is only monitoring at a special URL: /display/public/HealthCheckPage.

Hence, to filter it out exclude the monitring:

Kodeblok
uri!="/display/public/HealthCheckPage"

...

Kodeblok
root@splunkserver:/splunk/etc/apps/moseisleymonitoring/lookups# head robot_spiders.csv
clientip
"216.244.66.237"
"77.75.76.163"
"77.75.77.62"
"216.244.66.237"
"77.75.78.162"
"216.244.66.237"
"77.75.76.165"
"37.9.113.190"
"106.120.173.75"

To exclude these IP Addresses:

Kodeblok
NOT [| inputlookup robot_spiders.csv | fields clientip]

Step  - Eliminate all "hard hitting hosts"

...

Kodeblok
root@splunkserver:/splunk/etc/apps/moseisleymonitoring/lookups# head hard_hitting_hosts.csv
uri,clientip,count
"/display/ATLASSIAN/JIRA+as+CMDB/","188.163.74.19",125
"/display/ATLASSIAN/JIRA+as+CMDB/","37.115.189.113",138
"/display/ATLASSIAN/JIRA+as+CMDB/","37.115.191.27",121
"/display/ATLASSIAN/JIRA+as+CMDB/","46.118.159.224",101
"/display/public/HealthCheckPage","77.243.52.139",5732
"/display/slangereden/","5.9.155.37",118
"/display/slangereden/","66.249.64.19",140

To exclude these IP Addresses:

Kodeblok
NOT [| inputlookup hard_hitting_hosts.csv | fields clientip]

To sum up - Conclusion

The best search to get the "human" hitsfinal result is this splunk search:

Kodeblok
(uri="*/display/*" OR uri="*/viewpage.action/*") uri!="/display/public/HealthCheckPage" useragent!="*bot*" useragent!="*spider*" useragent!="*facebookexternalhit*" useragent!="*crawler*" useragent!="*Datadog Agent*" NOT [| inputlookup robot_spiders.csv | fields clientip] NOT [| inputlookup hard_hitting_hosts.csv | fields clientip]

...