Quantcast
Channel: Piwik Forums - Support & Bugs
Viewing all articles
Browse latest Browse all 13117

Re: PIWIK Log ANalysis

$
0
0
I hope this is helpful! (This is still a work in progress and I encourage feed back or help! )

This was most useful in working the live regex custom log format option:

http://ksamuel.pythonanywhere.com/

if you know the valve variables from server.xml (tomcat), like:

common - %h %l %u %t "%r" %s %b
combined - %h %l %u %t "%r" %s %b "%{Referer}i" "%{User-Agent}i"

in my case I used:
pattern='%h %S %t %s %b %D %m %U "%{User-Agent}i"'

I identified what was currently in the code pulling this from the import_log.py (so I had a clue about what I was attempting to do):

_HOST_PREFIX = '(?P<host>[\w\-\.]*)(?::\d+)? '
_COMMON_LOG_FORMAT = (
'(?P<ip>\S+) \S+ \S+ [(?P<date>.*?) (?P<timezone>.*?)\] '
'"\S+ (?P<path>.*?) \S+" (?P<status>\S+) (?P<length>\S+)'
)
_NCSA_EXTENDED_LOG_FORMAT = (_COMMON_LOG_FORMAT +
' "(?P<referrer>.*?)" "(?P<user_agent>.*?)"'
)
_S3_LOG_FORMAT = (
'\S+ (?P<host>\S+) [(?P<date>.*?) (?P<timezone>.*?)\] (?P<ip>\S+) '
'\S+ \S+ \S+ \S+ "\S+ (?P<path>.*?) \S+" (?P<status>\S+) \S+ (?P<length>\S+) '
'\S+ \S+ \S+ "(?P<referrer>.*?)" "(?P<user_agent>.*?)"'
)
_ICECAST2_LOG_FORMAT = ( _NCSA_EXTENDED_LOG_FORMAT +
' (?P<session_time>\S+)'
)

FORMATS = {
'common': RegexFormat('common', _COMMON_LOG_FORMAT),
'common_vhost': RegexFormat('common_vhost', _HOST_PREFIX + _COMMON_LOG_FORMAT),
'ncsa_extended': RegexFormat('ncsa_extended', _NCSA_EXTENDED_LOG_FORMAT),
'common_complete': RegexFormat('common_complete', _HOST_PREFIX + _NCSA_EXTENDED_LOG_FORMAT),
'iis': IisFormat(),
's3': RegexFormat('s3', _S3_LOG_FORMAT),
'icecast2': RegexFormat('icecast2', _ICECAST2_LOG_FORMAT),
}

Then pieced this together:

(?P<host>[\w\-\.]*)(?::\d+)? \S+ [(?P<date>.*?) (?P<timezone>.*?)\] (?P<status>\S+)? \S+ (?P<length>\S+) (?P<request>\S+) (?P<path>.*?) "(?P<user_agent>.*?)"

and looking at one log line:

Raw:
10.88.168.198 - [15/May/2013:19:55:38 +0000] 302 - 64 GET / "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31"

match.group():
u'10.88.168.198 - [15/May/2013:19:55:38 +0000] 302 - 64 GET / "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31"'

match.groupdict():
{u'date': u'15/May/2013:19:55:38', u'host': u'10.88.168.198', u'length': u'64', u'path': u'/', u'request': u'GET', u'status': u'302', u'timezone': u'+0000', u'user_agent': u'Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.64 Safari/537.31'}

and adding it back to --log-format-regex='(?P<host>[\w\-\.]*)(?::\d+)? \S+ [(?P<date>.*?) (?P<timezone>.*?)\] (?P<status>\S+)? \S+ (?P<length>\S+) (?P<request>\S+) (?P<path>.*?) "(?P<user_agent>.*?)"'

BOOM .... Logs imported. although I'm having an issue with the actual browser type. I'll update the final once I have it.

Viewing all articles
Browse latest Browse all 13117

Trending Articles