iwla

iwla Commit Details

Date:2014-12-19 17:21:45 (6 years 7 months ago)
Author:Grégory Soutadé
Branch:dev, master
Commit:54b7b59899565938c72e772dac37858f87acc3a4
Parents: d6bc6ea19285fdb6d4cbb9ed9698d861ccd3fb58
Message:Update documentation

Changes:
Mdefault_conf.py (3 diffs)
Mdocs/main.md (2 diffs)
Mtools/extract_doc.py (2 diffs)

File differences

default_conf.py
1717
1818
1919
20
20
2121
2222
2323
......
3030
3131
3232
33
33
3434
3535
3636
......
4949
5050
5151
52
52
5353
5454
5555
# Database filename per month
DB_FILENAME = 'iwla.db'
# Web server log format (nginx style). Default is what apache log
# Web server log format (nginx style). Default is apache log format
log_format = '$server_name:$server_port $remote_addr - $remote_user [$time_local] ' +\
'"$request" $status $body_bytes_sent ' +\
'"$http_referer" "$http_user_agent"'
post_analysis_hooks = []
display_hooks = []
# Extensions that are considered as a HTML page (or result)
# Extensions that are considered as a HTML page (or result) in opposite to hits
pages_extensions = ['/', 'htm', 'html', 'xhtml', 'py', 'pl', 'rb', 'php']
# HTTP codes that are cosidered OK
viewed_http_codes = [200, 304]
# CSS path (you can add yours)
css_path = ['%s/%s/%s' % (os.path.basename(resources_path[0]), 'css', 'iwla.css')]
# Extensions to compress in gzip during display build
# Files extensions to compress in gzip during display build
compress_output_files = []
# Path to locales files
docs/main.md
44
55
66
7
7
88
99
1010
1111
1212
1313
14
14
1515
1616
1717
......
2121
2222
2323
24
25
26
27
28
29
30
31
24
25
26
27
28
29
30
31
32
33
34
35
3236
33
3437
38
3539
3640
41
42
43
3744
3845
3946
40
41
42
43
44
45
46
47
48
49
47
48
49
50
51
52
53
54
55
56
5057
5158
5259
5360
54
55
56
57
61
5862
59
63
64
65
66
67
68
69
70
71
72
73
74
75
6076
6177
6278
6379
64
80
6581
6682
6783
68
84
6985
70
71
86
87
88
89
7290
7391
7492
Introduction
------------
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolothic project with everything in one big PERL file. In opposite, iwla has be though to be very modulor : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filters : modify statistics until final result.
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolothic project with everything in one big PERL file. In opposite, iwla has be though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filters : modify statistics until final result.
Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
Usage
-----
./iwla [-c|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL]
./iwla [-c|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL]
-c : Clean output (database and HTML) before starting
-i : Read data from stdin instead of conf.analyzed_filename
Basic usage
-----------
In addition to command line, iwla read parameters in _ default_conf.py _. User can override default values using _conf.py_ file. Each module requires its own parameters.
Main valued to edit are :
analyzed_filename : web server log
domaine_name : domain name to filter
pre_analysis_hooks
post_analysis_hooks
display_hooks
locale
In addition to command line, iwla read parameters in default_conf.py. User can override default values using _conf.py_ file. Each module requires its own parameters.
Main values to edit are :
* **analyzed_filename** : web server log
* **domaine_name** : domain name to filter
* **pre_analysis_hooks** : List of pre analysis hooks
* **post_analysis_hooks** : List of post analysis hooks
* **display_hooks** : List of display hooks
* **locale** : Displayed locale (_en_ or _fr_)
Then, you can then iwla. Output HTML files are created in _output_ directory by default. To quickly see it go in output and type
You can then launch iwla. Output HTML files are created in _output_ directory by default. To quickly see it go in output and type
python -m SimpleHTTPServer 8000
Open your favorite web browser at _http://localhost:8000_. Enjoy !
**Warning** : The order is hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.
Interesting default configuration values
----------------------------------------
DB_ROOT : Default database directory
DISPLAY_ROOT : Default HTML output directory
log_format : Web server log format (nginx style). Default is what apache log
time_format : Time format used in log format
pages_extensions : Extensions that are considered as a HTML page (or result)
viewed_http_codes : HTTP codes that are cosidered OK
count_hit_only_visitors : If False, doesn't cout visitors that doesn't GET a page but resources only (images, rss...)
multimedia_files : Multimedia extensions (not accounted as downloaded files)
css_path : CSS path (you can add yours)
compress_output_files : Extensions to compress in gzip during display build
* **DB_ROOT** : Default database directory (default ./output_db)
* **DISPLAY_ROOT** : Default HTML output directory (default ./output)
* **log_format** : Web server log format (nginx style). Default is apache log format
* **time_format** : Time format used in log format
* **pages_extensions** : Extensions that are considered as a HTML page (or result) in opposit to hits
* **viewed_http_codes** : HTTP codes that are cosidered OK (200, 304)
* **count_hit_only_visitors** : If False, doesn't cout visitors that doesn't GET a page but resources only (images, rss...)
* **multimedia_files** : Multimedia extensions (not accounted as downloaded files)
* **css_path** : CSS path (you can add yours)
* **compress_output_files** : Files extensions to compress in gzip during display build
Plugins
-------
As previously described, plugins acts like UNIX pipes : final statistics are constantly updated by each plugin to produce final result. We have three type of plugins :
Pre analysis plugins : Called before generating days statistics. They are in charge to filter robots, crawlers, bad pages...
Post analysis plugins : Called after basic statistics computation. They are in charge to enlight them with each own algorithms
Display plugins : They are in charge to produce HTML files from statistics.
As previously described, plugins acts like UNIX pipes : statistics are constantly updated by each plugin to produce final result. We have three type of plugins :
To use plugins, just insert their name in pre_analysis_hooks, post_analysis_hooks and display_hooks list.
* **Pre analysis plugins** : Called before generating days statistics. They are in charge to filter robots, crawlers, bad pages...
* **Post analysis plugins** : Called after basic statistics computation. They are in charge to enlight them with their own algorithms
* **Display plugins** : They are in charge to produce HTML files from statistics.
To use plugins, just insert their name in _pre_analysis_hooks_, _post_analysis_hooks_ and _display_hooks_ lists in conf.py.
Statistics are stored in dictionaries :
* **month_stats** : Statistics of current analysed month
* **valid_visitor** : A subset of month_stats without robots
* **days_stats** : Statistics of current analysed day
* **visits** : All visitors with all of its requests
* **meta** : Final result of month statistics (by year)
Create a Plugins
----------------
To create a new plugin, it's necessary to create a derived class of IPlugin (_iplugin.py) in the right directory (_plugins/xxx/your_plugin.py_).
To create a new plugin, it's necessary to create a derived class of IPlugin (_iplugin.py) in the right directory (_plugins/xxx/yourPlugin.py_).
Plugins can defines required configuration values (self.conf_requires) that must be set in conf.py (or can be optional). They can also defines required plugins (self.requires).
For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and graphs.
The two functions to overload are _load(self)_ that must returns True or False if all is good (or not). It's called after _init_. The second is _hook(self)_ that is the body of plugins.
Modules
-------
For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and bar graphs.
Plugins
=======
Optional configuration values ends with *.
tools/extract_doc.py
88
99
1010
11
12
13
11
12
1413
14
15
1516
1617
1718
......
2425
2526
2627
27
28
2829
2930
sys.exit(0)
package_name = filename.replace('/', '.').replace('.py', '')
sys.stdout.write('**%s**' % (package_name))
sys.stdout.write('\n\n')
# sys.stdout.write('-' * len(package_name))
sys.stdout.write('%s' % (package_name))
sys.stdout.write('\n')
# sys.stdout.write('\n\n')
sys.stdout.write('-' * len(package_name))
sys.stdout.write('\n\n')
sys.stderr.write('\tExtract doc from %s\n' % (filename))
else:
break
elif copy:
sys.stdout.write(line)
sys.stdout.write(' %s' % (line))
sys.stdout.write('\n\n')

Archive Download the corresponding diff file

Branches

Tags