Merge with remote:master

This commit is contained in:
Gregory Soutade 2014-12-21 15:38:50 +01:00
commit 182f7f9465
4 changed files with 1016 additions and 1 deletions

552
docs/index.md Normal file
View File

@ -0,0 +1,552 @@
iwla
====
Introduction
------------
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolothic project with everything in one big PERL file. In opposite, iwla has be though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filters : modify statistics until final result. It's written in Python.
Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
Usage
-----
./iwla [-c|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL]
-c : Clean output (database and HTML) before starting
-i : Read data from stdin instead of conf.analyzed_filename
-f : Read data from FILE instead of conf.analyzed_filename
-d : Loglevel in ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
Basic usage
-----------
In addition to command line, iwla read parameters in default_conf.py. User can override default values using _conf.py_ file. Each module requires its own parameters.
Main values to edit are :
* **analyzed_filename** : web server log
* **domaine_name** : domain name to filter
* **pre_analysis_hooks** : List of pre analysis hooks
* **post_analysis_hooks** : List of post analysis hooks
* **display_hooks** : List of display hooks
* **locale** : Displayed locale (_en_ or _fr_)
Then, you can then iwla. Output HTML files are created in _output_ directory by default. To quickly see it go in output and type
python -m SimpleHTTPServer 8000
Open your favorite web browser at _http://localhost:8000_. Enjoy !
**Warning** : The order is hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.
Interesting default configuration values
----------------------------------------
* **DB_ROOT** : Default database directory (default ./output_db)
* **DISPLAY_ROOT** : Default HTML output directory (default ./output)
* **log_format** : Web server log format (nginx style). Default is apache log format
* **time_format** : Time format used in log format
* **pages_extensions** : Extensions that are considered as a HTML page (or result) in opposit to hits
* **viewed_http_codes** : HTTP codes that are cosidered OK (200, 304)
* **count_hit_only_visitors** : If False, doesn't cout visitors that doesn't GET a page but resources only (images, rss...)
* **multimedia_files** : Multimedia extensions (not accounted as downloaded files)
* **css_path** : CSS path (you can add yours)
* **compress_output_files** : Files extensions to compress in gzip during display build
Plugins
-------
As previously described, plugins acts like UNIX pipes : statistics are constantly updated by each plugin to produce final result. We have three type of plugins :
* **Pre analysis plugins** : Called before generating days statistics. They are in charge to filter robots, crawlers, bad pages...
* **Post analysis plugins** : Called after basic statistics computation. They are in charge to enlight them with their own algorithms
* **Display plugins** : They are in charge to produce HTML files from statistics.
To use plugins, just insert their name in _pre_analysis_hooks_, _post_analysis_hooks_ and _display_hooks_ lists in conf.py.
Statistics are stored in dictionaries :
* **month_stats** : Statistics of current analysed month
* **valid_visitor** : A subset of month_stats without robots
* **days_stats** : Statistics of current analysed day
* **visits** : All visitors with all of its requests
* **meta** : Final result of month statistics (by year)
Create a Plugins
----------------
To create a new plugin, it's necessary to create a derived class of IPlugin (_iplugin.py) in the right directory (_plugins/xxx/yourPlugin.py_).
Plugins can defines required configuration values (self.conf_requires) that must be set in conf.py (or can be optional). They can also defines required plugins (self.requires).
The two functions to overload are _load(self)_ that must returns True or False if all is good (or not). It's called after _init_. The second is _hook(self)_ that is the body of plugins.
For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and bar graphs.
Plugins
=======
Optional configuration values ends with *.
iwla
----
Main class IWLA
Parse Log, compute them, call plugins and produce output
For now, only HTTP log are valid
Plugin requirements :
None
Conf values needed :
analyzed_filename
domain_name
locales_path
compress_output_files*
Output files :
DB_ROOT/meta.db
DB_ROOT/year/month/iwla.db
OUTPUT_ROOT/index.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
meta :
last_time
start_analysis_time
stats =>
year =>
month =>
viewed_bandwidth
not_viewed_bandwidth
viewed_pages
viewed_hits
nb_visits
nb_visitors
month_stats :
viewed_bandwidth
not_viewed_bandwidth
viewed_pages
viewed_hits
nb_visits
days_stats :
day =>
viewed_bandwidth
not_viewed_bandwidth
viewed_pages
viewed_hits
nb_visits
nb_visitors
visits :
remote_addr =>
remote_addr
remote_ip
viewed_pages
viewed_hits
not_viewed_pages
not_viewed_hits
bandwidth
last_access
requests =>
[fields_from_format_log]
extract_request =>
extract_uri
extract_parameters*
extract_referer* =>
extract_uri
extract_parameters*
robot
hit_only
is_page
valid_visitors:
month_stats without robot and hit only visitors (if not conf.count_hit_only_visitors)
Statistics update :
None
Statistics deletion :
None
plugins.display.top_downloads
-----------------------------
Display hook
Create TOP downloads page
Plugin requirements :
post_analysis/top_downloads
Conf values needed :
max_downloads_displayed*
create_all_downloads_page*
Output files :
OUTPUT_ROOT/year/month/top_downloads.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.all_visits
--------------------------
Display hook
Create All visits page
Plugin requirements :
None
Conf values needed :
display_visitor_ip*
Output files :
OUTPUT_ROOT/year/month/all_visits.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.top_hits
------------------------
Display hook
Create TOP hits page
Plugin requirements :
post_analysis/top_hits
Conf values needed :
max_hits_displayed*
create_all_hits_page*
Output files :
OUTPUT_ROOT/year/month/top_hits.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.referers
------------------------
Display hook
Create Referers page
Plugin requirements :
post_analysis/referers
Conf values needed :
max_referers_displayed*
create_all_referers_page*
max_key_phrases_displayed*
create_all_key_phrases_page*
Output files :
OUTPUT_ROOT/year/month/referers.html
OUTPUT_ROOT/year/month/key_phrases.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.top_visitors
----------------------------
Display hook
Create TOP visitors block
Plugin requirements :
None
Conf values needed :
display_visitor_ip*
Output files :
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.top_pages
-------------------------
Display hook
Create TOP pages page
Plugin requirements :
post_analysis/top_pages
Conf values needed :
max_pages_displayed*
create_all_pages_page*
Output files :
OUTPUT_ROOT/year/month/top_pages.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.post_analysis.top_downloads
-----------------------------------
Post analysis hook
Count TOP downloads
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_downloads =>
uri
Statistics deletion :
None
plugins.post_analysis.top_hits
------------------------------
Post analysis hook
Count TOP hits
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_hits =>
uri
Statistics deletion :
None
plugins.post_analysis.referers
------------------------------
Post analysis hook
Extract referers and key phrases from requests
Plugin requirements :
None
Conf values needed :
domain_name
Output files :
None
Statistics creation :
None
Statistics update :
month_stats :
referers =>
pages
hits
robots_referers =>
pages
hits
search_engine_referers =>
pages
hits
key_phrases =>
phrase
Statistics deletion :
None
plugins.post_analysis.reverse_dns
---------------------------------
Post analysis hook
Replace IP by reverse DNS names
Plugin requirements :
None
Conf values needed :
reverse_dns_timeout*
Output files :
None
Statistics creation :
None
Statistics update :
valid_visitors:
remote_addr
dns_name_replaced
dns_analyzed
Statistics deletion :
None
plugins.post_analysis.top_pages
-------------------------------
Post analysis hook
Count TOP pages
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_pages =>
uri
Statistics deletion :
None
plugins.pre_analysis.page_to_hit
--------------------------------
Pre analysis hook
Change page into hit and hit into page into statistics
Plugin requirements :
None
Conf values needed :
page_to_hit_conf*
hit_to_page_conf*
Output files :
None
Statistics creation :
None
Statistics update :
visits :
remote_addr =>
is_page
Statistics deletion :
None
plugins.pre_analysis.robots
---------------------------
Pre analysis hook
Filter robots
Plugin requirements :
None
Conf values needed :
page_to_hit_conf*
hit_to_page_conf*
Output files :
None
Statistics creation :
None
Statistics update :
visits :
remote_addr =>
robot
Statistics deletion :
None

View File

@ -4,7 +4,7 @@ iwla
Introduction
------------
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolothic project with everything in one big PERL file. In opposite, iwla has be though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filters : modify statistics until final result.
iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolothic project with everything in one big PERL file. In opposite, iwla has be though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filters : modify statistics until final result. It's written in Python.
Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).

460
docs/modules.md Normal file
View File

@ -0,0 +1,460 @@
iwla
----
Main class IWLA
Parse Log, compute them, call plugins and produce output
For now, only HTTP log are valid
Plugin requirements :
None
Conf values needed :
analyzed_filename
domain_name
locales_path
compress_output_files*
Output files :
DB_ROOT/meta.db
DB_ROOT/year/month/iwla.db
OUTPUT_ROOT/index.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
meta :
last_time
start_analysis_time
stats =>
year =>
month =>
viewed_bandwidth
not_viewed_bandwidth
viewed_pages
viewed_hits
nb_visits
nb_visitors
month_stats :
viewed_bandwidth
not_viewed_bandwidth
viewed_pages
viewed_hits
nb_visits
days_stats :
day =>
viewed_bandwidth
not_viewed_bandwidth
viewed_pages
viewed_hits
nb_visits
nb_visitors
visits :
remote_addr =>
remote_addr
remote_ip
viewed_pages
viewed_hits
not_viewed_pages
not_viewed_hits
bandwidth
last_access
requests =>
[fields_from_format_log]
extract_request =>
extract_uri
extract_parameters*
extract_referer* =>
extract_uri
extract_parameters*
robot
hit_only
is_page
valid_visitors:
month_stats without robot and hit only visitors (if not conf.count_hit_only_visitors)
Statistics update :
None
Statistics deletion :
None
plugins.display.top_downloads
-----------------------------
Display hook
Create TOP downloads page
Plugin requirements :
post_analysis/top_downloads
Conf values needed :
max_downloads_displayed*
create_all_downloads_page*
Output files :
OUTPUT_ROOT/year/month/top_downloads.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.all_visits
--------------------------
Display hook
Create All visits page
Plugin requirements :
None
Conf values needed :
display_visitor_ip*
Output files :
OUTPUT_ROOT/year/month/all_visits.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.top_hits
------------------------
Display hook
Create TOP hits page
Plugin requirements :
post_analysis/top_hits
Conf values needed :
max_hits_displayed*
create_all_hits_page*
Output files :
OUTPUT_ROOT/year/month/top_hits.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.referers
------------------------
Display hook
Create Referers page
Plugin requirements :
post_analysis/referers
Conf values needed :
max_referers_displayed*
create_all_referers_page*
max_key_phrases_displayed*
create_all_key_phrases_page*
Output files :
OUTPUT_ROOT/year/month/referers.html
OUTPUT_ROOT/year/month/key_phrases.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.top_visitors
----------------------------
Display hook
Create TOP visitors block
Plugin requirements :
None
Conf values needed :
display_visitor_ip*
Output files :
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.display.top_pages
-------------------------
Display hook
Create TOP pages page
Plugin requirements :
post_analysis/top_pages
Conf values needed :
max_pages_displayed*
create_all_pages_page*
Output files :
OUTPUT_ROOT/year/month/top_pages.html
OUTPUT_ROOT/year/month/index.html
Statistics creation :
None
Statistics update :
None
Statistics deletion :
None
plugins.post_analysis.top_downloads
-----------------------------------
Post analysis hook
Count TOP downloads
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_downloads =>
uri
Statistics deletion :
None
plugins.post_analysis.top_hits
------------------------------
Post analysis hook
Count TOP hits
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_hits =>
uri
Statistics deletion :
None
plugins.post_analysis.referers
------------------------------
Post analysis hook
Extract referers and key phrases from requests
Plugin requirements :
None
Conf values needed :
domain_name
Output files :
None
Statistics creation :
None
Statistics update :
month_stats :
referers =>
pages
hits
robots_referers =>
pages
hits
search_engine_referers =>
pages
hits
key_phrases =>
phrase
Statistics deletion :
None
plugins.post_analysis.reverse_dns
---------------------------------
Post analysis hook
Replace IP by reverse DNS names
Plugin requirements :
None
Conf values needed :
reverse_dns_timeout*
Output files :
None
Statistics creation :
None
Statistics update :
valid_visitors:
remote_addr
dns_name_replaced
dns_analyzed
Statistics deletion :
None
plugins.post_analysis.top_pages
-------------------------------
Post analysis hook
Count TOP pages
Plugin requirements :
None
Conf values needed :
None
Output files :
None
Statistics creation :
None
Statistics update :
month_stats:
top_pages =>
uri
Statistics deletion :
None
plugins.pre_analysis.page_to_hit
--------------------------------
Pre analysis hook
Change page into hit and hit into page into statistics
Plugin requirements :
None
Conf values needed :
page_to_hit_conf*
hit_to_page_conf*
Output files :
None
Statistics creation :
None
Statistics update :
visits :
remote_addr =>
is_page
Statistics deletion :
None
plugins.pre_analysis.robots
---------------------------
Pre analysis hook
Filter robots
Plugin requirements :
None
Conf values needed :
page_to_hit_conf*
hit_to_page_conf*
Output files :
None
Statistics creation :
None
Statistics update :
visits :
remote_addr =>
robot
Statistics deletion :
None

3
tools/gettext.sh Executable file
View File

@ -0,0 +1,3 @@
#!/bin/sh
pygettext -a -d iwla -o iwla.pot -k gettext *.py plugins/pre_analysis/*.py plugins/post_analysis/*.py plugins/display/*.py