iwla/docs/main.md

5.6 KiB

iwla

Introduction

iwla (Intelligent Web Log Analyzer) is basically a clone of awstats. The main problem with awstats is that it's a very monolithic project with everything in one big PERL file. In opposite, iwla has been though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filter : modify statistics until final result. It's written in Python.

Nevertheless, iwla is only focused on HTTP logs. It uses data (search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).

Demo

A demonstration instance is available here

Usage

./iwla [-c|--config-file file] [-C|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL] [-r|--reset year/month] [-z|--dont-compress] [-p] [-P|--disable-display] [-D|--dry-run]
-c : Configuration file to use (default conf.py)
-C : Clean output (database and HTML) before starting
-i : Read data from stdin instead of conf.analyzed_filename
-f : Analyse this log file, multiple files can be specified    (comma separated). gz files are acceptedRead data from FILE instead of conf.analyzed_filename
-d : Loglevel in ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
-r : Reset analysis to a specific date (month/year)
-z : Don't compress databases (bigger but faster, not compatible with compressed databases)
-p : Only generate display
-P : Don't generate display
-d : Dry run (don't write/update files to disk)

Basic usage

In addition to command line, iwla read parameters in default_conf.py. User can override default values using conf.py file. Each module requires its own parameters.

Main values to edit are :

  • analyzed_filename : web server log
  • domaine_name : domain name to filter
  • pre_analysis_hooks : List of pre analysis hooks
  • post_analysis_hooks : List of post analysis hooks
  • display_hooks : List of display hooks
  • locale : Displayed locale (en or fr)
  • feeds : Address of your feeds files
  • count_hit_only_visitors true/false (don't) count visitors that only do one hit (for a picture, ...)

You can also append an element to an existing default configuration list by using "_append" suffix. Example : multimedia_files_append = ['xml'] or multimedia_files_append = 'xml'

Will append 'xml' to current multimedia_files list

Then, you can launch iwla. Output HTML files are created in output directory by default. To quickly see it, go into output and type

python -m SimpleHTTPServer 8000

Open your favorite web browser at http://localhost:8000. Enjoy !

Warning : The order in hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.

Interesting default configuration values

  • DB_ROOT : Default database directory (default ./output_db)
  • DISPLAY_ROOT : Default HTML output directory (default ./output)
  • log_format : Web server log format (nginx style). Default is apache log format
  • time_format : Time format used in log format
  • pages_extensions : Extensions that are considered as a HTML page (or result) in opposit to hits
  • viewed_http_codes : HTTP codes that are cosidered OK (200, 304)
  • count_hit_only_visitors : If False, don't count visitors that doesn't GET a page but resources only (images, rss...)
  • multimedia_files : Multimedia extensions (not accounted as downloaded files)
  • css_path : CSS path (you can add yours)
  • compress_output_files : Files extensions to compress in gzip during display build

Plugins

As previously described, plugins acts like UNIX pipes : statistics are constantly updated by each plugin to produce final result. We have three type of plugins :

  • Pre analysis plugins : Called before generating days statistics. They are in charge to filter robots, crawlers, bad pages...
  • Post analysis plugins : Called after basic statistics computation. They are in charge to enlight them with their own algorithms
  • Display plugins : They are in charge to produce HTML files from statistics.

To use plugins, just insert their file name (without .py extension) in pre_analysis_hooks, post_analysis_hooks and display_hooks lists in conf.py.

Statistics are stored in dictionaries :

  • month_stats : Statistics of current analysed month
  • valid_visitors : A subset of month_stats without robots
  • days_stats : Statistics of current analysed day
  • visits : All visitors with all of its requests (only if 'keep_requests' is true or filtered)
  • meta : Final result of month statistics (by year)

Create a Plugins

To create a new plugin, it's necessary to subclass IPlugin (_iplugin.py) in the right directory (plugins/xxx/yourPlugin.py).

Plugins can defines required configuration values (self.conf_requires) that must be set in conf.py (or can be optional). They can also defines required plugins (self.requires).

The two functions to overload are load(self) that must returns True or False if all is good (or not). It's called after init. The second is hook(self) that is the body of plugins.

For display plugins, a lot of code has been wrote in display.py that simplify the creation on HTML blocks, tables and bar graphs.

Plugins

Optional configuration values ends with *.