iwla

iwla Git Source Tree

Root/docs/index.md

1iwla
2====
3
4Introduction
5------------
6
7iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolothic project with everything in one big PERL file. In opposite, iwla has be though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filters : modify statistics until final result. It's written in Python.
8
9Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
10
11Usage
12-----
13
14 ./iwla [-c|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL]
15
16 -c : Clean output (database and HTML) before starting
17 -i : Read data from stdin instead of conf.analyzed_filename
18 -f : Read data from FILE instead of conf.analyzed_filename
19 -d : Loglevel in ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
20
21Basic usage
22-----------
23
24In addition to command line, iwla read parameters in default_conf.py. User can override default values using _conf.py_ file. Each module requires its own parameters.
25
26Main values to edit are :
27
28 * **analyzed_filename** : web server log
29 * **domaine_name** : domain name to filter
30 * **pre_analysis_hooks** : List of pre analysis hooks
31 * **post_analysis_hooks** : List of post analysis hooks
32 * **display_hooks** : List of display hooks
33 * **locale** : Displayed locale (_en_ or _fr_)
34
35Then, you can then iwla. Output HTML files are created in _output_ directory by default. To quickly see it go in output and type
36
37 python -m SimpleHTTPServer 8000
38
39Open your favorite web browser at _http://localhost:8000_. Enjoy !
40
41**Warning** : The order is hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.
42
43
44Interesting default configuration values
45----------------------------------------
46
47 * **DB_ROOT** : Default database directory (default ./output_db)
48 * **DISPLAY_ROOT** : Default HTML output directory (default ./output)
49 * **log_format** : Web server log format (nginx style). Default is apache log format
50 * **time_format** : Time format used in log format
51 * **pages_extensions** : Extensions that are considered as a HTML page (or result) in opposit to hits
52 * **viewed_http_codes** : HTTP codes that are cosidered OK (200, 304)
53 * **count_hit_only_visitors** : If False, doesn't cout visitors that doesn't GET a page but resources only (images, rss...)
54 * **multimedia_files** : Multimedia extensions (not accounted as downloaded files)
55 * **css_path** : CSS path (you can add yours)
56 * **compress_output_files** : Files extensions to compress in gzip during display build
57
58Plugins
59-------
60
61As previously described, plugins acts like UNIX pipes : statistics are constantly updated by each plugin to produce final result. We have three type of plugins :
62
63 * **Pre analysis plugins** : Called before generating days statistics. They are in charge to filter robots, crawlers, bad pages...
64 * **Post analysis plugins** : Called after basic statistics computation. They are in charge to enlight them with their own algorithms
65 * **Display plugins** : They are in charge to produce HTML files from statistics.
66
67To use plugins, just insert their name in _pre_analysis_hooks_, _post_analysis_hooks_ and _display_hooks_ lists in conf.py.
68
69Statistics are stored in dictionaries :
70
71 * **month_stats** : Statistics of current analysed month
72 * **valid_visitor** : A subset of month_stats without robots
73 * **days_stats** : Statistics of current analysed day
74 * **visits** : All visitors with all of its requests
75 * **meta** : Final result of month statistics (by year)
76
77Create a Plugins
78----------------
79
80To create a new plugin, it's necessary to create a derived class of IPlugin (_iplugin.py) in the right directory (_plugins/xxx/yourPlugin.py_).
81
82Plugins can defines required configuration values (self.conf_requires) that must be set in conf.py (or can be optional). They can also defines required plugins (self.requires).
83
84The two functions to overload are _load(self)_ that must returns True or False if all is good (or not). It's called after _init_. The second is _hook(self)_ that is the body of plugins.
85
86For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and bar graphs.
87
88Plugins
89=======
90
91Optional configuration values ends with *.
92
93iwla
94----
95
96 Main class IWLA
97 Parse Log, compute them, call plugins and produce output
98 For now, only HTTP log are valid
99
100 Plugin requirements :
101 None
102
103 Conf values needed :
104 analyzed_filename
105 domain_name
106 locales_path
107 compress_output_files*
108
109 Output files :
110 DB_ROOT/meta.db
111 DB_ROOT/year/month/iwla.db
112 OUTPUT_ROOT/index.html
113 OUTPUT_ROOT/year/month/index.html
114
115 Statistics creation :
116
117 meta :
118 last_time
119 start_analysis_time
120 stats =>
121 year =>
122 month =>
123 viewed_bandwidth
124 not_viewed_bandwidth
125 viewed_pages
126 viewed_hits
127 nb_visits
128 nb_visitors
129
130 month_stats :
131 viewed_bandwidth
132 not_viewed_bandwidth
133 viewed_pages
134 viewed_hits
135 nb_visits
136
137 days_stats :
138 day =>
139 viewed_bandwidth
140 not_viewed_bandwidth
141 viewed_pages
142 viewed_hits
143 nb_visits
144 nb_visitors
145
146 visits :
147 remote_addr =>
148 remote_addr
149 remote_ip
150 viewed_pages
151 viewed_hits
152 not_viewed_pages
153 not_viewed_hits
154 bandwidth
155 last_access
156 requests =>
157 [fields_from_format_log]
158 extract_request =>
159 extract_uri
160 extract_parameters*
161 extract_referer* =>
162 extract_uri
163 extract_parameters*
164 robot
165 hit_only
166 is_page
167
168 valid_visitors:
169 month_stats without robot and hit only visitors (if not conf.count_hit_only_visitors)
170
171 Statistics update :
172 None
173
174 Statistics deletion :
175 None
176
177
178plugins.display.all_visits
179--------------------------
180
181 Display hook
182
183 Create All visits page
184
185 Plugin requirements :
186 None
187
188 Conf values needed :
189 display_visitor_ip*
190
191 Output files :
192 OUTPUT_ROOT/year/month/all_visits.html
193 OUTPUT_ROOT/year/month/index.html
194
195 Statistics creation :
196 None
197
198 Statistics update :
199 None
200
201 Statistics deletion :
202 None
203
204
205plugins.display.referers
206------------------------
207
208 Display hook
209
210 Create Referers page
211
212 Plugin requirements :
213 post_analysis/referers
214
215 Conf values needed :
216 max_referers_displayed*
217 create_all_referers_page*
218 max_key_phrases_displayed*
219 create_all_key_phrases_page*
220
221 Output files :
222 OUTPUT_ROOT/year/month/referers.html
223 OUTPUT_ROOT/year/month/key_phrases.html
224 OUTPUT_ROOT/year/month/index.html
225
226 Statistics creation :
227 None
228
229 Statistics update :
230 None
231
232 Statistics deletion :
233 None
234
235
236plugins.display.top_downloads
237-----------------------------
238
239 Display hook
240
241 Create TOP downloads page
242
243 Plugin requirements :
244 post_analysis/top_downloads
245
246 Conf values needed :
247 max_downloads_displayed*
248 create_all_downloads_page*
249
250 Output files :
251 OUTPUT_ROOT/year/month/top_downloads.html
252 OUTPUT_ROOT/year/month/index.html
253
254 Statistics creation :
255 None
256
257 Statistics update :
258 None
259
260 Statistics deletion :
261 None
262
263
264plugins.display.top_hits
265------------------------
266
267 Display hook
268
269 Create TOP hits page
270
271 Plugin requirements :
272 post_analysis/top_hits
273
274 Conf values needed :
275 max_hits_displayed*
276 create_all_hits_page*
277
278 Output files :
279 OUTPUT_ROOT/year/month/top_hits.html
280 OUTPUT_ROOT/year/month/index.html
281
282 Statistics creation :
283 None
284
285 Statistics update :
286 None
287
288 Statistics deletion :
289 None
290
291
292plugins.display.top_pages
293-------------------------
294
295 Display hook
296
297 Create TOP pages page
298
299 Plugin requirements :
300 post_analysis/top_pages
301
302 Conf values needed :
303 max_pages_displayed*
304 create_all_pages_page*
305
306 Output files :
307 OUTPUT_ROOT/year/month/top_pages.html
308 OUTPUT_ROOT/year/month/index.html
309
310 Statistics creation :
311 None
312
313 Statistics update :
314 None
315
316 Statistics deletion :
317 None
318
319
320plugins.display.top_visitors
321----------------------------
322
323 Display hook
324
325 Create TOP visitors block
326
327 Plugin requirements :
328 None
329
330 Conf values needed :
331 display_visitor_ip*
332
333 Output files :
334 OUTPUT_ROOT/year/month/index.html
335
336 Statistics creation :
337 None
338
339 Statistics update :
340 None
341
342 Statistics deletion :
343 None
344
345
346plugins.display.referers_diff
347-----------------------------
348
349 Display hook
350
351 Enlight new and updated key phrases in in all_key_phrases.html
352
353 Plugin requirements :
354 display/referers
355
356 Conf values needed :
357 None
358
359 Output files :
360 None
361
362 Statistics creation :
363 None
364
365 Statistics update :
366 None
367
368 Statistics deletion :
369 None
370
371
372plugins.post_analysis.referers
373------------------------------
374
375 Post analysis hook
376
377 Extract referers and key phrases from requests
378
379 Plugin requirements :
380 None
381
382 Conf values needed :
383 domain_name
384
385 Output files :
386 None
387
388 Statistics creation :
389 None
390
391 Statistics update :
392 month_stats :
393 referers =>
394 pages
395 hits
396 robots_referers =>
397 pages
398 hits
399 search_engine_referers =>
400 pages
401 hits
402 key_phrases =>
403 phrase
404
405 Statistics deletion :
406 None
407
408
409plugins.post_analysis.reverse_dns
410---------------------------------
411
412 Post analysis hook
413
414 Replace IP by reverse DNS names
415
416 Plugin requirements :
417 None
418
419 Conf values needed :
420 reverse_dns_timeout*
421
422 Output files :
423 None
424
425 Statistics creation :
426 None
427
428 Statistics update :
429 valid_visitors:
430 remote_addr
431 dns_name_replaced
432 dns_analyzed
433
434 Statistics deletion :
435 None
436
437
438plugins.post_analysis.top_downloads
439-----------------------------------
440
441 Post analysis hook
442
443 Count TOP downloads
444
445 Plugin requirements :
446 None
447
448 Conf values needed :
449 None
450
451 Output files :
452 None
453
454 Statistics creation :
455 None
456
457 Statistics update :
458 month_stats:
459 top_downloads =>
460 uri
461
462 Statistics deletion :
463 None
464
465
466plugins.post_analysis.top_hits
467------------------------------
468
469 Post analysis hook
470
471 Count TOP hits
472
473 Plugin requirements :
474 None
475
476 Conf values needed :
477 None
478
479 Output files :
480 None
481
482 Statistics creation :
483 None
484
485 Statistics update :
486 month_stats:
487 top_hits =>
488 uri
489
490 Statistics deletion :
491 None
492
493
494plugins.post_analysis.top_pages
495-------------------------------
496
497 Post analysis hook
498
499 Count TOP pages
500
501 Plugin requirements :
502 None
503
504 Conf values needed :
505 None
506
507 Output files :
508 None
509
510 Statistics creation :
511 None
512
513 Statistics update :
514 month_stats:
515 top_pages =>
516 uri
517
518 Statistics deletion :
519 None
520
521
522plugins.pre_analysis.page_to_hit
523--------------------------------
524
525 Pre analysis hook
526 Change page into hit and hit into page into statistics
527
528 Plugin requirements :
529 None
530
531 Conf values needed :
532 page_to_hit_conf*
533 hit_to_page_conf*
534
535 Output files :
536 None
537
538 Statistics creation :
539 None
540
541 Statistics update :
542 visits :
543 remote_addr =>
544 is_page
545
546 Statistics deletion :
547 None
548
549
550plugins.pre_analysis.robots
551---------------------------
552
553 Pre analysis hook
554
555 Filter robots
556
557 Plugin requirements :
558 None
559
560 Conf values needed :
561 page_to_hit_conf*
562 hit_to_page_conf*
563
564 Output files :
565 None
566
567 Statistics creation :
568 None
569
570 Statistics update :
571 visits :
572 remote_addr =>
573 robot
574
575 Statistics deletion :
576 None
577
578

Archive Download this file

Branches

Tags