iwla

iwla Git Source Tree

Root/docs/index.md

Source at commit 2180f9e7d4003ca4adb58c1da7bf3aa1f85023af created 6 years 8 months ago.
By Gregory Soutade, Update doc
1iwla
2====
3
4Introduction
5------------
6
7iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolothic project with everything in one big PERL file. In opposite, iwla has been though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filter : modify statistics until final result. It's written in Python.
8
9Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
10
11Usage
12-----
13
14 ./iwla [-c|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL]
15
16 -c : Clean output (database and HTML) before starting
17 -i : Read data from stdin instead of conf.analyzed_filename
18 -f : Read data from FILE instead of conf.analyzed_filename
19 -d : Loglevel in ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
20
21Basic usage
22-----------
23
24In addition to command line, iwla read parameters in default_conf.py. User can override default values using _conf.py_ file. Each module requires its own parameters.
25
26Main values to edit are :
27
28 * **analyzed_filename** : web server log
29 * **domaine_name** : domain name to filter
30 * **pre_analysis_hooks** : List of pre analysis hooks
31 * **post_analysis_hooks** : List of post analysis hooks
32 * **display_hooks** : List of display hooks
33 * **locale** : Displayed locale (_en_ or _fr_)
34
35Then, you can launch iwla. Output HTML files are created in _output_ directory by default. To quickly see it, go into _output_ and type
36
37 python -m SimpleHTTPServer 8000
38
39Open your favorite web browser at _http://localhost:8000_. Enjoy !
40
41**Warning** : The order in hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.
42
43
44Interesting default configuration values
45----------------------------------------
46
47 * **DB_ROOT** : Default database directory (default ./output_db)
48 * **DISPLAY_ROOT** : Default HTML output directory (default _./output_)
49 * **log_format** : Web server log format (nginx style). Default is apache log format
50 * **time_format** : Time format used in log format
51 * **pages_extensions** : Extensions that are considered as a HTML page (or result) in opposit to hits
52 * **viewed_http_codes** : HTTP codes that are cosidered OK (200, 304)
53 * **count_hit_only_visitors** : If False, don't count visitors that doesn't GET a page but resources only (images, rss...)
54 * **multimedia_files** : Multimedia extensions (not accounted as downloaded files)
55 * **css_path** : CSS path (you can add yours)
56 * **compress_output_files** : Files extensions to compress in gzip during display build
57
58Plugins
59-------
60
61As previously described, plugins acts like UNIX pipes : statistics are constantly updated by each plugin to produce final result. We have three type of plugins :
62
63 * **Pre analysis plugins** : Called before generating days statistics. They are in charge to filter robots, crawlers, bad pages...
64 * **Post analysis plugins** : Called after basic statistics computation. They are in charge to enlight them with their own algorithms
65 * **Display plugins** : They are in charge to produce HTML files from statistics.
66
67To use plugins, just insert their file name (without _.py_ extension) in _pre_analysis_hooks_, _post_analysis_hooks_ and _display_hooks_ lists in conf.py.
68
69Statistics are stored in dictionaries :
70
71 * **month_stats** : Statistics of current analysed month
72 * **valid_visitor** : A subset of month_stats without robots
73 * **days_stats** : Statistics of current analysed day
74 * **visits** : All visitors with all of its requests
75 * **meta** : Final result of month statistics (by year)
76
77Create a Plugins
78----------------
79
80To create a new plugin, it's necessary to subclass IPlugin (_iplugin.py) in the right directory (_plugins/xxx/yourPlugin.py_).
81
82Plugins can defines required configuration values (self.conf_requires) that must be set in conf.py (or can be optional). They can also defines required plugins (self.requires).
83
84The two functions to overload are _load(self)_ that must returns True or False if all is good (or not). It's called after _init_. The second is _hook(self)_ that is the body of plugins.
85
86For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and bar graphs.
87
88Plugins
89=======
90
91Optional configuration values ends with *.
92
93iwla
94----
95
96 Main class IWLA
97 Parse Log, compute them, call plugins and produce output
98 For now, only HTTP log are valid
99
100 Plugin requirements :
101 None
102
103 Conf values needed :
104 analyzed_filename
105 domain_name
106 locales_path
107 compress_output_files*
108
109 Output files :
110 DB_ROOT/meta.db
111 DB_ROOT/year/month/iwla.db
112 OUTPUT_ROOT/index.html
113 OUTPUT_ROOT/year/month/index.html
114
115 Statistics creation :
116
117 meta :
118 last_time
119 start_analysis_time
120 stats =>
121 year =>
122 month =>
123 viewed_bandwidth
124 not_viewed_bandwidth
125 viewed_pages
126 viewed_hits
127 nb_visits
128 nb_visitors
129
130 month_stats :
131 viewed_bandwidth
132 not_viewed_bandwidth
133 viewed_pages
134 viewed_hits
135 nb_visits
136
137 days_stats :
138 day =>
139 viewed_bandwidth
140 not_viewed_bandwidth
141 viewed_pages
142 viewed_hits
143 nb_visits
144 nb_visitors
145
146 visits :
147 remote_addr =>
148 remote_addr
149 remote_ip
150 viewed_pages
151 viewed_hits
152 not_viewed_pages
153 not_viewed_hits
154 bandwidth
155 last_access
156 requests =>
157 [fields_from_format_log]
158 extract_request =>
159 extract_uri
160 extract_parameters*
161 extract_referer* =>
162 extract_uri
163 extract_parameters*
164 robot
165 hit_only
166 is_page
167
168 valid_visitors:
169 month_stats without robot and hit only visitors (if not conf.count_hit_only_visitors)
170
171 Statistics update :
172 None
173
174 Statistics deletion :
175 None
176
177
178plugins.display.all_visits
179--------------------------
180
181 Display hook
182
183 Create All visits page
184
185 Plugin requirements :
186 None
187
188 Conf values needed :
189 display_visitor_ip*
190
191 Output files :
192 OUTPUT_ROOT/year/month/all_visits.html
193 OUTPUT_ROOT/year/month/index.html
194
195 Statistics creation :
196 None
197
198 Statistics update :
199 None
200
201 Statistics deletion :
202 None
203
204
205plugins.display.referers
206------------------------
207
208 Display hook
209
210 Create Referers page
211
212 Plugin requirements :
213 post_analysis/referers
214
215 Conf values needed :
216 max_referers_displayed*
217 create_all_referers_page*
218 max_key_phrases_displayed*
219 create_all_key_phrases_page*
220
221 Output files :
222 OUTPUT_ROOT/year/month/referers.html
223 OUTPUT_ROOT/year/month/key_phrases.html
224 OUTPUT_ROOT/year/month/index.html
225
226 Statistics creation :
227 None
228
229 Statistics update :
230 None
231
232 Statistics deletion :
233 None
234
235
236plugins.display.top_visitors
237----------------------------
238
239 Display hook
240
241 Create TOP visitors block
242
243 Plugin requirements :
244 None
245
246 Conf values needed :
247 display_visitor_ip*
248
249 Output files :
250 OUTPUT_ROOT/year/month/index.html
251
252 Statistics creation :
253 None
254
255 Statistics update :
256 None
257
258 Statistics deletion :
259 None
260
261
262plugins.display.top_pages
263-------------------------
264
265 Display hook
266
267 Create TOP pages page
268
269 Plugin requirements :
270 post_analysis/top_pages
271
272 Conf values needed :
273 max_pages_displayed*
274 create_all_pages_page*
275
276 Output files :
277 OUTPUT_ROOT/year/month/top_pages.html
278 OUTPUT_ROOT/year/month/index.html
279
280 Statistics creation :
281 None
282
283 Statistics update :
284 None
285
286 Statistics deletion :
287 None
288
289
290plugins.display.top_hits
291------------------------
292
293 Display hook
294
295 Create TOP hits page
296
297 Plugin requirements :
298 post_analysis/top_hits
299
300 Conf values needed :
301 max_hits_displayed*
302 create_all_hits_page*
303
304 Output files :
305 OUTPUT_ROOT/year/month/top_hits.html
306 OUTPUT_ROOT/year/month/index.html
307
308 Statistics creation :
309 None
310
311 Statistics update :
312 None
313
314 Statistics deletion :
315 None
316
317
318plugins.display.top_downloads
319-----------------------------
320
321 Display hook
322
323 Create TOP downloads page
324
325 Plugin requirements :
326 post_analysis/top_downloads
327
328 Conf values needed :
329 max_downloads_displayed*
330 create_all_downloads_page*
331
332 Output files :
333 OUTPUT_ROOT/year/month/top_downloads.html
334 OUTPUT_ROOT/year/month/index.html
335
336 Statistics creation :
337 None
338
339 Statistics update :
340 None
341
342 Statistics deletion :
343 None
344
345
346plugins.pre_analysis.page_to_hit
347--------------------------------
348
349 Pre analysis hook
350 Change page into hit and hit into page into statistics
351
352 Plugin requirements :
353 None
354
355 Conf values needed :
356 page_to_hit_conf*
357 hit_to_page_conf*
358
359 Output files :
360 None
361
362 Statistics creation :
363 None
364
365 Statistics update :
366 visits :
367 remote_addr =>
368 is_page
369
370 Statistics deletion :
371 None
372
373
374plugins.pre_analysis.robots
375---------------------------
376
377 Pre analysis hook
378
379 Filter robots
380
381 Plugin requirements :
382 None
383
384 Conf values needed :
385 page_to_hit_conf*
386 hit_to_page_conf*
387
388 Output files :
389 None
390
391 Statistics creation :
392 None
393
394 Statistics update :
395 visits :
396 remote_addr =>
397 robot
398
399 Statistics deletion :
400 None
401
402
403plugins.post_analysis.referers
404------------------------------
405
406 Post analysis hook
407
408 Extract referers and key phrases from requests
409
410 Plugin requirements :
411 None
412
413 Conf values needed :
414 domain_name
415
416 Output files :
417 None
418
419 Statistics creation :
420 None
421
422 Statistics update :
423 month_stats :
424 referers =>
425 pages
426 hits
427 robots_referers =>
428 pages
429 hits
430 search_engine_referers =>
431 pages
432 hits
433 key_phrases =>
434 phrase
435
436 Statistics deletion :
437 None
438
439
440plugins.post_analysis.top_pages
441-------------------------------
442
443 Post analysis hook
444
445 Count TOP pages
446
447 Plugin requirements :
448 None
449
450 Conf values needed :
451 None
452
453 Output files :
454 None
455
456 Statistics creation :
457 None
458
459 Statistics update :
460 month_stats:
461 top_pages =>
462 uri
463
464 Statistics deletion :
465 None
466
467
468plugins.post_analysis.reverse_dns
469---------------------------------
470
471 Post analysis hook
472
473 Replace IP by reverse DNS names
474
475 Plugin requirements :
476 None
477
478 Conf values needed :
479 reverse_dns_timeout*
480
481 Output files :
482 None
483
484 Statistics creation :
485 None
486
487 Statistics update :
488 valid_visitors:
489 remote_addr
490 dns_name_replaced
491 dns_analyzed
492
493 Statistics deletion :
494 None
495
496
497plugins.post_analysis.top_hits
498------------------------------
499
500 Post analysis hook
501
502 Count TOP hits
503
504 Plugin requirements :
505 None
506
507 Conf values needed :
508 None
509
510 Output files :
511 None
512
513 Statistics creation :
514 None
515
516 Statistics update :
517 month_stats:
518 top_hits =>
519 uri
520
521 Statistics deletion :
522 None
523
524
525plugins.post_analysis.top_downloads
526-----------------------------------
527
528 Post analysis hook
529
530 Count TOP downloads
531
532 Plugin requirements :
533 None
534
535 Conf values needed :
536 None
537
538 Output files :
539 None
540
541 Statistics creation :
542 None
543
544 Statistics update :
545 month_stats:
546 top_downloads =>
547 uri
548
549 Statistics deletion :
550 None
551
552

Archive Download this file

Branches

Tags