iwla

iwla Git Source Tree

Root/docs/index.md

Source at commit 4e02325733e5e8e4f5de2f0046e721f8da7abfff created 6 years 10 months ago.
By Gregory Soutade, Initial commit
1iwla
2====
3
4Introduction
5------------
6
7iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolothic project with everything in one big PERL file. In opposite, iwla has be though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filters : modify statistics until final result. It's written in Python.
8
9Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
10
11Usage
12-----
13
14 ./iwla [-c|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL]
15
16 -c : Clean output (database and HTML) before starting
17 -i : Read data from stdin instead of conf.analyzed_filename
18 -f : Read data from FILE instead of conf.analyzed_filename
19 -d : Loglevel in ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
20
21Basic usage
22-----------
23
24In addition to command line, iwla read parameters in default_conf.py. User can override default values using _conf.py_ file. Each module requires its own parameters.
25
26Main values to edit are :
27
28 * **analyzed_filename** : web server log
29 * **domaine_name** : domain name to filter
30 * **pre_analysis_hooks** : List of pre analysis hooks
31 * **post_analysis_hooks** : List of post analysis hooks
32 * **display_hooks** : List of display hooks
33 * **locale** : Displayed locale (_en_ or _fr_)
34
35Then, you can then iwla. Output HTML files are created in _output_ directory by default. To quickly see it go in output and type
36
37 python -m SimpleHTTPServer 8000
38
39Open your favorite web browser at _http://localhost:8000_. Enjoy !
40
41**Warning** : The order is hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.
42
43
44Interesting default configuration values
45----------------------------------------
46
47 * **DB_ROOT** : Default database directory (default ./output_db)
48 * **DISPLAY_ROOT** : Default HTML output directory (default ./output)
49 * **log_format** : Web server log format (nginx style). Default is apache log format
50 * **time_format** : Time format used in log format
51 * **pages_extensions** : Extensions that are considered as a HTML page (or result) in opposit to hits
52 * **viewed_http_codes** : HTTP codes that are cosidered OK (200, 304)
53 * **count_hit_only_visitors** : If False, doesn't cout visitors that doesn't GET a page but resources only (images, rss...)
54 * **multimedia_files** : Multimedia extensions (not accounted as downloaded files)
55 * **css_path** : CSS path (you can add yours)
56 * **compress_output_files** : Files extensions to compress in gzip during display build
57
58Plugins
59-------
60
61As previously described, plugins acts like UNIX pipes : statistics are constantly updated by each plugin to produce final result. We have three type of plugins :
62
63 * **Pre analysis plugins** : Called before generating days statistics. They are in charge to filter robots, crawlers, bad pages...
64 * **Post analysis plugins** : Called after basic statistics computation. They are in charge to enlight them with their own algorithms
65 * **Display plugins** : They are in charge to produce HTML files from statistics.
66
67To use plugins, just insert their name in _pre_analysis_hooks_, _post_analysis_hooks_ and _display_hooks_ lists in conf.py.
68
69Statistics are stored in dictionaries :
70
71 * **month_stats** : Statistics of current analysed month
72 * **valid_visitor** : A subset of month_stats without robots
73 * **days_stats** : Statistics of current analysed day
74 * **visits** : All visitors with all of its requests
75 * **meta** : Final result of month statistics (by year)
76
77Create a Plugins
78----------------
79
80To create a new plugin, it's necessary to create a derived class of IPlugin (_iplugin.py) in the right directory (_plugins/xxx/yourPlugin.py_).
81
82Plugins can defines required configuration values (self.conf_requires) that must be set in conf.py (or can be optional). They can also defines required plugins (self.requires).
83
84The two functions to overload are _load(self)_ that must returns True or False if all is good (or not). It's called after _init_. The second is _hook(self)_ that is the body of plugins.
85
86For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and bar graphs.
87
88Plugins
89=======
90
91Optional configuration values ends with *.
92
93iwla
94----
95
96 Main class IWLA
97 Parse Log, compute them, call plugins and produce output
98 For now, only HTTP log are valid
99
100 Plugin requirements :
101 None
102
103 Conf values needed :
104 analyzed_filename
105 domain_name
106 locales_path
107 compress_output_files*
108
109 Output files :
110 DB_ROOT/meta.db
111 DB_ROOT/year/month/iwla.db
112 OUTPUT_ROOT/index.html
113 OUTPUT_ROOT/year/month/index.html
114
115 Statistics creation :
116
117 meta :
118 last_time
119 start_analysis_time
120 stats =>
121 year =>
122 month =>
123 viewed_bandwidth
124 not_viewed_bandwidth
125 viewed_pages
126 viewed_hits
127 nb_visits
128 nb_visitors
129
130 month_stats :
131 viewed_bandwidth
132 not_viewed_bandwidth
133 viewed_pages
134 viewed_hits
135 nb_visits
136
137 days_stats :
138 day =>
139 viewed_bandwidth
140 not_viewed_bandwidth
141 viewed_pages
142 viewed_hits
143 nb_visits
144 nb_visitors
145
146 visits :
147 remote_addr =>
148 remote_addr
149 remote_ip
150 viewed_pages
151 viewed_hits
152 not_viewed_pages
153 not_viewed_hits
154 bandwidth
155 last_access
156 requests =>
157 [fields_from_format_log]
158 extract_request =>
159 extract_uri
160 extract_parameters*
161 extract_referer* =>
162 extract_uri
163 extract_parameters*
164 robot
165 hit_only
166 is_page
167
168 valid_visitors:
169 month_stats without robot and hit only visitors (if not conf.count_hit_only_visitors)
170
171 Statistics update :
172 None
173
174 Statistics deletion :
175 None
176
177
178plugins.display.top_downloads
179-----------------------------
180
181 Display hook
182
183 Create TOP downloads page
184
185 Plugin requirements :
186 post_analysis/top_downloads
187
188 Conf values needed :
189 max_downloads_displayed*
190 create_all_downloads_page*
191
192 Output files :
193 OUTPUT_ROOT/year/month/top_downloads.html
194 OUTPUT_ROOT/year/month/index.html
195
196 Statistics creation :
197 None
198
199 Statistics update :
200 None
201
202 Statistics deletion :
203 None
204
205
206plugins.display.all_visits
207--------------------------
208
209 Display hook
210
211 Create All visits page
212
213 Plugin requirements :
214 None
215
216 Conf values needed :
217 display_visitor_ip*
218
219 Output files :
220 OUTPUT_ROOT/year/month/all_visits.html
221 OUTPUT_ROOT/year/month/index.html
222
223 Statistics creation :
224 None
225
226 Statistics update :
227 None
228
229 Statistics deletion :
230 None
231
232
233plugins.display.top_hits
234------------------------
235
236 Display hook
237
238 Create TOP hits page
239
240 Plugin requirements :
241 post_analysis/top_hits
242
243 Conf values needed :
244 max_hits_displayed*
245 create_all_hits_page*
246
247 Output files :
248 OUTPUT_ROOT/year/month/top_hits.html
249 OUTPUT_ROOT/year/month/index.html
250
251 Statistics creation :
252 None
253
254 Statistics update :
255 None
256
257 Statistics deletion :
258 None
259
260
261plugins.display.referers
262------------------------
263
264 Display hook
265
266 Create Referers page
267
268 Plugin requirements :
269 post_analysis/referers
270
271 Conf values needed :
272 max_referers_displayed*
273 create_all_referers_page*
274 max_key_phrases_displayed*
275 create_all_key_phrases_page*
276
277 Output files :
278 OUTPUT_ROOT/year/month/referers.html
279 OUTPUT_ROOT/year/month/key_phrases.html
280 OUTPUT_ROOT/year/month/index.html
281
282 Statistics creation :
283 None
284
285 Statistics update :
286 None
287
288 Statistics deletion :
289 None
290
291
292plugins.display.top_visitors
293----------------------------
294
295 Display hook
296
297 Create TOP visitors block
298
299 Plugin requirements :
300 None
301
302 Conf values needed :
303 display_visitor_ip*
304
305 Output files :
306 OUTPUT_ROOT/year/month/index.html
307
308 Statistics creation :
309 None
310
311 Statistics update :
312 None
313
314 Statistics deletion :
315 None
316
317
318plugins.display.top_pages
319-------------------------
320
321 Display hook
322
323 Create TOP pages page
324
325 Plugin requirements :
326 post_analysis/top_pages
327
328 Conf values needed :
329 max_pages_displayed*
330 create_all_pages_page*
331
332 Output files :
333 OUTPUT_ROOT/year/month/top_pages.html
334 OUTPUT_ROOT/year/month/index.html
335
336 Statistics creation :
337 None
338
339 Statistics update :
340 None
341
342 Statistics deletion :
343 None
344
345
346plugins.post_analysis.top_downloads
347-----------------------------------
348
349 Post analysis hook
350
351 Count TOP downloads
352
353 Plugin requirements :
354 None
355
356 Conf values needed :
357 None
358
359 Output files :
360 None
361
362 Statistics creation :
363 None
364
365 Statistics update :
366 month_stats:
367 top_downloads =>
368 uri
369
370 Statistics deletion :
371 None
372
373
374plugins.post_analysis.top_hits
375------------------------------
376
377 Post analysis hook
378
379 Count TOP hits
380
381 Plugin requirements :
382 None
383
384 Conf values needed :
385 None
386
387 Output files :
388 None
389
390 Statistics creation :
391 None
392
393 Statistics update :
394 month_stats:
395 top_hits =>
396 uri
397
398 Statistics deletion :
399 None
400
401
402plugins.post_analysis.referers
403------------------------------
404
405 Post analysis hook
406
407 Extract referers and key phrases from requests
408
409 Plugin requirements :
410 None
411
412 Conf values needed :
413 domain_name
414
415 Output files :
416 None
417
418 Statistics creation :
419 None
420
421 Statistics update :
422 month_stats :
423 referers =>
424 pages
425 hits
426 robots_referers =>
427 pages
428 hits
429 search_engine_referers =>
430 pages
431 hits
432 key_phrases =>
433 phrase
434
435 Statistics deletion :
436 None
437
438
439plugins.post_analysis.reverse_dns
440---------------------------------
441
442 Post analysis hook
443
444 Replace IP by reverse DNS names
445
446 Plugin requirements :
447 None
448
449 Conf values needed :
450 reverse_dns_timeout*
451
452 Output files :
453 None
454
455 Statistics creation :
456 None
457
458 Statistics update :
459 valid_visitors:
460 remote_addr
461 dns_name_replaced
462 dns_analyzed
463
464 Statistics deletion :
465 None
466
467
468plugins.post_analysis.top_pages
469-------------------------------
470
471 Post analysis hook
472
473 Count TOP pages
474
475 Plugin requirements :
476 None
477
478 Conf values needed :
479 None
480
481 Output files :
482 None
483
484 Statistics creation :
485 None
486
487 Statistics update :
488 month_stats:
489 top_pages =>
490 uri
491
492 Statistics deletion :
493 None
494
495
496plugins.pre_analysis.page_to_hit
497--------------------------------
498
499 Pre analysis hook
500 Change page into hit and hit into page into statistics
501
502 Plugin requirements :
503 None
504
505 Conf values needed :
506 page_to_hit_conf*
507 hit_to_page_conf*
508
509 Output files :
510 None
511
512 Statistics creation :
513 None
514
515 Statistics update :
516 visits :
517 remote_addr =>
518 is_page
519
520 Statistics deletion :
521 None
522
523
524plugins.pre_analysis.robots
525---------------------------
526
527 Pre analysis hook
528
529 Filter robots
530
531 Plugin requirements :
532 None
533
534 Conf values needed :
535 page_to_hit_conf*
536 hit_to_page_conf*
537
538 Output files :
539 None
540
541 Statistics creation :
542 None
543
544 Statistics update :
545 visits :
546 remote_addr =>
547 robot
548
549 Statistics deletion :
550 None
551
552

Archive Download this file

Branches

Tags