iwla

iwla Git Source Tree

Root/docs/index.md

1iwla
2====
3
4Introduction
5------------
6
7iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolithic project with everything in one big PERL file. In opposite, iwla has been though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filter : modify statistics until final result. It's written in Python.
8
9Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
10
11Usage
12-----
13
14 ./iwla [-c|--config-file file] [-C|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL] [-r|--reset year/month] [-z|--dont-compress] [-p] [-D|--dry-run]
15
16 -c : Configuration file to use (default conf.py)
17 -C : Clean output (database and HTML) before starting
18 -i : Read data from stdin instead of conf.analyzed_filename
19 -f : Analyse this log file, multiple files can be specified (comma separated). gz files are acceptedRead data from FILE instead of conf.analyzed_filename
20 -d : Loglevel in ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
21 -r : Reset analysis to a specific date (month/year)
22 -z : Don't compress databases (bigger but faster, not compatible with compressed databases)
23 -p : Only generate display
24 -d : Dry run (don't write/update files to disk)
25
26Basic usage
27-----------
28
29In addition to command line, iwla read parameters in default_conf.py. User can override default values using _conf.py_ file. Each module requires its own parameters.
30
31Main values to edit are :
32
33 * **analyzed_filename** : web server log
34 * **domaine_name** : domain name to filter
35 * **pre_analysis_hooks** : List of pre analysis hooks
36 * **post_analysis_hooks** : List of post analysis hooks
37 * **display_hooks** : List of display hooks
38 * **locale** : Displayed locale (_en_ or _fr_)
39
40You can also append an element to an existing default configuration list by using "_append" suffix. Example :
41 multimedia_files_append = ['xml']
42or
43 multimedia_files_append = 'xml'
44Will append 'xml' to current multimedia_files list
45
46Then, you can launch iwla. Output HTML files are created in _output_ directory by default. To quickly see it, go into _output_ and type
47
48 python -m SimpleHTTPServer 8000
49
50Open your favorite web browser at _http://localhost:8000_. Enjoy !
51
52**Warning** : The order in hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.
53
54
55Interesting default configuration values
56----------------------------------------
57
58 * **DB_ROOT** : Default database directory (default ./output_db)
59 * **DISPLAY_ROOT** : Default HTML output directory (default _./output_)
60 * **log_format** : Web server log format (nginx style). Default is apache log format
61 * **time_format** : Time format used in log format
62 * **pages_extensions** : Extensions that are considered as a HTML page (or result) in opposit to hits
63 * **viewed_http_codes** : HTTP codes that are cosidered OK (200, 304)
64 * **count_hit_only_visitors** : If False, don't count visitors that doesn't GET a page but resources only (images, rss...)
65 * **multimedia_files** : Multimedia extensions (not accounted as downloaded files)
66 * **css_path** : CSS path (you can add yours)
67 * **compress_output_files** : Files extensions to compress in gzip during display build
68
69Plugins
70-------
71
72As previously described, plugins acts like UNIX pipes : statistics are constantly updated by each plugin to produce final result. We have three type of plugins :
73
74 * **Pre analysis plugins** : Called before generating days statistics. They are in charge to filter robots, crawlers, bad pages...
75 * **Post analysis plugins** : Called after basic statistics computation. They are in charge to enlight them with their own algorithms
76 * **Display plugins** : They are in charge to produce HTML files from statistics.
77
78To use plugins, just insert their file name (without _.py_ extension) in _pre_analysis_hooks_, _post_analysis_hooks_ and _display_hooks_ lists in conf.py.
79
80Statistics are stored in dictionaries :
81
82 * **month_stats** : Statistics of current analysed month
83 * **valid_visitor** : A subset of month_stats without robots
84 * **days_stats** : Statistics of current analysed day
85 * **visits** : All visitors with all of its requests
86 * **meta** : Final result of month statistics (by year)
87
88Create a Plugins
89----------------
90
91To create a new plugin, it's necessary to subclass IPlugin (_iplugin.py) in the right directory (_plugins/xxx/yourPlugin.py_).
92
93Plugins can defines required configuration values (self.conf_requires) that must be set in conf.py (or can be optional). They can also defines required plugins (self.requires).
94
95The two functions to overload are _load(self)_ that must returns True or False if all is good (or not). It's called after _init_. The second is _hook(self)_ that is the body of plugins.
96
97For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and bar graphs.
98
99Plugins
100=======
101
102Optional configuration values ends with *.
103
104 * iwla.py
105 * plugins/display/all_visits.py
106 * plugins/display/browsers.py
107 * plugins/display/feeds.py
108 * plugins/display/filter_users.py
109 * plugins/display/hours_stats.py
110 * plugins/display/ip_to_geo.py
111 * plugins/display/istats_diff.py
112 * plugins/display/operating_systems.py
113 * plugins/display/referers_diff.py
114 * plugins/display/referers.py
115 * plugins/display/robot_bandwidth.py
116 * plugins/display/top_downloads_diff.py
117 * plugins/display/top_downloads.py
118 * plugins/display/top_hits.py
119 * plugins/display/top_pages_diff.py
120 * plugins/display/top_pages.py
121 * plugins/display/top_visitors.py
122 * plugins/display/track_users.py
123 * plugins/post_analysis/browsers.py
124 * plugins/post_analysis/feeds.py
125 * plugins/post_analysis/filter_users.py
126 * plugins/post_analysis/hours_stats.py
127 * plugins/post_analysis/ip_to_geo.py
128 * plugins/post_analysis/iptogeo.py
129 * plugins/post_analysis/iptogeo.reset.py
130 * plugins/post_analysis/operating_systems.py
131 * plugins/post_analysis/referers.py
132 * plugins/post_analysis/reverse_dns.py
133 * plugins/post_analysis/top_downloads.py
134 * plugins/post_analysis/top_hits.py
135 * plugins/post_analysis/top_pages.py
136 * plugins/pre_analysis/page_to_hit.py
137 * plugins/pre_analysis/robots.py
138
139
140iwla
141----
142
143 Main class IWLA
144 Parse Log, compute them, call plugins and produce output
145 For now, only HTTP log are valid
146
147 Plugin requirements :
148 None
149
150 Conf values needed :
151 analyzed_filename
152 domain_name
153 locales_path
154 compress_output_files
155 excluded_ip
156
157 Output files :
158 DB_ROOT/meta.db
159 DB_ROOT/year/month/iwla.db
160 OUTPUT_ROOT/index.html
161 OUTPUT_ROOT/year/_stats.html
162 OUTPUT_ROOT/year/month/index.html
163
164 Statistics creation :
165
166 meta :
167 last_time
168 start_analysis_time
169 stats =>
170 year =>
171 month =>
172 viewed_bandwidth
173 not_viewed_bandwidth
174 viewed_pages
175 viewed_hits
176 nb_visits
177 nb_visitors
178
179 month_stats :
180 viewed_bandwidth
181 not_viewed_bandwidth
182 viewed_pages
183 viewed_hits
184 nb_visits
185
186 days_stats :
187 day =>
188 viewed_bandwidth
189 not_viewed_bandwidth
190 viewed_pages
191 viewed_hits
192 nb_visits
193 nb_visitors
194
195 visits :
196 remote_addr =>
197 remote_addr
198 remote_ip
199 viewed_pages{0..31} # 0 contains total
200 viewed_hits{0..31} # 0 contains total
201 not_viewed_pages{0..31}
202 not_viewed_hits{0..31}
203 bandwidth{0..31}
204 last_access
205 requests =>
206 [fields_from_format_log]
207 extract_request =>
208 http_method
209 http_uri
210 http_version
211 extract_uri
212 extract_parameters*
213 extract_referer* =>
214 extract_uri
215 extract_parameters*
216 robot
217 hit_only
218 is_page
219 keep_requests
220
221 valid_visitors:
222 month_stats without robot and hit only visitors (if not conf.count_hit_only_visitors)
223
224 Statistics update :
225 None
226
227 Statistics deletion :
228 None
229
230
231plugins.display.all_visits
232--------------------------
233
234 Display hook
235
236 Create All visits page
237
238 Plugin requirements :
239 None
240
241 Conf values needed :
242 display_visitor_ip*
243
244 Output files :
245 OUTPUT_ROOT/year/month/all_visits.html
246 OUTPUT_ROOT/year/month/index.html
247
248 Statistics creation :
249 None
250
251 Statistics update :
252 None
253
254 Statistics deletion :
255 None
256
257
258plugins.display.browsers
259------------------------
260
261 Display hook
262
263 Create browsers page
264
265 Plugin requirements :
266 post_analysis/browsers
267
268 Conf values needed :
269 max_browsers_displayed*
270 create_browsers_page*
271
272 Output files :
273 OUTPUT_ROOT/year/month/browsers.html
274 OUTPUT_ROOT/year/month/index.html
275
276 Statistics creation :
277 None
278
279 Statistics update :
280 None
281
282 Statistics deletion :
283 None
284
285
286plugins.display.feeds
287---------------------
288
289 Display hook
290
291 Display feeds parsers
292
293 Plugin requirements :
294 post_analysis/feeds
295
296 Conf values needed :
297 create_all_feeds_page*
298
299 Output files :
300 OUTPUT_ROOT/year/month/index.html
301 OUTPUT_ROOT/year/month/all_feeds.html
302
303 Statistics creation :
304 None
305
306 Statistics update :
307 None
308
309 Statistics deletion :
310 None
311
312
313plugins.display.filter_users
314----------------------------
315
316 Display hook
317
318 Filter users
319
320 Plugin requirements :
321 None
322
323 Conf values needed :
324 create_filtered_page*
325
326 Output files :
327 OUTPUT_ROOT/year/month/index.html
328 OUTPUT_ROOT/year/month/filtered_users.html
329
330 Statistics creation :
331 None
332
333 Statistics update :
334 None
335
336 Statistics deletion :
337 None
338
339
340plugins.display.hours_stats
341---------------------------
342
343 Display hook
344
345 Display statistics by hour/week day
346
347 Plugin requirements :
348 post_analysis/hours_stats
349
350 Conf values needed :
351 None
352
353 Output files :
354 OUTPUT_ROOT/year/month/index.html
355
356 Statistics creation :
357 None
358
359 Statistics update :
360 None
361
362 Statistics deletion :
363 None
364
365
366plugins.display.ip_to_geo
367-------------------------
368
369 Display hook
370
371 Add geo statistics
372
373 Plugin requirements :
374 post_analysis/ip_to_geo
375
376 Conf values needed :
377 create_geo_page*
378
379 Output files :
380 OUTPUT_ROOT/year/month/index.html
381
382 Statistics creation :
383 None
384
385 Statistics update :
386 None
387
388 Statistics deletion :
389 None
390
391
392plugins.display.istats_diff
393---------------------------
394
395 Display hook interface
396
397 Enlight new and updated statistics
398
399 Plugin requirements :
400 None
401
402 Conf values needed :
403 None
404
405 Output files :
406 None
407
408 Statistics creation :
409 None
410
411 Statistics update :
412 None
413
414 Statistics deletion :
415 None
416
417
418plugins.display.operating_systems
419---------------------------------
420
421 Display hook
422
423 Add operating systems statistics
424
425 Plugin requirements :
426 post_analysis/operating_systems
427
428 Conf values needed :
429 create_families_page*
430
431 Output files :
432 OUTPUT_ROOT/year/month/index.html
433
434 Statistics creation :
435 None
436
437 Statistics update :
438 None
439
440 Statistics deletion :
441 None
442
443
444plugins.display.referers_diff
445-----------------------------
446
447 Display hook
448
449 Enlight new and updated key phrases in in all_key_phrases.html
450
451 Plugin requirements :
452 display/referers
453
454 Conf values needed :
455 None
456
457 Output files :
458 None
459
460 Statistics creation :
461 None
462
463 Statistics update :
464 None
465
466 Statistics deletion :
467 None
468
469
470plugins.display.referers
471------------------------
472
473 Display hook
474
475 Create Referers page
476
477 Plugin requirements :
478 post_analysis/referers
479
480 Conf values needed :
481 max_referers_displayed*
482 create_all_referers_page*
483 max_key_phrases_displayed*
484 create_all_key_phrases_page*
485
486 Output files :
487 OUTPUT_ROOT/year/month/referers.html
488 OUTPUT_ROOT/year/month/key_phrases.html
489 OUTPUT_ROOT/year/month/index.html
490
491 Statistics creation :
492 None
493
494 Statistics update :
495 None
496
497 Statistics deletion :
498 None
499
500
501plugins.display.robot_bandwidth
502-------------------------------
503
504 Display hook
505
506 Display top 10 robot bandwidth use
507
508 Plugin requirements :
509 None
510
511 Conf values needed :
512 display_visitor_ip*
513 create_all_robot_bandwidth_page*
514
515 Output files :
516 OUTPUT_ROOT/year/month/top_robots_bandwidth.html
517 OUTPUT_ROOT/year/month/index.html
518
519 Statistics creation :
520 None
521
522 Statistics update :
523 None
524
525 Statistics deletion :
526 None
527
528
529plugins.display.top_downloads_diff
530----------------------------------
531
532 Display hook
533
534 Enlight new and updated downloads in in top_downloads.html
535
536 Plugin requirements :
537 display/top_downloads
538
539 Conf values needed :
540 None
541
542 Output files :
543 None
544
545 Statistics creation :
546 None
547
548 Statistics update :
549 None
550
551 Statistics deletion :
552 None
553
554
555plugins.display.top_downloads
556-----------------------------
557
558 Display hook
559
560 Create TOP downloads page
561
562 Plugin requirements :
563 post_analysis/top_downloads
564
565 Conf values needed :
566 max_downloads_displayed*
567 create_all_downloads_page*
568
569 Output files :
570 OUTPUT_ROOT/year/month/top_downloads.html
571 OUTPUT_ROOT/year/month/index.html
572
573 Statistics creation :
574 None
575
576 Statistics update :
577 None
578
579 Statistics deletion :
580 None
581
582
583plugins.display.top_hits
584------------------------
585
586 Display hook
587
588 Create TOP hits page
589
590 Plugin requirements :
591 post_analysis/top_hits
592
593 Conf values needed :
594 max_hits_displayed*
595 create_all_hits_page*
596
597 Output files :
598 OUTPUT_ROOT/year/month/top_hits.html
599 OUTPUT_ROOT/year/month/index.html
600
601 Statistics creation :
602 None
603
604 Statistics update :
605 None
606
607 Statistics deletion :
608 None
609
610
611plugins.display.top_pages_diff
612------------------------------
613
614 Display hook
615
616 Enlight new and updated pages in in top_pages.html
617
618 Plugin requirements :
619 display/top_pages
620
621 Conf values needed :
622 None
623
624 Output files :
625 None
626
627 Statistics creation :
628 None
629
630 Statistics update :
631 None
632
633 Statistics deletion :
634 None
635
636
637plugins.display.top_pages
638-------------------------
639
640 Display hook
641
642 Create TOP pages page
643
644 Plugin requirements :
645 post_analysis/top_pages
646
647 Conf values needed :
648 max_pages_displayed*
649 create_all_pages_page*
650
651 Output files :
652 OUTPUT_ROOT/year/month/top_pages.html
653 OUTPUT_ROOT/year/month/index.html
654
655 Statistics creation :
656 None
657
658 Statistics update :
659 None
660
661 Statistics deletion :
662 None
663
664
665plugins.display.top_visitors
666----------------------------
667
668 Display hook
669
670 Create TOP visitors block
671
672 Plugin requirements :
673 None
674
675 Conf values needed :
676 display_visitor_ip*
677
678 Output files :
679 OUTPUT_ROOT/year/month/index.html
680
681 Statistics creation :
682 None
683
684 Statistics update :
685 None
686
687 Statistics deletion :
688 None
689
690
691plugins.display.track_users
692---------------------------
693
694 Display hook
695
696 Track users
697
698 Plugin requirements :
699 None
700
701 Conf values needed :
702 tracked_ip
703 create_tracked_page*
704
705 Output files :
706 OUTPUT_ROOT/year/month/index.html
707 OUTPUT_ROOT/year/month/tracked_users.html
708
709 Statistics creation :
710 None
711
712 Statistics update :
713 None
714
715 Statistics deletion :
716 None
717
718
719plugins.post_analysis.browsers
720------------------------------
721
722 Post analysis hook
723
724 Detect browser information from requests
725
726 Plugin requirements :
727 None
728
729 Conf values needed :
730 None
731
732 Output files :
733 None
734
735 Statistics creation :
736 visits :
737 remote_addr =>
738 browser
739
740 month_stats :
741 browsers =>
742 browser => count
743
744 Statistics update :
745 None
746
747 Statistics deletion :
748 None
749
750
751plugins.post_analysis.feeds
752---------------------------
753
754 Post analysis hook
755
756 Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
757 If there is ony one hit per day to a feed, merge feeds parsers with the same user agent
758 as it must be the same person with a different IP address.
759
760 Plugin requirements :
761 None
762
763 Conf values needed :
764 feeds
765 feeds_referers*
766 merge_one_hit_only_feeds_parsers*
767
768 Output files :
769 None
770
771 Statistics creation :
772 remote_addr =>
773 feed_parser
774
775 Statistics update :
776 None
777
778 Statistics deletion :
779 None
780
781
782plugins.post_analysis.filter_users
783----------------------------------
784
785 Post analysis hook
786
787 Filter users with given user conditions
788
789 Plugin requirements :
790 None
791
792 Conf values needed :
793 filtered_users : list of filters
794 filtered_ip : list of ip (string)
795 create_filtered_page*
796
797 Filter is a list of filter description combined by AND operator
798 Filter description is a list of 3 elements :
799
800 * Field to match in visits
801 * Operator '=', '==', '!=', '>', '>=', '<', '<=' for int value
802 * Operator '=', '==', '!=', 'in', 'match' for str value
803 * Target value
804
805 For easiest config, you can indicate both 'remote_addr' or 'ip' in field element
806
807 Output files :
808 None
809
810 Statistics creation :
811 visits :
812 remote_addr =>
813 filtered
814
815 Statistics update :
816 visits :
817 remote_addr =>
818 keep_requests
819
820 Statistics deletion :
821 None
822
823
824plugins.post_analysis.hours_stats
825---------------------------------
826
827 Post analysis hook
828
829 Count pages, hits and bandwidth by hour/week day
830
831 Plugin requirements :
832 None
833
834 Conf values needed :
835 None
836
837 Output files :
838 None
839
840 Statistics creation :
841 month_stats:
842 hours_stats =>
843 00 .. 23 =>
844 pages
845 hits
846 bandwidth
847
848 days_stats =>
849 0 .. 6 =>
850 pages
851 hits
852 bandwidth
853
854 Statistics update :
855 None
856
857 Statistics deletion :
858 None
859
860
861plugins.post_analysis.ip_to_geo
862-------------------------------
863
864 Post analysis hook
865
866 Get country code from IP address
867
868 Plugin requirements :
869 None
870
871 Conf values needed :
872 iptogeo_remote_addr*
873 iptogeo_remote_port*
874
875 Output files :
876 None
877
878 Statistics creation :
879 geo =>
880 country_code => count
881 None
882
883 Statistics update :
884 valid_visitors:
885 country_code
886
887 Statistics deletion :
888 None
889
890
891plugins.post_analysis.iptogeo
892-----------------------------
893
894
895
896plugins.post_analysis.iptogeo.reset
897-----------------------------------
898
899
900
901plugins.post_analysis.operating_systems
902---------------------------------------
903
904 Post analysis hook
905
906 Detect operating systems from requests
907
908 Plugin requirements :
909 None
910
911 Conf values needed :
912 None
913
914 Output files :
915 None
916
917 Statistics creation :
918 visits :
919 remote_addr =>
920 operating_system
921
922 month_stats :
923 operating_systems =>
924 operating_system => count
925
926 os_families =>
927 family => count
928
929 Statistics update :
930 None
931
932 Statistics deletion :
933 None
934
935
936plugins.post_analysis.referers
937------------------------------
938
939 Post analysis hook
940
941 Extract referers and key phrases from requests
942
943 Plugin requirements :
944 None
945
946 Conf values needed :
947 domain_name
948
949 Output files :
950 None
951
952 Statistics creation :
953 None
954
955 Statistics update :
956 month_stats :
957 referers =>
958 pages => count
959 hits => count
960 robots_referers =>
961 pages => count
962 hits => count
963 search_engine_referers =>
964 pages => count
965 hits => count
966 key_phrases =>
967 phrase => count
968
969 Statistics deletion :
970 None
971
972
973plugins.post_analysis.reverse_dns
974---------------------------------
975
976 Post analysis hook
977
978 Replace IP by reverse DNS names
979
980 Plugin requirements :
981 None
982
983 Conf values needed :
984 reverse_dns_timeout*
985
986 Output files :
987 None
988
989 Statistics creation :
990 None
991
992 Statistics update :
993 valid_visitors:
994 remote_addr
995 dns_name_replaced
996 dns_analyzed
997
998 Statistics deletion :
999 None
1000
1001
1002plugins.post_analysis.top_downloads
1003-----------------------------------
1004
1005 Post analysis hook
1006
1007 Count TOP downloads
1008
1009 Plugin requirements :
1010 None
1011
1012 Conf values needed :
1013 None
1014
1015 Output files :
1016 None
1017
1018 Statistics creation :
1019 None
1020
1021 Statistics update :
1022 month_stats:
1023 top_downloads =>
1024 uri => count
1025
1026 Statistics deletion :
1027 None
1028
1029
1030plugins.post_analysis.top_hits
1031------------------------------
1032
1033 Post analysis hook
1034
1035 Count TOP hits
1036
1037 Plugin requirements :
1038 None
1039
1040 Conf values needed :
1041 None
1042
1043 Output files :
1044 None
1045
1046 Statistics creation :
1047 None
1048
1049 Statistics update :
1050 month_stats:
1051 top_hits =>
1052 uri => count
1053
1054 Statistics deletion :
1055 None
1056
1057
1058plugins.post_analysis.top_pages
1059-------------------------------
1060
1061 Post analysis hook
1062
1063 Count TOP pages
1064
1065 Plugin requirements :
1066 None
1067
1068 Conf values needed :
1069 None
1070
1071 Output files :
1072 None
1073
1074 Statistics creation :
1075 None
1076
1077 Statistics update :
1078 month_stats:
1079 top_pages =>
1080 uri => count
1081
1082 Statistics deletion :
1083 None
1084
1085
1086plugins.pre_analysis.page_to_hit
1087--------------------------------
1088
1089 Pre analysis hook
1090 Change page into hit and hit into page into statistics
1091
1092 Plugin requirements :
1093 None
1094
1095 Conf values needed :
1096 page_to_hit_conf*
1097 hit_to_page_conf*
1098
1099 Output files :
1100 None
1101
1102 Statistics creation :
1103 None
1104
1105 Statistics update :
1106 visits :
1107 remote_addr =>
1108 is_page
1109
1110 Statistics deletion :
1111 None
1112
1113
1114plugins.pre_analysis.robots
1115---------------------------
1116
1117 Pre analysis hook
1118
1119 Filter robots
1120
1121 Plugin requirements :
1122 None
1123
1124 Conf values needed :
1125 None
1126
1127 Output files :
1128 None
1129
1130 Statistics creation :
1131 None
1132
1133 Statistics update :
1134 visits :
1135 remote_addr =>
1136 robot
1137 keep_requests
1138
1139 Statistics deletion :
1140 None
1141
1142

Archive Download this file

Branches

Tags