iwla

iwla Git Source Tree

Root/docs/index.md

1iwla
2====
3
4Introduction
5------------
6
7iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolithic project with everything in one big PERL file. In opposite, iwla has been though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filter : modify statistics until final result. It's written in Python.
8
9Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
10
11Usage
12-----
13
14 ./iwla [-c|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL] [-r|--reset year/month] [-z|--dont-compress] [-p] [-D|--dry-run]
15
16 -c : Clean output (database and HTML) before starting
17 -i : Read data from stdin instead of conf.analyzed_filename
18 -f : Analyse this log file, multiple files can be specified (comma separated). gz files are acceptedRead data from FILE instead of conf.analyzed_filename
19 -d : Loglevel in ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
20 -r : Reset analysis to a specific date (month/year)
21 -z : Don't compress databases (bigger but faster, not compatible with compressed databases)
22 -p : Only generate display
23 -d : Dry run (don't write/update files to disk)
24
25Basic usage
26-----------
27
28In addition to command line, iwla read parameters in default_conf.py. User can override default values using _conf.py_ file. Each module requires its own parameters.
29
30Main values to edit are :
31
32 * **analyzed_filename** : web server log
33 * **domaine_name** : domain name to filter
34 * **pre_analysis_hooks** : List of pre analysis hooks
35 * **post_analysis_hooks** : List of post analysis hooks
36 * **display_hooks** : List of display hooks
37 * **locale** : Displayed locale (_en_ or _fr_)
38
39You can also append an element to an existing default configuration list by using "_append" suffix. Example :
40 multimedia_files_append = ['xml']
41or
42 multimedia_files_append = 'xml'
43Will append 'xml' to current multimedia_files list
44
45Then, you can launch iwla. Output HTML files are created in _output_ directory by default. To quickly see it, go into _output_ and type
46
47 python -m SimpleHTTPServer 8000
48
49Open your favorite web browser at _http://localhost:8000_. Enjoy !
50
51**Warning** : The order in hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.
52
53
54Interesting default configuration values
55----------------------------------------
56
57 * **DB_ROOT** : Default database directory (default ./output_db)
58 * **DISPLAY_ROOT** : Default HTML output directory (default _./output_)
59 * **log_format** : Web server log format (nginx style). Default is apache log format
60 * **time_format** : Time format used in log format
61 * **pages_extensions** : Extensions that are considered as a HTML page (or result) in opposit to hits
62 * **viewed_http_codes** : HTTP codes that are cosidered OK (200, 304)
63 * **count_hit_only_visitors** : If False, don't count visitors that doesn't GET a page but resources only (images, rss...)
64 * **multimedia_files** : Multimedia extensions (not accounted as downloaded files)
65 * **css_path** : CSS path (you can add yours)
66 * **compress_output_files** : Files extensions to compress in gzip during display build
67
68Plugins
69-------
70
71As previously described, plugins acts like UNIX pipes : statistics are constantly updated by each plugin to produce final result. We have three type of plugins :
72
73 * **Pre analysis plugins** : Called before generating days statistics. They are in charge to filter robots, crawlers, bad pages...
74 * **Post analysis plugins** : Called after basic statistics computation. They are in charge to enlight them with their own algorithms
75 * **Display plugins** : They are in charge to produce HTML files from statistics.
76
77To use plugins, just insert their file name (without _.py_ extension) in _pre_analysis_hooks_, _post_analysis_hooks_ and _display_hooks_ lists in conf.py.
78
79Statistics are stored in dictionaries :
80
81 * **month_stats** : Statistics of current analysed month
82 * **valid_visitor** : A subset of month_stats without robots
83 * **days_stats** : Statistics of current analysed day
84 * **visits** : All visitors with all of its requests
85 * **meta** : Final result of month statistics (by year)
86
87Create a Plugins
88----------------
89
90To create a new plugin, it's necessary to subclass IPlugin (_iplugin.py) in the right directory (_plugins/xxx/yourPlugin.py_).
91
92Plugins can defines required configuration values (self.conf_requires) that must be set in conf.py (or can be optional). They can also defines required plugins (self.requires).
93
94The two functions to overload are _load(self)_ that must returns True or False if all is good (or not). It's called after _init_. The second is _hook(self)_ that is the body of plugins.
95
96For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and bar graphs.
97
98Plugins
99=======
100
101Optional configuration values ends with *.
102
103 * iwla.py
104 * plugins/display/all_visits.py
105 * plugins/display/browsers.py
106 * plugins/display/feeds.py
107 * plugins/display/filter_users.py
108 * plugins/display/hours_stats.py
109 * plugins/display/ip_to_geo.py
110 * plugins/display/istats_diff.py
111 * plugins/display/operating_systems.py
112 * plugins/display/referers_diff.py
113 * plugins/display/referers.py
114 * plugins/display/robot_bandwidth.py
115 * plugins/display/top_downloads_diff.py
116 * plugins/display/top_downloads.py
117 * plugins/display/top_hits.py
118 * plugins/display/top_pages_diff.py
119 * plugins/display/top_pages.py
120 * plugins/display/top_visitors.py
121 * plugins/display/track_users.py
122 * plugins/post_analysis/browsers.py
123 * plugins/post_analysis/feeds.py
124 * plugins/post_analysis/hours_stats.py
125 * plugins/post_analysis/ip_to_geo.py
126 * plugins/post_analysis/iptogeo.py
127 * plugins/post_analysis/iptogeo.reset.py
128 * plugins/post_analysis/operating_systems.py
129 * plugins/post_analysis/referers.py
130 * plugins/post_analysis/reverse_dns.py
131 * plugins/post_analysis/top_downloads.py
132 * plugins/post_analysis/top_hits.py
133 * plugins/post_analysis/top_pages.py
134 * plugins/pre_analysis/page_to_hit.py
135 * plugins/pre_analysis/robots.py
136
137
138iwla
139----
140
141 Main class IWLA
142 Parse Log, compute them, call plugins and produce output
143 For now, only HTTP log are valid
144
145 Plugin requirements :
146 None
147
148 Conf values needed :
149 analyzed_filename
150 domain_name
151 locales_path
152 compress_output_files
153 excluded_ip
154
155 Output files :
156 DB_ROOT/meta.db
157 DB_ROOT/year/month/iwla.db
158 OUTPUT_ROOT/index.html
159 OUTPUT_ROOT/year/_stats.html
160 OUTPUT_ROOT/year/month/index.html
161
162 Statistics creation :
163
164 meta :
165 last_time
166 start_analysis_time
167 stats =>
168 year =>
169 month =>
170 viewed_bandwidth
171 not_viewed_bandwidth
172 viewed_pages
173 viewed_hits
174 nb_visits
175 nb_visitors
176
177 month_stats :
178 viewed_bandwidth
179 not_viewed_bandwidth
180 viewed_pages
181 viewed_hits
182 nb_visits
183
184 days_stats :
185 day =>
186 viewed_bandwidth
187 not_viewed_bandwidth
188 viewed_pages
189 viewed_hits
190 nb_visits
191 nb_visitors
192
193 visits :
194 remote_addr =>
195 remote_addr
196 remote_ip
197 viewed_pages{0..31} # 0 contains total
198 viewed_hits{0..31} # 0 contains total
199 not_viewed_pages{0..31}
200 not_viewed_hits{0..31}
201 bandwidth{0..31}
202 last_access
203 requests =>
204 [fields_from_format_log]
205 extract_request =>
206 http_method
207 http_uri
208 http_version
209 extract_uri
210 extract_parameters*
211 extract_referer* =>
212 extract_uri
213 extract_parameters*
214 robot
215 hit_only
216 is_page
217
218 valid_visitors:
219 month_stats without robot and hit only visitors (if not conf.count_hit_only_visitors)
220
221 Statistics update :
222 None
223
224 Statistics deletion :
225 None
226
227
228plugins.display.all_visits
229--------------------------
230
231 Display hook
232
233 Create All visits page
234
235 Plugin requirements :
236 None
237
238 Conf values needed :
239 display_visitor_ip*
240
241 Output files :
242 OUTPUT_ROOT/year/month/all_visits.html
243 OUTPUT_ROOT/year/month/index.html
244
245 Statistics creation :
246 None
247
248 Statistics update :
249 None
250
251 Statistics deletion :
252 None
253
254
255plugins.display.browsers
256------------------------
257
258 Display hook
259
260 Create browsers page
261
262 Plugin requirements :
263 post_analysis/browsers
264
265 Conf values needed :
266 max_browsers_displayed*
267 create_browsers_page*
268
269 Output files :
270 OUTPUT_ROOT/year/month/browsers.html
271 OUTPUT_ROOT/year/month/index.html
272
273 Statistics creation :
274 None
275
276 Statistics update :
277 None
278
279 Statistics deletion :
280 None
281
282
283plugins.display.feeds
284---------------------
285
286 Display hook
287
288 Display feeds parsers
289
290 Plugin requirements :
291 post_analysis/feeds
292
293 Conf values needed :
294 create_all_feeds_page*
295
296 Output files :
297 OUTPUT_ROOT/year/month/index.html
298 OUTPUT_ROOT/year/month/all_feeds.html
299
300 Statistics creation :
301 None
302
303 Statistics update :
304 None
305
306 Statistics deletion :
307 None
308
309
310plugins.display.filter_users
311----------------------------
312
313 Display hook
314
315 Filter users
316
317 Plugin requirements :
318 None
319
320 Conf values needed :
321 filtered_users : list of filters
322 filtered_ip : list of ip (string)
323 create_filtered_page*
324
325 Filter is a list of filter description combined by AND operator
326 Filter description is a list of 3 elements :
327
328 * Field to match in visits
329 * Operator '=', '==', '!=', '>', '>=', '<', '<=' for int value
330 * Operator '=', '==', '!=', 'in', 'match' for str value
331 * Target value
332
333 For easiest config, you can indicate both 'remote_addr' or 'ip' in field element
334
335 Output files :
336 OUTPUT_ROOT/year/month/index.html
337 OUTPUT_ROOT/year/month/filtered_users.html
338
339 Statistics creation :
340 None
341
342 Statistics update :
343 None
344
345 Statistics deletion :
346 None
347
348
349plugins.display.hours_stats
350---------------------------
351
352 Display hook
353
354 Display statistics by hour/week day
355
356 Plugin requirements :
357 post_analysis/hours_stats
358
359 Conf values needed :
360 None
361
362 Output files :
363 OUTPUT_ROOT/year/month/index.html
364
365 Statistics creation :
366 None
367
368 Statistics update :
369 None
370
371 Statistics deletion :
372 None
373
374
375plugins.display.ip_to_geo
376-------------------------
377
378 Display hook
379
380 Add geo statistics
381
382 Plugin requirements :
383 post_analysis/ip_to_geo
384
385 Conf values needed :
386 create_geo_page*
387
388 Output files :
389 OUTPUT_ROOT/year/month/index.html
390
391 Statistics creation :
392 None
393
394 Statistics update :
395 None
396
397 Statistics deletion :
398 None
399
400
401plugins.display.istats_diff
402---------------------------
403
404 Display hook interface
405
406 Enlight new and updated statistics
407
408 Plugin requirements :
409 None
410
411 Conf values needed :
412 None
413
414 Output files :
415 None
416
417 Statistics creation :
418 None
419
420 Statistics update :
421 None
422
423 Statistics deletion :
424 None
425
426
427plugins.display.operating_systems
428---------------------------------
429
430 Display hook
431
432 Add operating systems statistics
433
434 Plugin requirements :
435 post_analysis/operating_systems
436
437 Conf values needed :
438 create_families_page*
439
440 Output files :
441 OUTPUT_ROOT/year/month/index.html
442
443 Statistics creation :
444 None
445
446 Statistics update :
447 None
448
449 Statistics deletion :
450 None
451
452
453plugins.display.referers_diff
454-----------------------------
455
456 Display hook
457
458 Enlight new and updated key phrases in in all_key_phrases.html
459
460 Plugin requirements :
461 display/referers
462
463 Conf values needed :
464 None
465
466 Output files :
467 None
468
469 Statistics creation :
470 None
471
472 Statistics update :
473 None
474
475 Statistics deletion :
476 None
477
478
479plugins.display.referers
480------------------------
481
482 Display hook
483
484 Create Referers page
485
486 Plugin requirements :
487 post_analysis/referers
488
489 Conf values needed :
490 max_referers_displayed*
491 create_all_referers_page*
492 max_key_phrases_displayed*
493 create_all_key_phrases_page*
494
495 Output files :
496 OUTPUT_ROOT/year/month/referers.html
497 OUTPUT_ROOT/year/month/key_phrases.html
498 OUTPUT_ROOT/year/month/index.html
499
500 Statistics creation :
501 None
502
503 Statistics update :
504 None
505
506 Statistics deletion :
507 None
508
509
510plugins.display.robot_bandwidth
511-------------------------------
512
513 Display hook
514
515 Display top 10 robot bandwidth use
516
517 Plugin requirements :
518 None
519
520 Conf values needed :
521 display_visitor_ip*
522 create_all_robot_bandwidth_page*
523
524 Output files :
525 OUTPUT_ROOT/year/month/top_robots_bandwidth.html
526 OUTPUT_ROOT/year/month/index.html
527
528 Statistics creation :
529 None
530
531 Statistics update :
532 None
533
534 Statistics deletion :
535 None
536
537
538plugins.display.top_downloads_diff
539----------------------------------
540
541 Display hook
542
543 Enlight new and updated downloads in in top_downloads.html
544
545 Plugin requirements :
546 display/top_downloads
547
548 Conf values needed :
549 None
550
551 Output files :
552 None
553
554 Statistics creation :
555 None
556
557 Statistics update :
558 None
559
560 Statistics deletion :
561 None
562
563
564plugins.display.top_downloads
565-----------------------------
566
567 Display hook
568
569 Create TOP downloads page
570
571 Plugin requirements :
572 post_analysis/top_downloads
573
574 Conf values needed :
575 max_downloads_displayed*
576 create_all_downloads_page*
577
578 Output files :
579 OUTPUT_ROOT/year/month/top_downloads.html
580 OUTPUT_ROOT/year/month/index.html
581
582 Statistics creation :
583 None
584
585 Statistics update :
586 None
587
588 Statistics deletion :
589 None
590
591
592plugins.display.top_hits
593------------------------
594
595 Display hook
596
597 Create TOP hits page
598
599 Plugin requirements :
600 post_analysis/top_hits
601
602 Conf values needed :
603 max_hits_displayed*
604 create_all_hits_page*
605
606 Output files :
607 OUTPUT_ROOT/year/month/top_hits.html
608 OUTPUT_ROOT/year/month/index.html
609
610 Statistics creation :
611 None
612
613 Statistics update :
614 None
615
616 Statistics deletion :
617 None
618
619
620plugins.display.top_pages_diff
621------------------------------
622
623 Display hook
624
625 Enlight new and updated pages in in top_pages.html
626
627 Plugin requirements :
628 display/top_pages
629
630 Conf values needed :
631 None
632
633 Output files :
634 None
635
636 Statistics creation :
637 None
638
639 Statistics update :
640 None
641
642 Statistics deletion :
643 None
644
645
646plugins.display.top_pages
647-------------------------
648
649 Display hook
650
651 Create TOP pages page
652
653 Plugin requirements :
654 post_analysis/top_pages
655
656 Conf values needed :
657 max_pages_displayed*
658 create_all_pages_page*
659
660 Output files :
661 OUTPUT_ROOT/year/month/top_pages.html
662 OUTPUT_ROOT/year/month/index.html
663
664 Statistics creation :
665 None
666
667 Statistics update :
668 None
669
670 Statistics deletion :
671 None
672
673
674plugins.display.top_visitors
675----------------------------
676
677 Display hook
678
679 Create TOP visitors block
680
681 Plugin requirements :
682 None
683
684 Conf values needed :
685 display_visitor_ip*
686
687 Output files :
688 OUTPUT_ROOT/year/month/index.html
689
690 Statistics creation :
691 None
692
693 Statistics update :
694 None
695
696 Statistics deletion :
697 None
698
699
700plugins.display.track_users
701---------------------------
702
703 Display hook
704
705 Track users
706
707 Plugin requirements :
708 None
709
710 Conf values needed :
711 tracked_ip
712 create_tracked_page*
713
714 Output files :
715 OUTPUT_ROOT/year/month/index.html
716 OUTPUT_ROOT/year/month/tracked_users.html
717
718 Statistics creation :
719 None
720
721 Statistics update :
722 None
723
724 Statistics deletion :
725 None
726
727
728plugins.post_analysis.browsers
729------------------------------
730
731 Post analysis hook
732
733 Detect browser information from requests
734
735 Plugin requirements :
736 None
737
738 Conf values needed :
739 None
740
741 Output files :
742 None
743
744 Statistics creation :
745 visits :
746 remote_addr =>
747 browser
748
749 month_stats :
750 browsers =>
751 browser => count
752
753 Statistics update :
754 None
755
756 Statistics deletion :
757 None
758
759
760plugins.post_analysis.feeds
761---------------------------
762
763 Post analysis hook
764
765 Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
766 If there is ony one hit per day to a feed, merge feeds parsers with the same user agent
767 as it must be the same person with a different IP address.
768
769 Plugin requirements :
770 None
771
772 Conf values needed :
773 feeds
774 feeds_referers*
775 merge_one_hit_only_feeds_parsers*
776
777 Output files :
778 None
779
780 Statistics creation :
781 remote_addr =>
782 feed_parser
783
784 Statistics update :
785 None
786
787 Statistics deletion :
788 None
789
790
791plugins.post_analysis.hours_stats
792---------------------------------
793
794 Post analysis hook
795
796 Count pages, hits and bandwidth by hour/week day
797
798 Plugin requirements :
799 None
800
801 Conf values needed :
802 None
803
804 Output files :
805 None
806
807 Statistics creation :
808 month_stats:
809 hours_stats =>
810 00 .. 23 =>
811 pages
812 hits
813 bandwidth
814
815 days_stats =>
816 0 .. 6 =>
817 pages
818 hits
819 bandwidth
820
821 Statistics update :
822 None
823
824 Statistics deletion :
825 None
826
827
828plugins.post_analysis.ip_to_geo
829-------------------------------
830
831 Post analysis hook
832
833 Get country code from IP address
834
835 Plugin requirements :
836 None
837
838 Conf values needed :
839 iptogeo_remote_addr*
840 iptogeo_remote_port*
841
842 Output files :
843 None
844
845 Statistics creation :
846 geo =>
847 country_code => count
848 None
849
850 Statistics update :
851 valid_visitors:
852 country_code
853
854 Statistics deletion :
855 None
856
857
858plugins.post_analysis.iptogeo
859-----------------------------
860
861
862
863plugins.post_analysis.iptogeo.reset
864-----------------------------------
865
866
867
868plugins.post_analysis.operating_systems
869---------------------------------------
870
871 Post analysis hook
872
873 Detect operating systems from requests
874
875 Plugin requirements :
876 None
877
878 Conf values needed :
879 None
880
881 Output files :
882 None
883
884 Statistics creation :
885 visits :
886 remote_addr =>
887 operating_system
888
889 month_stats :
890 operating_systems =>
891 operating_system => count
892
893 os_families =>
894 family => count
895
896 Statistics update :
897 None
898
899 Statistics deletion :
900 None
901
902
903plugins.post_analysis.referers
904------------------------------
905
906 Post analysis hook
907
908 Extract referers and key phrases from requests
909
910 Plugin requirements :
911 None
912
913 Conf values needed :
914 domain_name
915
916 Output files :
917 None
918
919 Statistics creation :
920 None
921
922 Statistics update :
923 month_stats :
924 referers =>
925 pages => count
926 hits => count
927 robots_referers =>
928 pages => count
929 hits => count
930 search_engine_referers =>
931 pages => count
932 hits => count
933 key_phrases =>
934 phrase => count
935
936 Statistics deletion :
937 None
938
939
940plugins.post_analysis.reverse_dns
941---------------------------------
942
943 Post analysis hook
944
945 Replace IP by reverse DNS names
946
947 Plugin requirements :
948 None
949
950 Conf values needed :
951 reverse_dns_timeout*
952
953 Output files :
954 None
955
956 Statistics creation :
957 None
958
959 Statistics update :
960 valid_visitors:
961 remote_addr
962 dns_name_replaced
963 dns_analyzed
964
965 Statistics deletion :
966 None
967
968
969plugins.post_analysis.top_downloads
970-----------------------------------
971
972 Post analysis hook
973
974 Count TOP downloads
975
976 Plugin requirements :
977 None
978
979 Conf values needed :
980 None
981
982 Output files :
983 None
984
985 Statistics creation :
986 None
987
988 Statistics update :
989 month_stats:
990 top_downloads =>
991 uri => count
992
993 Statistics deletion :
994 None
995
996
997plugins.post_analysis.top_hits
998------------------------------
999
1000 Post analysis hook
1001
1002 Count TOP hits
1003
1004 Plugin requirements :
1005 None
1006
1007 Conf values needed :
1008 None
1009
1010 Output files :
1011 None
1012
1013 Statistics creation :
1014 None
1015
1016 Statistics update :
1017 month_stats:
1018 top_hits =>
1019 uri => count
1020
1021 Statistics deletion :
1022 None
1023
1024
1025plugins.post_analysis.top_pages
1026-------------------------------
1027
1028 Post analysis hook
1029
1030 Count TOP pages
1031
1032 Plugin requirements :
1033 None
1034
1035 Conf values needed :
1036 None
1037
1038 Output files :
1039 None
1040
1041 Statistics creation :
1042 None
1043
1044 Statistics update :
1045 month_stats:
1046 top_pages =>
1047 uri => count
1048
1049 Statistics deletion :
1050 None
1051
1052
1053plugins.pre_analysis.page_to_hit
1054--------------------------------
1055
1056 Pre analysis hook
1057 Change page into hit and hit into page into statistics
1058
1059 Plugin requirements :
1060 None
1061
1062 Conf values needed :
1063 page_to_hit_conf*
1064 hit_to_page_conf*
1065
1066 Output files :
1067 None
1068
1069 Statistics creation :
1070 None
1071
1072 Statistics update :
1073 visits :
1074 remote_addr =>
1075 is_page
1076
1077 Statistics deletion :
1078 None
1079
1080
1081plugins.pre_analysis.robots
1082---------------------------
1083
1084 Pre analysis hook
1085
1086 Filter robots
1087
1088 Plugin requirements :
1089 None
1090
1091 Conf values needed :
1092 page_to_hit_conf*
1093 hit_to_page_conf*
1094
1095 Output files :
1096 None
1097
1098 Statistics creation :
1099 None
1100
1101 Statistics update :
1102 visits :
1103 remote_addr =>
1104 robot
1105
1106 Statistics deletion :
1107 None
1108
1109

Archive Download this file

Branches

Tags