iwla

iwla Git Source Tree

Root/docs/index.md

1iwla
2====
3
4Introduction
5------------
6
7iwla (Intelligent Web Log Analyzer) is basically a clone of [awstats](http://www.awstats.org). The main problem with awstats is that it's a very monolithic project with everything in one big PERL file. In opposite, iwla has been though to be very modular : a small core analysis and a lot of filters. It can be viewed as UNIX pipes. Philosophy of iwla is : add, update, delete ! That's the job of each filter : modify statistics until final result. It's written in Python.
8
9Nevertheless, iwla is only focused on HTTP logs. It uses data (robots definitions, search engines definitions) and design from awstats. Moreover, it's not dynamic, but only generates static HTML page (with gzip compression option).
10
11Usage
12-----
13
14 ./iwla [-c|--clean-output] [-i|--stdin] [-f FILE|--file FILE] [-d LOGLEVEL|--log-level LOGLEVEL] [-r|--reset year/month] [-z|--dont-compress] [-p] [-D|--dry-run]
15
16 -c : Clean output (database and HTML) before starting
17 -i : Read data from stdin instead of conf.analyzed_filename
18 -f : Analyse this log file, multiple files can be specified (comma separated). gz files are acceptedRead data from FILE instead of conf.analyzed_filename
19 -d : Loglevel in ['DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL']
20 -r : Reset analysis to a specific date (month/year)
21 -z : Don't compress databases (bigger but faster, not compatible with compressed databases)
22 -p : Only generate display
23 -d : Dry run (don't write/update files to disk)
24
25Basic usage
26-----------
27
28In addition to command line, iwla read parameters in default_conf.py. User can override default values using _conf.py_ file. Each module requires its own parameters.
29
30Main values to edit are :
31
32 * **analyzed_filename** : web server log
33 * **domaine_name** : domain name to filter
34 * **pre_analysis_hooks** : List of pre analysis hooks
35 * **post_analysis_hooks** : List of post analysis hooks
36 * **display_hooks** : List of display hooks
37 * **locale** : Displayed locale (_en_ or _fr_)
38
39You can also append an element to an existing default configuration list by using "_append" suffix. Example :
40 multimedia_files_append = ['xml']
41or
42 multimedia_files_append = 'xml'
43Will append 'xml' to current multimedia_files list
44
45Then, you can launch iwla. Output HTML files are created in _output_ directory by default. To quickly see it, go into _output_ and type
46
47 python -m SimpleHTTPServer 8000
48
49Open your favorite web browser at _http://localhost:8000_. Enjoy !
50
51**Warning** : The order in hooks list is important : Some plugins may requires others plugins, and the order of display_hooks is the order of displayed blocks in final result.
52
53
54Interesting default configuration values
55----------------------------------------
56
57 * **DB_ROOT** : Default database directory (default ./output_db)
58 * **DISPLAY_ROOT** : Default HTML output directory (default _./output_)
59 * **log_format** : Web server log format (nginx style). Default is apache log format
60 * **time_format** : Time format used in log format
61 * **pages_extensions** : Extensions that are considered as a HTML page (or result) in opposit to hits
62 * **viewed_http_codes** : HTTP codes that are cosidered OK (200, 304)
63 * **count_hit_only_visitors** : If False, don't count visitors that doesn't GET a page but resources only (images, rss...)
64 * **multimedia_files** : Multimedia extensions (not accounted as downloaded files)
65 * **css_path** : CSS path (you can add yours)
66 * **compress_output_files** : Files extensions to compress in gzip during display build
67
68Plugins
69-------
70
71As previously described, plugins acts like UNIX pipes : statistics are constantly updated by each plugin to produce final result. We have three type of plugins :
72
73 * **Pre analysis plugins** : Called before generating days statistics. They are in charge to filter robots, crawlers, bad pages...
74 * **Post analysis plugins** : Called after basic statistics computation. They are in charge to enlight them with their own algorithms
75 * **Display plugins** : They are in charge to produce HTML files from statistics.
76
77To use plugins, just insert their file name (without _.py_ extension) in _pre_analysis_hooks_, _post_analysis_hooks_ and _display_hooks_ lists in conf.py.
78
79Statistics are stored in dictionaries :
80
81 * **month_stats** : Statistics of current analysed month
82 * **valid_visitor** : A subset of month_stats without robots
83 * **days_stats** : Statistics of current analysed day
84 * **visits** : All visitors with all of its requests
85 * **meta** : Final result of month statistics (by year)
86
87Create a Plugins
88----------------
89
90To create a new plugin, it's necessary to subclass IPlugin (_iplugin.py) in the right directory (_plugins/xxx/yourPlugin.py_).
91
92Plugins can defines required configuration values (self.conf_requires) that must be set in conf.py (or can be optional). They can also defines required plugins (self.requires).
93
94The two functions to overload are _load(self)_ that must returns True or False if all is good (or not). It's called after _init_. The second is _hook(self)_ that is the body of plugins.
95
96For display plugins, a lot of code has been wrote in _display.py_ that simplify the creation on HTML blocks, tables and bar graphs.
97
98Plugins
99=======
100
101Optional configuration values ends with *.
102
103 * iwla.py
104 * plugins/display/all_visits.py
105 * plugins/display/browsers.py
106 * plugins/display/feeds.py
107 * plugins/display/hours_stats.py
108 * plugins/display/ip_to_geo.py
109 * plugins/display/istats_diff.py
110 * plugins/display/operating_systems.py
111 * plugins/display/referers_diff.py
112 * plugins/display/referers.py
113 * plugins/display/robot_bandwidth.py
114 * plugins/display/top_downloads_diff.py
115 * plugins/display/top_downloads.py
116 * plugins/display/top_hits.py
117 * plugins/display/top_pages_diff.py
118 * plugins/display/top_pages.py
119 * plugins/display/top_visitors.py
120 * plugins/display/track_users.py
121 * plugins/post_analysis/browsers.py
122 * plugins/post_analysis/feeds.py
123 * plugins/post_analysis/hours_stats.py
124 * plugins/post_analysis/ip_to_geo.py
125 * plugins/post_analysis/iptogeo.py
126 * plugins/post_analysis/operating_systems.py
127 * plugins/post_analysis/referers.py
128 * plugins/post_analysis/reverse_dns.py
129 * plugins/post_analysis/top_downloads.py
130 * plugins/post_analysis/top_hits.py
131 * plugins/post_analysis/top_pages.py
132 * plugins/pre_analysis/page_to_hit.py
133 * plugins/pre_analysis/robots.py
134
135
136iwla
137----
138
139 Main class IWLA
140 Parse Log, compute them, call plugins and produce output
141 For now, only HTTP log are valid
142
143 Plugin requirements :
144 None
145
146 Conf values needed :
147 analyzed_filename
148 domain_name
149 locales_path
150 compress_output_files
151 excluded_ip
152
153 Output files :
154 DB_ROOT/meta.db
155 DB_ROOT/year/month/iwla.db
156 OUTPUT_ROOT/index.html
157 OUTPUT_ROOT/year/_stats.html
158 OUTPUT_ROOT/year/month/index.html
159
160 Statistics creation :
161
162 meta :
163 last_time
164 start_analysis_time
165 stats =>
166 year =>
167 month =>
168 viewed_bandwidth
169 not_viewed_bandwidth
170 viewed_pages
171 viewed_hits
172 nb_visits
173 nb_visitors
174
175 month_stats :
176 viewed_bandwidth
177 not_viewed_bandwidth
178 viewed_pages
179 viewed_hits
180 nb_visits
181
182 days_stats :
183 day =>
184 viewed_bandwidth
185 not_viewed_bandwidth
186 viewed_pages
187 viewed_hits
188 nb_visits
189 nb_visitors
190
191 visits :
192 remote_addr =>
193 remote_addr
194 remote_ip
195 viewed_pages{0..31} # 0 contains total
196 viewed_hits{0..31} # 0 contains total
197 not_viewed_pages{0..31}
198 not_viewed_hits{0..31}
199 bandwidth{0..31}
200 last_access
201 requests =>
202 [fields_from_format_log]
203 extract_request =>
204 http_method
205 http_uri
206 http_version
207 extract_uri
208 extract_parameters*
209 extract_referer* =>
210 extract_uri
211 extract_parameters*
212 robot
213 hit_only
214 is_page
215
216 valid_visitors:
217 month_stats without robot and hit only visitors (if not conf.count_hit_only_visitors)
218
219 Statistics update :
220 None
221
222 Statistics deletion :
223 None
224
225
226plugins.display.all_visits
227--------------------------
228
229 Display hook
230
231 Create All visits page
232
233 Plugin requirements :
234 None
235
236 Conf values needed :
237 display_visitor_ip*
238
239 Output files :
240 OUTPUT_ROOT/year/month/all_visits.html
241 OUTPUT_ROOT/year/month/index.html
242
243 Statistics creation :
244 None
245
246 Statistics update :
247 None
248
249 Statistics deletion :
250 None
251
252
253plugins.display.browsers
254------------------------
255
256 Display hook
257
258 Create browsers page
259
260 Plugin requirements :
261 post_analysis/browsers
262
263 Conf values needed :
264 max_browsers_displayed*
265 create_browsers_page*
266
267 Output files :
268 OUTPUT_ROOT/year/month/browsers.html
269 OUTPUT_ROOT/year/month/index.html
270
271 Statistics creation :
272 None
273
274 Statistics update :
275 None
276
277 Statistics deletion :
278 None
279
280
281plugins.display.feeds
282---------------------
283
284 Display hook
285
286 Display feeds parsers
287
288 Plugin requirements :
289 post_analysis/feeds
290
291 Conf values needed :
292 create_all_feeds_page*
293
294 Output files :
295 OUTPUT_ROOT/year/month/index.html
296 OUTPUT_ROOT/year/month/all_feeds.html
297
298 Statistics creation :
299 None
300
301 Statistics update :
302 None
303
304 Statistics deletion :
305 None
306
307
308plugins.display.hours_stats
309---------------------------
310
311 Display hook
312
313 Display statistics by hour/week day
314
315 Plugin requirements :
316 post_analysis/hours_stats
317
318 Conf values needed :
319 None
320
321 Output files :
322 OUTPUT_ROOT/year/month/index.html
323
324 Statistics creation :
325 None
326
327 Statistics update :
328 None
329
330 Statistics deletion :
331 None
332
333
334plugins.display.ip_to_geo
335-------------------------
336
337 Display hook
338
339 Add geo statistics
340
341 Plugin requirements :
342 post_analysis/ip_to_geo
343
344 Conf values needed :
345 create_geo_page*
346
347 Output files :
348 OUTPUT_ROOT/year/month/index.html
349
350 Statistics creation :
351 None
352
353 Statistics update :
354 None
355
356 Statistics deletion :
357 None
358
359
360plugins.display.istats_diff
361---------------------------
362
363 Display hook interface
364
365 Enlight new and updated statistics
366
367 Plugin requirements :
368 None
369
370 Conf values needed :
371 None
372
373 Output files :
374 None
375
376 Statistics creation :
377 None
378
379 Statistics update :
380 None
381
382 Statistics deletion :
383 None
384
385
386plugins.display.operating_systems
387---------------------------------
388
389 Display hook
390
391 Add operating systems statistics
392
393 Plugin requirements :
394 post_analysis/operating_systems
395
396 Conf values needed :
397 create_families_page*
398
399 Output files :
400 OUTPUT_ROOT/year/month/index.html
401
402 Statistics creation :
403 None
404
405 Statistics update :
406 None
407
408 Statistics deletion :
409 None
410
411
412plugins.display.referers_diff
413-----------------------------
414
415 Display hook
416
417 Enlight new and updated key phrases in in all_key_phrases.html
418
419 Plugin requirements :
420 display/referers
421
422 Conf values needed :
423 None
424
425 Output files :
426 None
427
428 Statistics creation :
429 None
430
431 Statistics update :
432 None
433
434 Statistics deletion :
435 None
436
437
438plugins.display.referers
439------------------------
440
441 Display hook
442
443 Create Referers page
444
445 Plugin requirements :
446 post_analysis/referers
447
448 Conf values needed :
449 max_referers_displayed*
450 create_all_referers_page*
451 max_key_phrases_displayed*
452 create_all_key_phrases_page*
453
454 Output files :
455 OUTPUT_ROOT/year/month/referers.html
456 OUTPUT_ROOT/year/month/key_phrases.html
457 OUTPUT_ROOT/year/month/index.html
458
459 Statistics creation :
460 None
461
462 Statistics update :
463 None
464
465 Statistics deletion :
466 None
467
468
469plugins.display.robot_bandwidth
470-------------------------------
471
472 Display hook
473
474 Display top 10 robot bandwidth use
475
476 Plugin requirements :
477 None
478
479 Conf values needed :
480 display_visitor_ip*
481 create_all_robot_bandwidth_page*
482
483 Output files :
484 OUTPUT_ROOT/year/month/top_robots_bandwidth.html
485 OUTPUT_ROOT/year/month/index.html
486
487 Statistics creation :
488 None
489
490 Statistics update :
491 None
492
493 Statistics deletion :
494 None
495
496
497plugins.display.top_downloads_diff
498----------------------------------
499
500 Display hook
501
502 Enlight new and updated downloads in in top_downloads.html
503
504 Plugin requirements :
505 display/top_downloads
506
507 Conf values needed :
508 None
509
510 Output files :
511 None
512
513 Statistics creation :
514 None
515
516 Statistics update :
517 None
518
519 Statistics deletion :
520 None
521
522
523plugins.display.top_downloads
524-----------------------------
525
526 Display hook
527
528 Create TOP downloads page
529
530 Plugin requirements :
531 post_analysis/top_downloads
532
533 Conf values needed :
534 max_downloads_displayed*
535 create_all_downloads_page*
536
537 Output files :
538 OUTPUT_ROOT/year/month/top_downloads.html
539 OUTPUT_ROOT/year/month/index.html
540
541 Statistics creation :
542 None
543
544 Statistics update :
545 None
546
547 Statistics deletion :
548 None
549
550
551plugins.display.top_hits
552------------------------
553
554 Display hook
555
556 Create TOP hits page
557
558 Plugin requirements :
559 post_analysis/top_hits
560
561 Conf values needed :
562 max_hits_displayed*
563 create_all_hits_page*
564
565 Output files :
566 OUTPUT_ROOT/year/month/top_hits.html
567 OUTPUT_ROOT/year/month/index.html
568
569 Statistics creation :
570 None
571
572 Statistics update :
573 None
574
575 Statistics deletion :
576 None
577
578
579plugins.display.top_pages_diff
580------------------------------
581
582 Display hook
583
584 Enlight new and updated pages in in top_pages.html
585
586 Plugin requirements :
587 display/top_pages
588
589 Conf values needed :
590 None
591
592 Output files :
593 None
594
595 Statistics creation :
596 None
597
598 Statistics update :
599 None
600
601 Statistics deletion :
602 None
603
604
605plugins.display.top_pages
606-------------------------
607
608 Display hook
609
610 Create TOP pages page
611
612 Plugin requirements :
613 post_analysis/top_pages
614
615 Conf values needed :
616 max_pages_displayed*
617 create_all_pages_page*
618
619 Output files :
620 OUTPUT_ROOT/year/month/top_pages.html
621 OUTPUT_ROOT/year/month/index.html
622
623 Statistics creation :
624 None
625
626 Statistics update :
627 None
628
629 Statistics deletion :
630 None
631
632
633plugins.display.top_visitors
634----------------------------
635
636 Display hook
637
638 Create TOP visitors block
639
640 Plugin requirements :
641 None
642
643 Conf values needed :
644 display_visitor_ip*
645
646 Output files :
647 OUTPUT_ROOT/year/month/index.html
648
649 Statistics creation :
650 None
651
652 Statistics update :
653 None
654
655 Statistics deletion :
656 None
657
658
659plugins.display.track_users
660---------------------------
661
662 Display hook
663
664 Track users
665
666 Plugin requirements :
667 None
668
669 Conf values needed :
670 tracked_ip
671 create_tracked_page*
672
673 Output files :
674 OUTPUT_ROOT/year/month/index.html
675 OUTPUT_ROOT/year/month/tracked_users.html
676
677 Statistics creation :
678 None
679
680 Statistics update :
681 None
682
683 Statistics deletion :
684 None
685
686
687plugins.post_analysis.browsers
688------------------------------
689
690 Post analysis hook
691
692 Detect browser information from requests
693
694 Plugin requirements :
695 None
696
697 Conf values needed :
698 None
699
700 Output files :
701 None
702
703 Statistics creation :
704 visits :
705 remote_addr =>
706 browser
707
708 month_stats :
709 browsers =>
710 browser => count
711
712 Statistics update :
713 None
714
715 Statistics deletion :
716 None
717
718
719plugins.post_analysis.feeds
720---------------------------
721
722 Post analysis hook
723
724 Find feeds parsers (first hit in feeds conf value and no viewed pages if it's a robot)
725 If there is ony one hit per day to a feed, merge feeds parsers with the same user agent
726 as it must be the same person with a different IP address.
727
728 Plugin requirements :
729 None
730
731 Conf values needed :
732 feeds
733 merge_one_hit_only_feeds_parsers*
734
735 Output files :
736 None
737
738 Statistics creation :
739 remote_addr =>
740 feed_parser
741
742 Statistics update :
743 None
744
745 Statistics deletion :
746 None
747
748
749plugins.post_analysis.hours_stats
750---------------------------------
751
752 Post analysis hook
753
754 Count pages, hits and bandwidth by hour/week day
755
756 Plugin requirements :
757 None
758
759 Conf values needed :
760 None
761
762 Output files :
763 None
764
765 Statistics creation :
766 month_stats:
767 hours_stats =>
768 00 .. 23 =>
769 pages
770 hits
771 bandwidth
772
773 days_stats =>
774 0 .. 6 =>
775 pages
776 hits
777 bandwidth
778
779 Statistics update :
780 None
781
782 Statistics deletion :
783 None
784
785
786plugins.post_analysis.ip_to_geo
787-------------------------------
788
789 Post analysis hook
790
791 Get country code from IP address
792
793 Plugin requirements :
794 None
795
796 Conf values needed :
797 iptogeo_remote_addr*
798 iptogeo_remote_port*
799
800 Output files :
801 None
802
803 Statistics creation :
804 geo =>
805 country_code => count
806 None
807
808 Statistics update :
809 valid_visitors:
810 country_code
811
812 Statistics deletion :
813 None
814
815
816plugins.post_analysis.iptogeo
817-----------------------------
818
819
820
821plugins.post_analysis.operating_systems
822---------------------------------------
823
824 Post analysis hook
825
826 Detect operating systems from requests
827
828 Plugin requirements :
829 None
830
831 Conf values needed :
832 None
833
834 Output files :
835 None
836
837 Statistics creation :
838 visits :
839 remote_addr =>
840 operating_system
841
842 month_stats :
843 operating_systems =>
844 operating_system => count
845
846 os_families =>
847 family => count
848
849 Statistics update :
850 None
851
852 Statistics deletion :
853 None
854
855
856plugins.post_analysis.referers
857------------------------------
858
859 Post analysis hook
860
861 Extract referers and key phrases from requests
862
863 Plugin requirements :
864 None
865
866 Conf values needed :
867 domain_name
868
869 Output files :
870 None
871
872 Statistics creation :
873 None
874
875 Statistics update :
876 month_stats :
877 referers =>
878 pages => count
879 hits => count
880 robots_referers =>
881 pages => count
882 hits => count
883 search_engine_referers =>
884 pages => count
885 hits => count
886 key_phrases =>
887 phrase => count
888
889 Statistics deletion :
890 None
891
892
893plugins.post_analysis.reverse_dns
894---------------------------------
895
896 Post analysis hook
897
898 Replace IP by reverse DNS names
899
900 Plugin requirements :
901 None
902
903 Conf values needed :
904 reverse_dns_timeout*
905
906 Output files :
907 None
908
909 Statistics creation :
910 None
911
912 Statistics update :
913 valid_visitors:
914 remote_addr
915 dns_name_replaced
916 dns_analyzed
917
918 Statistics deletion :
919 None
920
921
922plugins.post_analysis.top_downloads
923-----------------------------------
924
925 Post analysis hook
926
927 Count TOP downloads
928
929 Plugin requirements :
930 None
931
932 Conf values needed :
933 None
934
935 Output files :
936 None
937
938 Statistics creation :
939 None
940
941 Statistics update :
942 month_stats:
943 top_downloads =>
944 uri => count
945
946 Statistics deletion :
947 None
948
949
950plugins.post_analysis.top_hits
951------------------------------
952
953 Post analysis hook
954
955 Count TOP hits
956
957 Plugin requirements :
958 None
959
960 Conf values needed :
961 None
962
963 Output files :
964 None
965
966 Statistics creation :
967 None
968
969 Statistics update :
970 month_stats:
971 top_hits =>
972 uri => count
973
974 Statistics deletion :
975 None
976
977
978plugins.post_analysis.top_pages
979-------------------------------
980
981 Post analysis hook
982
983 Count TOP pages
984
985 Plugin requirements :
986 None
987
988 Conf values needed :
989 None
990
991 Output files :
992 None
993
994 Statistics creation :
995 None
996
997 Statistics update :
998 month_stats:
999 top_pages =>
1000 uri => count
1001
1002 Statistics deletion :
1003 None
1004
1005
1006plugins.pre_analysis.page_to_hit
1007--------------------------------
1008
1009 Pre analysis hook
1010 Change page into hit and hit into page into statistics
1011
1012 Plugin requirements :
1013 None
1014
1015 Conf values needed :
1016 page_to_hit_conf*
1017 hit_to_page_conf*
1018
1019 Output files :
1020 None
1021
1022 Statistics creation :
1023 None
1024
1025 Statistics update :
1026 visits :
1027 remote_addr =>
1028 is_page
1029
1030 Statistics deletion :
1031 None
1032
1033
1034plugins.pre_analysis.robots
1035---------------------------
1036
1037 Pre analysis hook
1038
1039 Filter robots
1040
1041 Plugin requirements :
1042 None
1043
1044 Conf values needed :
1045 page_to_hit_conf*
1046 hit_to_page_conf*
1047
1048 Output files :
1049 None
1050
1051 Statistics creation :
1052 None
1053
1054 Statistics update :
1055 visits :
1056 remote_addr =>
1057 robot
1058
1059 Statistics deletion :
1060 None
1061
1062

Archive Download this file

Branches

Tags