iwla

iwla Commit Details

Date:2021-06-03 08:52:04 (1 month 26 days ago)
Author:Grégory Soutadé
Branch:dev
Commit:0c2ac431d129539d2885cd799a74e02bab0d9e30
Parents: 4cd7712201f35ccdc2cfe896f0d61b66ff6a598d
Message:Be more strict with robots : requires at least 1 hit per viewed page

Changes:
Mplugins/pre_analysis/robots.py (2 diffs)

File differences

plugins/pre_analysis/robots.py
108108
109109
110110
111
112
111
112
113113
114114
115115
......
118118
119119
120120
121
122
123
124
125
126
127121
128122
129123
# super_hit['robot'] = 1
# continue
# 2) pages without hit --> robot
if not super_hit['viewed_hits'][0] and super_hit['viewed_pages'][0]:
# 2) Less than 1 hit per page
if super_hit['viewed_pages'][0] and (super_hit['viewed_hits'][0] < super_hit['viewed_pages'][0]):
self._setRobot(k, super_hit)
continue
self._setRobot(k, super_hit)
continue
# 4) pages without hit --> robot
if not super_hit['viewed_hits'][0] and super_hit['viewed_pages'][0]:
self.logger.debug(super_hit)
self._setRobot(k, super_hit)
continue
not_found_pages = 0
for hit in super_hit['requests']:
# 5) /robots.txt read

Archive Download the corresponding diff file

Branches

Tags