|
因为公司网络有代理服务器,通过scrapy爬取或跳转到一个警告的页面,通过和公司IT的沟通了解,我们公司有NTML,需要三次握手后才能访问。
我也百度了很多信息,自己也尝试了很多,但是依旧失败。附上我中间件的信息。
def process_request(self, request, spider):
# Called for each request that goes through the downloader
# middleware.
# Must either:
# - return None: continue processing this request
# - or return a Response object
# - or return a Request object
# - or raise IgnoreRequest: process_exception() methods of
# installed downloader middleware will be called
根据网上教程写的,但是发现不行。
# url = request.url
# # usr = getattr(spider, 'http_usr', '')
# # pwd = getattr(spider, 'http_pass','')
# s = requests.session()
# response = s.get(url, auth=HttpNtlmAuth(usr,pwd))
# return HtmlResponse(url,response.status_code, response.headers.iteritems(), response.content)
spider.browser.get(request.url)
for i in range(5):
spider.browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
return HtmlResponse(url = spider.browser.current_url, body = spider.browser.page_source,encoding="utf-8", request=request)
|
|