如何使用scrapy中的CrawlSpider单击带有javascript onc-前端问题

How to use CrawlSpider from scrapy to click a link with javascript onclick?(如何使用scrapy中的CrawlSpider单击带有javascript onclick的链接?)

本文介绍了如何使用scrapy中的CrawlSpider单击带有javascript onclick的链接?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想让 scrapy 抓取下一个链接如下所示的页面:

I want scrapy to crawl pages where going on to the next link looks like this:

<a href="#" onclick="return gotoPage('2');"> Next </a>

scrapy 是否能够解释其中的 javascript 代码?

Will scrapy be able to interpret javascript code of that?

通过 livehttpheaders 扩展，我发现单击 Next 会生成一个 POST，其中包含一个非常大的垃圾"，如下所示:

With livehttpheaders extension I found out that clicking Next generates a POST with a really huge piece of "garbage" starting like this:

encoded_session_hidden_map=H4sIAAAAAAAAALWZXWwj1RXHJ9n

我正在尝试在 CrawlSpider 类上构建我的蜘蛛，但我无法真正弄清楚如何对其进行编码，使用 BaseSpider 我使用了 parse() 方法来处理第一个 URL，它恰好是一个登录表单，我在那里做了一个 POST:

I am trying to build my spider on the CrawlSpider class, but I can't really figure out how to code it, with BaseSpider I used the parse() method to process the first URL, which happens to be a login form, where I did a POST with:

def logon(self, response):
    login_form_data={ 'email': 'user@example.com', 'password': 'mypass22', 'action': 'sign-in' }
    return [FormRequest.from_response(response, formnumber=0, formdata=login_form_data, callback=self.submit_next)]

然后我定义了 submit_next() 来告诉下一步该做什么.我不知道如何告诉 CrawlSpider 在第一个 URL 上使用哪种方法?

And then I defined submit_next() to tell what to do next. I can't figure out how do I tell CrawlSpider which method to use on the first URL?

我爬取的所有请求，除了第一个，都是 POST 请求.它们交替使用两种类型的请求:粘贴一些数据，然后单击下一步"进入下一页.

All requests in my crawling, except the first one, are POST requests. They are alternating two types of requests: pasting some data, and clicking "Next" to go to the next page.

问题描述

推荐答案