本文共 732 字,大约阅读时间需要 2 分钟。
Rendered/interactive javascript with gtk/webkit/jswebkit
1 2 3 4 5 6 7 8 910111213141516 | from scrapy.http import Request, FormRequest, HtmlResponseimport gtkimport webkitimport jswebkitclass WebkitDownloader( object ): def process_request( self, request, spider ): if( type(request) is not FormRequest ): webview = webkit.WebView() webview.connect( 'load-finished', lambda v,f: gtk.main_quit() ) webview.load_uri( request.url ) gtk.main() js = jswebkit.JSContext( webview.get_main_frame().get_global_context() ) renderedBody = str( js.EvaluateScript( 'document.documentElement.innerHTML' ) ) return HtmlResponse( request.url, body=renderedBody ) |
转载地址:http://hshgx.baihongyu.com/