博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Java爬虫——网易云热评爬取
阅读量:6377 次
发布时间:2019-06-23

本文共 12310 字,大约阅读时间需要 41 分钟。

爬取目标网址 :  

需要爬取信息 :   网易云top13热评

 

使用之前的 HttpURLConnection 获取网页源码,经过分析发现,在源码中并没有热评信息

1 package bok; 2  3 import java.io.BufferedReader; 4 import java.io.InputStreamReader; 5 import java.net.HttpURLConnection; 6 import java.net.URL; 7  8 public class GC { 9     public static void main(String[] args) throws Exception{10         URL url = new URL("http://music.163.com/#/song?id=409649818") ;11         HttpURLConnection httpURLConnection = (HttpURLConnection)url.openConnection() ;12         String get = "" ;13         if(httpURLConnection.getResponseCode()==200){14             BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(httpURLConnection.getInputStream(),"UTF-8")) ;15             String read ;16             while(((read=bufferedReader.readLine()))!=null){17                 get+=read+="\r\n" ;18             }19             System.out.println(get);20         }21     }22 }

 

部分源码如下:

1 {/if}  2 {else}  3   4 {/if}  5   6   7   8   9 
10
11
  12
13
14 {var alia=songAlia(x)} 15 ${soil(x.name)}{if alia} - (${soil(alia)}){/if} 16 {if x.mvid>0} 17 MV 18 {/if} 19 20
21
22
23 24 {/if} 25 26 ${dur2time(x.duration/1000)}{if x.ftype==2}{/if} 27
28
33
34
分享 35
36 {if canDel} 37
删除 38 {/if} 39
40 41 42
43 ${getArtistName(x.artists, '', '', false, false, true)} 44
45 46 47 {/list} 48 49 50 51 110 152 220 236 253 254 255 256 276
View Code

获取的源码中既然没有热评信息

只有通过 F12 -> NetWork 分析网络请求

 

可以发现

有关热评信息的请求是http://music.163.com/weapi/v1/resource/comments/R_SO_4_409649818?csrf_token=

 

 

409649818 是歌曲ID  

 

 

且表单数据与歌曲无关,是一段关于本机Cookie的信息,所以只需要一种表单数据,即可用来实现不同歌曲的请求

 

基本代码如下:

1 package 网易云热评爬取; 2  3 import org.apache.http.HttpEntity; 4 import org.apache.http.NameValuePair; 5 import org.apache.http.client.entity.UrlEncodedFormEntity; 6 import org.apache.http.client.methods.CloseableHttpResponse; 7 import org.apache.http.client.methods.HttpGet; 8 import org.apache.http.client.methods.HttpPost; 9 import org.apache.http.impl.client.CloseableHttpClient;10 import org.apache.http.impl.client.HttpClients;11 import org.apache.http.message.BasicNameValuePair;12 import org.apache.http.util.EntityUtils;13 import java.util.ArrayList;14 import java.util.List;15 import java.util.regex.Matcher;16 import java.util.regex.Pattern;17 18 public class MyClawer {19     public static void printHot(String u) throws Exception{20         CloseableHttpClient closeableHttpClient = HttpClients.createDefault() ;21         HttpPost httpPost = new HttpPost(u) ;22         httpPost.setHeader("User-Agent","Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36");23 24         List
list=new ArrayList
();25 list.add(new BasicNameValuePair("params","RlBC7U1bfy/boPwg9ag7/a7AjkQOgsIfd+vsUjoMY2tyQCPFgnNoxHeCY+ZuHYqtM1zF8DWIBwJWbsCOQ6ZYxBiPE3bk+CI1U6Htoc4P9REBePlaiuzU4M3rDAxtMfNN3y0eimeq3LVo28UoarXs2VMWkCqoTXSi5zgKEKbxB7CmlBJAP9pn1aC+e3+VOTr0"));26 list.add(new BasicNameValuePair("encSecKey","76a0d8ff9f6914d4f59be6b3e1f5d1fc3998317195464f00ee704149bc6672c587cd4a37471e3a777cb283a971d6b9205ce4a7187e682bdaefc0f225fb9ed1319f612243096823ddec88b6d6ea18f3fec883d2489d5a1d81cb5dbd0602981e7b49db5543b3d9edb48950e113f3627db3ac61cbc71d811889d68ff95d0eba04e9"));27 28 httpPost.setEntity(new UrlEncodedFormEntity(list));29 CloseableHttpResponse response=closeableHttpClient.execute(httpPost);30 31 HttpEntity entity=response.getEntity();32 String ux = EntityUtils.toString(entity,"utf-8") ;33 //System.out.println(ux);34 ArrayList
s= getBook(ux);35 36 for(int i=0;i
arrayList = new ArrayList
() ;48 49 String con = "content(.*?)\"}" ;50 Pattern ah = Pattern.compile(con);51 Matcher mr = ah.matcher(read);52 while(mr.find()) {53 if (!arrayList.contains(mr.group())) {54 arrayList.add(mr.group());55 }56 }57 return arrayList ;58 }59 }

 

运行结果:

 

转载地址:http://usxqa.baihongyu.com/

你可能感兴趣的文章
vue.js入门学习
查看>>
第8件事 3步打造产品的独特气质
查看>>
debug-stripped.ap_' specified for property 'resourceFile' does not exist
查看>>
利用MapReduce计算平均数
查看>>
scala-05-map映射
查看>>
Spring Boot - how to configure port
查看>>
右键添加复制路径选项
查看>>
DocFetcher 本机文件搜索工具
查看>>
ambassador 学习三 限速处理
查看>>
HTTP传输编码增加了传输量,只为解决这一个问题 | 实用 HTTP
查看>>
数据结构:最小生成树--Kruskal算法
查看>>
Swift_1_基本数据类型
查看>>
深入解析Vuex实战总结
查看>>
流水落花春去也
查看>>
【教训】为什么不作备份?!
查看>>
ThinkPHP3.0启动过程
查看>>
JAX-WS(JWS)发布WebService
查看>>
Centos7安装docker-compse踩过的坑
查看>>
细说Nullable<T>类型
查看>>
oracle 插入表数据的4种方式
查看>>