如何检测获取到的代理ip是否为高匿ip

方法一

通过第三方接口检测

1
2
url = "http://httpbin.org/ip"  # 如果返回的ip里面有本机ip,则证明不是匿名代理
url = "http://httpbin.org/get?show_env=1" # "origin"对应的value如果是你的本机ip,则不是匿名代理

示例:

1
2
3
4
5
# 访问http://httpbin.org/ip 返回

{
"origin": "117.136.0.213"
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 访问http://httpbin.org/get?show_env=1 返回

{
"args": {
"show_env": "1"
},
"headers": {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "zh-CN,zh;q=0.9",
"Alexatoolbar-Alx-Ns-Ph": "AlexaToolbar/alx-4.0.3",
"Host": "httpbin.org",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
"X-Amzn-Trace-Id": "Root=1-5f5ebcfe-2047fe040ea2d1e594e5b6e0",
"X-Forwarded-For": "117.136.0.213",
"X-Forwarded-Port": "80",
"X-Forwarded-Proto": "http"
},
"origin": "117.136.0.213",
"url": "http://httpbin.org/get?show_env=1"
}

附上网站源码:https://github.com/postmanlabs/httpbin

方法二

利用某些禁止代理访问的网站进行代理访问测试,比如孔夫子旧书网

步骤:

第一步 先对孔夫子进行高频访问(不使用代理), 直到无法访问为止(说明本机ip已经被禁止)

第二步 换成代理进行访问 如果可以正常访问 则为高匿代理

代码实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
import requests
import time
from threading import Thread


def get_gao_ni_ip(ip, file_save):
url = "http://book.kongfz.com/175804/1038155437/"
headers = {
'Cookie': 'PHPSESSID=0d12c303a92043f13a3cc2c329e444f36b44ef71; shoppingCartSessionId=74c831996eb9a1009d79244d7d915040; kfz_uuid=f53edd56-8938-48af-a447-9a07bde47ffa; reciever_area=1006000000; Hm_lvt_bca7840de7b518b3c5e6c6d73ca2662c=1552367977; Hm_lvt_33be6c04e0febc7531a1315c9594b136=1552367977; kfz_trace=f53edd56-8938-48af-a447-9a07bde47ffa|10072231|834871367e51d410|-; acw_tc=65c86a0a15523697386136416e812159c1e7ce1072aea90b9eb27c93ee05cc; BIGipServerpool_nxtqzj=527099402.24615.0000; Hm_lpvt_bca7840de7b518b3c5e6c6d73ca2662c=1552371456; Hm_lpvt_33be6c04e0febc7531a1315c9594b136=1552371456',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36'
}
proxies = {
"http": "http://" + ip,
}
try:
resp = requests.get(url=url, headers=headers, proxies=proxies, timeout=2)
# print(resp.text)
if "胡适传论,上" \
"下。_胡明_孔夫子旧书网" in resp.text:
print("ip可用",ip)
file_save.write(ip + '\n')
else:
print("errer")
except:
pass


if __name__ == '__main__':
start_time = time.time()
# 储存可用的ip地址
file_save = open(r'D:\')

with open(r'D:\', 'r',
encoding='utf-8') as file_ips:
ips_list = file_ips.readlines()

thread_list = []
total_num = 0
for ip_one in set(ips_list):
# 前面携带http的
ip = ip_one.strip()
# 直接ip+port的
thred_ip = Thread(target=get_gao_ni_ip, args=[ip, file_save])
thread_list.append(thred_ip)
thred_ip.start()
total_num += 1
print(total_num, total_num)
# 为了是电脑CPU不至于很卡
time.sleep(0.005)
for i in thread_list:
i.join()
file_save.close()
end_time = time.time()
print((end_time - start_time), '秒')

方法三

利用百度搜索关键字 IP,出来的网站的结果,进行爬取,然后利用自己电脑本机ip和request请求响应回来的代理ip进行比较,如果不一样,说明请求的代理IP就是高匿的。

本文为作者原创 转载时请注明出处 谢谢

乱码三千 – 点滴积累 ,欢迎来到乱码三千技术博客站

0%