使用python爬虫实现子域名探测问题丨本际云推荐

使用python爬虫实现子域名探测问题

Python爬虫实现子域名探测

大家都知道，在Python当中，需要面对是各种各样的问题，比如我们需要用到的是使用Python爬虫实现子域名探测，这种技能是值得我们去进行学习的，但是学习的话，内容还是比较多的，下面就具体的内容，给大家做出一个详细解答。

使用python爬虫实现子域名探测问题

实现方法

爬虫

1. ip138


def search_2(domain):
    res_list=[]
    headers={
        'Accept':'*/*',
        'Accept-Language':'en-US,en;q=0.8',
        'Cache-Control':'max-age=0',
        'User-Agent':'Mozilla/5.0(X11;Linux x86_64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/48.0.2564.116 Safari/537.36',
        'Connection':'keep-alive',
        'Referer':'http://www.baidu.com/'
    }
    results=requests.get('https://site.ip138.com/'+domain+'/domain.htm',headers=headers)
    soup=BeautifulSoup(results.content,'html.parser')
    job_bt=soup.findAll('p')
    try:
        for i in job_bt:
            link=i.a.get('href')
            linkk=link[1:-1]
            res_list.append(linkk)
            print(linkk)
    except:
        pass
    print(res_list[:-1])

if __name__=='__main__':
    search_2("jd.com")

返回结果：

2. bing


def search_1(site):
    Subdomain=[]
    headers={
        'Accept':'*/*',
        'Accept-Language':'en-US,en;q=0.8',
        'Cache-Control':'max-age=0',
        'User-Agent':'Mozilla/5.0(X11;Linux x86_64)AppleWebKit/537.36(KHTML,like Gecko)Chrome/48.0.2564.116 Safari/537.36',
        'Connection':'keep-alive',
        'Referer':'http://www.baidu.com/'
    }
    for i in range(1,16):
        url="https://cn.bing.com/search?q=site%3A"+site+"&go=Search&qs=ds&first="+str((int(i)-1)*10)+"&FORM=PERE"
        html=requests.get(url,stream=True,headers=headers)
        soup=BeautifulSoup(html.content,'html.parser')
        job_bt=soup.findAll('h2')
        for i in job_bt:
            link=i.a.get('href')
            print(link)
            if link in Subdomain:
                pass
            else:
                Subdomain.append(link)
        print(Subdomain)
    if __name__=='__main__':
        search_1("jd.com")

返回结果：

通过字典进行子域名爆破


def dict(url):
    for dict in open('dic.txt'):#这里用到子域名字典文件dic.txt
        dict=dict.replace('\n',"")
        zym_url=dict+"."+url
        try:
            ip=socket.gethostbyname(zym_url)
            print(zym_url+"-->"+ip)
            time.sleep(0.1)
        except Exception as e:
            time.sleep(0.1)

if __name__=='__main__':
    dict("jd.com")

Python爬虫操作步骤

1.写出请求头headers与目标网站url


headers={
    'User-Agent':"Mozilla/5.0(Windows NT 10.0)AppleWebKit/537.36(KHTML,like Gecko)Chrome/42.0.2311.135 Safari/537.36 Edge/12.10240"
}
url="https://site.ip138.com/"

2.生成请求


# get
res=requests.get(url+domain,headers=headers)
# post
res=requests.post(url+domain,headers=headers,data=data)

3.抓取数据


soup=BeautifulSoup(res.content,'html.parser') # 以html解析器解析res的内容
print(soup) # 返回结果

4.分析源码，截取标签中内容

通过分析源码，确定需要提取p标签中的内容：


job_bt=soup.findAll('p')
print(job_bt) # 返回结果

继续提取a标签内属性为href的值：


try:
    for i in job_bt:
        link=i.a.get('href')
        linkk=link[1:-1]
        res_list.append(linkk)
        print(linkk)
except:
    pass

再进行截取：


res_list[:-1]

爬虫一些总结

抓取数据，生成soup：


soup=BeautifulSoup(res.content,'html.parser') # 以html解析器解析res的内容

从文档中获取所有文字内容：


print(soup.get_text())

从文档中找到所有a标签的链接：


for link in soup.find_all('a'):
    print(link.get('href'))

综上所述，这篇文章就给大家介绍到这里了，希望可以给大家带来更多的帮助。

原创文章，作者：小编小本本，如若转载，请注明出处：https://www.benjiyun.com/yunzhujiyunwei/vps-yunwei/7247.html

使用python爬虫实现子域名探测问题

Python爬虫实现子域名探测

实现方法

爬虫

通过字典进行子域名爆破

Python爬虫操作步骤

爬虫一些总结

相关推荐