通过python 从url 中取出域名

作者: print("") 分类: python 发布时间: 2018-06-07 13:39

通过python 从url 中取出域名 

文件如下:

webshell.txt

http://img.machine365.com/bhvpz13711.asp;.jpg
http://www.cdsrf.com.cn/upfile/image/diaosi.asp/bhvpz13711.jpg
http://www.tairoa.org.tw/uploadfiles/image/diaosi.asp/bhvpz13711.jpg
http://www.cnjing.com.cn/plus/mytag_js.php?aid=9527
http://www.bngs.sc.cn/yqcqt70388.asp;.jpg
http://img.machine365.com/yqcqt70388.asp;.jpg
http://www.sanjinhongyuntong.com/plus/mytag_js.php?aid=9527
http://www.feipin001.com/yqcqt70388.asp;.jpg

代码如下:(python2 的)

import urllib
import re
from urlparse import *
f=open('webshell.txt','r')
topHostPostfix = (
    '.com','.la','.io','.co','.info','.net','.org','.me','.mobi',
    '.us','.biz','.xxx','.ca','.co.jp','.com.cn','.net.cn',
    '.org.cn','.mx','.tv','.ws','.ag','.com.ag','.net.ag',
    '.org.ag','.am','.asia','.at','.be','.com.br','.net.br',
    '.bz','.com.bz','.net.bz','.cc','.com.co','.net.co',
    '.nom.co','.de','.es','.com.es','.nom.es','.org.es',
    '.eu','.fm','.fr','.gs','.in','.co.in','.firm.in','.gen.in',
    '.ind.in','.net.in','.org.in','.it','.jobs','.jp','.ms',
    '.com.mx','.nl','.nu','.co.nz','.net.nz','.org.nz',
    '.se','.tc','.tk','.tw','.com.tw','.idv.tw','.org.tw',
    '.hk','.co.uk','.me.uk','.org.uk','.vg', ".com.hk")
for i in f:
    urls=i.strip().split()
    # urls2=str(urls)
    # print(urls2)
    regx = r'[^\.]+('+'|'.join([h.replace('.',r'\.') for h in topHostPostfix])+')$'
    pattern = re.compile(regx,re.IGNORECASE)
    for url in urls:
        parts = urlparse(url)
        host = parts.netloc
        m = pattern.search(host)
        res = m.group() if m else host
        print "unkonw" if not res else res

执行的结果

machine365.com
cdsrf.com.cn
tairoa.org.tw
cnjing.com.cn
www.bngs.sc.cn
machine365.com
sanjinhongyuntong.com
feipin001.com

唔,搞定

如果觉得我的文章对您有用,请随意打赏。您的支持将鼓励我继续创作!

说点什么

avatar
  Subscribe  
提醒