偷偷摘套内射激情视频,久久精品99国产国产精,中文字幕无线乱码人妻,中文在线中文a,性爽19p

51CTO首頁(yè)

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

信創(chuàng)認(rèn)證

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考信創(chuàng)認(rèn)證華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫(kù)

在線學(xué)習(xí)

文章資源問(wèn)答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營(yíng)

鴻蒙開發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫(kù)

賬號(hào)設(shè)置退出

Python多線程下載有聲小說(shuō)

作者：巫峽 2012-12-25 11:39:20

開發(fā) 后端

這里不會(huì)漫無(wú)目的的取爬取一個(gè)網(wǎng)站的所有鏈接，而是給定一個(gè)小說(shuō)，比方說(shuō)我要下載小說(shuō)《童年》，我會(huì)在我聽評(píng)書網(wǎng)上找到該小說(shuō)的主頁(yè)然后用程序下載所有mp3音頻，具體做法見下面代碼。

有經(jīng)驗(yàn)的老鳥都(未婚的)會(huì)在公司附近租房，免受舟車勞頓之苦的同時(shí)節(jié)約了大把時(shí)間；也有些人出于某種原因需要每天披星戴月地游走于公司與家之間，很不幸俺就是這其中一員。由于家和公司離得比較遠(yuǎn)，我平時(shí)在公交車上的時(shí)間占據(jù)了工作時(shí)間段的1/4，再加上杭州一向有中國(guó)的拉斯維加斯之稱(堵城)，每每堵起來(lái)，哥都能想象自己成為變形金剛。這段漫長(zhǎng)時(shí)間我想作為每個(gè)程序猿來(lái)說(shuō)是無(wú)法忍受的，可是既然短時(shí)間無(wú)法改變生存的現(xiàn)狀，咱就好好利用這段時(shí)間吧。所以，我特地買了大屏幕的Note II 以便看pdf，另外耳朵也不能閑著，不過(guò)咱不是聽英語(yǔ)而是聽小說(shuō)，我在讀書的時(shí)候就喜歡聽廣播，特別是說(shuō)書、相聲等，所以我需要大量的有聲小說(shuō)，現(xiàn)在網(wǎng)上這些資源多的很，但是下載頁(yè)記為麻煩，為了掙取更多的流量和廣告點(diǎn)擊，這些網(wǎng)站的下載鏈接都需要打開至少兩個(gè)以上的網(wǎng)頁(yè)才能找到真正的鏈接，甚是麻煩，為了節(jié)省整體下載時(shí)間，我寫了這個(gè)小程序，方便自己和大家下載有聲小說(shuō)（當(dāng)然，還有任何其他類型的資源）

先說(shuō)明一下，我不是為了爬很多資料和數(shù)據(jù)，僅僅是為了娛樂(lè)和學(xué)習(xí)，所以這里不會(huì)漫無(wú)目的的取爬取一個(gè)網(wǎng)站的所有鏈接，而是給定一個(gè)小說(shuō)，比方說(shuō)我要下載小說(shuō)《童年》，我會(huì)在我聽評(píng)書網(wǎng)上找到該小說(shuō)的主頁(yè)然后用程序下載所有mp3音頻，具體做法見下面代碼，所有代碼都在模塊crawler5tps中：

1. 先設(shè)定一下start url 和保存文件的目錄

#-*-coding:GBK-*-  
 import urllib,urllib2  
 import re,threading,os  
 baseurl = 'http://www.5tps.com' #base url   
 down2path = 'E:/enovel/'        #saving path  
 save2path = ''                  #saving file name (full path)

2. 從start url 解析下載頁(yè)面的url

def parseUrl(starturl):  
     '''''  
     parse out download page from start url.  
     eg. we can get 'http://www.5tps.com/down/8297_52_1_1.html' from 'http://www.5tps.com/html/8297.html'  
     ''' 
     global save2path  
     rDownloadUrl = re.compile(".*?<A href=\'(/down/\w+\.html)\'.*") #find the link of download page  
     #rTitle = re.compile("<TITILE>.{4}\s{1}(.*)\s{1}.*</TITLE>")  
     #<TITLE>有聲小說(shuō) 悶騷1 播音:劉濤 全集</TITLE>  
     f = urllib2.urlopen(starturl)  
     totalLine =  f.readlines()  
       
　　　　''''' create the name of saving file ''' 
     title = totalLine[3].split(" ")[1]  
     if os.path.exists(down2path+title) is not True:  
         os.mkdir(down2path+title)  
         save2path = down2path+title+"/" 
       
     downUrlLine = [ line for line in totalLine if rDownloadUrl.match(line)]  
     downLoadUrl = [];  
     for dl in downUrlLine:  
         while True:  
             m = rDownloadUrl.match(dl)  
             if not m:  
                 break 
             downUrl = m.group(1)  
             downLoadUrl.append(downUrl.strip())  
             dl = dl.replace(downUrl,'')  
     return downLoadUrl

3. 從下載頁(yè)面解析出真正的下載鏈接

def getDownlaodLink(starturl):  
     '''''  
     find out the real download link from download page.  
     eg. we can get the download link 'http://180j-d.ysts8.com:8000/人物紀(jì)實(shí)/童年/001.mp3?\  
     1251746750178x1356330062x1251747362932-3492f04cf54428055a110a176297d95a' from \  
     'http://www.5tps.com/down/8297_52_1_1.html'  
     ''' 
     downUrl = []  
     gbk_ClickWord = '點(diǎn)此下載' 
     downloadUrl = parseUrl(starturl)  
     rDownUrl = re.compile('<a href=\"(.*)\"><font color=\"blue\">'+gbk_ClickWord+'.*</a>') #find the real download link  
     for url in downloadUrl:  
         realurl = baseurl+url  
         print realurl  
         for line in urllib2.urlopen(realurl).readlines():  
             m = rDownUrl.match(line)  
             if m:  
                 downUrl.append(m.group(1))  
     
     return downUrl

4. 定義下載函數(shù)

def download(url,filename):  
     ''''' download mp3 file ''' 
     print url  
     urllib.urlretrieve(url, filename)

5. 創(chuàng)建用于下載文件的線程類

class DownloadThread(threading.Thread):  
     ''''' dowanload thread class ''' 
     def __init__(self,func,savePath):  
         threading.Thread.__init__(self)  
         self.function = func  
         self.savePath = savePath  
       
     def run(self):  
         download(self.function,self.savePath)

6. 開始下載

if __name__ == '__main__':  
     starturl = 'http://www.5tps.com/html/8297.html' 
     downUrl = getDownlaodLink(starturl)  
     aliveThreadDict = {}        # alive thread  
     downloadingUrlDict = {}     # downloading link  
   
     i = 0;  
     while i < len(downUrl):  
         ''''' Note:我聽評(píng)說(shuō)網(wǎng) 只允許同時(shí)有三個(gè)線程下載同一部小說(shuō)，但是有時(shí)受網(wǎng)絡(luò)等影響，\  
                         為確保下載的是真實(shí)的mp3，這里將線程數(shù)設(shè)為2 ''' 
         while len(downloadingUrlDict)< 2 :  
             downloadingUrlDict[i]=i  
             i += 1 
         for urlIndex in downloadingUrlDict.values():  
             #argsTuple = (downUrl[urlIndex],save2path+str(urlIndex+1)+'.mp3')  
             if urlIndex not in aliveThreadDict.values():  
                 t = DownloadThread(downUrl[urlIndex],save2path+str(urlIndex+1)+'.mp3')  
                 t.start()  
                 aliveThreadDict[t]=urlIndex  
         for (th,urlIndex) in aliveThreadDict.items():  
             if th.isAlive() is not True:  
                 del aliveThreadDict[th] # delete the thread slot  
                 del downloadingUrlDict[urlIndex] # delete the url from url list needed to download   
       
     print 'Completed Download Work'

這樣就可以了，讓他盡情的下吧，咱還得碼其他的項(xiàng)目去，哎 >>>

等下了班copy到Note中就可以一邊聽小說(shuō)一邊看資料啦，***附上源碼。

原文鏈接：http://www.cnblogs.com/wuren/archive/2012/12/24/2831100.html

責(zé)任編輯：張偉來(lái)源：博客園

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營(yíng)