警告
本文最后更新于 2022-10-22,文中内容可能已过时。
ADSP(Algo + Data Structure = Programming) 是一个优秀的播客网站,经常邀请一些编程界的大佬的探讨技术性话题,截止目前(2022-10-20)已经录制正好100期节目了。
该网站同时还提供了音频下载,方便用户离线收听。不过对于懒人如我者(程序员福利),当然想着使用脚本来自动化下载了。话不多说,show the code。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
import bs4
import re
import requests
import os
from datetime import datetime
links = []
for i in range(1,21):
if i == 1:
url = f"https://adspthepodcast.com"
else:
url = f"https://adspthepodcast.com/blog/page{i}/"
rsp = requests.get(url)
soup = bs4.BeautifulSoup(rsp.text, 'html.parser').find_all('a')
links.extend([link.get('href') for link in soup if 'Episode' in link.get('href')])
for k in links:
url = f"https://adspthepodcast.com/{k}"
print(f"{datetime.now()} processing url ==> {url}")
rsp = requests.get(url)
soup = bs4.BeautifulSoup(rsp.text, 'html.parser')
## 这里需要查看一下 soup 里面具体的格式
## 发现 section 这个地方出现了下载链接
res = soup.find_all(name='section')[0].find('script').get('src')
res = re.sub(r".js", r".mp3", res)
## 提取title
# title = re.sub('.*(episode.*mp3).*', '\\1', res)
title = '-'.join(res.split('?')[0].split('/')[-1].split('-')[1:])
filename = f'/home/william/Downloads/ADSP/{title}'
if os.path.isfile(filename):
continue
mp3 = requests.get(res)
with open(filename, 'wb') as f:
f.write(mp3.content)
print(f"{datetime.now()} saved file ==> {filename}")
|