疯狂下载小视频

2018-04-12 13:25:28 62 14194 1
0x01 分析网站视频
我接触的小视频网站分类两类
A: 播放器对视频进行加密过的
B:直接mp4的播放(这一类其实很多很多)

对A类视频探测下载地址的方式:
直接使用浏览器自带的插件, 找到network选项  使用常用的 视频格式进行过滤, 如下图所示的分析方式

直接可以探测到 下载地址
备注: 有时候经常会探测到 .ts的文件, ckplayer的播放器经常会遇到这种格式, 不要惊慌, 全部下载下来, 然后用格式工厂的工具,进行视频文件组合即可。


对B类的视频文件就简单了:
右键直接点击下载视频, 这个不讲解


0x02 批量下载
有时候太猥琐, 想把网站的视频全部下载下来。
给大家我用的思路与方法吧
拿到视频的下载地址以后
地址分为几类:
www.xxx.com/video/aa001.mp4
www.xxx.com/video/aa002.mp4
www.xxx.com/video/aa00*.mp4

这类的视频下载地址, 直接进行可以下载了,工具【迅雷】
两种方式



迅雷提供的另种批量下载的方式: 链接批量下载, 还有一种正则表达式的方式!

对于这类 没有对url进行加密的 ,批量扒下来。
类似这样:
www.xxx.com/video/9452ab54287bbccc989.mp4
一段没有任何规律的字符, 可以适当分析一下他的加密方式, 分析不出来就放弃吧。
对于这类的网站,给大家一个思路:
a:首先写一个爬虫脚本, 爬取所有播放页面的URL
b:写一个脚本,对每个页面的播放地址进行分析

这种思路, 基本可以解决所有的问题了。

0x03 效果展示




说明: 很简单的一种扒视频的方法, 只是给大家分享一下自己的思路,没啥技术含量。
TCV : 0

关于作者

busishen5篇文章47篇回复

对底层逆向感兴趣

评论62次

要评论?请先  登录  或  注册
  • TOP1
    2018-4-12 15:19

    #!/usr/bin/env python# -*- conding:utf-8 -*- import urllib.requestimport re,os,socket,base64url = base64.b64decode('aHR0cHM6Ly8yMDE3MTJtcDQuODlzb3NvLmNvbS8=').decode('utf-8')jpgs = ['1_1.jpg','1_2.jpg','1_3.jpg','1_4.jpg','1_5.jpg','1_6.jpg','1_7.jpg','1_8.jpg','1_9.jpg','1_10.jpg']urllist = ['20170102', '20170103', '20170104', '20170106', '20170107', '20170108', '20170109', '20170110', '20170111', '20170112', '20170113', '20170114', '20170115', '20170116', '20170220', '20170222', '20170223', '20170224', '20170225', '20170226', '20170227', '20170228', '20170301', '20170302', '20170303', '20170304', '20170305', '20170306', '20170307', '20170308', '20170309', '20170310', '20170311', '20170312', '20170316', '20170318', '20170322', '20170323', '20170324', '20170325', '20170326', '20170327', '20170329', '20170330', '20170331', '20170401', '20170402', '20170403', '20170404', '20170405', '20170406', '20170407', '20170408', '20170409', '20170410', '20170412', '20170413', '20170415', '20170416', '20170417', '20170418', '20170419', '20170420', '20170421', '20170422', '20170423', '20170424', '20170425', '20170426', '20170427', '20170428', '20170429', '20170430', '20170501', '20170502', '20170503', '20170504', '20170505', '20170506', '20170507', '20170508', '20170509', '20170510', '20170511', '20170512', '20170513', '20170514', '20170515', '20170516', '20170517', '20170518', '20170519', '20170520', '20170521', '20170522', '20170523', '20170524', '20170525', '20170526', '20170527', '20170528', '20170529', '20170530', '20170531', '20170601', '20170602', '20170603', '20170604', '20170605', '20170606', '20170607', '20170608', '20170609', '20170610', '20170611', '20170612', '20170613', '20170614', '20170615', '20170616', '20170617', '20170618', '20170619', '20170620', '20170621', '20170622', '20170623', '20170624', '20170625', '20170626', '20170627', '20170628', '20170629', '20170630', '20170701', '20170702', '20170704', '20170705', '20170706', '20170707', '20170708', '20170709', '20170710', '20170711', '20170712', '20170713', '20170714', '20170715', '20170716', '20170717', '20170718', '20170719', '20170720', '20170721', '20170722', '20170723', '20170724', '20170725', '20170726', '20170727', '20170728', '20170729', '20170730', '20170731', '20170801', '20170802', '20170803', '20170804', '20170805', '20170806', '20170807', '20170808', '20170809', '20170810', '20170811', '20170812', '20170813', '20170814', '20170815', '20170816', '20170817', '20170818', '20170819', '20170820', '20170821', '20170822', '20170823', '20170824', '20170825', '20170826', '20170827', '20170828', '20170829', '20170830', '20170831', '20170901', '20170902', '20170903', '20170904', '20170905', '20170906', '20170907', '20170908', '20170909', '20170910', '20170911', '20170912', '20170913', '20170914', '20170915', '20170916', '20170917', '20170918', '20170919', '20170920', '20170921', '20170922', '20170923', '20170924', '20170925', '20170926', '20170927', '20170928', '20170929', '20170930', '20171001', '20171002', '20171003', '20171004', '20171005', '20171006', '20171007', '20171008', '20171009', '20171010', '20171011', '20171012', '20171013', '20171014', '20171015', '20171016', '20171017', '20171018', '20171019', '20171020', '20171021', '20171022', '20171023', '20171024', '20171025', '20171026', '20171027', '20171028', '20171029', '20171030', '20171031', '20171101', '20171102', '20171103', '20171104', '20171105', '20171106', '20171107', '20171108', '20171109', '20171110', '20171111', '20171112', '20171113', '20171114', '20171115', '20171116', '20171117', '20171118', '20171119', '20171120', '20171121', '20171122', '20171123', '20171124', '20171125', '20171126', '20171127', '20171128', '20171129', '20171130', '20171201', '20171202', '20171203', '20171204', '20171205', '20171206', '20171207', '20171208', '20171209', '20171210', '20171211', '20171212', '20171213', '20171214', '20171215', '20171216', '20171225', '20171226', '20171227', '20171228', '20171229', '20171230', '20171231', '20180104', '20180105', '20180106', '20180107', '20180108', '20180109', '20180110', '20180111', '20180112', '20180113', '20180114', '20180115', '20180116', '20180117', '20180118', '20180119', '20180120', '20180121', '20180122', '20180123', '20180124', '20180125', '20180126', '20180127', '20180128', '20180129', '20180130', '20180131', '20180201', '20180202', '20180203', '20180204', '20180205', '20180206', '20180207', '20180208', '20180209', '20180210', '20180221', '20180222', '20180223', '20180224', '20180302', '20180303', '20180304', '20180305', '20180306', '20180307', '20180308', '20180309', '20180310', '20180311', '20180312', '20180313', '20180314', '20180315', '20180316', '20180317', '20180318', '20180319', '20180320', '20180321', '20180322', '20180323', '20180324', '20180325', '20180326', '20180327', '20180328', '20180329', '20180330', '20180331', '20180406', '20180407', '20180408', '20180409', '20180410']def jpg(url,j):  '''下载图片'''  if os.path.exists('jpg') == False:  os.mkdir('jpg')  os.mkdir('jpg\\%s'%(j))  for y in range(1, 24):  count = 1  os.mkdir('jpg\\%s\\%s' %(j,y))  for z in jpgs:    try:      k =url+"/"+str(y)+"/"+z      #print(k)      xmlurl = url+"/"+str(y)+"/"+"1/xml/index.xml"      if count == 10:      data = urllib.request.urlopen(xmlurl).read().decode()      redata = re.compile(r'http.*?\.mp4')      datas = redata.findall(data)      counts = 1      for i in datas:        with open("jpg\\%s\\%s\\mp4_%s.html" % (j, y,counts), 'a') as f:          f.write('<video src="%s" controls="controls"'%(i))          f.write(r" width='100%' height='100%'></video>")        counts +=1      print("视频以保存到目录下的html文件中")      socket.setdefaulttimeout(5)#下载5秒没反应 就跳过      urllib.request.urlretrieve(k,"jpg\\%s\\%s\\%s.jpg"%(j,y,count))      print("正在下载第%s组第%s张照片"%(y,count))      count += 1    except urllib.error.HTTPError as e:      print(e,"下载第%s组第%s张照片失败,下载还在继续!"%(y,count))      count += 1    except urllib.error.URLError as e:      print(e,"下载第%s组第%s张照片失败,下载还在继续!"%(y,count))      count += 1    except socket.timeout as e:      print(e,"下载第%s组第%s张照片失败,下载还在继续!"%(y,count))      count +=1for i in urllist:  j = url + i  jpg(j,i)

  • 22楼
    2018-4-12 17:27
    算命縖子

    #!/usr/bin/env python# -*- conding:utf-8 -*- import urllib.requestimport re,os,socket,base64url = base64.b64decode('aHR0cHM6Ly8yMDE3MTJtcDQuODlzb3NvLmNvbS8=').decode('utf-8')jpgs = urllist = def jpg(url,j): '''下载图片''' if os.path.exists('jpg') == False: os.mkdir('jpg') os.mkdir('jpg\\%s'%(j)) for y in range(1, 24): count = 1 os.mkdir('jpg\\%s\\%s' %(j,y)) for z in jpgs: try: k =url+"/"+str(y)+"/"+z #print(k) xmlurl = url+"/"+str(y)+"/"+"1/xml/index.xml" if count == 10: data = urllib.request.urlopen(xmlurl).read().decode() redata = re.compile(r'http.*?\.mp4') datas = redata.findall(data) counts = 1 for i in datas: with open("jpg\\%s\\%s\\mp4_%s.html" % (j, y,counts), 'a') as f: f.write('<video src="%s" controls="controls"'%(i)) f.write(r" width='100%' height='100%'></video>") counts +=1 print("视频以保存到目录下的html文件中") socket.setdefaulttimeout(5) #下载5秒没反应 就跳过 urllib.request.urlretrieve(k,"jpg\\%s\\%s\\%s.jpg"%(j,y,count)) print("正在下载第%s组第%s张照片"%(y,count)) count += 1 except urllib.error.HTTPError as e: print(e,"下载第%s组第%s张照片失败,下载还在继续!"%(y,count)) count += 1 except urllib.error.URLError as e: print(e,"下载第%s组第%s张照片失败,下载还在继续!"%(y,count)) count += 1 except socket.timeout as e: print(e,"下载第%s组第%s张照片失败,下载还在继续!"%(y,count)) count +=1for i in urllist: j = url + i jpg(j,i)

    1

    营养快跟不上了。t00ls老司机真多啊。

  • 21楼
    2018-4-12 17:06

    一般是在f12控制台正则出URL,保存到txt文本里. wget 就全下载到本地了

  • 20楼
    2018-4-12 17:03
    阿乾

    如果遇到那种分段视频的怎么办,而且加密了

    1

    遇到过 ,很少 一般分段很少加密的

  • 19楼
    2018-4-12 17:02

    直接用IDM 链接抓取,批量下载

  • 18楼
    2018-4-12 17:02
    sleeve

    这个chrom下有比较完整的框架的吧

    1

    没用过 我去搜一搜

  • 17楼
    2018-4-12 17:01
    joe

    火狐用video downloader pro插件就能播放视频时找到下载地址

    1

    感谢,又多了个工具

  • 16楼
    2018-4-12 16:33

    这个chrom下有比较完整的框架的吧

  • 15楼
    2018-4-12 16:28

    火狐用video downloader pro插件就能播放视频时找到下载地址

  • 14楼
    2018-4-12 16:22
    算命縖子

    #!/usr/bin/env python# -*- conding:utf-8 -*- import urllib.requestimport re,os,socket,base64url = base64.b64decode('aHR0cHM6Ly8yMDE3MTJtcDQuODlzb3NvLmNvbS8=').decode('utf-8')jpgs = urllist = def jpg(url,j): '''下载图片''' if os.path.exists('jpg') == False: os.mkdir('jpg') os.mkdir('jpg\\%s'%(j)) for y in range(1, 24): count = 1 os.mkdir('jpg\\%s\\%s' %(j,y)) for z in jpgs: try: k =url+"/"+str(y)+"/"+z #print(k) xmlurl = url+"/"+str(y)+"/"+"1/xml/index.xml" if count == 10: data = urllib.request.urlopen(xmlurl).read().decode() redata = re.compile(r'http.*?\.mp4') datas = redata.findall(data) counts = 1 for i in datas: with open("jpg\\%s\\%s\\mp4_%s.html" % (j, y,counts), 'a') as f: f.write('<video src="%s" controls="controls"'%(i)) f.write(r" width='100%' height='100%'></video>") counts +=1 print("视频以保存到目录下的html文件中") socket.setdefaulttimeout(5) #下载5秒没反应 就跳过 urllib.request.urlretrieve(k,"jpg\\%s\\%s\\%s.jpg"%(j,y,count)) print("正在下载第%s组第%s张照片"%(y,count)) count += 1 except urllib.error.HTTPError as e: print(e,"下载第%s组第%s张照片失败,下载还在继续!"%(y,count)) count += 1 except urllib.error.URLError as e: print(e,"下载第%s组第%s张照片失败,下载还在继续!"%(y,count)) count += 1 except socket.timeout as e: print(e,"下载第%s组第%s张照片失败,下载还在继续!"%(y,count)) count +=1for i in urllist: j = url + i jpg(j,i)

    1

    骚气啊 我一般用copy网站的工具

  • 13楼
    2018-4-12 16:11

    爬虫已经泛滥啦

  • 12楼
    2018-4-12 15:53

    如果遇到那种分段视频的怎么办,而且加密了

  • 11楼
    2018-4-12 15:43
    busishen

    滴, 滴滴

    1

    段子已死 段友永生!!!

  • 10楼
    2018-4-12 15:28

    老司机 快上车。以前不会py,用c#写过一个

  • 9楼
    2018-4-12 15:19

    #!/usr/bin/env python # -*- conding:utf-8 -*-  import urllib.request import re,os,socket,base64  url = base64.b64decode('aHR0cHM6Ly8yMDE3MTJtcDQuODlzb3NvLmNvbS8=').decode('utf-8') jpgs = ['1_1.jpg','1_2.jpg','1_3.jpg','1_4.jpg','1_5.jpg','1_6.jpg','1_7.jpg','1_8.jpg','1_9.jpg','1_10.jpg'] urllist = ['20170102', '20170103', '20170104', '20170106', '20170107', '20170108', '20170109', '20170110', '20170111', '20170112', '20170113', '20170114', '20170115', '20170116', '20170220', '20170222', '20170223', '20170224', '20170225', '20170226', '20170227', '20170228', '20170301', '20170302', '20170303', '20170304', '20170305', '20170306', '20170307', '20170308', '20170309', '20170310', '20170311', '20170312', '20170316', '20170318', '20170322', '20170323', '20170324', '20170325', '20170326', '20170327', '20170329', '20170330', '20170331', '20170401', '20170402', '20170403', '20170404', '20170405', '20170406', '20170407', '20170408', '20170409', '20170410', '20170412', '20170413', '20170415', '20170416', '20170417', '20170418', '20170419', '20170420', '20170421', '20170422', '20170423', '20170424', '20170425', '20170426', '20170427', '20170428', '20170429', '20170430', '20170501', '20170502', '20170503', '20170504', '20170505', '20170506', '20170507', '20170508', '20170509', '20170510', '20170511', '20170512', '20170513', '20170514', '20170515', '20170516', '20170517', '20170518', '20170519', '20170520', '20170521', '20170522', '20170523', '20170524', '20170525', '20170526', '20170527', '20170528', '20170529', '20170530', '20170531', '20170601', '20170602', '20170603', '20170604', '20170605', '20170606', '20170607', '20170608', '20170609', '20170610', '20170611', '20170612', '20170613', '20170614', '20170615', '20170616', '20170617', '20170618', '20170619', '20170620', '20170621', '20170622', '20170623', '20170624', '20170625', '20170626', '20170627', '20170628', '20170629', '20170630', '20170701', '20170702', '20170704', '20170705', '20170706', '20170707', '20170708', '20170709', '20170710', '20170711', '20170712', '20170713', '20170714', '20170715', '20170716', '20170717', '20170718', '20170719', '20170720', '20170721', '20170722', '20170723', '20170724', '20170725', '20170726', '20170727', '20170728', '20170729', '20170730', '20170731', '20170801', '20170802', '20170803', '20170804', '20170805', '20170806', '20170807', '20170808', '20170809', '20170810', '20170811', '20170812', '20170813', '20170814', '20170815', '20170816', '20170817', '20170818', '20170819', '20170820', '20170821', '20170822', '20170823', '20170824', '20170825', '20170826', '20170827', '20170828', '20170829', '20170830', '20170831', '20170901', '20170902', '20170903', '20170904', '20170905', '20170906', '20170907', '20170908', '20170909', '20170910', '20170911', '20170912', '20170913', '20170914', '20170915', '20170916', '20170917', '20170918', '20170919', '20170920', '20170921', '20170922', '20170923', '20170924', '20170925', '20170926', '20170927', '20170928', '20170929', '20170930', '20171001', '20171002', '20171003', '20171004', '20171005', '20171006', '20171007', '20171008', '20171009', '20171010', '20171011', '20171012', '20171013', '20171014', '20171015', '20171016', '20171017', '20171018', '20171019', '20171020', '20171021', '20171022', '20171023', '20171024', '20171025', '20171026', '20171027', '20171028', '20171029', '20171030', '20171031', '20171101', '20171102', '20171103', '20171104', '20171105', '20171106', '20171107', '20171108', '20171109', '20171110', '20171111', '20171112', '20171113', '20171114', '20171115', '20171116', '20171117', '20171118', '20171119', '20171120', '20171121', '20171122', '20171123', '20171124', '20171125', '20171126', '20171127', '20171128', '20171129', '20171130', '20171201', '20171202', '20171203', '20171204', '20171205', '20171206', '20171207', '20171208', '20171209', '20171210', '20171211', '20171212', '20171213', '20171214', '20171215', '20171216', '20171225', '20171226', '20171227', '20171228', '20171229', '20171230', '20171231', '20180104', '20180105', '20180106', '20180107', '20180108', '20180109', '20180110', '20180111', '20180112', '20180113', '20180114', '20180115', '20180116', '20180117', '20180118', '20180119', '20180120', '20180121', '20180122', '20180123', '20180124', '20180125', '20180126', '20180127', '20180128', '20180129', '20180130', '20180131', '20180201', '20180202', '20180203', '20180204', '20180205', '20180206', '20180207', '20180208', '20180209', '20180210', '20180221', '20180222', '20180223', '20180224', '20180302', '20180303', '20180304', '20180305', '20180306', '20180307', '20180308', '20180309', '20180310', '20180311', '20180312', '20180313', '20180314', '20180315', '20180316', '20180317', '20180318', '20180319', '20180320', '20180321', '20180322', '20180323', '20180324', '20180325', '20180326', '20180327', '20180328', '20180329', '20180330', '20180331', '20180406', '20180407', '20180408', '20180409', '20180410']   def jpg(url,j):     '''下载图片'''     if os.path.exists('jpg') == False:         os.mkdir('jpg')     os.mkdir('jpg\\%s'%(j))     for y in range(1, 24):         count = 1         os.mkdir('jpg\\%s\\%s' %(j,y))         for z in jpgs:             try:                 k =url+"/"+str(y)+"/"+z                 #print(k)                 xmlurl = url+"/"+str(y)+"/"+"1/xml/index.xml"                 if count == 10:                     data = urllib.request.urlopen(xmlurl).read().decode()                     redata = re.compile(r'http.*?\.mp4')                     datas = redata.findall(data)                     counts = 1                     for i in datas:                         with open("jpg\\%s\\%s\\mp4_%s.html" % (j, y,counts), 'a') as f:                             f.write('<video src="%s" controls="controls"'%(i))                             f.write(r" width='100%' height='100%'></video>")                         counts +=1                     print("视频以保存到目录下的html文件中")                 socket.setdefaulttimeout(5)  #下载5秒没反应 就跳过                 urllib.request.urlretrieve(k,"jpg\\%s\\%s\\%s.jpg"%(j,y,count))                 print("正在下载第%s组第%s张照片"%(y,count))                 count += 1             except urllib.error.HTTPError as e:                 print(e,"下载第%s组第%s张照片失败,下载还在继续!"%(y,count))                 count += 1             except urllib.error.URLError as e:                 print(e,"下载第%s组第%s张照片失败,下载还在继续!"%(y,count))                 count += 1             except socket.timeout as e:                 print(e,"下载第%s组第%s张照片失败,下载还在继续!"%(y,count))                 count +=1  for i in urllist:     j = url + i     jpg(j,i)

  • 8楼
    2018-4-12 15:06
    snsqq

    “然后用格式工厂的工具,进行视频文件组合即可。” 这个是怎么操作的

    1

    格式工厂可以合并视频文件,你看一下它的功能列表,试试

  • 7楼
    2018-4-12 14:58

    “然后用格式工厂的工具,进行视频文件组合即可。” 这个是怎么操作的

  • 6楼
    2018-4-12 14:16
    Jacob

    有些还专门去找视频接口,然后自己整个界面,调别人的接口来免费看

    1

    嗯,接触过一些接口, 不过提供了一种思路, 先用接口分析, 然后批量下载。 也可以实现的。

  • 5楼
    2018-4-12 14:13

    滴, 滴滴

  • 4楼
    2018-4-12 14:12

    有些还专门去找视频接口,然后自己整个界面,调别人的接口来免费看

  • 3楼
    2018-4-12 14:10

    学xi了 借鉴一下