Python爬58同城二手房图片

Python与爬虫 Haran 9年前 (2016-04-08) 4186次浏览 0个评论

Python爬58同城二手房图片

  1. from bs4 import BeautifulSoup
  2. import requests
  3. import os
  4. import urllib.request
  5. import random
  6. import time
  7. import re
  8. user_agent = ['Mozilla/5.0 (Windows NT 6.1)\
  9. AppleWebKit/537.11 (KHTML, like Gecko)\
  10. Chrome/23.0.1271.64 Safari/537.11','Mozilla/5.0 (Windows NT 6.1; WOW64)\
  11. AppleWebKit/537.36 (KHTML, like Gecko)\
  12. Chrome/47.0.2526.106 Safari/537.36','Mozilla/5.0 \
  13. (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0',"Mozilla/5.0\
  14. (X11; Linux x86_64) AppleWebKit/537.17 (KHTML, like Gecko)\
  15. Chrome/24.0.1312.56 Safari/537.17",'Mozilla/5.0\
  16. (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0']
  17. url=[]
  18. for i in range(1000):
  19. if i==0:
  20. url.append('http://gz.58.com/ershoufang/')
  21. else:
  22. url.append('http://gz.58.com/ershoufang/pn{0}/'.format(i))
  23. print("url is done!")
  24. b=0
  25. url=['http://gz.58.com/ershoufang/']
  26. cd week7
  27. cd douban
  28. for i in url:
  29. time.sleep(1)
  30. agent = random.choice(user_agent)
  31. header= {
  32. 'Connection': 'Keep-Alive',
  33. 'Accept': 'text/html, application/xhtml+xml, */*',
  34. 'Accept-Language': 'en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3',
  35. 'User-Agent': '%s' %agent}
  36. soup=BeautifulSoup(requests.get(i,headers = header).text,"html.parser")
  37. items=soup('tr',logr=re.compile('^j'))
  38. if len(items)==0:
  39. break
  40. else:
  41. for item in items:
  42. urllib.request.urlretrieve(item.find('div','img_list').img.get('lazy_src'),
  43. os.path.basename(item.find('p','bthead').a.get_text()+'.jpg'))
  44. # print(item.find('div','img_list').img.get('lazy_src'))
  45. b+=1
  46. print("下载%d张"%int(b))
  47. print("Finish Down %d Picture" %int(b))

如有疑问,可以在文章底部留言或邮件(haran.huang@ichdata.com) 我~
喜欢 (1)
发表我的评论
取消评论
表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址