What Is The Difference Between Python Crawler Module Urlib And Requests

In the previous article ( see the references section ), we have talked about the usage of the python Requests module and how to use the Python Requests module to simply crawl and save web pages. In this article, we mainly talk about the difference between the Python Urllib and Requests module.

1. Difference In Getting Web Page Data.

  1. Import different libraries. It is obvious that the two libraries import different python modules.
    # import the python urllib module.
    import urllib.request
    
    # import the python requests module.
    import requests
  2. Sending web page requests method is different. The python Urllib sends web page requests through the urlopen() method, the python Requests module needs to get data through the response type of the web page.
    # urllib module send web page request by the urlopen() method.
    resp = urllib.request.urlopen("https://www.google.com")
    
    # python requests module send web page request 
    resp = request.get("https://www.google.com")
  3. Data encapsulation is different. For complex data requests, we can’t simply use the urlopen method.
  4. Using the python urllib module, we know that for websites with anti crawler mechanism, we need to encapsulate the URL to obtain data.
    url = "https://www.google.com"
    
    headers = {
        
        "user-agent": "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36"
    }
    
    data = bytes(urllib.parse.urlencode({"hello":"world"}), encoding="utf-8")
    
    req=urllib.request.Request(url=url,data=data, headers=headers)
    
    resp=urllib.request.urlopen(req,context = ctx)
  5. In the python Requests module, there is no need for such complex operations. Just add the parameter headers in the second step.
    import requests
    
    headers = {
        
        "user-agent": "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36"
    }
    
    resp=requests.get("https://www.google.com", headers=headers)

2. Difference In Parsing Web Data.

  1. Both the python urllib and requests module can parse data through bs4 and re module, and the requests module can also parse the  reponse data through XPath.

References

  1. Python3 urllib.request.urlopen Example
  2. How To Use Python Requests Module Example

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.