Typically if you scrape a web site, you could have encountered the truth that the web site returns shortened URLs to sources from different web sites.
As on this case, for instance, https://upflix.pl/r/Qb64Ar
this hyperlink consists of a website and a few random characters. The best way a shortened hyperlink works is that it redirects you to a different web page. Due to this fact, the status_code
that our question returns is 302
Typically it occurs that we’d like a full URL to get that may do that with a number of traces of Python code and the requests library.
pip set up requests
We’ll use the head
technique to carry out this perform
This technique is much like get
with the distinction that it doesn’t return any content material, solely headers.
response = requests.head(short_url)
After executing the question, we are able to verify the headers that had been returned.
There’s data right here similar to:
- date
- kind of web site content material
- character encoding
- FULL LINK
and plenty of different data you possibly can see beneath.
{'Date': 'Thu, 16 Nov 2023 00:43:13 GMT', 'Content material-Kind': 'textual content/html; charset=UTF-8', 'Connection': 'keep-alive', 'location': 'https://www.imdb.com/title/tt14060708/', 'differ': 'Origin', 'x-powered-by': 'PHP/7.3.33', 'x-frame-options': 'SAMEORIGIN', 'CF-Cache-Standing': 'DYNAMIC', 'Report-To': '{"endpoints":[{"url":"https://a.nel.cloudflare.com/report/v3?s=bgvCMcMQg1ZkjanlgqzemKUHHthalhb%2FAT72Q58O8a22eFmkeb%2FyeeIMfKkGFwt8WmkMB6dv28F1G2CdH134Kilk%2BcdQNweIZ3O%2FN9KlQf1A2VF%2Bm3yYT89rvjU%3D"}],"group":"cf-nel","max_age":604800}', 'NEL': '{"success_fraction":0,"report_to":"cf-nel","max_age":604800}', 'Strict-Transport-Safety': 'max-age=15552000; includeSubDomains; preload', 'X-Content material-Kind-Choices': 'nosniff', 'Server': 'cloudflare', 'CF-RAY': '826bb2ce597ebfda-WAW', 'alt-svc': 'h3=":443"; ma=86400'}
Full code
import requests
def get_full_url(short_url: str) -> Optionally available[str]
response = requests.head(short_url)
if response.status_code == 302:
headers = response.headers
return headers["location"]
return None