Python for Data Science #366

Diamond-Ruby · 2023-03-27T14:02:57Z

Hello there!
I'm trying to scrap data from the web for an analysis but the code is having error and I'm not able to fix, pls I will paste the code and the error below, can anyone help pls.

base_url = "https://www.airlinequality.com/airline-reviews/british-airways"
pages = 10
page_size = 100

reviews = []

for i in range(1, pages + 1):

print(f"Scraping page {i}")

# Create URL to collect links from paginated data
url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"

# Collect HTML data from this page
response = requests.get(url)

# Parse content
content = response.content
parsed_content = BeautifulSoup(content, 'html.parser')
for para in parsed_content.find_all("div", {"class": "text_content"}):
    reviews.append(para.get_text())

print(f"   ---> {len(reviews)} total reviews")

TimeoutError Traceback (most recent call last)
~\anaconda3\lib\site-packages\urllib3\connection.py in _new_conn(self)
173 try:
--> 174 conn = connection.create_connection(
175 (self._dns_host, self.port), self.timeout, **extra_kw

~\anaconda3\lib\site-packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options)
94 if err is not None:
---> 95 raise err
96

~\anaconda3\lib\site-packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options)
84 sock.bind(source_address)
---> 85 sock.connect(sa)
86 return sock

TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

NewConnectionError Traceback (most recent call last)
~\anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
702 # Make the request on the httplib connection object.
--> 703 httplib_response = self._make_request(
704 conn,

~\anaconda3\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
385 try:
--> 386 self._validate_conn(conn)
387 except (SocketTimeout, BaseSSLError) as e:

~\anaconda3\lib\site-packages\urllib3\connectionpool.py in _validate_conn(self, conn)
1041 if not getattr(conn, "sock", None): # AppEngine might not have .sock
-> 1042 conn.connect()
1043

~\anaconda3\lib\site-packages\urllib3\connection.py in connect(self)
357 # Add certificate verification
--> 358 self.sock = conn = self._new_conn()
359 hostname = self.host

~\anaconda3\lib\site-packages\urllib3\connection.py in _new_conn(self)
185 except SocketError as e:
--> 186 raise NewConnectionError(
187 self, "Failed to establish a new connection: %s" % e

NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x000002095A7CD550>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

MaxRetryError Traceback (most recent call last)
~\anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
488 if not chunked:
--> 489 resp = conn.urlopen(
490 method=request.method,

~\anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
786
--> 787 retries = retries.increment(
788 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]

~\anaconda3\lib\site-packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
591 if new_retry.is_exhausted():
--> 592 raise MaxRetryError(_pool, url, error or ResponseError(cause))
593

MaxRetryError: HTTPSConnectionPool(host='www.airlinequality.com', port=443): Max retries exceeded with url: /airline-reviews/british-airways/page/1/?sortby=post_date%3ADesc&pagesize=100 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002095A7CD550>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_7652\3242930068.py in
14
15 # Collect HTML data from this page
---> 16 response = requests.get(url)
17
18 # Parse content

~\anaconda3\lib\site-packages\requests\api.py in get(url, params, **kwargs)
71 """
72
---> 73 return request("get", url, params=params, **kwargs)
74
75

~\anaconda3\lib\site-packages\requests\api.py in request(method, url, **kwargs)
57 # cases, and look like a memory leak in others.
58 with sessions.Session() as session:
---> 59 return session.request(method=method, url=url, **kwargs)
60
61

~\anaconda3\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
585 }
586 send_kwargs.update(settings)
--> 587 resp = self.send(prep, **send_kwargs)
588
589 return resp

~\anaconda3\lib\site-packages\requests\sessions.py in send(self, request, **kwargs)
699
700 # Send the request
--> 701 r = adapter.send(request, **kwargs)
702
703 # Total elapsed time of the request (approximately)

~\anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
563 raise SSLError(e, request=request)
564
--> 565 raise ConnectionError(e, request=request)
566
567 except ClosedPoolError as e:

ConnectionError: HTTPSConnectionPool(host='www.airlinequality.com', port=443): Max retries exceeded with url: /airline-reviews/british-airways/page/1/?sortby=post_date%3ADesc&pagesize=100 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002095A7CD550>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

The text was updated successfully, but these errors were encountered:

nivid26 · 2023-04-06T16:49:05Z

Hi,
Looks like your code is good, but there is problem with establishing connection between website and computer.
Check internet connection or any firewall setting

pushpitkamboj · 2023-04-15T18:24:49Z

hey is ur problem solved brother or need help still?

SudhanAnnamalai · 2023-08-08T19:19:01Z

It is probably with error from server side

check the status of the intended website in browser
Check whether the paging format of your URL is right
Check also whether your internet connection is stable ( sometimes the system will throw timeout error, because of unstable internet)
You can also check the code in Google colab to rule out firewall issues.

Hope this helps!

Chirag529 · 2023-08-13T04:55:24Z

Hello, You are getting TimeoutError caused by a connection attempt that didn't receive a response within a certain time period. For resolving you can:

Double check the URL you are trying to access.
Check your internet connection.
Check for any firewall or proxy server as they might block the requests.
You can use Timeout Handling and can catch the error you are getting.
Try to add User-Agent as some websites treats requests without a User-Agent header as suspicious and block them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python for Data Science #366

Python for Data Science #366

Diamond-Ruby commented Mar 27, 2023

nivid26 commented Apr 6, 2023

pushpitkamboj commented Apr 15, 2023

SudhanAnnamalai commented Aug 8, 2023

Chirag529 commented Aug 13, 2023 •

edited

Python for Data Science #366

Python for Data Science #366

Comments

Diamond-Ruby commented Mar 27, 2023

for i in range(1, pages + 1):

nivid26 commented Apr 6, 2023

pushpitkamboj commented Apr 15, 2023

SudhanAnnamalai commented Aug 8, 2023

Chirag529 commented Aug 13, 2023 • edited

Chirag529 commented Aug 13, 2023 •

edited