Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Python for Data Science #366

Open
Diamond-Ruby opened this issue Mar 27, 2023 · 4 comments
Open

Python for Data Science #366

Diamond-Ruby opened this issue Mar 27, 2023 · 4 comments

Comments

@Diamond-Ruby
Copy link

Hello there!
I'm trying to scrap data from the web for an analysis but the code is having error and I'm not able to fix, pls I will paste the code and the error below, can anyone help pls.

base_url = "https://www.airlinequality.com/airline-reviews/british-airways"
pages = 10
page_size = 100

reviews = []

for i in range(1, pages + 1):

for i in range(1, pages + 1):

print(f"Scraping page {i}")

# Create URL to collect links from paginated data
url = f"{base_url}/page/{i}/?sortby=post_date%3ADesc&pagesize={page_size}"

# Collect HTML data from this page
response = requests.get(url)

# Parse content
content = response.content
parsed_content = BeautifulSoup(content, 'html.parser')
for para in parsed_content.find_all("div", {"class": "text_content"}):
    reviews.append(para.get_text())

print(f"   ---> {len(reviews)} total reviews")

TimeoutError Traceback (most recent call last)
~\anaconda3\lib\site-packages\urllib3\connection.py in _new_conn(self)
173 try:
--> 174 conn = connection.create_connection(
175 (self._dns_host, self.port), self.timeout, **extra_kw

~\anaconda3\lib\site-packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options)
94 if err is not None:
---> 95 raise err
96

~\anaconda3\lib\site-packages\urllib3\util\connection.py in create_connection(address, timeout, source_address, socket_options)
84 sock.bind(source_address)
---> 85 sock.connect(sa)
86 return sock

TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

NewConnectionError Traceback (most recent call last)
~\anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
702 # Make the request on the httplib connection object.
--> 703 httplib_response = self._make_request(
704 conn,

~\anaconda3\lib\site-packages\urllib3\connectionpool.py in _make_request(self, conn, method, url, timeout, chunked, **httplib_request_kw)
385 try:
--> 386 self._validate_conn(conn)
387 except (SocketTimeout, BaseSSLError) as e:

~\anaconda3\lib\site-packages\urllib3\connectionpool.py in _validate_conn(self, conn)
1041 if not getattr(conn, "sock", None): # AppEngine might not have .sock
-> 1042 conn.connect()
1043

~\anaconda3\lib\site-packages\urllib3\connection.py in connect(self)
357 # Add certificate verification
--> 358 self.sock = conn = self._new_conn()
359 hostname = self.host

~\anaconda3\lib\site-packages\urllib3\connection.py in _new_conn(self)
185 except SocketError as e:
--> 186 raise NewConnectionError(
187 self, "Failed to establish a new connection: %s" % e

NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x000002095A7CD550>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

During handling of the above exception, another exception occurred:

MaxRetryError Traceback (most recent call last)
~\anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
488 if not chunked:
--> 489 resp = conn.urlopen(
490 method=request.method,

~\anaconda3\lib\site-packages\urllib3\connectionpool.py in urlopen(self, method, url, body, headers, retries, redirect, assert_same_host, timeout, pool_timeout, release_conn, chunked, body_pos, **response_kw)
786
--> 787 retries = retries.increment(
788 method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]

~\anaconda3\lib\site-packages\urllib3\util\retry.py in increment(self, method, url, response, error, _pool, _stacktrace)
591 if new_retry.is_exhausted():
--> 592 raise MaxRetryError(_pool, url, error or ResponseError(cause))
593

MaxRetryError: HTTPSConnectionPool(host='www.airlinequality.com', port=443): Max retries exceeded with url: /airline-reviews/british-airways/page/1/?sortby=post_date%3ADesc&pagesize=100 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002095A7CD550>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

During handling of the above exception, another exception occurred:

ConnectionError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_7652\3242930068.py in
14
15 # Collect HTML data from this page
---> 16 response = requests.get(url)
17
18 # Parse content

~\anaconda3\lib\site-packages\requests\api.py in get(url, params, **kwargs)
71 """
72
---> 73 return request("get", url, params=params, **kwargs)
74
75

~\anaconda3\lib\site-packages\requests\api.py in request(method, url, **kwargs)
57 # cases, and look like a memory leak in others.
58 with sessions.Session() as session:
---> 59 return session.request(method=method, url=url, **kwargs)
60
61

~\anaconda3\lib\site-packages\requests\sessions.py in request(self, method, url, params, data, headers, cookies, files, auth, timeout, allow_redirects, proxies, hooks, stream, verify, cert, json)
585 }
586 send_kwargs.update(settings)
--> 587 resp = self.send(prep, **send_kwargs)
588
589 return resp

~\anaconda3\lib\site-packages\requests\sessions.py in send(self, request, **kwargs)
699
700 # Send the request
--> 701 r = adapter.send(request, **kwargs)
702
703 # Total elapsed time of the request (approximately)

~\anaconda3\lib\site-packages\requests\adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
563 raise SSLError(e, request=request)
564
--> 565 raise ConnectionError(e, request=request)
566
567 except ClosedPoolError as e:

ConnectionError: HTTPSConnectionPool(host='www.airlinequality.com', port=443): Max retries exceeded with url: /airline-reviews/british-airways/page/1/?sortby=post_date%3ADesc&pagesize=100 (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x000002095A7CD550>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))

@nivid26
Copy link

nivid26 commented Apr 6, 2023

Hi,
Looks like your code is good, but there is problem with establishing connection between website and computer.
Check internet connection or any firewall setting

@pushpitkamboj
Copy link

hey is ur problem solved brother or need help still?

@SudhanAnnamalai
Copy link

It is probably with error from server side

  • check the status of the intended website in browser
  • Check whether the paging format of your URL is right
  • Check also whether your internet connection is stable ( sometimes the system will throw timeout error, because of unstable internet)
  • You can also check the code in Google colab to rule out firewall issues.

Hope this helps!

@Chirag529
Copy link

Chirag529 commented Aug 13, 2023

Hello, You are getting TimeoutError caused by a connection attempt that didn't receive a response within a certain time period. For resolving you can:

  1. Double check the URL you are trying to access.
  2. Check your internet connection.
  3. Check for any firewall or proxy server as they might block the requests.
  4. You can use Timeout Handling and can catch the error you are getting.
  5. Try to add User-Agent as some websites treats requests without a User-Agent header as suspicious and block them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants