Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: Unable to connect to Kafka in Pipeline due to attempt to connect already-connected SSLSocket!, raising exit flag. #269

Open
BeamoINT opened this issue Sep 20, 2023 · 1 comment

Comments

@BeamoINT
Copy link

I ran the Scrapy Cluster spider start code and I ended up getting this error message, I have no idea what this could be and have troubleshooted for a while. I was also wondering a few other things which I have below this error message. Thank you!

root@crawler:~/scrapy-cluster/crawler# scrapy runspider crawling/spiders/link_spider.py
2023-09-20 18:02:08,347 [sc-crawler] ERROR: Unable to connect to Kafka in Pipeline due to attempt to connect already-connected SSLSocket!, raising exit flag.
Unhandled error in Deferred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/scrapy/crawler.py", line 245, in crawl
    return self._crawl(crawler, *args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/scrapy/crawler.py", line 249, in _crawl
    d = crawler.crawl(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1905, in unwindGenerator
    return _cancellableInlineCallbacks(gen)
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1815, in _cancellableInlineCallbacks
    _inlineCallbacks(None, gen, status)
--- <exception caught here> ---
  File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 1660, in _inlineCallbacks
    result = current_context.run(gen.send, result)
  File "/usr/local/lib/python3.10/dist-packages/scrapy/crawler.py", line 134, in crawl
    self.engine = self._create_engine()
  File "/usr/local/lib/python3.10/dist-packages/scrapy/crawler.py", line 148, in _create_engine
    return ExecutionEngine(self, lambda _: self.stop())
  File "/usr/local/lib/python3.10/dist-packages/scrapy/core/engine.py", line 99, in __init__
    self.scraper = Scraper(crawler)
  File "/usr/local/lib/python3.10/dist-packages/scrapy/core/scraper.py", line 109, in __init__
    self.itemproc: ItemPipelineManager = itemproc_cls.from_crawler(crawler)
  File "/usr/local/lib/python3.10/dist-packages/scrapy/middleware.py", line 67, in from_crawler
    return cls.from_settings(crawler.settings, crawler)
  File "/usr/local/lib/python3.10/dist-packages/scrapy/middleware.py", line 44, in from_settings
    mw = create_instance(mwcls, settings, crawler)
  File "/usr/local/lib/python3.10/dist-packages/scrapy/utils/misc.py", line 188, in create_instance
    instance = objcls.from_crawler(crawler, *args, **kwargs)
  File "/root/scrapy-cluster/crawler/crawling/pipelines.py", line 134, in from_crawler
    return cls.from_settings(crawler.settings)
  File "/root/scrapy-cluster/crawler/crawling/pipelines.py", line 124, in from_settings
    sys.exit(1)
builtins.SystemExit: 1

Here are the other things I was wondering about Scrapy Cluster:

Does this command automatically start the crawler without anything having to be fed into it?

scrapy runspider crawling/spiders/link_spider.py

If so, is there a starting URL in the settings and does it branch off from there to crawl multiple URLs from the seed URL? If you do have to feed a URL into it to start it, does it then automatically start crawling other URLs from there? Sorry for the so many questions, thank you for your help!

@madisonb
Copy link
Collaborator

This project does not support Python 3.10 yet. If you are referencing a proposed PR, please leave comments on #267

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants