Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable TLS connection #95

Open
fgi opened this issue Jun 18, 2020 · 8 comments
Open

Unstable TLS connection #95

fgi opened this issue Jun 18, 2020 · 8 comments

Comments

@fgi
Copy link

fgi commented Jun 18, 2020

Hello @boazsegev,

Here is an issue related to TLS connections, I suppose lost or reset state while in use.

System Information

  • OS: FreeBSD 12.1-RELEASE-p3 GENERIC amd64
  • Ruby: 2.7.1p83
  • Version: 0.7.40
  • OpenSSL: 1.1.1d-freebsd 10 Sep 2019

Description

I launch Iodine with following command in a FreeBSD service (run as root):
$GEMDIR/iodine -b X.X.X.X -p 443 -w 1 -tls-cert $CERTDIR/cert.pem -tls-key $CERTDIR/privkey.pem -public $DIR/public -v 2>$DIR/log/iodine.log -verbosity 5 &

Observed behaviors explained here could not be reproduced in HTTP, without TLS.

Behavior with static files

Static path is set at Iodine launch, Ruby should not be involved in static files delivery.

On some static file requests, the TLS connection fails to finish sending data and the process overloads CPU. The connection seems to stop working and is closed with a TLS cleanup after about 1min 37sec.

Example:
First try to load 2.32db9452.chunk.js file, all seems to be okay.

DEBUG (fio_tls_openssl.c:868): Attaching TLS read/write hook for 0x100c (server mode).
DEBUG (fio_tls_openssl.c:345): TLS ALPN set to: http/1.1 for 0x100c
DEBUG (fio_tls_openssl.c:767): Completed TLS handshake for 0x100c
 - - [Thu, 18 Jun 2020 13:16:57 GMT] "GET /webapp/static/js/2.32db9452.chunk.js HTTP/1.1" 200 200736b 0ms
 - - [Thu, 18 Jun 2020 13:16:57 GMT] "GET /favicon.ico HTTP/1.1" 200 15406b 249ms
DEBUG (fio_tls_openssl.c:647): TLS cleanup for 0x110a
DEBUG (fio.c:327): FD 17 re-initialized (state: 0x110b-closed).
DEBUG (fio_tls_openssl.c:647): TLS cleanup for 0xf0e
DEBUG (fio.c:327): FD 15 re-initialized (state: 0xf0f-closed).

A couple more reloads work as well.

 - - [Thu, 18 Jun 2020 13:17:13 GMT] "GET /webapp/static/js/2.32db9452.chunk.js HTTP/1.1" 200 200736b 443ms
 - - [Thu, 18 Jun 2020 13:17:14 GMT] "GET /favicon.ico HTTP/1.1" 200 15406b 80ms
 - - [Thu, 18 Jun 2020 13:17:17 GMT] "GET /webapp/static/js/2.32db9452.chunk.js HTTP/1.1" 200 200736b 237ms
 - - [Thu, 18 Jun 2020 13:17:17 GMT] "GET /favicon.ico HTTP/1.1" 200 15406b 215ms

One more time, 2.32db9452.chunk.js file delivery is stopped with about one half of data sent, the process overloads the CPU at 100% for about 25-30 seconds:

 - - [Thu, 18 Jun 2020 13:17:21 GMT] "GET /webapp/static/js/2.32db9452.chunk.js HTTP/1.1" 200 200736b 111ms
DEBUG (fio.c:327): FD 17 re-initialized (state: 0x110c-open).
DEBUG (fio_tls_openssl.c:868): Attaching TLS read/write hook for 0x110c (server mode).
DEBUG (fio_tls_openssl.c:345): TLS ALPN set to: http/1.1 for 0x110c
DEBUG (fio_tls_openssl.c:767): Completed TLS handshake for 0x110c

The log says that the file is delivered but it's incomplete.
The browser waits for the end of the file.

After 1 min and 37 sec, the connection seems to be reinitialized or closed:

 - - [Thu, 18 Jun 2020 13:17:22 GMT] "GET /favicon.ico HTTP/1.1" 200 15406b 0ms
DEBUG (fio_tls_openssl.c:647): TLS cleanup for 0x100c
DEBUG (fio.c:327): FD 16 re-initialized (state: 0x100d-closed).

The browser stops waiting, shows a HTTP code 200 but received file is not complete.

It seems to occur on files bigger than 100-150kb maybe. The one in the example is 200kb.
I could not reproduce it with smaller files, but I could with larger ones, like images.

Behavior with Websockets

I observed something similar with websockets, so I add it to this issue.
I open a websocket, exchange a small amount of data and suddenly or after about 15 minutes, the process starts running 100% of the CPU forever.
The previously opened websocket does not seem to respond.
It's hard to say what is going on in the websocket. I activated Ruby logs in all events but they show no error.
There is nothing in Iodine logs, the server is not crashed but is very slow until I restart it.

I hope this information is enough to analyse the issue.
Don't hesitate to ask if I can help.

Kindly,
Franck

@fgi fgi changed the title TLS connection unstable Unstable TLS connection Jun 18, 2020
@boazsegev
Copy link
Owner

boazsegev commented Jun 18, 2020

Thank you for opening this issue, @fgi.

I can't reproduce this on my MacOS. I tired downloading a 480Kb file multiple times and they were all successful (using curl).

If you upgrade to OpenSSL 1.1.1g, does the issue persist (that's the version I'm running against)?

Does this happen with a self-signed certificate? (iodine - tls)?

Sadly, I'm swamped and might no get to work on the issue for a while, but I'll do my best to speed things up as soon as I can reproduce the issue.

Kindly,
Bo.

P.S.

I tested using this Ruby script:

def download_file
   `curl --insecure https://localhost:3000/assets/logo/facil_logos_png.zip > tmp.zip`
end

while(true) 
  10.times { Thread.new() { download_file() }  }
  download_file()
  sleep(0.1)
end

I'm still looking. If I find something, I'll update this issue.

@fgi
Copy link
Author

fgi commented Jun 19, 2020

I upgraded OpenSSL to 1.1.1g and tried with a self-signed certificate.
Unfortunately, it's not better.

Kindly,
Franck

@boazsegev
Copy link
Owner

I'm still can't seem to debug the issue, as I can't recreate it (and I'm also bogged down with other work).

The whole of the implementation concern for OpenSSL is in the Read-Write hooks coded in fio_tls_openssl.c.

The requirements for these read-write hooks and the API is detailed in the fio.h file.

I think the best approach would be to replace the OpenSSL socket API based implementation with an OpenSSL BIO API implementation (buffered IO).

If anyone comes across this issue and can contribute, that would be great. Otherwise it might take longer than I like.

B.

@fgi
Copy link
Author

fgi commented Jun 24, 2020

I did more testing based on your Ruby script. I can't recreate the issue with a script using curl or wget.
I can only do it reloading manually a browser, Firefox of Chrome.
I'm still investigating, focusing on file loading in the browser.

@boazsegev
Copy link
Owner

Thanks for the update.

If this is a browser related issue, it might not have shown on my tests at all... does it occur with TLS disabled? Is it truly a TLS issue? It might be a pipelining issue 🤔

B.

@fgi
Copy link
Author

fgi commented Jun 25, 2020

It does not occur without TLS. Pipelining?

@boazsegev
Copy link
Owner

It does not occur without TLS

Good to know.

Pipelining?

It's possible to send a large number of requests without waiting for a response. This is called pipelining.

When facil.io / iodine detect a pipelining client (single connection), they will reschedule the client after a few requests to prevent a DOS attack (allowing other client to perform work).

Unlike a slow client attack, where the throttling is reported to the debug log, pipelining might be a valid client and isn't reported (I should probably change that).

I thought maybe an error occurred somewhere around that point.

That was assuming that when you quickly refresh the page a number of times, the browser's network layer might collect a larger number of requests into its buffer, pipelining all these requests (sending them through the same connection without waiting for a response).

@fgi
Copy link
Author

fgi commented Jun 25, 2020

Thank you for the explanation about pipelining.

Finally, I could recreate the issue with curl and reduced it to a minimalistic configuration.
On this server port, only Iodine runs to serve static files. There is no config.ru, no Ruby app.
Iodine launch command:

iodine -b 0.0.0.0 -p 4434 -w 1 -t 1 -tls -public ./public -v 2>./log/iodine.log -verbosity 5 &

You can run following curl command 4-5 times to observe interrupted transfers:

curl -kv 'https://213.32.77.185:4434/chunk.js'

I added a symbolic link to Iodine log file in public directory. You should be able to see it with:

curl -kv 'https://213.32.77.185:4434/iodine.log'

Strangely, when I launch this curl request in a terminal directly on the server, the issue does not occur.
The request seems to have to be remote.
I tried to disable the firewall, it changed nothing.

I hope it helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants