Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OH4.1M5: Very very slow first startup and initial exception (HTTP 404) #1616

Open
lolodomo opened this issue Dec 17, 2023 · 37 comments · Fixed by openhab/openhab-core#3931
Open

Comments

@lolodomo
Copy link
Contributor

lolodomo commented Dec 17, 2023

It takes a long time to get a first log and it started by a n exception !

19:39:24.466 [ERROR] [.internal.JSONResponseExceptionMapper] - Unexpected exception occurred while processing REST request.
javax.ws.rs.NotFoundException: HTTP 404 Not Found
        at org.apache.cxf.jaxrs.utils.SpecExceptions.toNotFoundException(SpecExceptions.java:89) ~[bundleFile:3.6.2]
        at org.apache.cxf.jaxrs.utils.ExceptionUtils.toNotFoundException(ExceptionUtils.java:128) ~[bundleFile:3.6.2]
        at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.processRequest(JAXRSInInterceptor.java:174) ~[bundleFile:3.6.2]
        at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.handleMessage(JAXRSInInterceptor.java:79) ~[bundleFile:3.6.2]
        at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.servlet.ServletController.invokeDestination(ServletController.java:234) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:208) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.servlet.ServletController.invoke(ServletController.java:160) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.servlet.CXFNonSpringServlet.invoke(CXFNonSpringServlet.java:225) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.servlet.AbstractHTTPServlet.handleRequest(AbstractHTTPServlet.java:304) ~[bundleFile:3.6.2]
        at org.apache.cxf.transport.servlet.AbstractHTTPServlet.doPost(AbstractHTTPServlet.java:217) ~[bundleFile:3.6.2]
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:517) ~[bundleFile:4.0.4]
        at org.apache.cxf.transport.servlet.AbstractHTTPServlet.service(AbstractHTTPServlet.java:279) ~[bundleFile:3.6.2]
        at org.ops4j.pax.web.service.spi.servlet.OsgiInitializedServlet.service(OsgiInitializedServlet.java:102) ~[bundleFile:?]
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) ~[bundleFile:9.4.52.v20230823]
        at org.ops4j.pax.web.service.spi.servlet.OsgiFilterChain.doFilter(OsgiFilterChain.java:100) ~[bundleFile:?]
        at org.ops4j.pax.web.service.jetty.internal.PaxWebServletHandler.doHandle(PaxWebServletHandler.java:320) ~[bundleFile:?]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:600) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) ~[bundleFile:9.4.52.v20230823]
        at org.ops4j.pax.web.service.jetty.internal.PrioritizedHandlerCollection.handle(PrioritizedHandlerCollection.java:96) ~[bundleFile:?]
        at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:722) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.Server.handle(Server.java:516) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) ~[bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) [bundleFile:9.4.52.v20230823]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) [bundleFile:9.4.52.v20230823]
        at java.lang.Thread.run(Unknown Source) [?:?]
19:39:25.362 [INFO ] [org.openhab.core.Activator           ] - Starting openHAB 4.1.0.M5 (build Milestone Build)

It was never as long before.
I am curious to know what is this HTTP request returning 404.
But finally the server started properly. After the initial exception, the time to start is similar as before.

@lolodomo
Copy link
Contributor Author

@openhab/core-maintainers for your information. I hope this is not the new IP scan ?

@lolodomo lolodomo changed the title OH4.1M5: Very very slow startup and initial exception OH4.1M5: Very very slow startup and initial exception (HTTP 404) Dec 17, 2023
@lolodomo
Copy link
Contributor Author

I do not get the same exception after a restart. So it happens apparently only at the very first start.
But I have the strange feeling that the server is blocked at a certain time of the startup and then unblocked after a delay.
I could compare the time to start between M3 and M5.

@lolodomo
Copy link
Contributor Author

lolodomo commented Dec 17, 2023

But I have the strange feeling that the server is blocked at a certain time of the startup and then unblocked after a delay.
I could compare the time to start between M3 and M5.

Time of startup is finally similar between M3 and M5, around 4 minutes for me.

So the problem I am reporting is only the time to start at the very first startup and the associated exception that is logged.

@openhab-bot
Copy link
Collaborator

This issue has been mentioned on openHAB Community. There might be relevant details there:

https://community.openhab.org/t/openhab-4-1-milestone-discussion/149502/146

@florian-h05
Copy link
Contributor

Even if that exception is thrown because a client requests a resource not available yet, I would say, that exception should be suppressed.

@kaikreuzer
Copy link
Member

I agree, @florian-h05. It definitely looks as if there was some HTTP request coming in while the system was starting up and couldn't yet serve the resources. I have created openhab/openhab-core#3931 to not log errors in such a case anymore.

@lolodomo Wrt start times, I am not sure how to best investigate this. So it only happens at the very first start? Can you somehow reproduce it on a test system? Was it maybe slow to install the remote add-ons because our artifactory might have had network issues?

@lolodomo
Copy link
Contributor Author

Was it maybe slow to install the remote add-ons because our artifactory might have had network issues?

I am using the kar file and remote is disabled. Hope that kar file is still used.

@lolodomo
Copy link
Contributor Author

It definitely looks as if there was some HTTP request coming in while the system was starting up

Strange but maybe you're right.
I will see if the problem happens again with RC1.

@florian-h05
Copy link
Contributor

I have upgrade to the snapshot yesterday evening and also experienced an extremely slow start-up.
It took about 4 minutes until any log output was generated, and then I received plenty of the error messages above.
Normally, a few of those were „normal“ because I have a few devices in the network regularly polling the REST API.
I guess the high number of those messages has something to do with the extremely slow startup and is only a symptom, not the root cause.

@kaikreuzer
Copy link
Member

I guess the high number of those messages has something to do with the extremely slow startup and is only a symptom, not the root cause.

Yes, I would guess so as well. With today's snapshot, you shouldn't see any such log messages anymore as openhab/openhab-core#3931 is now contained.

@florian-h05
Copy link
Contributor

Hmm, with the latest snapshot I am getting this error:

2023-12-20 11:16:32.751 [ERROR] [internal.JSONResponseExceptionMapper] - Unexpected exception occurred while processing REST request.
javax.ws.rs.ClientErrorException: HTTP 404 Not Found
	at org.apache.cxf.jaxrs.utils.SpecExceptions.toHttpException(SpecExceptions.java:118) ~[bundleFile:3.6.2]
	at org.apache.cxf.jaxrs.utils.ExceptionUtils.toHttpException(ExceptionUtils.java:168) ~[bundleFile:3.6.2]
	at org.apache.cxf.jaxrs.utils.JAXRSUtils.findTargetMethod(JAXRSUtils.java:673) ~[bundleFile:3.6.2]
	at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.processRequest(JAXRSInInterceptor.java:182) ~[bundleFile:3.6.2]
	at org.apache.cxf.jaxrs.interceptor.JAXRSInInterceptor.handleMessage(JAXRSInInterceptor.java:79) ~[bundleFile:3.6.2]
	at org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307) ~[bundleFile:3.6.2]

@kaikreuzer
Copy link
Member

Looks as if we have to catch yet another exception type...

@J-N-K
Copy link
Member

J-N-K commented Dec 20, 2023

I wonder why we use WebClientException in ItemResource l. 488 instead of returning a 404 response (like in other places).

@kaikreuzer
Copy link
Member

I don't see any good reason for it, @J-N-K.
I have changed it with openhab/openhab-core#3940.

@lolodomo
Copy link
Contributor Author

I have upgrade to the snapshot yesterday evening and also experienced an extremely slow start-up.
It took about 4 minutes until any log output was generated,

It was the same for me with M5, no log before a very long time.
If there is a new snapshot this evening, I will try again and compute how much time to get the very first log. And I will compare first start and next starts.

@lolodomo lolodomo changed the title OH4.1M5: Very very slow startup and initial exception (HTTP 404) OH4.1M5: Very very slow first startup and initial exception (HTTP 404) Dec 20, 2023
@florian-h05
Copy link
Contributor

I have tried out several recent snapshots, including the latest one 3792, and with none I have experienced a slow (first) startup again.
I have always cleaned the cache before starting a the newly installed snapshot.

@kaikreuzer
Copy link
Member

@lolodomo & @florian-h05 We have a new snapshot distro (3794) available now.

@florian-h05
Copy link
Contributor

When upgrading to that snapshot, startup took the same time as usual, great!

@lolodomo
Copy link
Contributor Author

Feedback in few minutes with snapshot 3794...

@lolodomo
Copy link
Contributor Author

lolodomo commented Dec 20, 2023

First startup: 3min23 to get the log entry "Starting openHAB 4.1.0 (build Build #3794)". 5m18 to have another log entry (LSP service). Around 8min20 to have everything loaded and started !!!

If I stop and restart the server: 45s to get the same first log entry and around 3min55 to start everything. This looks bigger than before I believe.
There is 40s while nothing happens in logs, just after starting the openHAB Cloud service or maybe this is the load of my rule file ? Note something interesting, some requests form the Netatmo binding were then interrupted.

18:47:38.912 [INFO ] [b.core.model.lsp.internal.ModelServer] - Started Language Server Protocol (LSP) service on port 5007
18:47:40.329 [INFO ] [del.core.internal.ModelRepositoryImpl] - Loading model 'maison.rules'
18:47:50.587 [INFO ] [.io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (xxx)
18:48:29.556 [WARN ] [tty.util.ssl.SslContextFactory.config] - Trusting all certificates configured for Client@f1a324[provider=null,keyStore=null,trustStore=null]
18:48:29.561 [WARN ] [tty.util.ssl.SslContextFactory.config] - No Client EndPointIdentificationAlgorithm configured for Client@f1a324[provider=null,keyStore=null,trustStore=null]
18:48:50.446 [WARN ] [nal.handler.capability.HomeCapability] - Error getting Home informations: Request interrupted
18:48:50.438 [WARN ] [.handler.capability.WeatherCapability] - Error retrieving weather data 'xxx' : Request interrupted
18:48:50.539 [WARN ] [handler.capability.SecurityCapability] - Error retrieving last events for home 'xxx' : Request interrupted

I will compare normal start with 4.1 M3 later this evening.

@lolodomo
Copy link
Contributor Author

4.1 M3

  • 29s to get the banner
  • around 3min58s to load everything
  • around 45s to stop the server

Comparing a (re)start of 4.1 M3 and snapshot 3794 (meaning not the initial startup) is relatively similar in duration.

Stop of the server is 45s with 4.1 M3 while 56s with snapshot 3794.

Maybe the real problem is the 8 minutes for the very first startup of snapshot 3794. Remember that I use a kar file for addons so there is even nothing to download.

I will now compare every 4.1 milestones on my PC with a fully empty setup.

@J-N-K
Copy link
Member

J-N-K commented Dec 20, 2023

Could it be related to the installation of the discovery addons.xml? Since it is a boot feature, it is installed early. The file itself is contained in the .kar and requires the .kar to be processed before it can be installed.

@lolodomo
Copy link
Contributor Author

On a Windows PC for the very first startup with a fully empty config and no kar file until "Rule engine started." is displayed:

  • 4.1 M1: around 36s
  • 4.1 M5: around 31s
  • snapshot 3794: around 35s
    So very similar.

@J-N-K
Copy link
Member

J-N-K commented Dec 20, 2023

Yes, but that is without .kar. Then the file get's installed from the remote location where the source feature is immediately available.

@kaikreuzer
Copy link
Member

Since it is a boot feature, it is installed early. The file itself is contained in the .kar

What? We require an external feature for the whole core to start up? That's an issue, I would say - so far, the core is meant to be fully contained in the distro and not dependent on any externally sourced feature.
Is there a reason that this file has to be in a feature or could it simply be packaged into the distro?

@lolodomo
Copy link
Contributor Author

Yes, but that is without .kar. Then the file get's installed from the remote location where the source feature is immediately available.

If I try again on PC with snapshot 3794 but this time with the kar file in addons, still with an empty config, it takes around the same time (38s).

@lolodomo
Copy link
Contributor Author

I can try again on my RPI the very first startup without the kar file.

@lolodomo
Copy link
Contributor Author

lolodomo commented Dec 20, 2023

Very first startup on my RPI without the kar file and with setting remote set to true: yes, it is faster, but still around 7 minutes.

@lolodomo
Copy link
Contributor Author

For the second start, I get approximatively the same start time if I have initially the kar file or not. This is logic as I guess the kar file is not used at the second start.

@lolodomo
Copy link
Contributor Author

My SD card is now old, I don't know if it could explain such durations.

@J-N-K
Copy link
Member

J-N-K commented Dec 20, 2023

@kaikreuzer I don't know. The problem is that the file needs to be updated when the remote feature changes especially important for snapshots). This is done automatically when the feature is refreshed. The question is if there is a better place to put it. Unfortunately the feature never installed for me if I don't add it as a boot feature.

@lolodomo
Copy link
Contributor Author

Disabling the finders changes apparently nothing to the restart time.

@kaikreuzer
Copy link
Member

@J-N-K I was thinking of something like #1620.

The problem is that the file needs to be updated

I would actually prefer to package it in runtime/etc, which is then clearly a file that the user should not tamper with.
Is there any need to allow users to change the file or to put other files next to it?

@J-N-K
Copy link
Member

J-N-K commented Dec 20, 2023

No, I don't think the user needs to adjust something there. We just need to make sure that it is always the newest version (i.e. replaced on update).

@kaikreuzer
Copy link
Member

Then I would indeed suggest to move it to the runtime folder. I'm just preparing a PR for core.

@lolodomo
Copy link
Contributor Author

With snapshot 3796: 3min55s to get the first log and around 9min to have everything loaded at the very first startup.

I will buy and try with a new SD card next week.

@kaikreuzer
Copy link
Member

Thanks for testing so early. Pity that the change didn't do any noticeable difference. 😢
I guess we then leave it at this and do the RC1, hoping that it isn't a more general issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants