Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GBFS Forecast Extension #612

Open
1 of 3 tasks
tobsesHub opened this issue Mar 15, 2024 · 10 comments
Open
1 of 3 tasks

GBFS Forecast Extension #612

tobsesHub opened this issue Mar 15, 2024 · 10 comments

Comments

@tobsesHub
Copy link
Contributor

tobsesHub commented Mar 15, 2024

If you are new to the specification, please introduce yourself (name and organization/link to GBFS). It’s helpful to know who we're chatting with!

Hi, I'm Tobias Walter and I'm a software developer at Raumobil GmbH.
We develop MaaS software and offer, for example, information on renting bicycles and scooters on a map. We also offer routing for rental bikes.
And this is where I come to the context of the topic:
We at Raumobil GmbH are currently working on a research project in which we offer the possibility of calculating a routing with rental bikes in the future.
To do this, we use forecast data that shows how likely it is that a free bike will be available in a certain area at a certain time. While we are working on this project, we are considering the possibility of extending the GBFS standard.

What is the issue and why is it an issue?

There is currently no way of representing forecast data for the availability of vehicles or stations in the future.
It could be interesting to extend the GBFS standard for this use case.

We discussed already a little bit on the slack channel: https://mobilitydata-io.slack.com/archives/CNXA9ASBV/p1705512299616699
We can continue the discussion here to get a proposal for such an extension.

Please describe some potential solutions you have considered (even if they aren’t related to GBFS).

This would probably be a new optional file that could contain the following, for example

  • a station_id
  • a geometry representing an area
  • a probability for an available vehicle in an area or at a station
  • a time span

There are a few other factors that could probably also be part of such a norm:

  • the day of the week
  • the month (winter time, summer time)
  • the weather.

Is your potential solution a breaking change?

  • Yes
  • No
  • Unsure
@tobsesHub
Copy link
Contributor Author

Hi everybody, I have created a first proposal now. Please feel free to give me feedback.

specification

forecasts.json

This file would describe multiple forecasts. Every forecast has a probability in percent under certain conditions. That can be a geographic area (as a polygon or geohash), a period between two dates, a day of the week, a month and the weather. All these conditions are optional, as the calculation is likely to vary greatly depending on the application.

All these conditions make the format a little complicated. It might be better to reduce them for the first step.

Field Name REQUIRED Type Defines
forecasts Yes Array Contains one object per forecast.
forecasts[].id Yes ID Identifier of the forecast
forecasts[].avaibility_probability Yes Float The probability in percent, that there will be at least one vehicle availableThe further fields define, on which condition the probability depends.
forecasts[].station_id No ID Identifier of a station.If this value is set, the probability value represents the probability of the availability of at least one vehicle at the station
forecasts[].area_polygon No GeoJSON MultiPolygon A GeoJSON MultiPolygon that describes the area in which a vehicle could be.If this value is set, the probability value represents the probability of the availability of at least one vehicle in this area.
forecasts[].area_geohash No Geohash Stringhttps://en.wikipedia.org/wiki/Geohash A geohash that describes the area in which a vehicle could be.If this value is set, the probability value represents the probability of the availability of at least one vehicle in this area.
forecasts[].period No Array If the probability relates to a period, period can be used.
forecasts[].period[].start_date No Conditionally REQUIRED start date of the period.If it is null, the forecast includes dates until the end_dateif there is a period element it needs at least start_date or end_date
forecasts[].period[].end_date No Conditionally REQUIRED end date of the period.If it is null, the forecasts includes all dates since the start_dateif there is a period element it needs at least start_date or end_date
forecasts[].days_of_the_week No Array If the probability relates to a day of the week, this values can be used.valid values are:mondaytuesdaywednesdaythursdayfridaysaturdaysundaypublic_holiday
forecasts[].months No Array If the probability relates to a month of the week, this values can be used.valid values are:januaryfebruarymarchaprilmayjunejulyaugustseptemberoctobernovemberDecember
forecasts[].weather No Object If the probability relates to the weather this value can be used.
forecasts[].weather.min_temperature No integer min temperature in degree
forecasts[].weather.max_temperature No integer min temperature in degree
forecasts[].weather.min_precipitation No Non-negative integer min precipitation in mm
forecasts[].weather.max_precipitation No Non-negative integer min precipitation in mm
forecasts[].weather.minWindSpeed No Non-negative integer min winds speed in km/h
forecasts[].weather.maxWindSpeed No Non-negative integer max wind speed in km/h

Alternatively, the weather object could be represented less specifically and simpler as an enum e.g. by the values sunny, cloudy, rainy

example

{
  "last_updated": "2024-04-26T13:34:13+02:00",
  "ttl": 60,
  "version": "3.0",
  "data": {
    "forecasts": [
      {
        "station_id": "b18e1952-16da-47d7-b0fd-a361cc6a8a94",
        "area_polygon": {
          "type": "MultiPolygon",
          "coordinates": [
            [
              [
                [
                  -122.655775,
                  45.516445
                ],
                [
                  -122.655705,
                  45.516445
                ],
                [
                  -122.655705,
                  45.516495
                ],
                [
                  -122.655775,
                  45.516495
                ],
                [
                  -122.655775,
                  45.516445
                ]
              ]
            ]
          ]
        },
        "area_geohash": "u0wn",
        "period": {
          "start_date": "2024-04-26T13:34:13+02:00",
          "end_date": "2024-04-26T18:34:13+02:00"
        },
        "days_of_the_week": [
          "monday"
        ],
        "months": [
          "may",
          "june"
        ],
        "weather": {
          "min_temperature": 20,
          "max_temperature": 30,
          "min_precipitation": 2,
          "max_precipitation": 5,
          "minWindSpeed": 5,
          "maxWindSpeed": 8
        },
        "weather_alternative": "sunny",
        "probability": 0.6116679853387422
      }
    ]
  }
}

final words

So this is just my first proposal. I guess there are a lot of ideas, to improve this proposal. So please give me feedback or add proposals.

@sven4all
Copy link
Contributor

Hi @tobsesHub,

It's an interesting idea, but I doubt if GBFS is the right place to put this kind of data in.

In my opinion the goal of GBFS is to make it possible to communicate raw data of shared mobilityoperators to parties that aggregate and integrate data.

It would make more sense to add this kind of inteligence at the side of the integrator instead of at the operator's side. That make's it easier to have consistent predictions behaviour across all operators.

@tobsesHub
Copy link
Contributor Author

@sven4all Thanks, that's a good point.
Do you know any kind of standard where something like this would fit better.

@sven4all
Copy link
Contributor

I am not aware of that at this moment. I think the best approach is to just get started. During the experimental fase of developing this you will be the only integrator that is doing this. If there is in the future a need to combine predictions of different integrators with each other it's the moment to develop an new standard (or you are defacto the standard because you was the first one).

@matt-wirtz
Copy link

To widen the discussion: When it comes to operations and predicting where shared vehicles without time slot reservation are available in the future I think the operator himself has the most information and best knowledge to make predictions. He is the only one you knows what his operations team is planning. They may plan to redistribute existing vehicles, put in new ones etc. And he has (should have) the most data regarding historical demand.
So I think that each operator knows his service and operations the best and therefore would be a very good source for the predictions mentioned by @tobsesHub.

@tdelmas
Copy link
Contributor

tdelmas commented May 6, 2024

  • Operators may have unique insights so in some case only them can publish a realistic forecast 👍
  • Weather may not have its place here (at least for a first version)
  • Why introducing a new concept, geohash, when it could simply refer to a (new) kind of area (like "forecast" area), or an inline polygon as proposed with area_polygon.

@tobsesHub
Copy link
Contributor Author

Okay, good points. I'm not sure at the moment.
There are probably several sources that could provide such forecast data.
The operator, because he has the data, but maybe also other service providers who want to offer such a service and specialize in forecasts.

@tdelmas
Yes, I also think the weather might be too much for the first version, although it has a big impact on the forecast.
The idea of geohahsh was that it can be better cached on the client side. If you have a lot of areas for the prediction data, using a geojson would also be a large amount of data. With a geohash it would be much less data.
But I think for the first version we could offer just the area_polygon.

@testower
Copy link
Contributor

testower commented May 27, 2024

As a consumer, how do you suggest the data is used? I think this could be simplified greatly, given a certain time and location, will there or will there not be a vehicle available? Everything else is noise and will lead to different consumers showing different truths to users. Maybe probability is ok, assuming this probability is shown to the user. However, journey planner engines don't work with probabilities.

@futuretap
Copy link
Contributor

futuretap commented May 27, 2024

I'm a bit sceptic whether a purely probabilistic approach is a) useful for consumers and b) feasible for providers.

@matt-wirtz
Copy link

I'm very optimistic that a forecast of availability is useful for consumers. I think there was a discussion on slack where someone shared this screenshot from velib where they do forecasts for the next 30 and 60 minutes.
It's quite common for rental systems that you have cyclical vehicle movement patterns which share common directions. So there might be a rail station where lot of bikes are rented in the morning by the commuters and brought back in the evening when commuting back. Especially around noon the probability to find a bike close by might be very low. So when traveling in this area I assume that consumers would very much appreciate the info in their journey planning app that at the time they will arrive at this rail station there is only a low probability that a rental bike is available. Even if at the time point of planning - maybe the day before in the evening - a lot of rental bikes are available. So consumers can plan to walk and reserve the necessary time for it.
screenshot_velib

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants