Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow downloading of mbtiles per task #890

Open
susmina94 opened this issue Oct 10, 2023 · 16 comments · May be fixed by #1080
Open

Allow downloading of mbtiles per task #890

susmina94 opened this issue Oct 10, 2023 · 16 comments · May be fixed by #1080

Comments

@susmina94
Copy link
Collaborator

Problem:
The mbtiles are too large and takes up more memory space if I download the entire mbtile of the project area.

Solution:
I want to download only the mbtiles of assigned task
This will save my memory space and easy to load.

@robsavoye
Copy link
Collaborator

Considering multiple people are generating mbtiles, on the server side it should download all tiles in the project AOI, and then produce the smaller mbtiles for each task. I assume you are having a problem doing this on a local machine, and not the dev server ?

@spwoodcock
Copy link
Member

spwoodcock commented Oct 11, 2023

I have a PR in place to download different formats (mbtiles, pmtiles, sqlite).
Currently adding another PR to store tiles in S3 instead of locally.

Proposed Workflow

Online:

  • On project creation, all tiles are downloaded, packaged as PMTiles and stored in S3.
  • The PMTiles can be used as the basemap for online project viewing.

Offline:

  • User selects to download a subset of tiles to MBTile format, either via selecting a few task boundaries, or the entire project if they like.
  • Tiles in the bbox are extracted out of the PMTiles archive into MBTiles format for offline use by user.
    (either using new pmtiles extract CLI Add pmtiles extract command protomaps/go-pmtiles#31 or a re-implementation in Python).

@robsavoye
Copy link
Collaborator

There is an inbetween online & offline. I have plenty of memory and storage, so I'd be working online, but not using S3 at all.

@spwoodcock
Copy link
Member

spwoodcock commented Dec 26, 2023

Is this complete?

I didn't think so.

We need code to split an mbtiles or pmtiles for an entire project into individual task areas.

@manjitapandey

@spwoodcock spwoodcock reopened this Dec 27, 2023
@spwoodcock
Copy link
Member

@robsavoye suggested we can just run basemapper for the whole aoi, then again for each individual task area.

It should used the cached tiles on disk for the task areas, so will be very quick to complete.

This is a great idea & saves writing code for pmtile --> mbtile conversion.

@spwoodcock
Copy link
Member

Also, to do this, we need to generate all of the task area mbtiles in one go automatically (when a project is created?)

This is probably a good idea anyway, as long as it doesn't take up a large amount of S3 storage space.

@nrjadkry
Copy link
Collaborator

@spwoodcock
I will be working in it. Probably tomorrow or next week.
One question is, what source should we use for default when we generate mbtiles when creating a project.

@robsavoye
Copy link
Collaborator

By default, I use ESRI.

@spwoodcock
Copy link
Member

Excellent, thanks @nrjadkry

My thoughts are to:

  • Generate the entire AOI as PMTiles when a project is created.
  • Using the same tile cache, generate the MBTile files for each individual task.

Both can be done as a background task.

The advantage of the PMTiles is that it can be streamed when the user is online.

The MBTiles can be stored in the S3 and downloaded as needed before going offline.

Perhaps:
/orgid/projectid/basemap.pmtiles
/orgid/projectid/basemaps/taskid.mbtiles

@robsavoye
Copy link
Collaborator

We don't want to cache the map tiles under org or project, as we'll get duplication. For the generated mbtiles basemap, that does make send to store them under project id. Not sure if org is needed.

@spwoodcock
Copy link
Member

Yeah we don't want to store the cached tiles - they get discarded afterwards and we only keep the generated tile archives (we should probably add the step to delete the tile output folder from the tmp dir after we have all the tile archives generated 👍)

As for the org id, it's probably not necessary, true - we already have the folder structure in place, but could consider changing it in the future (I thought the org id made the bucket root dir a but neater, rather than 100's-1000's of project dirs in one).

@robsavoye
Copy link
Collaborator

We will want to store map tiles for some areas, since downloading takes a long time, and we often have different projects covering the same area. Maybe even entire countries for our priority areas.

@spwoodcock
Copy link
Member

spwoodcock commented Dec 29, 2023

True that would be nice, we could have a persistent volume to keep a tile cache.

But I imagine it would fill up the server storage quite quickly.

We would need to weigh up if it's worth the extra cost for the convenience (based on how frequently project areas intersect).

@robsavoye
Copy link
Collaborator

You'd be surprised, tiles are pretty small 256x256. I have high res sat imagery from 3 providers for all of Colorado, and parts of surrounding states, plus several countries, like Nepal, all on my laptop, which is about 314GB. That's probably more than we'd need to cache long-term, but there definitely areas of the world with multiple active mapping communities where downloading the tiles each time would be tedious. Or during disaster response, we'd cache the tiles for a few months till the initial mapping projects are done. And this would to be a mix of OAM, Maxar, ESRI, and Bing, all covering the same regions. The downloading time can be days (weeks), so caching tiles for areas with more frequent mapping going on is a good idea.

I find that I prefer zoom level 18 for ODK Collect. 19 is better, but much bigger files, long download times. I do cache level 19 though in many areas where I want the extra level of deatil.

@spwoodcock
Copy link
Member

spwoodcock commented Jan 1, 2024

Storage on AWS is approx $0.08/GB, so 300GB of data would cost $288 a year per instance.

Make the data global and for multiple providers, then we are talking $1000's spent on storing map tiles.

@susmina94
Copy link
Collaborator Author

@NSUWAL123 I think Sujan has already prepared an api for downloading tiles per task and as per the last discussion, we will be generating tiles using ESRI imagery by default.

But we do not have any UI on the frontend side for the users to download the tiles per task. I will suggest a design for the pop-up that includes the download icon and you can proceed ahead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment