Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase Upload Speed #19

Open
DavidBerdik opened this issue Jan 3, 2019 · 145 comments
Open

Increase Upload Speed #19

DavidBerdik opened this issue Jan 3, 2019 · 145 comments

Comments

@DavidBerdik
Copy link

Would it be possible to increase the upload and download speeds that can be obtained when using this project? I have noticed that download speeds are better than upload speeds, but are still rather slow.

@stewartmcgown
Copy link
Owner

stewartmcgown commented Jan 4, 2019 via email

@DavidBerdik
Copy link
Author

The only suggestion that I have which has not been implemented as far as I can tell is batching requests to the Drive API. Perhaps a certain number of chunks (100?) could be encoded and then sent to Drive for processing on a separate thread while encoding of chunks continues on the main thread. Time-permitting, I will play with this idea. I'm not sure if time will permit though.

@78Alpha
Copy link
Contributor

78Alpha commented Feb 13, 2019

Not sure how great an idea it would be but...

Google has mentioned you can convert text documents to google drive format. Now sure if that would set it as "0 space used doc", but it would allow for files of up to 50 MB to be uploaded and converted.

With the right threading, or processing, have one set to encoding, one set to uploading, and one to conversion. However they would have to be synced up neatly, as I found that calling the API in multiple instances terminated upload.

I attempted to multiprocess the upload but when more than one "user" accesses anything it cuts connection, so it was playing duck duck blackout with itself until stopped. Since every drive has a minimum of 15 GB, could be set to upload up to 7.5 GB then set to convert.

Uploading a solid file would at least be faster, but again, not sure if it converts neatly.

@DavidBerdik
Copy link
Author

@78Alpha From what I can tell from my admittedly brief research, converting an uploaded file to a Google Doc does produce a "0 space used doc."

@stewartmcgown
Copy link
Owner

I have been unable to convert even 8MB text files to Google Docs format. Have you had any verifiable experience with this?

@DavidBerdik
Copy link
Author

DavidBerdik commented Feb 14, 2019

I have experience doing it with Word Documents years ago but never txt files. From what I've read though it is supposed to be possible.

I suspect you all have seen this already, but I will post it anyway: https://support.google.com/drive/answer/37603?hl=en

@stewartmcgown
Copy link
Owner

No of course you can convert documents, that is exactly what my program does. But there is a technical limitation which says that Docs can have only 10 million characters. Trying to convert large text files to Google Docs format fails every time.

I'm still open to other speed improvement suggestions, but I don't think this is the way forward.

@DavidBerdik
Copy link
Author

Unless I am misinterpreting this, conversions do not have that limit: https://support.google.com/drive/answer/37603?hl=en

@stewartmcgown
Copy link
Owner

I imagine that is to allow for word docs with images in them.

You can test the behaviour I'm talking about by creating a fake base64 txt file:

base64 /dev/urandom | head -c 40000000 > file.txt

and attempting to upload and convert it in your Google Drive. The error is the console is 'deadlineExceeded', which I assume means there is an internal time limit on how long a conversion can take on Google's servers.

@DavidBerdik
Copy link
Author

DavidBerdik commented Feb 14, 2019

Yeah I see what you mean. I can't even get conversion to work properly through the web interface.

I have not had a chance to test converting documents that have images in them, but assuming that it works, it may be worth looking in to modifying the project to do the following.

  1. Generate "images" that contain slightly less than 50MB worth of data.
  2. Add those "images" to a Word document.
  3. Upload the Word document to Drive.
  4. Convert to the native Docs format.
  5. Delete the original.

What are you thoughts on this?

@78Alpha
Copy link
Contributor

78Alpha commented Feb 15, 2019

With the images way, wouldn't you be able to input random garbled data into a png wrapped file and just upload it to google photos? A multigigabyte photo may be odd but it could work.

@78Alpha
Copy link
Contributor

78Alpha commented Feb 16, 2019

After attempting a few things I found what works and what might not work. I made a file with text and turned it into a PNG file, not just changing the extension but hex editing it to have the header... This did not work well... it requires some other header manipulation, changing data chunks and adding to crc to each chunk... manually writing it did not give good results...

However, the method I did find is a long one but did work. I converted a txt file to PNG, by that I mean I made a picture that showed the words "I am Text!". Editing the file doesn't have those words in any way. Getting it back into text used OCR... so... that method works but you have to account for the OCR being able to read small characters and of course turning it from text to whatever file it is... I guess this is covered by base64? Turns even the headers into plain text, so it should be as easy as adding it to a file with the right extension afterwards. I'll have to test this out more, I haven't found a way to automate it as I don't know how the sites I'm using do what they do, I suspect AI and that is a large scope...

@DavidBerdik
Copy link
Author

DavidBerdik commented Feb 16, 2019

I am not so sure that trusting OCR with this is a good idea. One thing that might be worth looking in to regarding images though would be trying to pack data in a JPG.

https://twitter.com/David3141593/status/1057042085029822464?s=19

https://www.google.com/amp/s/www.theverge.com/platform/amp/2018/11/1/18051514/twitter-image-steganography-shakespeare-unzip-me

Update: Apparently the source code is available now. I might play with it this weekend if time permits. - https://twitter.com/David3141593/status/1057609354403287040?s=19

@78Alpha
Copy link
Contributor

78Alpha commented Feb 16, 2019

There's also the method of making several google accounts and uploading a part of a zip to each. The limit would be 5, as uds seems to only ever 1/5 total bandwidth on any network. Having each file with its own account wouldn't drop the connection like multiprocess uploading did.

I've made my own script that auto updates the ID whenever a file is deleted or the like, but I have ti add in the details manually unless I want to make a self evolving script. Instead of just saying "pull id" for 1 drive, it goes "pull my_picture" and pulls from each drive, or deletes from each or pushes and loads the ID into a shared json...

However, seeing as how David got a really nice setup on jpg zips it seems promising. I will test to see if it works on drive but drive is very picky on "altered images". Best of luck, great concept, and awesome execution.

Edit:

After testing it out I managed to upload one of those "zips in jpg" files to the unlimited storage. However it is limited to about 64 kilobytes per jpg...

@DavidBerdik
Copy link
Author

You were testing using Google Photos, right? Did you try putting altered images in a Word document, uploading to drive, and then converting? Perhaps that is doable?

Also, this is off topic, but I want to point out that I am not the same David who wrote the "zips in jpg" thing. I wish I was though. 😊

@78Alpha
Copy link
Contributor

78Alpha commented Feb 17, 2019

My apologies for the misunderstanding. I'll be trying that next. Currently fiddling with Steghide to store things, but it needs a jpg large enough to hide the data and good lord, i'm trying to make an 80,000 x 80,000 jpg on a little laptop... 4K images only offer 1.6 MB space.

I'll edit this once I tested the word document.

@DavidBerdik
Copy link
Author

DavidBerdik commented Feb 18, 2019

Good luck to you! Unfortunately I have not really had any time to play with any of this. I've only been able to theorize over what might be possible. School work has kept me busy even over the weekend.

Another way to handle this could be to create a 50MB (probably slightly less) bitmap file and use that for storing data. If you want to hide data in a bitmap while retaining the image, you can use least significant bit steganography, but since there is really no incentive to retain the appearance of the unaltered image, there's really no reason why we can just overwrite the entire image with our bits and put the garbled-looking image in a document. Using MS Paint, I was able to generate a 256 Color Bitmap of 48.7MB by setting the dimensions of it to 7150px by 7150px. The question here is does Google do anything to bitmaps in Word Documents that are converted to the native Docs format?

In regards to generating Word documents with Python, here is the answer to that: https://python-docx.readthedocs.io/en/latest/

And no worries. I just want to make it clear that I am not trying to claim someone else's work as my own. I know what it feels like when someone does that and do not want to perpetuate it. 😊

Update: Here is the bitmap I created. Apparently GitHub does not take kindly to uploading 50MB bitmaps so I had to zip it.
Demo Image.zip

@78Alpha
Copy link
Contributor

78Alpha commented Feb 18, 2019

I tested it out and it did not go well... Putting a 2 MB text file required a near 20 MB image. Attempting to put a larger amount of data required a bigger image but I ran into a completely different problem. It consumes a ton of memory just to add the 2 MB into the image, I have an average 8 GB, and it requires 7 GB + 1 GB Swap per image, and that is just jpeg... I tried doing it with a PNG, but the software available requires even more memory to do it with a PNG. Even though said PNG can hold more data, it requires significantly more memory. Where JPEG took 8 GB, PNG was demanding 10 to 12 GB, freezing the system and crashing. It requires the same memory to extract too so... even though I had a test file it was not happy about taking the file out of it.

I also tested the word document. Google converts all images to PNG format, destroying the data injected into them... But it did create a zero space docs file. To do it, you would need to have a PNG in the first place... However, the requirements for making the PNG are way too high to be useful... 1 Image needs 10 GB RAM, but can only inject around 500 KB data, the image created would also be larger than a JPEG... that is ration of 1:30, for JPEG it's 1:10, and for Stewart's UDS method it's 2:3. If you account for upload speed, images can upload at full speed but UDS is limited to 1/5 total network bandwidth, so only 5x performance can be gained from each image format. PNG > 5:30 ? 1:6, JPEG > 5:10 > 1:2, UDS (still 1/5) = 2:3, or 1: 1.5. His method is smaller and the fastest of the image methods. The image methods just allow for cloud syncing in such a way that you don't have to deal with an ID and can easily resume upload.

The only methods I can see would be having multiple drives and uploading to each via processing, so it doesn't close connection due to too many access attempts at the same time.
Making offline word docs, uploading those and converting them (however you would have to delete the original word doc because it still takes up space).

And I used Steghide and OpenStego, if anyone is curious. Steghide is a command line tool while OpenStego is a GUI tool, also the only one of the two that can work with PNG files.

And about the uploading an image file with Garbled Data, I attempted that a while ago, giving the file all the headers necessary to appear to be a file, but Google, Twitter, etc. Require the file to be thumbnailed in order to prove it is an image and not, of course, what we are trying to do. That's why a cover image is used. Google photos refused to upload any image that it could not make a real thumbnail out of, but did work with ones that had real image data.

So... still at square one? UDS method is still the fastest available...

I have also found that using any steg tool has serious problems handing part files, a zip with multiple parts. The part file can be 1 KB but if the data inside is supposed to be larger than 2 MB, it just will not put the zip into the image because it thinks it is a larger file than it actually is...

@DavidBerdik
Copy link
Author

I am not sure that this adds much, but after a little investigation, I believe that the bitmap issue with the Word Documents is actually not Google's fault, but rather, Microsoft's fault. Here's how I found that.

  1. Created a new Word document and added my bitmap to it.
  2. Saved the document, closed it, and changed the extension to zip. (docx files are not thing more than glorified zip archives)
  3. Unzipped the file and found the image. It was in a PNG form.

I do have one more crazy idea. I am pretty sure that it is too impractical to be useful, but I will share it in the hopes that someone does find it valuable.

  1. Create an empty Word document.
  2. Create a new image file (the one that I am playing with is 7150px by 1px and the format should not matter).
  3. Set each pixel in the bitmap to white or black to indicate the bit setting in the file that is being uploaded. (0 = white, 1 = black, or the other way around if you prefer)
  4. Add the image to the Word document, save the Word document, and check the size.
  5. Repeat steps 2-4 while Word document size is under a certain threshold.
  6. Once size threshold is reached, upload document, convert to native Docs format, and delete original Word file.

I am of course aware of the drawbacks of treating each pixel as a bit instead of a byte, but using this method, I am not sure that each pixel could be reliably used as a byte. Since Word/Docs seems to like the PNG format, perhaps using bytes would be acceptable since we would not have to worry about what happens during conversion.

Does anyone have any thoughts on this? (Besides of course thinking that I am crazy.)

@78Alpha
Copy link
Contributor

78Alpha commented Feb 21, 2019

Recently tested multiprocessing in a range of ways... defined processes, self mutating processes, os processes, and star map... all of which got caught up on a certain part "Requires x docs to make", specifically. It seems to stop the system from spawning any new processes, or automatically calls join() on a process it doesn't know the name of. In just running multiple instances in different terminals it worked fine, at least for a small set of data. I once tried with very big files and it got caught up and just stopped both uploads. Using "os.system" to call UDS and whatever command I need to do also seems to cause a problem, it makes the group name "None" and for some reason that stops the whole thing, even when grouping does nothing at all... Trying to do it from an outside script... led to... VERY weird results. UDS started treating letters like files, it would fail to upload anything but encoded non-existent data and uploaded that with the name "."

I have run out of external ideas to speed it up... the only ways left would be to change parts of the core UDS, and that is way over my head.

And apparently there is a rate limit quota, so a single file can only edited so fast, I found this out when messing around with the concurrent.executor that was quoted out. And applying for extra quota requires asking google for more... So... My old idea of making multiple google accounts to access a file might be valid for multiplying speed, maybe I'll test that next...

@Asqii
Copy link

Asqii commented Feb 21, 2019 via email

@DavidBerdik
Copy link
Author

@78Alpha Have you bothered playing around with my latest crazy suggestion at all? If not, I will have a look at it myself when time permits.

@78Alpha
Copy link
Contributor

78Alpha commented Feb 22, 2019

I am studying it at the moment. I haven't gotten to it as I personally wouldn't know how to execute it. From what I see, it would still be bound to the 700 KB doc size limit, but you would be able to group files, however, they wouldn't be able to be part files at the moment...

It allows for more organization but reduces the amount of storable data per picture. I'll have to work with BMP a little bit to see how it handles data.

Attempted the BMP you uploaded, it was apparently too short to hide a 5 MB file, but again managed to hide a 2 MB file. It is starting to appear that 2 MB is the limit for a single file.

I found that files can continuously be pumped into images. I added a zip into an image and made an image, then pumped data into that image... however, the time it takes to do so is exponential. The first time took 10 minutes, this second time is at 5.7% and has taken 2 hours already.

@DavidBerdik
Copy link
Author

Regarding the BMP thing I suggested, why was that limit present? Can't you edit all the bytes outside of the header without corrupting the image?

Regarding the Doc size limit, I tested that.

Using an online PNG generator (https://onlinepngtools.com/generate-random-png), I generated a bunch of PNG images and placed those images in a word document ("Word PNG Image Test.docx") that was about 48MB in size. I uploaded the document to Google Drive and converted it. The conversion was successful. I then downloaded the converted file and checked its size ("Word to Google Doc Conversion Test.docx"). It was 28.1 MB. Using the .zip extension trick, I unzipped the two files to compare the images, and although both were in PNG format, the images were not technically the same. The ones in the Google Drive version were more compressed. I then tried creating a new Google Doc via the web UI and inserting all of the images from the original Word document as well as the converted document in to the new Google Doc. This worked, but it took a while for saving to complete. After this, I downloaded the Doc-created file ("Google Doc Manual Image Insertion Test.docx") which totaled 76.1 MB (note that this size is a sum of the previous two sizes). I then extracted this file using the zip trick and compared the hashes of the images to the hashes of the images from the documents they were sourced from and they all matched. So it looks like the best way to do this would be to insert the images directly in a Google Doc. Unfortunately I cannot find official documentation on what the maximum size is for a native Google Doc, but according to this obscure Google Products forum post, the limit is 250MB. The three documents I created during this test are attached in RAR archive fragments.

GitHub does not allow RAR archives to be uploaded so I had to change the extension to zip. To extract these, change all of the extensions back to .rar and use WinRAR to extract.

Sample Documents.part1.zip
Sample Documents.part2.zip
Sample Documents.part3.zip
Sample Documents.part4.zip
Sample Documents.part5.zip
Sample Documents.part6.zip
Sample Documents.part7.zip

@78Alpha
Copy link
Contributor

78Alpha commented Mar 1, 2019

I myself couldn't edit past the header bytes, it was far outside my field of expertise. I used one of the Kali linux tools, Steghide, and it attempts to inject data in such a way that it will work on sites that try to generate a preview. Since it pushes it to a singular block in the image, I assume that's the limit, if I could input data per block then it would only have a limit of block numbers instead of size (and if course ram when trying to open the image itself as a text document). That 250 limit seems very generous, I wonder who made a doc that big that it was placed so high. I'll have to learn more about all this, but as long as the data is in big chunks, it could boost upload to full potential. I'll take a few days to learn more, if I can't learn what is needed I might have ti pass off trying an implementation myself.

@78Alpha
Copy link
Contributor

78Alpha commented Mar 3, 2019

So, I looked into the whole thing a bit and learned a lot. When I first started out, I was using PNG images, and that's where I went wrong. PNG files are the hardest to work with, as they have checksums for each block, making it nearly impossible to inject data into them, knowing that is helpful though. PNG files have the largest potential size (I downloaded a 500 MB from the Nasa site), but working with them is slow, tedious, and not very efficient...

I worked extensively with your BMP files too, but with Google Photos related tests. After doing some reading up, unlimited storage is for files less than 16 MP (4920 x 3264). So I made a BMP of size 4920 x 3264 with a simple gradient. It is ~50 MB in size, much better than JPG, but not as good as PNG, however, it works. The BMP uploaded to google photos, takes no storage space, and could be downloaded and unzipped.

https://photos.app.goo.gl/RvRR7H4bhcwQcCRu5 (contains a 7zip file)

That is the picture, you can tell how full it is by the amount of random static in it, from bottom to top, so you can also add in more stuff if you want, it's a pain to find the end bytes but is possible. (Also, the data in there is a game of my own design, if it brings up any concern).

I attempted to copy the BMP bytes and create arbitrary images with python, as python has binascii to do stuff like that. However, when writing up a script it threw out a nonsensical error, I state that simply because I ran the same code from an interactive prompt and worked flawlessly, so automation will be problematic...

I also tested your DOC idea, I added a very small jpg to a word document, converted it to a zero space file with docs, and downloaded the doc with its images... And, well... It destroyed the data again. The image was only a 30 KB JPG, so it wasn't turned into a PNG, however, it still tampered with the data such that it couldn't be extracted (or be seen as an archive).

Part files are also not working in multiple images so... I'll be working with hex for a while...

@DavidBerdik
Copy link
Author

Cool!

Over the weekend, I participated in a local hackathon with two friends (@SMyrick98 and @digicannon) and we tried to implement a prototype of the "bitmaps in Word documents" thing that I mentioned earlier. Unfortunately, we did not get everything working due to apparent inconsistencies in the bitmap standard, but time permitting, I believe we have plans to attempt to finish it. If that happens, I will share the work with their permission.

@78Alpha
Copy link
Contributor

78Alpha commented Mar 4, 2019

A rough snippet of code to help out...

def generic():
sequence = '1234567890ABCDEF'
base = binascii.unhexlify('424DD2E4DE02000000007A0000006C00000037130000BF0C0000010018000000000058E4DE02130B0000130B0000000000000000000042475273000000000000') # Header bytes of a BMP file of size 16 MP
with open("generic.bmp", 'wb') as byter:
byter.write(base)
temp = ''
for x in range(10):
for i in range(12000000):
temp += str(random.choice(sequence)) # Generate random noise to increase file size
#byter.write(secrets.choice(sequence))
byter.write(binascii.unhexlify(temp))
temp = ''
gc.collect()
byter.close()

The spacing is a github thing, it doesn't seem happy about sentences started with them...

The code generates a BMP of 60 MB, and yes, it is based solely on size. I used the header bytes to a BMP I had on hand, so it always has the same Width X Height and appears as a BMP. Although it is 60 MB, that's because google photos was not happy with the 240 MB generated one or the 120 MB... but different services should have different limits. In theory, you should be able to make a multi-gigabyte BMP file that always has the resolution of 16 MP.
BM����zl7�� ��X���� � BGRs

is what is made from the bytes...

424DD2E4DE02000000007A0000006C00000037130000BF0C0000010018000000000058E4DE02130B0000130B0000000000000000000042475273000000000000

So... it could be modified to have part files in each image and then consolidated into a single BMP file, not sure how clean that would be, but it means each DOC could have a full zipped file even if images are limited in raw data size. However, from my testing, taking part files from images generates noise of a weird kind, it added data that never existed, corrupting the archives... I guess a cleaner way would be to add the part files to an image and close the file there, without extra noise, such that you can just ignore the headers and stitch the files into one big file.

Hopefully my blunders lead to discoveries for others.

@jhdscript
Copy link

jhdscript commented Mar 4, 2019

I realize various tests:

  • Plain text
  • docx + convert
  • google sheet container

Speeds are always bad (250kbps max) so i think the only way to boost it is threading the process.

I look at rclone and with small files it works faster than all my test :'/

Moreover google limits file creation at 3 per seconds :-(

@78Alpha
Copy link
Contributor

78Alpha commented Mar 5, 2019

Here is a BMP tool, might make the process of putting them into docs easier, makes the standard more uniform. Sadly it's limited to 2.7 right now, 3.7 was having a cow about reading hex and bytes.

https://github.com/78Alpha/BMPMan

The only advantage google photos has is the fact it can make albums and continuously sync (at least from mobile). Still very manual, just makes pictures... I added in a license just for reasons, and read it over, so I guess I have to state this:

DavidBerdik, under the LGPL v3, you have free reign over the nightmarish code I have created in the link above, if you like.

@kloklojul
Copy link

Ah i see, i've only read the last 3-4 and the first 20-30 messages. Ill check out your repo.

@DavidBerdik
Copy link
Author

@kloklojul No problem. Enjoy!

@78Alpha
Copy link
Contributor

78Alpha commented Feb 3, 2020

I'm the guy that worked on the generic bitmap prototype.

Iteration 1:
https://github.com/78Alpha/BMPMan
Iteration 2: https://github.com/78Alpha/BMPMOT

My method hasn't shown any corruption but I'm the least capable programmer in this thread... So I got hard stopped by the REST API for photos.

Currently trying to convert BMPMOT to rust, not sure if it'll improve performance, as I found adding threads to the bitmap process hurt performance, since IO is the bottle neck.

For redundant bytes, that sounds like compression. Easy enough to point to 7zip or something else for best performance, might be a later option.

For the most part, I'm just using BMP* to learn other languages, it probably won't see another release.

@AnishDe12020
Copy link

You need to work on the speed.
image
Look at the screenshot. I started this upload 50 minutes ago.

@stewartmcgown
Copy link
Owner

stewartmcgown commented Jul 10, 2020 via email

@AnishDe12020
Copy link

Ok at least try to upload bigger chunks of data. Like 5 MB at least.

@DavidBerdik
Copy link
Author

Ok at least try to upload bigger chunks of data. Like 5 MB at least.

From the README:

A single google doc can store about a million characters. This is around 710KB of base64 encoded data.

Which means it's not possible for bigger chunks to be handled. This is a limitation of the Google Doc character limit. Perhaps you would prefer InfiniDrive, which is my implementation of the same "unlimited Drive storage" concept, but in a different way. - https://github.com/DavidBerdik/InfiniDrive

@AnishDe12020
Copy link

Yes, David, I saw your project but it is the same thing of uploading 710 kb of fragments. So what is the advantage?

@DavidBerdik
Copy link
Author

but it is the same thing of uploading 710 kb of fragments

Actually, it is not. My approach to this problem uses 9.75MB of data per fragment. It is faster than this text-based solution, but still not as fast as doing a normal file upload on a high-speed network connection.

@AnishDe12020
Copy link

It is fast as long as it is 1 fragment/second. And ya, I did use your solution and it worked but your first release with the id approach instead of file name approach was better.

@maitreerawat
Copy link

It is fast as long as it is 1 fragment/second. And ya, I did use your solution and it worked but your first release with the id approach instead of file name approach was better.

I am finding it hard to understand, why would you think that the first release was better. Perhaps your thoughts about it would enhance our perspective towards our old technique.

@AnishDe12020
Copy link

It is fast as long as it is 1 fragment/second. And ya, I did use your solution and it worked but your first release with the id approach instead of file name approach was better.

I am finding it hard to understand, why would you think that the first release was better. Perhaps your thoughts about it would enhance our perspective towards our old technique.

No like there would be an id to the file we upload than a file name which does not make sense.

@kloklojul
Copy link

You could also check out my POC but you would have to implement the filemanagement yourself. The max file uploade size is 64MB per fragment. I have a much more elaborate version on my pc at home which i can upload as soon as i get back to my hometown which has very basic filemanagement and google auth login but nothing you couldn't do yourself in 1-2 days. My approach stores bytes in pixels and than abuses the fact that you can upload unlimited pictures to google photos. The max file size of 64MB is because the max resolution that you can upload for free is 16MP before google starts to compress your files and corrupting them. I heard there is also a project that hides data in bitmaps that survives compression, you might be also interessted in that.

@78Alpha
Copy link
Contributor

78Alpha commented Jul 12, 2020

The current state of projects is...

Stewartmcgown: UDS, Docs method
Base64 encoded data in DOCS files, uploaded from local computer, limits of 710KB per chunk. Always unlimited

DavidBerdik: infinidrive, PNG method in DOCS
Data injected into PNG files placed in a DOCS file, faster upload and higher data density per chunk. Compression changes to doc may make data corrupt

78Alpha: BMPMOT, BMP spoofing for Photos
Inject data in BMP files, apparently the arbitrary limit of 50 MB was removed, so theoretical infinite size. Requires the photos app to continuously upload without failure. Will fail if BMP is purged from allowed file list do to age, compression changes will corrupt data, guaranteed. You can't search file name in Photos for some reason so management is a pain

kloklojul: hidedatainpictures?
Currently all I see is a readme

@DavidBerdik
Copy link
Author

No like there would be an id to the file we upload than a file name which does not make sense.

As @maitreerawat said, I do not understand your perspective on this. It was her idea and we worked together (although she did most of it) to make the change, as we both agreed that working with file names directly is far more intuitive than referencing file IDs which can quickly become hard to manage if you have many uploads to keep track of. If you can provide a strong argument as to why the file ID approach is superior, we can consider making it an option again, although I highly doubt such a justification will be possible.

That said, the open source nature of InfiniDrive means that if you don't like the way @maitreerawat and I decide to do something, you are welcome to fork the project and modify it. 🙂

My approach stores bytes in pixels

That's what mine does too! 😃

DavidBerdik: infinidrive, PNG method in DOCS
Data injected into PNG files placed in a DOCS file, faster upload and higher data density per chunk. Compression changes to doc may make data corrupt

Regarding the compression issue, this was a major issue in earlier releases of InfiniDrive, but at this point, there are no known corruption bugs. Users of it are of course encouraged to keep an eye out for corruption, but from my fairly extensive testing, we are now able to consistently detect it and repair it when it occurs.

@stewartmcgown
Copy link
Owner

stewartmcgown commented Jul 12, 2020 via email

@AnishDe12020
Copy link

You could also check out my POC but you would have to implement the filemanagement yourself. The max file uploade size is 64MB per fragment. I have a much more elaborate version on my pc at home which i can upload as soon as i get back to my hometown which has very basic filemanagement and google auth login but nothing you couldn't do yourself in 1-2 days. My approach stores bytes in pixels and than abuses the fact that you can upload unlimited pictures to google photos. The max file size of 64MB is because the max resolution that you can upload for free is 16MP before google starts to compress your files and corrupting them. I heard there is also a project that hides data in bitmaps that survives compression, you might be also interessted in that.

Can you make me understand what u meant?

@78Alpha
Copy link
Contributor

78Alpha commented Oct 9, 2020

@stewartmcgown I haven't looked too much into Google's own things, but am looking now. Seems they have changed things a bit.

Image sizes have been upped

Photos BMP, GIF, HEIC, ICO, JPG, PNG, TIFF, WEBP, some RAW files. 200 MB
Videos 3GP, 3G2, ASF, AVI, DIVX, M2T, M2TS, M4V, MKV, MMV, MOD, MOV, MP4, MPG, MTS, TOD, WMV. 10 GB

And they officially state support for BMP now, and apparently RAW (may be useful).

Looking back into this project, now that I have some experience. I saw that my first pull request items (Alpha extensions) are still there, and they look... less clean than what I make now, so I thought I'd tune everything up (and add a GUI for fun)

Also have a CPP POC of BMPMOT/BMPMAN if @DavidBerdik is interested (it's ugly and barely works on files smaller than chunk)

@DavidBerdik
Copy link
Author

@78Alpha I would certainly be interested!

Image sizes have been upped

This makes me curious about if the Drive API's maximum size limit has been upped.

@78Alpha
Copy link
Contributor

78Alpha commented Oct 10, 2020

After looking over the API, I still have no clue how to use it... But apparently it defaults to original quality, meaning it does take up storage.

So, the only effective setup for images is still Google drive backup tool + image creator

@DavidBerdik
Copy link
Author

@78Alpha Yeah it looks like there is no way around that: https://developers.google.com/photos/library/guides/api-limits-quotas

At the bottom of the page:

All media items uploaded to Google Photos using the API are stored in full resolution at original quality. They count toward the user's storage.

@78Alpha
Copy link
Contributor

78Alpha commented Oct 14, 2020

https://github.com/googleapis/google-api-python-client/blob/master/docs/thread_safety.md

Looks like the APIs are not thread safe, but that appears to be thread safe per file, or something I can't seem to see.

I ran 4 UDS uploads at once (4 different instances). They all uploaded fine and didn't seem to drop any files. A rudimentary way of multithreading/processing would be running a session or spawning a process per file.

Going back to work so can't test myself, but I assume like any threaded process, it's spawn in a list, chunk out the file per thread, and wait for all threads to finish before next set. Will have to try on my own.

And on the previously mentioned GUI, it may have to be in a separate branch, as it heavily modified everything to get working while also allowing CLI use of it. And, basically removing file selection in favor of batch folder upload (1 to MAX).

testing

@DavidBerdik
Copy link
Author

@78Alpha Even if they are thread safe, isn't rate limiting a problem? When I was messing with implementing it in InfiniDrive, it certainly was.

Also did you build the GUI version? I started working on an GUI for InfiniDrive over the summer, but never got around to finishing it.

At some point, @maitreerawat said she's going to try messing with either encryption or compression, but I don't think she's had much time.

@78Alpha
Copy link
Contributor

78Alpha commented Oct 15, 2020

Just threw together a template one, all I have to do left is connect the slots.

I haven't quite tested how much it can push at once. There may be a limit I haven't hit during testing.

@DavidBerdik
Copy link
Author

@78Alpha I haven't managed to find the information anywhere directly from Google, but according to two different posts on StackOverflow, the API limit is 1,000 queries per 100 seconds.

Sources:

The second of these two claims that projects have a 1,000,000,000 queries/day maximum and the 1,000 queries per 100 seconds limit is to reduce the chances of hitting that maximum very quickly, but if you want, it can be increased.

@DavidBerdik
Copy link
Author

It looks like we're about to lose our ability to use this trick on Google Photos.

Pour one out, everyone.

https://twitter.com/googlephotos/status/1326586112458936321

@78Alpha
Copy link
Contributor

78Alpha commented Nov 12, 2020

That is a sad fact indeed...

@DavidBerdik
Copy link
Author

DavidBerdik commented Nov 12, 2020

Actually, it seems that it kills off my InfiniDrive as well as this UDS project.

https://www.theverge.com/2020/11/11/21560810/google-photos-unlimited-cap-free-uploads-15gb-ending

Alongside photos, “Google Docs, Sheets, Slides, Drawings, Forms and Jamboard files” will also begin counting against storage caps. The reasoning is “to bring our policies more in line with industry standards,” Google says.

I should note that I haven't been able to find the source for this comment. If I find out for sure that this is true and not a mistake on The Verge's part, I'll be shutting down InfiniDrive. 😢

Edit: Found the source - https://support.google.com/docs/answer/9312312?hl=en#:~:text=After%20June%201%2C%202021&text=Files%20created%20or%20edited%20in,not%20count%20against%20your%20quota.

@kloklojul
Copy link

fffffffffffuck feels bad man

@DavidBerdik
Copy link
Author

My final InfiniDrive release was published a few hours ago. It's been fun sharing this learning experience with all of you. I hope that this will collectively drive us to look for new ways to store data.

https://github.com/DavidBerdik/InfiniDrive/releases/tag/v1.0.22

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests