After our next update, likely sometime next week, Algo sizes will be limited. We are currently planning to limit sizes to 15mb. Note that a vast majority of algos are code-only and use much less space, and that this only affects users who are uploading a significant amount of data or information along with their algo.
We wanted to allow as much flexibility as possible to see what everyone does with it, so we initially set the upload size limit to something extremely high. Unfortunately, the main limiting factor on the size of uploaded algos is now internet speed. Large algos would lead to network requests timing out and failing, and most users would start seeing issues after ~20mb. This also gave a small advantage to users with faster internet speeds, letting them upload larger algos. It also led to a negative user experience where players are encouraged to attempt to upload the same large algo many times until it works.
As always, let us know if anyone has comments or concerns
I’m concerned about the additional layer of difficulty this may add to ML algos, especially released before python dependencies are supported. The Keras2Cpp implementation I’ve been using dumps the model and its weights into a rather large file, that even compressed is around 21 mB on its own. The original h5f file that I would now have to read from is 14.2 mB and only compresses around 6-8%, so even if keras, tensorflow, and numpy were supported, or I altered my implementation to read from the h5f directly, there are still immediate size limitations to an ML algo.
For now, we will set it to 50mb, since most users wouldn’t be able to upload an algo of that size anyway, I don’t expect it to impede anyone in anyway. We will hold off on lowering it any further for now.
Thanks. If I get some spare time I’ll look into updating Keras2Cpp4Terminal to read directly from the h5f weight files and JSON model files to help minimize file sizes.
You could also try compressing the files, and then undoing the compression at the very start of the match. I’m not sure how long this would take, but you do have some extra time at the start of the match to set up, etc. Again, not sure on how much time this would actually take but it might work, especially if there are a lot and you use threading.
Yeah, I mean zip your data files before zipping everything. It won’t do much (or anything?) unless you increase your level of compression (for the inside compression). It might though, if the files are more similar compared to everything else, something I consider may be be true, given their format compared to code files, no actual hard experience from me, just speculation.
You could also try different types of compression (again, the inside files) to see if any give better results.
I think it’s worth noting that I changed the current model I’m using, and the raw file size of the dumped weights dropped from 55 mB to 1.5 mB, bringing the zip file size down to 1.5 mB. I guess it’s really just subject to how big the neural net is. For reference this slimmed version of my model is big enough to input the map and output map placements with a fairly small hidden layer to connect them.
The team suspects that it should be feasible to do ML with 15 mb, which is part of the reason we targeted that number. We don’t want to break algos midseason though, so we will wait until the end of April before making a more dramatic change.
Did this change go through? I’m working on a Bayesian inference network that is currently packed into an ugly 31mb json file. I could probably get it below 15 mb if I write or find a better way to pack the post/prior numbers, but if the size is still at 50 mb then I won’t waste my time on it.
Is there any chance we could have some libraries pip installed by default?
I just spent the last while getting my keras model to run as a C++ executable. I feel like having to do this might dissuade many from trying to use ML.
Lets open the PandoraBox…
Seriously I would love the idea to have more flexibility and better options for compiling, without the need to rewrite everything in C++, just to get the speed advantage.
That said, some people have invested a lot of time in this already…
And it would be fair at least to set fixed expectations:
(may be update the restrictions at beginning of each season)
list of libs / tools, fixed memory limit, file size, processor usage.
So people can prioritise, and eventually try out something new.
For Python, I think the key tools are: PyPy, Numpy , but not sure how easy it will be to add them in the current implementation.