I researched the topic in the previous blog. I now know how to query the API, and what is the response. In this blog, I aim to start writing the program to get the scooter data.
Current problem: access token generation
Each time I want to query the locations, I need to supply an access token, which comes from submitting a session request with my authentical token (not to be confused with an access token). The access token expires after 15 minutes, so I will have to generate a new one after that.
If I plan to frequently send location requests, say every minute. I won't want to make a session request and location request every time I do, as it effectively doubles the request I make, which is not polite to the server[1]. Therefore I wish to find a way to only get an access token when I need it.
[1] The term polite comes from web scraping, which is to not send too many requests to a web server at the same time, to avoid overwhelming it. I am using this term to say that I wish to minimise the request I send to the API server.
Literature review
I'm doing a literature review again because I want to make sure what I'm going to do have not been done before. If I did the project first and found out someone did something exactly the same before, I would have wasted all that time. Although some might argue I would still have learned something while doing it.
On Github
I searched for projects on Voi scooters on Github instead of Google because I am searching for code. To my surprise, I actually found a decent amount of results, which I wasn't able to the last time I searched on Google. The projects I found are:
Leostx disassembled a Voi scooter and shared their finding 3 years ago. github.com/Leostx/Voi_scooter
Marcus made a simple Voi scooter tracking script. github.com/Dridia1/VoiTracking
Pierrick Paul published the code for a website that shows the live location of a lot of bikes and scooters, but I tried and it doesn't show the location of Voi scooters in my city. github.com/PierrickP/multicycles
The former project mentioned that it uses Fluctuo data flow, which I investigated and it is an API service that returns the location of a lot of bikes and scooters. However, it is paid so I'll pass. fluctuo.com
Johan Åkerman made a cookie clicker game about clicking on Voi scooters. Not very helpful to me. github.com/johan-akerman/VoiHunter
David Orlea made a smart home assistant script to find the nearest Voi scooter. It includes scripts to authenticate and get locations etc. I should check it out for inspiration. github.com/davidorlea/homeassistant-voi_nea..
Dennis Trautwein made a go script to use the Voi API. I will have to check for any Voi API I didn't know of. github.com/dennis-tra/voi-client-go
Seppo Walther made another Voi finder. github.com/seppowalther/swlt.voiscooters
On Google
The last time I searched on Google, most of the results are from the Voi website, this time I decided to use -site:voiscooters.com
so that results from that domain would be excluded, and it helped. These are some extra results I found on Google that might help:
Jon Ashcroft wrote about how he investigated the Voi API. ashcroft.dev/blog/unofficial-voi-scooters-api
David Fant wrote about how he abused the promo code for unlimited Voi credit, which was fixed. fant.io/p/hacking-voi
Back to work
Now that I've learned what others have done on this subject. It is time to actually build my program. My plan is to have the program constantly query the API and save the result for further analysis.
The challenges that I currently face
- How to save the result? should I create a file per query? a file per minute? per hour?
- How do I keep the result concise to save storage space?
- When should I request an access token?
- Voi scooters are turned off at night, what happens then?
But one important thing is, that I should store my token in a separate file that is not committed to my git history since it is private. Otherwise, others would be able to steal my identity and potentially do something bad. I learned this from a time when I accidentally committed my Discord bot token, luckily, Discord runs a web scraping bot on Github and it found my token before the bad guys, it generated a new token for me and reminded me about that.
Remove a file completely from git history
I found out how to remove a file completely in the git history. This is useful because if I committed my token and remove it in a newer commit, it would still be in the history and people can access it. The way to completely remove it is to run git filter-branch --index-filter "git rm -rf --cached --ignore-unmatch path_to_file" HEAD
where path_to_file is the file that contains the secret, then git push -f
to force push the altered git history. A side effect is that it would remove the file in the active directory too, so keep that in mind.
Making location requests
For now, my plan is to make a request every minute, which would be 1,440 requests per day. I also should process the data got from the API since as seen in the last blog, the response contains a lot of useless data. Each scooter has id, short, battery, location in lng and lat, zoneId, category, locked, lockType and lock status. But all I need is:
- short: since it corresponds to the id
- battery: would be nice to know the battery consumption
- location: obviously I'm trying to track it
- locked: I'm not sure if the scooter is unlocked, will it still show up on the system (edit: it is confirmed that the API only returns locked scooters, so no need to store it)
And I will manually generate the following data:
- vehicle count: see how many scooter's locations are returned
- timestamp: when did I make this request, in ISO format for better readability
I made some requests, and if I just store reduced JSON, with an indent of 4:
{
"time_stamp": "2022-06-01T08:14:42.305285",
"vehicle_count": ----,
"vehicle_data": [
{
"short": "----",
"battery": --,
"lng": ----------------------,
"lat": ----------------------,
"locked": true
},
{
"short": "----",
"battery": --,
"lng": ----------------------,
"lat": ----------------------,
"locked": true
},
...
A single file is 225KB, which is 230,787 bytes. If 1440 such files a day, it will be 316MB, that's a lot, I will try to cut down on this.
By storing data as a list with no label:
{
"time_stamp": "2022-06-01T08:20:09.340059",
"vehicle_count": ----,
"data_format": [
"short",
"battery",
"lng",
"lat",
"locked"
],
"vehicle_data": [
[
"----",
--,
----------------------,
----------------------,
true
],
[
"----",
--,
----------------------,
----------------------,
true
],
...
I reduced the file size to 172KB, 176,814 bytes
by not using indent=4
:
{"time_stamp": "2022-06-01T08:21:39.708911", "vehicle_count": ----, "data_format": ["short", "battery", "lng", "lat", "locked"], "vehicle_data": [["----", --, ---------...
the file size became 71.2KB, 72,939 bytes, I never thought some while space could drive up file size so much
that will be 100MB of data each day for 1440 requests, maybe only 66 MB since they only operate from 6 am to 10 pm, plus stop when it is raining.
I should group the records into hours, so I only generate 24 files a day, since having a bunch of files will slow down the operating system.
What I plan to do in the future
Now that I have calculated the file sizes, I will start coding and see how it goes in the next blog.