Let's Do It AI Project

How we've harnessed artificial intelligence to spot trash

Trash is everywhere. Left uncollected, it often ends up harming the environment. To clean it up efficiently, we need to focus on the places that are most affected. But how do we know where in the world that is?

By combining artificial intelligence with tens of thousands of publicly available photos, we can help solve this problem. We, Let’s Do It Foundation and SIFR in partnership with Microsoft, have developed an AI algorithm for detecting trash in geolocated images. Today, our algorithm is surveying images all over the world, locating trash on a global scale so that our network of cleanup teams can target the worst locations. We are thankful for mentorship from Walter Yu at CalTrans, who inspired its development and for support from Microsoft.

This is the story of how we achieved it.

What’s trash, anyway?

Trash is a rather complicated object to detect. Imagine a restaurant table with cans of soda on it, people having fun, eating and drinking. In that context, cans are not trash. BUT, when those cans are on the street, they can most likely be considered trash.

This is the main challenge with any image-based trash detection algorithm. Not everything that LOOKS like trash IS trash. Trash is a word people use of an object that lacks purpose, and the purpose of an object is often not obvious in the images we use for teaching an algorithm to spot trash.

The plan

The machine learning project was divided into 5 steps :

  1. Collecting images – Luckily for us, this part was mostly done by the Let’s Do It World foundation and UC Riverside, who are experts at identifying and cleaning up trash. We had thousands of images collected through the World Cleanup App and scraped from Google Street View to use for our model.
  2. Selecting images – We strategically chose most of the images that went into the model. We started with a sample of images, trained the model, and analyzed the results. Based on the results, we determined what images we had to add to the next iteration of training.
  3. Object detection – This step was very time consuming because it required a lot of manual work: marking all the trash in each of our selected images. We got help from the lovely volunteers from LDIW Foundation and UC Riverside, and all together manually annotated over 1000 images for the model to learn what trash is.
  4. Training the machine learning model – This was done multiple times, each time adapting the parameters of the Mask R-CNN model to improve the results and adding new images to the training dataset.
  5. Result validation, Testing the training – After each training cycle, the best model was chosen and tested by having it predict trash on test images. These images were not in the training dataset and were used to assess the accuracy of the model.

How did it go? Read on to find out.

Finding the right tool

We started off testing the idea of trash detection with the YOLO (You Only Look Once) object detection system in the beginning of March 2019. We selected the initial weights available (tiny yolo) and trained the model to detect only one class: trash. This was done on about 40 images selected from the web.

The results were promising, but clearly not enough:

For greater precision, we decided to use Mask R-CNN Python implementation (more info on Mask R-CNN: introductory paper, GitHub) to get the benefit of both object detection and segmentation (a good explanation of both). But this required even more manual annotation of the images: since Mask R-CNN uses polygons as its input, this time just dragging a box around trash was not enough. With the help of LDIW volunteers, we started annotating images with the VGG Image Annotator tool.

Selecting what images to annotate and add was done rather methodically. Since annotating takes a lot of time, we started strategically choosing images that would benefit our goal the most.

The initial requirements of AI tool to help to detect trash were quite straight forward:

  1. Detect piles of garbage
  2. Do not detect humans

As a side note, we continued training the heads of coco weights, because earlier layers detect low level features (edges and corners), and later layers detect higher level features (car, person, cat) and in our case – trash.

During the initial testing, it was decided that the model will be trained to detect piles of garbage, rather than single elements of trash on the ground. Our reasoning was that bigger piles are less likely to move (so they may be cleaned up by Let’s Do It World teams), and that trash piles should carry enough characteristics compared to the background to be easily detectable.

Learning what trash is

Unfortunately, in the beginning, our model did not really seem to get the idea of trash. With every new test image, we found new objects that needed to be added to the training dataset to stop the model detecting non-trash objects as trash.

As you can see, the model’s understanding of what trash is was limited to “Mixture of randomly placed colors with no structure on a somewhat solid background”. This means that almost anything too colorful or weird shaped was trash to the model, even if in real life, it was not trash.

Mask R-CNN can teach its models 2 classes: “Trash” and “Background”. For us, this meant that everything we didn't label, the model learned to recognize as background. We started adding images that included “humans and trash”, as well as “trash and cars”, “trash and animals”, “trash and streets” and so on, and the model gradually became better.

During our development the model output developed from this:

… to this:

… to finally this:

And from this:

… to this:

… to finally this:

Taking it to the streets

At this point we had not yet incorporated the images from Google Street View. Our model was built from a collection of roughly 700 images, of which around 100 were selected from web searches and about 600 taken by volunteers at the Let’s Do It World events.

We trained the model with 115 epochs and 10 steps in each, with the learning rate being 0,001. The parameters we set were quite similar to the Balloon example demonstrated on https://github.com/matterport/Mask_RCNN.

After some rounds of train-validate-repeat, we started testing the model on Google Street View images. We quickly discovered that these images were very different from the ones we had trained the model on. The quality of the images was slightly poorer and there were many image defects due to the fact that they were made on a moving car.

There were also some objects that the model had not learned in the previous training: the model did not understand light flares, road surface markings, roadside posts, or the Google Street View car’s camera shadow.

Camera shadows are not trash:

Neither are light flares:

Road surface markings are also not trash:

After seeing the mistakes, we added some Google Street View images to the training dataset. We had nearly 50 000 images to choose from, and from these, we made a selection of images that the model had labelled incorrectly. Finally, we had 1300 images in the training dataset. We trained 200 epochs, with 17 steps in each with a learning rate 0,001.

The results are now starting to look quite promising. To finish up the story about where we are now, here are number of positive examples:

Next steps

  1. Looking for funding and partners with whom to verify our AI solution for the entire world;
  2. Assure public access to the source code;
  3. Make 1 billion people aware of the trash problem.
We are looking for collaborations with anyone interested in developing the trash detection tool further with Let’s Do It Foundation. Join with us in GitHub or contact with Kristiina Kerge from Let’s Do It Foundation [email protected]


During the process of developing this model, the team has looked through thousands of images that are just full of piles of trash: in the natural environment, on the street, in the sea, in countries around the world. We were able to detect the exact location of mismanaged waste.

Here are the examples of locations from USA

and Thailand

We have a big mess to clean up. And we hope that our model can help make the process of cleaning the world easier and faster and help to keep the planet clean.

The Team

Kristiina Kerge
Tech Innovation Lead
Let’s Do It Foundation
[email protected]
Win Cowger
Graduate Student, Environmental Sciences
University of California, Riverside
Kris Haamer
Programming the web and creating experiences across media
Kristin Ehala
Data Scientist
Kaarel Kivistik
AI architect
Taavi Tammiste
AI and ML Expert
Merili Vares Executive Director Let’s Do It Foundation

For more information please contact with Kristiina Kerge from Let’s Do It Foundation [email protected]