This corpus is computed by two data sets: CleanSea, with real underwater debris images, and e-CleanSea, a synthetically generated debris image set. These two sets are described in detail below.
CleanSea is based on the JAMSTEC E-library of Deep-sea Images (J-EDI), which contains samples of fauna and flora, among other classes, collected in Japan's surroundings seas. Each image may contain one or multiple objects in different illumination and sea conditions. A manual selection was made, taking a total of 1,223 images that contained samples of debris. Specifically, samples were classified into a total of 19 different types of waste: Can, Squared Can, Wood, Bottle, Plastic bag, Glove, Fishing Net, Tire, Wrapper, Washing machine, Metal Chain, Rope, Towel, Plastic Waste, Metal Waste, Pipe, Shoe, Bumper, and Basket. The labels “Plastic waste” and “Metal waste” refer to waste that, due to erosion caused by the ocean, is so degraded that it is very difficult to classify within a specific object category. It should also be noted that some of these labels could have been combined, however, it was preferred to keep them separate.
The objects were manually labeled at the pixel level by marking their outline. Note that many images contain several objects, which were labeled separately and indicating their class for each one. In total, 1,994 objects were labeled in the 1,223 images that make up the data set. In order to download the images related with the annotation files of this dataset, please read the LICENSE AND ACCESS section below.
e-CleanSea was generated artificially by inserting debris elements on real underwater backgrounds. More precisely, the Underwater Image Enhancement Benchmark (UIEB) dataset was used for backgrounds and the debris were directly gathered from the CleanSea set. The obtained synthetic set generated comprises 990 underwater images containing a total of 3,135 debris objects.
The figure below shows examples of scenes extracted from the CleanSea (note as Real) and e-CleanSea (Synthetic) datasets. The top and bottom rows in each group correspond to the actual images and the annotations, respectively.
The next table compares the object distribution of the real CleanSea scenes with that of the generated collection:
Label | CleanSea | e-CleanSea | Label | CleanSea | e-CleanSea | |
---|---|---|---|---|---|---|
Plastic Bag | 650 | 977 | Towel | 54 | 73 | |
Can | 339 | 598 | Glove | 46 | 57 | |
Packaging | 202 | 346 | Basket | 42 | 60 | |
Bottle | 143 | 228 | Pipe | 42 | 67 | |
Fishing net | 114 | 186 | Metal waste | 28 | 33 | |
Plastic waste | 86 | 146 | Square can | 20 | 23 | |
Rope | 79 | 113 | Tire | 16 | 25 | |
Bumper | 68 | 95 | Shoe | 8 | 12 | |
Wood | 62 | 96 |
CleanSea and e-CleanSea are shared only for non-profit research or educational purposes. If you use this dataset or a part of it, please respect these terms of use and reference the original work in which it was published.
CleanSea images were obtained from JAMSTEC E-library of Deep-sea Images (J-EDI) https://www.godac.jamstec.go.jp/jedi/e/. There may be individual conditions regarding the attribution and handling of each image. Copyright can be found in the bottom left corner of the downloaded image.
If the J-EDI website is suspended, you can request the images via e-mail to JAMSTEC Data Management Office (DMO, email: dmo@jamstec.go.jp) as an alternative method.
If you need access information for each image, you can combine the J-EDI URL (https://www.godac.jamstec.go.jp/jedi/e/) and the image ID. Image ID is the file name without extension. For example, image ID is "HPD0400OUT0422" for file name "HPD0400OUT0422.json".
Background images used for e-CleanSea were obtained from the Underwater Image Enhancement Benchmark (UIEB) collection (https://li-chongyi.github.io/proj_benchmark.html).
The underwater debris images of CleanSea were acquired from JAMSTEC in RGB and with different sizes. The images are stored as JPG file format. The mask with the exact location of each object in the image is included. This labeling is stored in JSON files using the annotation format of COCO. For this, we used the tool "LabelMe" (https://github.com/wkentaro/labelme).
In the case of e-CleanSea, the backgrounds and debris objects are provided as separate PNG images, and by executing the provided algorithm both the images and the labeling (also using the annotation format of COCO) are generated automatically.
Please, if you use these datasets or part of them, cite the following publications [1] [2]:
@inproceedings{asferrer2022, author = {Alejandro Sanchez Ferrer, Antonio Javier Gallego, Jose Javier Valero-Mas, and Jorge Calvo-Zaragoza}, title = {The CleanSea Set: A Benchmark Corpus for Underwater Debris Detection and Recognition}, booktitle = {Iberian Conference on Pattern Recognition and Image Analysis}, year = {2022} } @article{asferrer2023, author = {Alejandro Sánchez-Ferrer and Jose J. Valero-Mas and Antonio Javier Gallego and Jorge Calvo-Zaragoza}, title = {An experimental study on marine debris location and recognition using object detection}, journal = {Pattern Recognition Letters}, volume = {168}, pages = {154-161}, issn = {0167-8655}, doi = {https://doi.org/10.1016/j.patrec.2022.12.019}, year = {2023} }
To download these datasets use the following links: