As you may know, our mission at Monk is to inspect any car in the world and to extract as much info as possible out of the visual condition of a car, through a few smartphone pictures. As simple as that.
Therefore, we developed state of the art computer vision and machine learning algorithms to detect and estimate any damage on a car body, whatever its stage in its lifecycle.
As we strive for full transparency of a car history and composition, our car body analysis had to be augmented with the extra written intelligence that we can find in cars: VIN, dashboard mileage or tires specs.
The technology required to read characters from pictures is called OCR, standing for Optical Character Recognition. OCR algorithms are a specific range of algorithms dedicated to character reading. Being able to read VINs automatically was important for us but not our core business.
We decided to tackle this case by using OpenSource OCR library. Choosing the perfect partner flew from the source, according to our Head of Delivery, Nicolas Schuhl : “Providing state-of-the-art French Open source, we believed Mindee seriously compete with major OCR players in this use cases. Thus, DocTR, their OpenSource library, answered perfectly our needs: Pytorch and Tensorflow supports, latest Deep Learnings algorithms and incredible support.”
To summarize the OCR approach, the problem statement was:
- the input is a VIN picture written on a car
- the output is a 17 characters length string: the VIN
While it may look quite simple, there are a few challenges to take up :
- The input photos are mostly taken outdoor, with a lot of noise (brightness, water stains, shadows…) that can make the detection and the recognition of the VIN difficult
- Although the VIN is written in a pretty standard format, the fonts used are not standard, not always the same, and the letter spacing can vary a lot.
- A checksum validation method exists to validate VINs, but it’s not working for all vehicles. We rejected this post-processing solution.
- Last but not least, the VIN is not always the only text written in the photos, using a traditional OCR approach is not enough as we’d need to add a layer of post-processing to filter out the unwanted characters.
Know more about Mindee’s scientifical approach to successfully develop these models here: https://www.linkedin.com/posts/mindee_how-to-train-text-detection-recognition-activity-6864632837279125504-dJwI