At StatsBomb, we pride ourselves on utilising the most cutting-edge technology to create better data. We have used Computer Vision (CV) in our event data collection since day one, using a combination of Human + Artificial Intelligence (AI) to provide the most accurate and detailed event data in the industry. Now, as we move into tracking data collection, we’re leveraging more and more AI to deliver highly accurate data at scale.
Today, our AI team are going to talk you through one of the cornerstones of our of tracking data collection methodology – homography estimation.
What is homography estimation?
Homography estimation is a fundamental problem in computer vision that involves finding the transformation between two images of the same scene taken from different viewpoints. In simpler terms, it is what your brain automatically does to reconstruct the 3D scene when it sees two images like the ones below.
Our brain automatically identifies common points or areas of the image and associates them, reconstructing the scene by finding the relation between the images. That relation is what we call homography.
There are many applications of homography estimation that we all use in our daily lives such as:
- Image stitching to create panoramic images
- HDR imaging enables your phone to capture HD pictures under difficult lighting conditions.
- Autonomous navigation: it can be used to estimate the position and orientation of a robot with respect to its environment. Thanks to this, robot can navigate through a complex environment without human intervention.
Why is it so important in the sports analytics industry?
We know what a homography is, but why has it become so important in today's industry?
Have you ever wondered how it's possible to estimate how many kilometers a player has run in a match? Or how to compute player location heatmaps?
Exactly, all of these tasks can be done thanks to homography estimation. It allows us to accurately track the position and movement of players and the ball on the field, which is essential for sports analytics applications such as player tracking, shot analysis, and tactical analysis.
How can this be done? The trick is simple: we know that a homography relates two images of the same scene from different viewpoints. We can create a controlled template image of the pitch and transfer all the information we see in a match to that image. The perfect viewpoint for our case is a zenithal view of the field, since it is not subject to any perspective distortion.
Thanks to this, if we know the homography that relates our template image with any other one, we can transform our template to match the real image.
That looks like magic! What's even more interesting is that the transformation can be done in the other direction too. This is, we can project stuff from the real world image to our template. Let’s for example draw the area of the field that can be observed in previous image:
We can do the same transformation with any object that is present in the original image. Let’s assume for instance we spent some time tagging all the players that appear in an image:
Since we know the homography that relates this frame to our template, we can project each player's position once again in our controlled space.
So, if all we want is to be able to track players, we just need to tag each player across the different frames of a video and determine the homography for each of those images. That is how we can obtain those track lines that are shown in previous image.
Our controlled template is useful for establishing spatial relations. We can relate pixels to yards. For example, in reality, the distance between two contiguous yard lines is 5 yards. In our template image, that distance is represented by 20 pixels. Thus, we can establish that 1 yard corresponds to 4 pixels. This is how we can determine how many yards a player has run!
Having homographies for all images is the key step in building systems that can create a perceptual understanding of sports, facilitating multiple applications ranging from virtual reality graphics on the pitch to metrics measurements.
What do we need to compute homography?
As explained previously, the homography relates two images of the same scene that are taken from different viewpoints. This means that a point in image A can be located in image B by applying that transformation.
The transformation itself is represented by a 3x3 matrix. We will not delve too deeply into the technical details here, but as a simple intuition, you can think of it as follows: if we obtain the pixel coordinates of a player in a real image \(piH=pt\), and then multiply them by the homography matrix $H$, we will obtain the coordinates of that player in the template image $p_t$.
To compute a homography, we first need to find corresponding points in two images. According to mathematics, we need at least four corresponding points to retrieve a valid homography matrix.
StatsBomb has developed an excellent application that enables human labelers to quickly create four pairs of points and then automatically solve the system of equations that provides the homography matrix.
This procedure is fast, but when you consider the incredibly large number of leagues, games and frames that need to be processed in a single week, you quickly realize that scaling and automating this task is essential. Without automation, it becomes a bottleneck for all other applications.
How does AI help in this?
AI has had a major impact on the field of computer vision, with applications in tasks such as object detection, image segmentation, and homography estimation. Traditional methods for homography estimation are often computationally expensive and can be inaccurate, especially in cases where the images are of poor quality or have been taken under challenging conditions. AI can be used to improve the accuracy and efficiency of homography estimation by learning from large datasets of images. Deep learning models, in particular, have been shown to be very effective for this task.
Deep learning is a subfield of artificial intelligence that focuses on training artificial neural networks to learn from large amounts of data. Supervised learning is a common technique used in deep learning, where the network is trained on labeled data to predict the correct output for new, unseen inputs.
There are many different ways to train a deep learning model to recover the homography matrix between an image and a pitch template. Our current model is trained to recognize characteristic locations on the pitch, which can then be used to solve the equation $p_i H = p_t$ to obtain the homography matrix.
At the beginning of training, the model's performance is poor. It makes many mistakes and is unable to distinguish between different characteristic locations. For example, it might confuse the endzone line with a yardline. However, as the model is exposed to more images, it begins to extract patterns that can be applied to new, unseen images. This allows it to predict the homography of an image it has never seen before. This process is called model training, and it is the key to our homography estimator's performance.
One of the main drawbacks of these models is their requirement for a large amount of labeled data to function properly. Fortunately, StatsBomb excels in this specific task and is the best in the market for it, thanks to our highly-trained human collection group.
Once our model has been trained, it is ready to be deployed and used. However, it is important to keep in mind that models are trained with past data. Changes to stadiums or matches played in extreme weather or special lighting conditions may not be present in our training data. As a result, our model may struggle in these new situations due to lack of experience. This is why we continually train our models, incorporating new data to ensure high-quality results.
This is the first in a series of articles from our AI team, as we aim to give you an insight into how AI is central to our data collection group. If you want to discuss any of the themes or ideas touched upon in the article, feel free to connect with our Machine Learning Engineer and author of the piece Miguel Méndez Pérez on Twitter.