Location Tracking With ArUco Markers in Python

In different situations we might want to track the location of something or someone. That could be during the development of self driving cars, autonomous robots (like the amazing vacuum cleaner that does the work for you) or even in laboratory settings when observing people’s behavior in front of a stimulus. Different methods exist to achieve location tracking, and here we will go through one of them which implements ArUco markers from OpenCV in Python.

In the example below we see an object that has some markers on its surface, as seen by a camera which is moving in front of it. We have seen before how to get the pose of the object in camera coordinates and reproduce that in a 3D. Here we are switching perspectives: we are tracking then location of the camera relative to the object (aka. switching form camera to world coordinates).

Animation 1: Example of location tracking using ArUco Markers. Top-right is the visualization from a frontal perspective; Top-left is from the top; Bottom-right is from the side. Bottom-left is the actual video frame.

In this program, we are building on the same functions presented in the previous article , like which() and customAruco() that find index of an item (or items) in a list and create a custom marker dictionary respectively. 

The readObj() function, which loads a 3D object in .obj format, has now flip and scale arguments, to apply simple transformations to the model.

The function used to merge images can now merge either horizontally or vertically, which is used later in the code to create the plot presented above.

We also have a slightly different plot3d() function, to allow for neater code when creating multiple plots . As the code proceeds we see how this function is used in a loop to create 3 plots with different perspectives:

# add plots to a list to update in a loop later
plots = []
for pl in range(len(elev_list)):
    plots.append(plot3d(vertices, triangles, elevation = elev_list[pl], azimut = azim_list[pl], axes = False, draw = True))

Another difference with previous programs is that we are applying a moving median function to location estimate, to reduce the noise in the data and get a smoother estimate. This is done by creating a deque()from the collections package, which acts like a ring buffer, or a list with maximum length. As we start appending the position estimated form each frame, the “buffer” will reach the maximum length and then starts replacing the oldest element.

# create a list with fixed length, for simple noise reduction
pos_list_len = 15
camera_position_list = deque(maxlen=pos_list_len)
# add current position to buffer

Then, when the buffer reaches its maximum length, we start using the median of all the values to get the estimate:

# gather enough data in list for noise reduction
if len(camera_position_list) == pos_list_len:
    x_nr = np.median([x[0][0] for x in camera_position_list])
    y_nr = np.median([x[1][0] for x in camera_position_list])
    z_nr = np.median([x[2][0] for x in camera_position_list])
    thisPosition = [x_nr, y_nr, z_nr]

Next, as we are plotting lines, we need a starting and ending points, so we make sure we have a location estimate from last frame then we plot to all three plots:

if lastPosition is not None:
    x = [lastPosition[0], thisPosition[0]]
    y = [lastPosition[1], thisPosition[1]]
    z = [lastPosition[2], thisPosition[2]]
    for pl in range(len(elev_list)):
        plots[pl].plot(x , y, z, "k", linewidth = 2)

Finally, at the end of each frame, we merge the different plots and the frame image and show them and/or write them to the videoOut file:

# Display the resulting frame
merged_frame_1 = merge_images(canvas2rgb_array(plots[0].figure.canvas), canvas2rgb_array(plots[1].figure.canvas))
merged_frame_2 = merge_images(frame, np.rot90(canvas2rgb_array(plots[2].figure.canvas), 3))
merged_frame = merge_images(merged_frame_1, merged_frame_2, horiz = False)
if args.outputVideo is not None:

The full source-code for this program can be found on my GitHub ArUco repository. It is a proof-of-concept, and there is a lot of space for improvement as far as accuracy is concerned. First off, a proper calibration of the camera needs to be done. Additionally, all units of measurement must be consistent (including the 3D object loaded at the beginning) for the position to be estimated more accurately.

Potential applications of location tracking are many, including those mentioned at the beginning of this article. It can also be a valuable method to track people’s location in a store or in a factory setting, simply by fitting a head-mounted camera (typically included in eye tracking glasses). When enriched with other metrics, we could get valuable information regarding the path participants take, their permanence in specific areas and potential obstacles in the overall experience, which are usually used in virtual environments.

If you have enjoyed this article and think it can be useful for your projects, feel free to drop me a line and I’d be happy to discuss different methodologies and metrics that fit your research. Don’t hesitate to write me if you have any questions or if you spot a problem with the code or something that can be improved, either on this contact form or on social media links below.