While planning models are symbolic and precise, the real world is noisy and unstructured. This work aims to bridge the gap between noise and structure by aligning visualizations of planning states to the underlying state space structure. Further, we do so in the presence of noise and augmentations that simulates a commonly overlooked property of real environments: several variations of semantically equivalent states. First, we create a dataset that visualizes states for several common planning domains; each state is generated in a way that introduces variability or noise. E.g., objects changing in location or appearance in a manner that preserves semantic meaning. First we train a contrastive learning model to predict the underlying states from the images. We then evaluate how we can align the predictions of a given sequence of visualized states with the problem’s reachable state space, taking advantage of the known structure to improve predictions. We compare two methods for doing so: a greedy approach and Viterbi’s algorithm, a well-established algorithm for observation decoding given a hidden Markov model. The results demonstrate that these alignment methods can correct errors in the model and significantly improve predictive accuracy.