Verification is a core component for dialogue agents, as all conversations within scope must be handled predictably. The current approaches used to analyze agent capabilities are time-consuming and tedious, leaving dialogue designers unable to reliably understand the capabilities of their agents. In this paper, we address this issue with our novel method for systematic testing called Conversation Alignment, which uses a tailored Beam Search algorithm to explore how well the agent can handle given conversations. We also provide the dialogue designer with visual metrics that indicate where the majority of conversations are failing. We evaluated our system by measuring how effectively errors are captured, using the system to find errors iteratively, and scaling hyperparameters to test how performance was affected. We show that Beam Search is more effective than Greedy Search in providing useful failure metrics to the dialogue designer and that Conversation Alignment is an effective tool for incrementally reducing the number of failed conversations when used iteratively.