Symbolic systems require hand-coded symbolic representation as input, resulting in a knowledge acquisition bottleneck. Meanwhile, although deep learning has achieved significant success in many fields, the knowledge is encoded in a subsymbolic representation which is incompatible with symbolic systems. To address the gap between the two fields, one has to solve Symbol Grounding problem: The question of how a machine can generate symbols automatically. We discuss our recent work called Latplan, an unsupervised architecture combining deep learning and classical planning. Given only an unlabeled set of image pairs showing a subset of transitions allowed in the environment (training inputs), Latplan learns a complete propositional PDDL action model of the environment. Later, when a pair of images representing the initial and the goal states (planning inputs) is given, Latplan finds a plan to the goal state in a symbolic latent space and returns a visualized plan execution. We discuss several key ideas that made Latplan possible which would hopefully extend to many other symbolic paradigms outside classical planning.