A large amount of procedural videos on the web show how to complete various tasks. These tasks can often be accomplished in different ways and step orderings, with some steps able to be performed simultaneously, while others are constrained to be completed in a specific order. Flow graphs can be used to illustrate the step relationships of a task. Current task-based methods try to learn a single flow graph for all available videos of a specific task. The extracted flow graphs tend to be too abstract, failing to capture detailed step descriptions. In this work, our aim is to learn accurate and rich flow graphs by extracting them from a single video. We propose Box2Flow, an instance-based method to predict a step flow graph from a given procedural video. In detail, we extract bounding boxes from videos, predict pairwise edge probabilities between step pairs, and build the flow graph with a spanning tree algorithm. Experiments on MM-ReS and YouCookII show our method can extract flow graphs effectively.