To support humans in their daily lives, robots are required to autonomously learn, adapt to objects and environments, and perform the appropriate actions. We tackled cooking scrambled eggs using real ingredients, in which the robot needs to perceive the states of the egg and adjust stirring movement on the fly, while the egg is heated and the state changes continuously. In previous works, handling changing objects was found to be challenging because sensory information includes dynamical, both important or noisy information, and the modality which should be focused on changes every time, making it difficult to realize both perception and motion generation in real-time. We propose a predictive recurrent neural network with an attention mechanism that can weigh the sensor input, distinguishing how important and reliable each modality is, and that realizes quick and efficient perception and motion generation. We validated the proposed technique using the physical humanoid robot, Dry-AIREC.