In this paper, we present a unifying approach for learning and recognition of objects in unstructured environments through exploration. Taking inspiration from how young infants learn objects, we establish four principles for object learning. First, early object detection is based on an attention mechanism detecting salient parts in the scene. Second, motion of the object allows more accurate object localization. Next, acquiring multiple observations of the object through manipulation allows a more robust representation of the object. And last, object recognition benefits from a multi-modal representation. Using these principles, we developed a unifying method including visual attention, smooth pursuit of the object, and a multi-view and multi-modal object representation. Our results indicate the effectiveness of this approach and the improvement of the system when multiple observations are acquired from active object manipulation.