Automatic Sentence Based Image Description Generation Framework

No Thumbnail Available



Journal Title

Journal ISSN

Volume Title


Addis Ababa University


Sentence-based image description generation is a challenging task involving several open problems in the fields of Natural Language Processing and Computer Vision. To address this problem most of the previous efforts for this task rely on visual clues and corpus statistics. The generation approaches employ both concepts-to-text and text-to-text natural language generation methods, which generate image description by transferring text from descriptions of a similar image and generate a summary for a new image from retrieval related document but do not take much advantage of the semantic information inherent in the available image descriptions. Since these approaches have no capable of building novel descriptions. We focus on novel descriptions generation for unseen images. Here, we present a generic approach, which benefits from two sources visual data and available descriptions simultaneously. Our approach works on syntactically and linguistically motivated phrases extracted from the human descriptions. The proposed framework has three main components, which are called Image Engine, Search Engine, and Text Engine. Image Engine does feature extraction from training image dataset and provide to indexer sub-component, after indexation is completed visual word, construction is done by clustering local descriptor. Search Engine does feature extract from unseen image and compute similarity measure between image feature in the index and unseen image. The text engine does syntactically, and linguistically motivated phrases extracted from the textual descriptions and generate linguistic model. Then each image associate with linguistic model. Finally, text engine does assemble phrases into a grammatically correct sentence. Experimental evaluations demonstrate that our design mostly generates well-spoken and semantically correct descriptions. In order to validate the proposed approach, a Java-based prototype is developed. We used LIRe and Lucerne for low-level features extraction and indexation and for phrase extraction, we used Stanford core NLP and for sentence generation, we used SimpleNLG. Using relative metrics such as recall and precision measures were conducted using sample test images. The experimental result gives 60% recall and 75% precision.



Text Engine, Image Engine, Search Engine, Image Index, Phrase Relevance Evaluation, Phrase Integration