Attention has been shifted towards the realization of the role played by visual communication modes in enhancing literacy especially for English language learners (ELL). Multimodality in English language learning involves the use of combined communication modes like images, sounds, videos and pictures as well as graphic representations among other modes (Denton and Sewell 2011). Studies on English learning have demonstrated that multimodal communicative modes are more captivating to learners and can add more value to student learning. As Royce (2007) points out. competence in English as a second language has been overemphasized without giving due consideration to the relations between various semiotics in order to determine the characteristics that bring the visual-verbal coherency in multimodal texts. Kampf, Kastberg, and Maier (2007) observe that integrating verbal and visual discourses remains a major challenge in language teaching hence the need to enhance the connections between different modes involved in ESL learning. This paper examines the importance of multimodal literacy for English language learners and the possible ways of enhancing literacy for ELL.Following technological advancements, there has been a shift from the traditional literacy practices towards new forms of literacies in language learning. New technologies provide a range of applications which can be utilized to generate new stylistic types which have expanded the range of language expression (Wu 2010). Lim and O’Halloran (2011) observe that instead of fragmenting literacy into various literacy forms like visual, emotional, verbal and digital literacy among other forms which disregard the power of traditional literacy, it should be viewed as a dynamic process of enhancing understanding using multimodal semiotic discourses. In multimodal literacy, discourse design is explored by examining the role of various semiotic resources like language, images, and gestures.