Computer Engineering

Permanent URI for this collection

http://etd.aau.edu.et/handle/123456789/48

Browse

Now showing 1 - 20 of 232

Predicting Customer Churn in Digital Banking Services Using Machine Learning
(Addis Ababa University, 2025-06) Hlina Neguse; Biniam Tadesse (PhD)
an increasingly competitive digital banking environment, retaining customers is a key consideration in emerging markets like Ethiopia, where the adoption of digital offerings today is increasing rapidly. This study solves the gap of the lack of Ethiopian digital-specific prediction models, and the limited incorporation of sentiment analysis and probabilistic approaches. It proposes a suite of machine learning–based predictions of the risk of customer churn in the Ethiopia digital banking environment utilizing a comprehensive (one year) dataset of behavioral transactions, demographic details, and customer sentiment measures based on surveys. The researchers trained, tested, and compared five supervised learning models: Logistic Regression, Decision Tree, Gradient Boosting, Random Forest, and. Neural Network. The Random Forest approach performed best overall scoring 95% accuracy based on 0.204 score of log loss and ROC-AUC score of 0.984. The addition of sentiment features significantly improves the model's performance and highlights the potential value of obtaining customer sentiment responses on their likelihood of churn. Feature importance analysis using SHAP revealed that the most influential predictors of churn were EASY_SCORE, REGION, LAST_LOGIN, CONTINUE_SCORE, SECURE_SCORE, BALANCE, and NO_DB_TRN. The study also implemented individualized churn probability predictions. These results affirm the need for customer-centric churn models that account for both behavior and perception. Overall, this study underscores the importance of blending behavioral analytics with customer feedback to develop proactive, personalized retention strategies in Ethiopia’s growing digital banking ecosystem.
Acceleration of H.266 Encoding Using OPENCL And Vectorization with Block Size Variation
(Addis Ababa University, 2025-06) Michael Girma; Fitsum Assamnew (PhD)
Versatile Video Coding (H.266) achieves approximately a 50% reduction in bitrate compared to its predecessor. However, this improvement in compression efficiency comes with a significant increase in computational complexity, presenting major challenges for real-time encoding on general-purpose processors. Most existing H.266 (VVC) implementations rely heavily on CPU-only processing or on vendor specific GPU solutions such as CUDA, which limits portability and cross platform compatibility. Moreover, these approaches often fail to fully utilize modern heterogeneous CPU-GPU architectures, leaving substantial performance potential unexploited. This work proposes an OpenCL-based H.266 encoding solution aimed at delivering high performance, broad cross-platform support, and efficient hardware utilization. Key encoding modules including block partitioning, prediction, transform and quantization, loop filtering, and entropy coding—are implemented as OpenCL kernels to leverage task-level parallelism across both CPUs and GPUs. Additionally, AVX and SSE vectorization techniques are applied on the CPU side to enhance per-core throughput, particularly in compute intensive operations such as transform and quantization. Experimental results across various platforms demonstrate significant performance improvements. On an NVIDIA V100 GPU, the OpenCL-accelerated encoder achieves speedups of up to 7500× compared to a sequential implementation running on an Intel Xeon E5-2698 v4, with peak efficiency observed at a block size of 512×512. Tests conducted on an Intel UHD 620 GPU and an Intel i5-8265U CPU reveal speedups ranging from 15.5× to 370×, depending on the block size. The findings suggest that medium block sizes (64×64 to 256×256) strike the best balance between computational efficiency and workload distribution. While AVX provides only modest gains over SSE, the primary performance bottleneck lies in memory access speed rather than computational power. Overall, the proposed OpenCL-based implementation significantly accelerates H.266 encoding while maintaining high compression quality.
Formulating Goal Spaces from Latent Space Representations of Raw Sensory Inputs for Autonomous Learning of Artificial Agents
(Addis Ababa University, 2025-06) Mekdes Masresha; Menore Tekeba (PhD)
Interpretable Hybrid Multichannel Deep Learning for 12-Lead ECG-based Heart Disease Classification
(Addis Ababa University, 2025-02) Yehualashet Megersa; Schwenker, Friedhelm (Prof.); Bisrat Derebssa (PhD); Taye Girma (PhD)
The electrocardiogram (ECG) is a noninvasive and affordable tool that offers valuable insights into heart activity from multiple perspectives. However, medical practitioners often face difficulties in diagnosing underlying heart conditions from ECG signals. To address these challenges and improve diagnostic accuracy, researchers have investigated the potential of deep learning (DL) techniques. Nevertheless, developing a robust and interpretable deep learning model that performs well across diverse ECG datasets remains a key research focus. Thus, in this PhD research, an interpretable deep learning system is designed, incorporating preprocessing of ECG signal and post-hoc interpretability. The designed model is a multichannel hybrid deep learning architecture consisting of 12 blocks, each combining a one-dimensional (1D) convolutional neural network (CNN) with bidirectional long shortterm memory (BiLSTM) networks. After the 12 blocks, the feature maps are concatenated and further processed by an attention mechanism and a two-dimensional (2D) CNN. All components, including the 1D CNN, BiLSTM layers, attention mechanism, and 2D CNN, are used as feature extraction backbones. Subsequently, fully connected (FC) layers are incorporated for classification. The model was independently trained and tested on three distinct 12-lead ECG datasets: (1) the PTB-XL dataset, using five super-diagnostic classes, (2) the CODE-15% dataset, encompassing six heart disease classes, and (3) the Chapman Arrhythmia datasets, which were analyzed using two configurations: seven reduced classes (Chapman-Reduced) and four merged classes (Chapman-Merged). The model achieved average test accuracy rates of 89.84%, 97.82%, 98.55%, and 98.80% for these datasets, respectively. The result indicates the model’s effectiveness across different ECG datasets. To understand how the model reached its classification result, we applied two post-hoc interpretability techniques: Gradient-weighted Class Activation Mapping plus (Grad-CAM++) and SHapley Additive exPlanations (SHAP). These techniques were used to visualize influential segments of the ECG signal, both at the instance level for specific samples and at the test set level for assessing the overall contributions of individual ECG leads. SHAP, with its theoretical grounding, ensures consistent feature attribution by capturing causal relationships within the ECG data. Meanwhile, Grad-CAM++, through causal localization, identifies regions of the ECG signals that influenced the model’s decisions. The interpretability provided from both techniques were cross-checked against heart disease manifestations in ECG signals using established cardiology literature, ensuring alignment with clinical patterns. The model’s performance and the output interpretation techniques demonstrate that the proposed approach is a practicable tool for ECG-based heart disease diagnosis
Balanced View Temporal Contrastive Learning (BV-TCLR) for Improved Video Representation Learning
(Addis Ababa University, 2025-01) Ayantu Tesema; Menore Tekeba (PhD)
Understanding video data is crucial for tasks like action recognition, event detection, and video classification. However, traditional methods often struggle to effectively capture both the spatial and temporal aspects of video. To address this challenge, we introduce Balanced View Temporal Contrastive Learning (BV-TCLR), a new approach designed to improve video representation by addressing the issue of temporal imbalances. The term "Balanced View" refers to a method that ensures the model is exposed to both frequent and rare temporal events during training. This approach helps the model avoid focusing too much on common events while overlooking rare but important ones, leading to a more balanced and comprehensive understanding of the video data. This is achieved by combining balanced sampling and data augmentation techniques to diversify the temporal patterns the model learns from. We tested BV-TCLR on benchmark datasets like UCF101 and UCF10, and the results are promising. In linear evaluation, BV-TCLR boosts accuracy by 2.2% (from 91% to 93.2%) and increases F1-score by 2.5% (from 90% to 92.5%) compared to traditional Temporal Contrastive Learning (TCLR). In nearest neighbor retrieval, BV-TCLR outperforms TCLR with 0.8% higher accuracy (91.8% vs. 91%) and a 1.2% improvement in F1-score (91.2% vs. 90%). These results show that BV-TCLR is not only more accurate but also more adaptable, making it a powerful tool for tackling real-world challenges in video analysis.
Long-Term Inflation Trend Prediction In Ethiopia Using LSTM And ARIMA Ensemble Model
(Addis Ababa University, 2025-03) Eyasu Desta; Bisrat Derebssa (PhD)
Accurate inflation forecasting is crucial for economic stability, influencing policy formulation, financial planning, and market predictions. In Ethiopia, inflation dynamics are shaped by complex, interdependent factors, including macroeconomic indicators and sudden economic shocks such as civil war and drought. Traditional methods like ARIMA excel at capturing linear trends but struggle with non-linearities and external influences. Conversely, machine learning approaches like LSTM neural networks effectively model non-linear dynamics but often require extensive data and complex parameter tuning, limiting their applicability in data-scarce environments. To address these limitations, this study introduces an ensemble forecasting model that integrates ARIMA and LSTM, leveraging the strengths of both methodologies. The ensemble model uses historical economic data from Ethiopia spanning 1979 to 2023, incorporating key features such as money supply, real GDP, government investment, interest rates, exchange rates, and binary event flags representing sudden economic shocks. By combining ARIMA's proficiency in linear trend detection with LSTM's ability to model complex, nonlinear relationships, the ensemble approach achieves superior predictive accuracy. The study evaluates the model's performance under varying conditions, including different historical data lengths for training and testing, ensuring adaptability to diverse data availability scenarios. Additionally, the ensemble model handles multiple prediction horizons, providing reliable forecasts for inflation trends. This flexibility enhances its utility for policymakers and economists. The inclusion of binary event flags for civil war and drought ensures the model accounts for sudden disruptions, which significantly impact inflation in developing economies. Results indicate a substantial improvement in accuracy, with the ensemble model achieving 97.77% accuracy, outperforming standalone ARIMA (78.32%) and LSTM (89.01%). This highlights the model's effectiveness in capturing both linear and nonlinear patterns while addressing external shocks. The findings underscore the ensemble model's potential as a robust tool for economic forecasting and policymaking, facilitating informed fiscal strategies. Future research could explore further customization, integration of additional macroeconomic variables, and enhanced responsiveness to economic shocks, ensuring even greater predictive capability.
Imaginative and Contrastive Based Self Learning Agent
(Addis Ababa University, 2024-06) Kalkidan Behailu; Menore Tekeba (PhD)
Developing an agent in reinforcement learning (RL) that is capable of performing complex control tasks directly from high-dimensional observation such as raw pixels is a challenge as efforts still need to be made towards improving sample efficiency. This paper explores an unsupervised learning framework that leverages imaginative and contrastive-based representations to enhance sample efficiency in reinforcement learning, working directly with raw pixels. It incorporates an imaginative module and performs contrastive learning to train its deep convolutional neural network-based encoder to extract temporal and instance information representation to achieve a more sample efficiency for RL. It extracts high-level features from raw pixels using the hybrid of contrastive and imaginative based unsupervised representation learning. It performs off-policy control using the extracted features, enabling the agent to imagine its future states and capture temporal dependencies. The agent's dynamic behavior can be understood by generating learnable patterns. Our method outperforms prior both imaginative and contrastive pixel-based learning methods on complex tasks in of the DeepMind Control Suite at the 100K environment and interaction time-steps benchmarks
Improving Energy Efficiency and Security of LEACH Routing Protocol against Black Hole Attack in Wireless Sensor Network
(Addis Ababa University, 2021-12) Zeru Kifle; Yalemzewd Negash(PhD)
A Wireless sensor network (WSN) consists of masses of tiny sensor nodes working in a cooperative manner that sense the physical or environmental circumstances and send this information to the base station. WSN has different constraints due to limited sensor nodes resources, lack of central control, and unreliable communication of routing protocol used in WSN. Sensor nodes in WSN have limited resources such as memory capacity, storage space, and energy. These resource constraints in WSN lead to the limitations to implement the most commonly used cryptography algorithm to secure WSNs. Nowadays, secured data transferring and reducing energy dissipation of sensor nodes is the main challenge in WSNs. A black hole attack is one of the security challenges in WSN at the network layer in Low-Energy Adaptive Clustering Hierarchy (LEACH) routing protocol. Several scholars proposed energy-efficient and security schemes that enhanced the LEACH routing protocol. But there is a gap in addressing both energy efficiency and and black hole attack issues at the same time in the LEACH routing protocol during cluster head(CH) selection. This paper improves the energy efficiency and security of the LEACH routing protocol when CH is selected. The energy efficiency improved by modifying the parameters of the cluster head selection algorithm for threshold function. And also, the security issue of the black hole attack detected and prevented based on the behaviors of the sensor node shown in the network when the sensor nodes request to participate in the CHs selection process at the setup phase in LEACH. Unlike LEACH routing protocol, the modified threshold function for CH selection algorithm of proposed routing protocol includes the parameters of residual energy of sensor node, the average residual energy of networks, and distance between the sensor node and base station (BS) to calculate the threshold value. The CH selection algorithm decides whether sensor nodes are selected as CHs or not in the network depending on the threshold value. This improvement extends the network lifetime and transmits reliable data in WSN. According to the simulation result, the proposed protocol reduces the energy usage of WSN and minimizes the probability of malicious nodes selected as cluster heads in the LEACH routing protocol.The proposed routing protocol improves the LEACH routing by an average of 34.67%, 28.43% 237.65% and 21.676% in terms of residual energy, network lifetime, packet sent to BS and throughput’s respectively. After implementing security solution on the proposed LEACH it improved by 1.88% in terms number of packet sent to BS under malicious attack. The proposed routing protocol for WSNs simulation was conducted with MATLAB on two scenarios.
Testing the Invariance of Skills and Strategies Developed by Artificial Agents under Different Sensory Modalities
(Addis Ababa University, 2024-06) Meseret Gebremichael; Menore Tekeba (PhD)
Artificial intelligence (AI) is the intelligence of machines or software, as opposed to the intelligence of humans or animals by developing artificial intelligence agents. Artificial Intelligence (AI) agent must be aware of the external environment to understand or to execute tasks. The interaction between an agent and the environment is achieved by the aid of input sensory modality (image, sound, lidar or other input modality) using metrics reward sizes they collect during reinforcement learning. To do so, the we used two algorithms, Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC) for the experiments. PPO is a reinforcement learning algorithm that is used to train agents to perform tasks in environments through trial and error. It is designed to optimize policies, which are the strategies or behaviors that the agents use to make decisions. PPO aims to find the optimal policy by iteratively updating the parameters based on the collected experiences from interactions with the environment. SAC is a reinforcement learning algorithm used for training agents in environments with continuous action spaces. It is an extension of the actor-critic framework that combines the advantages of both policy optimization and value estimation methods. During the experiment researcher developed the agents and the environment model in the project and selected the appropriate RL method, implemented and designed the metrics of the learning performance of the agents depending on their nature. We demonstrate agents' varied training performances under different sensory modalities, with optimal outcomes observed when combining multiple modalities. Despite these differences, the study underscores agents' adaptability across sensory inputs, advancing our understanding of cross-modal learning in AI. Also, the researcher has shown the invariance of sensory modality results on the learning skills and strategies of an agent and developed simple game like reaching the goal with different input sensory information to test the agent skill and strategies under different perceptual modalities.
Vision to Auditory Substitution for an Artificial Agent
(Addis Ababa University, 2025-01) Semira Mohammed; Menore Tekeba (PhD)
Sensory substitution technology converts raw visual input into auditory soundscapes, allowing individuals to “see” with sound. However, mastering this skill requires significant cognitive adaptation, extensive training, and practical application in realistic, everyday scenarios. Experiments with humans have shown the potential for auditory substitution of vision, but these efforts are limited by high costs, ethical concerns, and the risk of unintended side effects, such as impaired auditory skills. To address these challenges, this study develops a Vision-to-Auditory Sensory Substitution system for artificial agents. By simulating sensory substitution in a controlled reinforcement learning (RL) framework, this approach eliminates the need for human experimentation while retaining the ability to explore learning dynamics and decision-making behaviors. Using the Proximal Policy Optimization (PPO) algorithm, agents were trained in two OpenAI Gym environments—CarRacing-v2 and LunarLander-v2—to compare the performance of vision-based and auditory-based agents. The results demonstrate that auditory agents, despite inherent challenges in interpreting sound-encoded visual inputs, achieved means rewards of 427.91 in CarRacing-v2 environments and 259.85 in the LunarLander-v2 environment over 100 episodes. These findings highlight the potential of sensory substitution systems in enabling artificial agents to act effectively using auditory cues. This research contributes to advancing assistive technologies while addressing the limitations and risks of human-based sensory substitution experiments.
A Top- Down Chart Parser for Ge’ez Sentences
(Addis Ababa University, 2024-12) Nega Meareg; Surafel Lemma (PhD)
Parsing is the process of breaking down a sentence into its part of speech or words like verb, noun, preposition, adjective and so on. Parsing plays an important role in enhancing the performance of numerous natural languages processing (NLP). Here this work is designed to parse geez language on the approach of a Top-Down chart Parser for Geez language sentences using the context free grammar rule (CFG). We reviewed various parsing approaches for different languages, to achieve our objective. How ever, Geez language parsing remains challenging dueto the absence of annotated dataset. To address this gap, we collected a dataset for sentence parsing from the well-known book Mezumere Dawit.Given the lack of the lack of pre-existing labled data; we collaborated with a language expert (Amanuel), an instructor in the Geez Department at Aksum University, to ensure lingustic accuracy. The expert prepared the dataset in a format suitable fstablished grammatical rules ford sentence construction based on verb and noun phrase structures, and manually parsed the sentences. This study presents a Top-Down parsing approach for Geez language sentences; addressing the challenges posed by the language’s unique morphological and synthetic structures. Using a dataset of 500 sourced from the book ’Mezumure Dawit” the parser correctly parsed 470 sentences, achiving a parsing accuracy of 94%. The results were validated through the manual parsing, where 490 sentences were parsed manually, with 470 sentences matching the parser’s output. The performance of the parser was evaluated using standard metrics, including precision, recall, F1 score, and Experimental results show the effectiveness of the proposed method in parsing Geez language sentences. Thrid study contributes a foundational step towards computational processing of Geez language, with potential applications in machine translation and historical analysis.
A Deterministic Approach to Tri-Radical Amharic Verb Derivatives Generation
(Addis Ababa University, 2022-02) Samrawit Kassaye; Yalemzewd Negash (PhD)
Morphological synthesis or generation is a process of returning one or more surface forms from a sequence of underlying (lexical) forms. Today, synthesizers of different kinds have been developed for languages that have relatively wider use internationally. Amharic is the second most populous Semitic languages after Arabic. But it is not exploited in the digital world. In this research paper, a rule-based approach to morphologically derive or generate Amharic words from tri-radical verbs to finally generate rich Amharic lexicon is elaborated. This work utilizes two data sources namely Amharic word list and tri-radical verbs. The Amharic word list file contains more than 450,000 unique Amharic words. The tri-radical verb data source contains more than 350 unique tri-radical verbs. The proposed method works towards identifying rules from existing Amharic words after analyzing with tri-radical verbs. The new feature identified is applying index changing the letter of tri-radical verbs. Index changing (adding vowel to consonant letters) is one of the approaches used in morphological derivation of Amharic words from root(stem) words. After index changing of the tri-radical verbs, the index changed words will be searched in the Amharic word list file. If the index changed words are found directly or part of word with prefix and/or suffix, the pattern of the word with respect to the root verb and index changed words will be captured. From the pattern captured morphemes are extracted and rules are identified. 85,115 unique rules are identified. While identifying rules, the frequency of every rule is recorded in order to evaluate the efficiency of each rule. A memory-based machine learning approach applied to evaluate the frequency of the rules. From the 85,115 rules, the prefix of 29,776 rules and the suffix of 32,401 rules are wrong, and 11,390 rules are discarded by wrong index changing process. The rules identified showed the accuracy of 0.99, average precision of 0.88 and average recall of 0.85. Based on these rules, a comprehensive set of derivatives for tri-radical Amharic verbs were generated and end up having a rich Amharic Lexicon.
Automated Construction of a New Dataset for Histopathological Breast Cancer Images
(Addis Ababa University, 2024-01) Kalkidan Kebede; Fitsum Assamnew (PhD)
Cancer is a medical condition where cells grow uncontrollably and can spread to other parts of the body, posing a significant global health challenge. Among women worldwide, breast cancer is the most frequently diagnosed cancer and the leading cause of cancerrelated deaths. Automated classification of breast cancer has been extensively studied, particularly in differentiating types, subtypes, and stages. However, simultaneous classification of subtypes with stages, such as Lobular Carcinoma In Situ (LCIS) and Invasive Lobular Carcinoma (ILC), remains challenging due to limited data availability. This research aims to address this gap by generating a new dataset that includes these unclassified subtypes with staging, utilizing existing datasets as primary sources. Labels for ductal and lobular carcinoma from the BreakHis dataset and invasive and in situ carcinoma labels from the Yan et al. dataset are used to train models for generating the new dataset. To achieve this, two separate ensemble models are trained using distinct datasets. The first ensemble model classifies ductal and lobular carcinoma using the BreakHis dataset. The second ensemble model classifies invasive and in situ carcinoma using the Yan et al. dataset. Both models are then used to extract a new dataset through soft voting techniques. The extracted labels include Ductal Carcinoma In Situ (DCIS), Invasive Ductal Carcinoma (IDC), LCIS, and ILC. This approach aims to provide a more comprehensive classification system by leveraging labels from both datasets. To validate the newly extracted labels, three pathologists were given randomly extracted images from the Yan et al. dataset test set. The pathologists agreed with the model outputs on 87.5% of the samples. Subsequently, the newly generated dataset was used to classify DCIS, IDC, LCIS, and ILC with an accuracy of 76.06%.
Spectrum Occupancy Prediction Using Deep Learning Algorithms
(Addis Ababa University, 2024-07) Addisu Melkie; Getachew Alemu (Phd)
The fixed spectrum allocation (FSA) policy causes a waste of valuable and limited natural resources because a significant portion of the spectrum allocated to users is unused. With the exponential growth of wireless devices and the continuous development of new technologies demanding more bandwidth, there is a significant spectrum shortage under current policies. Dynamic spectrum access (DSA) implemented in a cognitive radio network (CRN) is an emerging solution to meet the growing demand for spectrum that promises to improve spectrum utilization that enables secondary users (SUs) to utilize unused spectrum allocated to primary users (PUs). CRNs have capabilities for empowerment to spectrum sensing, decision-making, sharing, and mobility. Spectrum sharing gets spectrum usage patterns from spectrum occupancy prediction to determine the channel states as “idle” or “busy”. This study has addressed all the limitations of the previous studies by implementing a comprehensive approach that encompasses reliable spectrum sensing, potential candidate spectrum band identification, long-term adaptive prediction modeling, and quantification of improvements achieved in the prediction model. The Long-Short Term Memory (LSTM) Deep Learning (DL) model was proposed as a solution for this study to address the challenge of capturing temporal dynamics in sequential inputs. The LSTM model leverages a gating mechanism to regulate information flow within the network, allowing it to learn and model long-term temporal dependencies effectively. The dataset used for this study was obtained from a real-world spectrum measurement by employing the Cyclostationary Feature Detection (CFD) approaches in the GSM900 mobile network uplink band, spanning a frequency range of 902.5 to 915 MHz over five consecutive days. The dataset comprises a total of 225,000 data points. The five-day spectrum measurement data analysis yields an average spectrum utilization of 20.47%. The proposed model has predicted the spectrum occupancy state for 5 hours ahead in the future with an accuracy of 99.45% improved the spectrum utilization from 20.47% to 98.28% and reduced the sensing energy to 29.39% compared to real-time sensing.
DEACT Hardware Solution to Rowhammer Attacks
(Addis Ababa University, 2024-05) Tesfamichael Gebregziabher; Mohammed Ismail (Prof.); Fitsum Assamnew (PhD)
Dynamic Random-Access Memory (DRAM) technology has advanced significantly, resulting in faster access times and increased storage capacities by shrinking the size of memory cells and tightly packing them on a chip. However, as the scaling of DRAM continues, it presents new challenges and considerations that need to be addressed. Smaller memory cells and the proximity between them have led to circuit disturbance errors, such as the Row-hammer problem. These errors can be exploited by attackers to induce bit flips and gain unauthorized access to systems, posing a significant security threat. In this research, we propose DEACT, a counter-based hardware mitigation approach designed to tackle the Row-hammer problem in DRAM. It moves all frequently accessed rows to a safety sub-array. DEACT aims to prevent further row activations and maintain hot rows, effectively eliminating the vulnerability. Furthermore, our counter implementation requires smaller chip area compared to existing solutions. Moreover, We introduce DDRSHARP, a cycle-accurate DRAM simulator that simplifies configuration and evaluation of various DRAM standards. DDRSHARP provides over 1.8x simulation time reduction compared to contemporary simulators. Its performance is optimized by avoiding infeasible iterations, minimizing branch instructions, caching repetitive calculations and other optimizations.
Addressing User Cold Start Problem in Amharic YouTube Advertisement Recommendation Using BERT
(Addis Ababa University, 2024-06) Firehiwot Kebede; Fitsum Assamnew (PhD)
With the rapid growth of the internet and smart mobile devices, online advertising has become widely accepted across various social media platforms. These platforms employ recommendation systems to personalize advertisements for individual users. However, a significant challenge for these systems is the user cold-start problem, where recommending items to new users is difficult due to the lack of historical preference of the user in a content-based recommendation system. To address this issue we propose an Amharic YouTube advertisement recommendation system for unsigned YouTube users where there is no user information like past preference or personal information. The proposed system uses content-based filtering techniques and leverages Sentence Bidirectional Encoder Representations from Transformers (SBERT) to establish sentence semantic similarity between YouTube video titles, descriptions, and advertisement titles. For this research, 4500 data were collected and preprocessed from YouTube via YouTube API, and 500 advertisement titles from advertising and promotional companies. Random samples from these datasets were annotated for evaluation purposes. Our proposed approach achieved a 70% accuracy in recommending semantically related Amharic Advertisements (Ads) to corresponding YouTube videos with respect to the annotated data. At a 95% confidence interval, our system demonstrated an accuracy of 58% to 76% in recommending Ads which are relevant to new users who have no prior interaction history on the platform with the Ads. This approach significantly enhances privacy by reducing the need for extensive data sharing.
Multimodal Amharic Fake News Detection using CNN-BiLSTM
(Addis Ababa University, 2024-06) Mekdim Tessema; Fitsum Assamnew
With the growth of internet accessibility social media users increased rapidly in Ethiopia. This created an easy ground for transmission of information between people. On the flip side it became a hub for fake news fabrication and propagation. Fake news that is available online has the potential to cause significant issues for both individuals and society as a whole. We propose a multimodal fake news detection for Amharic on social media that combines textual and visual features. Genuine and fake news data was collected from social media to create multimodal Amharic news dataset. The collected data was preprocessed to retrieve textual and visual features using Bidirectional Long Short Term Memory (BiLSTM) and Conventional Neural Network (CNN) respectively. Then the two sets of features were concatenated and were used to train our multimodal fake news detection model. Our proposed method achieved a 90% accuracy, 94% Precision. Compared to the state of the art unimodal fake news detection for Amharic, our proposed model achieved 4% accuracy and 7% precision increase in fake news detection performance.
Anomaly-Augmented Deep Learning for Adaptive Fraud Detection in Mobile Money Transactions
(2024-06) Melat Kebede; Bisrat Derebssa (PhD)
Mobile Money, a revolutionary technology, enables individuals to manage their bank accounts entirely via their mobile devices, allowing for transactions like bill payments with unmatched ease and efficiency.This innovation has significantly reshaped financial landscapes, particularly in developing countries with limited access to traditional banking, by promoting financial inclusion and driving economic opportunity. However, the rapid growth of mobile money services has introduced significant challenges, such as fraud, where unauthorized individuals manipulate the system through various scams, creating serious risks that lead to financial losses and undermining trust in the system. We propose a fraud detection model that integrates deep learning techniques to identify fraudulent transactions and adapt to the dynamic behaviors of fraudsters in mobile money transactions. Given the private nature of financial data, we utilized a synthetic dataset generated using the Pay Sim simulator, which is based on a company in Africa. We evaluated three deep learning architectures, namely Restricted Boltzmann Machine (RBM), Probabilistic Neural Network (PNN), and Multi-Layer Perceptron (MLP) for fraud detection, emphasizing feature engineering and class distribution. The MLP achieved 95.70% accuracy, outperforming the RBM (89.91%) and PNN (73.36%) across various class ratios and on both the original and feature-engineered datasets. Among various techniques for anomaly detection, the Auto-Encoder consistently outperformed others, such as the Isolation Forest and Local Outlier Factor, achieving an accuracy of 82.85%. Our hybrid model employed a feature augmentation approach, integrating prediction scores from an Autoencoder model as additional features. These scores were then fed into the Multi-layer Perceptron (MLP) model along with the original dataset. This hybrid approach achieved 96.56% accuracy, 97.62% precision, 84.16% recall, and a 90.39% F1-score, outperforming the standalone MLP.The Hybrid model achieved an accuracy of 73.33% on unseen dataset, showing a 3.9% increase over the MLP model’s 69.41% accuracy, and demonstrating its enhanced ability to capture and adapt to evolving fraud patterns.This study finds that the hybrid model’s enhanced performance highlights the significance of anomaly detection and feature engineering in improving fraud detection.
Training Stability of Multi-modal Unsupervised Image-to-Image Translation for Low Image Resolution Quality
(Addis Ababa University, 2023-05) Yonas Desta; Bisrat Derbesa (PhD)
The ultimate objective of the unsupervised image-to-image translation is to find the relationship between two distinct visual domains. The major drawback of this task is several alternative outputs from a single input image. In a Multi-modal unsupervised image-to-image translation model, There exist common latent space representations across images from many domains. The model showed one-to-many mapping and its ability to produce several outputs from a particular image source. One of the challenges with the Multi-modal Unsupervised Image-to-Image Translation model is training instability, which occurs when the model is training using a data set with low-quality images, such as 128x128. During the training instability, the generator loss reduces slowly because the generator is too hard trying to find a new equilibrium. To address this limitation, We propose spectral normalization as a method for weight normalization, which would limit the fitting ability of the network to stabilize the training of the discriminator in networks. The Lipschitz constant was a single hyperparameter that was adjusted. Our experiments used two different datasets. The first dataset contains 5000 images, and we conducted two separate experiments using data set with 5 and 10 epochs. In 5 epochs, our proposed method has achieved overall training loss generator losses reduced by 5.049 % on average and discriminator losses reduced by 2.882 % on average. In addition, in 10 epochs, total training loss generator losses of 5.032% and discriminator losses of 2.864% decreased on average. The second data-set contains 20000 images, and we used datasets with 5 and 10 epochs in two different experiments. Over 5 epochs, our proposed method reduced overall training loss generator losses by 4.745 % on average and discriminator losses by 2.787 % on average. Furthermore, in 10 epochs, the average total training loss was reduced, with generator losses of 3.092 % and discriminator losses of 2.497%. In addition, During the transition, our approach produces output images that are more realistic than multi modal unsupervised imageto- image translation.
Amharic Hateful Memes Detection on Social Media
(Addis Ababa University, 2024-02) Abebe Goshime; Yalemzewud Negash (PhD)
Hateful meme is defined as any expression that disparages an individual or a group on the basis of characteristics like race, ethnicity, gender, sexual orientation, country, religion, or other characteristics. It has grown to be a significant issue for all social media platforms. Ethiopia’s government has increasingly relied on the temporary closure of social media sites but such kind of activity couldn’t be permanent solution so design automatic system. These days, there are plenty of ways to communicate and make conversation in chat spaces and on social media such as , text, image, audio, text with image, and image with audio information. Memes are new and exponentially growing trend of data on social media, that blend words and images to convey ideas. The audience can become dubious if one of them is absent. Previous research on the identification of hate speech in Amharic has been primarily focused on textual content. We should design deep learning modal which automatically filter hateful memes in order to reduce hate content on social media. The basis of our model consists of two fundamental components. one is for textual features and the other is for visual features. For textual features, we need to extract text from memes using optical character recognition (OCR). The extracted text through the OCR system is pixel-wise, and the morphological complex nature of Amharic language will affect the performance of the system to extract incomplete or misspelled words. This could result in the limited detection of hateful memes. In order to work effectively with an OCR extracted text, we employed a word embedding method that can capture the syntactic and semantic meaning of a word. LSTM is used for learning long-distance dependency between word sequence in short texts. The visual data was encoded using an ImageNet-trained VGG-16 convolutional neural network. In the studies, the input for the Amharic hateful meme detection classifier combines textual and visual data. The maximum precision was 80.01 percent. When compared to state-of-the-art approaches using memes as a feature on CNN-LSTM, an average F-score improvement of 2.9% was attained.

Browse

Recent Submissions