MLKD

Finished dissertations

Biologically inspired CNNs for Medical Imaging tasks Supervised by Arlindo L. Oliveira and Tiago Marques and authored by Daniela Carvalho Medical image data poses several challenges for computer vision algorithms: it spans multiple imaging modalities and biological tissues, it contains several sources of noise and variation, and there is a scarcity of available labeled datasets. Some recent advances in computer vision models, such as the use of vision transformers and self-supervised learning have showed promising results in dealing with some of these challenges. However, it has not been tested whether the use of biologically inspired computations, another recent advanced in computer vision with considerable improvements in robustness, also translates to gains in medical imaging tasks. The goal of this project is to adapt the VOneNet family, a hybrid CNN with a front-end inspired and constrained by the primate primary visual cortex (V1), to multiple computer vision neural network architectures used for medical imaging tasks and to test their performance in a wide range of related benchmarks.

Using large language models to interact with personal information systems Supervised by Arlindo L. Oliveira and authored by João Amoroso Large language models, such as ChatGPT and GPT-4 have shown remarkable abilities to interact in natural language. However, they cannot be used to access and learn from personal data, stored in email records, note taking systems or photos and videos. The objective of this dissertation is to design a system that uses large language model as the interface for personal data, using APIs and enabling the user to query, relate and retrieve information stored in different sub-systems, such as mailboxes, Google records and note taking platforms such as Obsidian. The resulting system should be able to emulate the behavior of an intelligent assistant that has access to all stored personal data and, ultimately, to answer questions about that data in a way similar to the user that owns the data. Requisites: The student should have significant programming experience, and practical knowledge of machine learning languages and environments, such as PyTorch or TensorFlow. He/she should also have interest in developing the understanding of large language models and LLM APIs. Notes: The selected student will have access to the facilities of INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/), including computing facilities that include four DELL PowerEdge C41402 servers, eight NVIDIA 32GB Tesla V100S and eight NVIDIA 64GB Tesla A100, among other computing servers (https://mlkd.idss.inesc-id.pt/cluster)

Modelos de causalidade para determinação do impacto de acções comerciais Supervised by Arlindo L. Oliveira and Filipa Marques and authored by Miguel Vicente Nesta tese, foram desenvolvidos modelos de causalidade para quantificar o impacto comercial das diferentes acções que resultam do processo de geração de leads pelos modelos analíticos. O objectivo é quantificar o impacto dos modelos de análise de dados no negócio, determinando quais acções específicas tiveram impacto no resultado final, usando para tal modelos de causalidade. Serão usados dados reais de cliente e de histórico de vendas da Fidelidade e analisadas as consequências da diferentes acções de geração de leads a partir da analítica. Serão também desenvolvidas métricas para avaliar o impacto dos modelos de scores no resultado final do negócio. Requisitos: o candidato deverá ter conhecimentos e interesse em análise de dados, aprendizagem automática e mecanismos de causalidade. A frequência de disciplinas destas áreas é recomendada. Notas: Esta tese será desenvolvida em parceria com a Fidelidade e co-orientada pela Dra. Filipa Marques.

Using biological features to improve deep neural network models for vision Supervised by Arlindo L. Oliveira and Tiago Marques and authored by Lucas Alergy Convolutional neural networks and vision transformers represent the state of the art in artificial neural network (ANN) models for vision problems, such as classification, segmentation, and object detection. However, the performance of these models still falls behind human performance in many problems and is highly susceptible to image variation, lighting conditions, and deliberate attacks. Recent results have shown that it is possible to draw inspiration from the architecture and function of the visual cortex to improve the performance of ANNs and to make these systems more robust to a wide range of image perturbations. The objective of this dissertation is to study how structural and functional characteristics of the primate visual pathways can be used to derive new layers and optimization goals in deep neural networks that contribute to improving their robustness and performance in image classification tasks. Efficient coding algorithms, used in the retina and the primary visual cortex, and different connection patterns between layers are some of the approaches that will be tested. The novel models will be assessed both in terms of their performance in existing computer vision benchmarks and on how well their internal components and behavioral output match those of real primate brains using the Brain-Score platform. The work will be co-supervised by Tiago Marques, currently at the Champalimaud Foundation. The selected student will have access to the facilities of INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/), including computing facilities that include four DELL PowerEdge C41402 servers, eight NVIDIA 32GB Tesla V100S and eight NVIDIA 64GB Tesla A100, among other computing servers (https://mlkd.idss.inesc-id.pt/cluster)

Representation learning of animal behavior Supervised by Arlindo L. Oliveira and Adrien Jouary and authored by Gonçalo Goulart Oliveira Over the past decade, several methods have been developed that allow high-throughput automated quantification of animal behavior. Advances in computer vision make it possible to automatically track multiple body points. And continuous movements can be decomposed into a sequence of meaningful elementary units. In this project, we aim to build a latent variable model of a large dataset of zebrafish larva behavior. The behavior of each larva consists of a sequence of stereotypical tail movements. The model will be trained to perform prediction of future action. Once the model is trained we will explore transfer learning by using the representation from the model to detect the effect of drug treatment. For this, we will use a dataset of the larva behavior in response to 10 pharmacological compounds at different concentrations. Our goal is to learn the internal state of the animal using this approach, which could be useful for studying the brain and improving the detection of drug-induced behavioral changes. Our approach holds promise for neuroscience and preclinical research, as careful measurements of animal behavior have proven to be an important complement to modern techniques for recording and manipulating neural circuits. Marques, J.C., Lackner, S., Félix, R. and Orger, M.B., 2018. Structure of the zebrafish locomotor repertoire revealed with unsupervised behavioral clustering. Current Biology, 28(2), pp.181-195. // Wiltschko, A.B., Tsukahara, T., Zeine, A., Anyoha, R., Gillis, W.F., Markowitz, J.E., Peterson, R.E., Katon, J., Johnson, M.J. and Datta, S.R., 2020. Revealing the structure of pharmacobehavioral space through motion sequencing. Nature neuroscience, 23(11), pp.1433-1443. // Oord, A.V.D., Li, Y. and Vinyals, O., 2018. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.

Using Large Language Models do solve the Abstraction and Reasoning Challenge Supervised by Arlindo L. Oliveira and authored by Guilherme Costa Current deep learning models, while adept at specific tasks, often struggle with human-like adaptability to new and varied challenges. This research delves into the creation of artificial intelligence systems that can mimic the generalization capabilities of human intelligence, particularly through the use of the Abstraction and Reasoning Corpus (ARC). ARC is a compilation of reasoning tasks that are deeply rooted in Knowledge Priors, which are essential human skills for effective problem-solving, such as counting. The proposed solution involves integrating a Large Language Model (LLM) with several DreamCoders, forming a Mixture of Experts (MoE) framework. In this framework, the LLM acts as a classifier, pinpointing the specific skills required for each ARC task. Following this identification, the problem is delegated to a specialized DreamCoder, each trained solely to tackle tasks within the identified skill set.

Using sequences of coronary angiograms to quantify the severity of stenosis Supervised by Arlindo L. Oliveira and authored by Mariana Serrão Automatic processing of images from coronary X-ray angiographies using deep learning techniques has been explored, but the ability to estimate accurately physiological indexes such as the instantaneous wave-free ratio (iFR) and/or the Fractional Flow Reserve (FFR) has not yet been demonstrated.. The objective of this dissertation is to develop a methodology that can estimate the value of the iFR from sequences of angiographies that has sufficient precision to avoid the need for invasive measurement methods, such as the commonly used insertion of a guidewire with a pressure sensor inserted through a coronary catheter. The approach that will be used in the application of deep learning techniques to frame sequences, using additional information from the sequence that cannot be obtained from single frame analysis.

Using graph embeddings to explore deep neural network architectures Supervised by Arlindo L. Oliveira and authored by José Carreira Convolutional neural networks and vision transformers represent the state of the art in artificial neural network (ANN) models for vision problems, such as classification, segmentation, and object detection. Many different architectures exist, that exhibit significant variations in performance, complexity and training cost. Using the appropriate transformations, it is possible to generate graph (or hypergraph) representations of deep neural network architectures, and these representations can be embedded into appropriate spaces that may be more amenable to performance quantification. This dissertation will explore the idea that graph embeddings of deep neural network architectures (and, possibly, weights) can be used to explore the architecture space in more effective ways than is possible today. Requisites: The student should have significant programming experience, and practical knowledge of machine learning languages and environments, such as PyTorch or TensorFlow. Notes: The work will be be developed in cooperation with research groups from the University of Tokyo and the Hong Kong Polytechnic, which have significant expertise in the graph embedding techniques that will be used in this work. The selected student will have access to the facilities of INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/), including computing facilities that include four DELL PowerEdge C41402 servers, eight NVIDIA 32GB Tesla V100S and eight NVIDIA 64GB Tesla A100, among other computing servers.

Stenosis detection in coronary X-ray angiographies Supervised by Arlindo L. Oliveira and Miguel Menezes and authored by Tomás Nunes Automatic processing of images from coronary X-ray angiographies using deep learning techniques has been explored, and the results show that it is possible to perform high-quality segmentation of relevant coronary arteries. Building on top of existing segmentation methods, based on deep convolutional neural networks, this dissertation will be focused on the estimation of the value of the instantaneous wave-free ratio (iFR) and/or the Fractional Flow Reserve (FFR) index from segmented images. The objective is to develop a methodology that can estimate the value of the iFR using non-invasive procedures and that has sufficient sensitivity to avoid the need for invasive measurement methods, such as the insertion of a guidewire with a pressure sensor inserted through a coronary catheter. Estimating the iFR and the FFR indexes is a difficult task, since imaging data, even after segmentation, will provide insufficient information, in many cases. Exploration of the possible tradeoffs between positive predictive value and recall will play an essential role in the identification of the best approach. Co-supervisors: Miguel Nobre Menezes (20%), João Lourenço Silva (40%) Requisites: The student should have significant programming experience, and practical knowledge of machine learning languages and environments, such as PyTorch or TensorFlow. He/she should also have interest in developing the understanding of medical image processing and cardiology. Notes: This work will be developed in cooperation with the school of department of cardiology of the School of Nedicine of the University of Lisbon. The selected student will have access to the facilities of INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/), including computing facilities that include four DELL PowerEdge C41402 servers, eight NVIDIA 32GB Tesla V100S and eight NVIDIA 64GB Tesla A100, among other computing servers.

Operation log monitoring using machine learning Supervised by Arlindo L. Oliveira and Fernando Silva authored by José Velez Traditional monitoring techniques may no longer be able to handle the complexity of modern applications, infrastructures and environments. These do not make the best use of the massive amounts of data being generated, thus several alarms are created that are not necessarily indicative of a new incident. The main objective of this thesis is to improve the monitoring and alarm generation by applying different Machine Learning algorithms and techniques with the rich and vast amount of data, to accurately detect complex problems even if they are outside the boundaries of the monitored software, which is common in modern architectures such as the Micro Service. The proposed work is framed within a critical IT application inside an international organization, in order to provide business and research value by solving a real world modern problem. The case study in question, consists in developing a monitoring solution using state of the art production Machine Learning (ML) algorithms, based on the modern Artificial Intelligence for IT Operations (AIOps) Platforms, to detect anomalies and generate reliable alarms for complex faults in HERMES, a critical application of EDP.

Deep learning when data is scarce Supervised by Arlindo Manuel Limede de Oliveira and authored by Ana Pimenta Alves Current deep learning models require enormous amounts of data to be trained. Recent studies by DeepMind show that even models like GPT-3, which is trained with 300 billion tokens, may still be “significantly undertrained”. Simply gathering more data to keep increasing the models’ performance is not biologically reasonable (as humans don’t need such quantities of data to learn), is not possible for some tasks (where obtaining more data is very expensive) and widens the gap between the researchers with the most resources and the rest of the community. There are several approaches that try to avoid this data requirement: few shot learning, self-supervision, using pre-trained models, and loss smoothing. The objective of this dissertation is to compare these approaches and analyze in particular their relative performance per dataset size. The selected student will have access to the facilities of INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/), including computing facilities that include two DELL PowerEdge C41402 servers and eight NVIDIA 32GB Tesla V100S, among other machines. The work will be developed using the Pytorch or TensorFlow programming platforms for machine learning and the Observable platform for data processing. The selected student will work within the scope of the Magellan project, and have access to the sources of data and financial resources made available by the project.

Data fusion and object recognition from sensor data Supervised by Arlindo Manuel Limede de Oliveira and authored by Francisco Honório With the increased opportunities for digitalization, cities will need to ensure that the conditions of public spaces are adequate for their functions. The use of sensor data (sound and images) to provide information about the quality of public spaces, ensuring safety and accessibility to all users, represents an important tendency for smart cities. Smart cities will use models for spaces, trained using data obtained from sensors and used to provide information about the characteristics of the spaces. The objective of this dissertation is to develop algorithms to collect, identify, and integrate sensor data and to create and train machine learning models that can process audio and image data to provide relevant information about public spaces. Models such as Mask R-CNN, Faster R-CNN, and YOLO will be assessed and trained using existing image databases, to perform real-time object detection. Once trained, data from pilot sites will be used to test the performance of the models. This project will be developed within the scope of project Magellan, developed in cooperation with Schréder and other research institutions. The selected student will have access to the facilities of INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/), including computing facilities that include two DELL PowerEdge C41402 servers and eight NVIDIA 32GB Tesla V100S, among other machines. The work will be developed using the Pytorch or TensorFlow programming platforms for machine learing and the Observable platform for data processing. The selected student will work within the scope of the Magellan project, and have access to the sources of data and financial resources made available by the project.

Using contrastive learning to learn representations from texts and images Supervised by Arlindo Manuel Limede de Oliveira and authored by Pedro Henriques Convolutional neural networks and transformer based architectures have shown the ability to perform complex classification and inference tasks, for images and texts. Still, most of the existing systems rely on the use of massive datasets of annotated data, such as ImageNet. This restricts the applicability of the technology to areas where such massive datasets exist, or imposes large labeling costs. Joint learning from texts and images. Recently, an approach based on contrastive language-image pre-training (CLIP) has demonstrated the ability to learn from unlabeled data, and to generate systems that are competitive with those trained on labeled data. The objective of this dissertation is to apply a CLIP-like approach to data available in Portuguese media, and to assess the quality of the derived system in a set of tasks, such as image classification, image captioning, and multimodal question answering. The selected student will have access to the facilities of INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/), including computing facilities that include two DELL PowerEdge C41402 servers and eight NVIDIA 32GB Tesla V100S, among other machines. The work will be developed using the Pytorch or TensorFlow programming.

Finding interesting regions in whole slide images using deep learning Supervised by: Arlindo Manuel Limede de Oliveira and Jonas Almeida. Developed in cooperation with the Data Science & Engineering Research Group of the Nacional Cancer Institute. Authored by: Martim Afonso Digital pathology, the analysis of visual information generated from digitized specimen slides, is a rapidly growing are. Whole-Slide Imaging (WSI) enables medical samples to be processed and used in diagnostic medicine, leading to more efficient and scalable processes made possible by deep learning techniques. Whole-slide, very high-resolution, images, need to be handled using special techniques that enable specialists to rapidly focus on the Regions of Interest (RoI) and identify relevant features for the task at hand. The efficient manipulation of WSIs, in a clinical setting, requires efficient system-level protocols and effective feature detection algorithm that, combined, save time and increase the productivity of physicians by automating the triage of WSI for RoIs. The objective of this dissertation is to develop effective deep learning techniques for the identification of the regions of interest in whole-slide images, and to integrate the resulting systems with client-based viewing software developed by NIH researchers. The system integration may or may not be included in the dissertation work, depending on the results obtained during the RoI identification phase. The selected student will have access to the facilities of INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/), including computing facilities that include two DELL PowerEdge C41402 servers and eight NVIDIA 32GB Tesla V100S, among other machines. The work will be developed using the Pytorch or TensorFlow programming platforms for machine learing and the Observable platform for data processing.

Deep learning on chaos game representation of genetic sequences Supervised by: Arlindo Manuel Limede de Oliveira and Susana Vinga. Developed in cooperation with the Data Science & Engineering Research Group of the Nacional Cancer Institute, headed by Jonas Almeida. Authored by: Vincente Silvestre Deep learning (DL) has been applied with success to areas as diverse as computer vision, natural language processing and protein folding. The ability of deep learning architectures to derive the appropriate features for classification and inference enabled these systems to reach unparalleled performance. However the successful application of deep learning depends on the existence of an appropriate canonical representation with built-in structure, in one, two or more dimensions. The objective of this dissertation is to study the application of deep learning techniques to genomic data, using chaos game representation (CGR), an iterated function that generates bijective maps between symbolic sequences and cartesian spaces.. The dissertation will study the application of standard deep learning architectures, such as ResNet of EfficientNet to the inference of genotype-phenotype correlation from the chaos game representation of genetic sequences and mutation signatures. Genetic sequence data for specific conditions and related information will be selected from the Cancer Genome Atlas (TCGA), Polygenic Score Catalog (PGS) databases and Genome-Wide Studies at NCI, and used to test the DL+CGR approach. The selected student will have access to the facilities of INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/), including computing facilities that include two DELL PowerEdge C41402 servers and eight NVIDIA 32GB Tesla V100S, among other machines. The work will be developed using the Pytorch or TensorFlow programming platforms.

Efficient Algorithms for Medical Image Segmentation Supervised by Arlindo L. Oliveira and authored by José Martinho With the growth in cancer cases and the increasing expenditures in the healthcare system, it is necessary to automate processes, aiming for a faster diagnostic and decrease in expenses. Although current technologies enable to capture high-resolution 3D images of organs, manual segmentation of organs and tumours is still a complex process that requires high expertise. State-of-the-art algorithms are already very accurate. However, they are very compute-intensive tasks, leading to the need for expensive hardware and energy wasting. Coupling state-of-the-art efficient feature extraction algorithms to the nnUNet segmentation framework, this work proposes novel efficient architectures for medical image segmentation. For some tasks, similar results were achieved using around 30% less Floating Point Operations (FLOPs) than the baseline nnUNet, also decreasing the inference time. Morevover, a better performance then nnUNet was achieved using architectures with slightly longer inference time.

Using self-supervised contrastive learning to improve medical image analysis Supervised by Arlindo Manuel Limede de Oliveira and authored by Miguel Rasquinho Ferreira Deep learning architectures, which include convolutional neural networks and vision transformers, have made it possible to achieve human-like performance in several medical image analysis tasks. However, some fields, including medical image analysis, are limited by the lack of labeled data. Furthermore, the use of extensive amounts of labeled data to train deep neural network architectures leads to behaviors and peculiar characteristics of the classifiers that do not have parallel in human vision. Self-supervised contrastive learning is a technique that can be effectively used to train systems when labeled data does not exist or is sparse. Furthermore, systems trained using self-supervised contrastive learning have the potential to exhibit behaviors that are more similar to the behavior of the primate’s vision system. The objective of this dissertation is to apply self-supervised contrastive learning techniques to problems in medical image, namely the detection of stroke in computed-tomography images of human brains. The results will be assessed both in term of the performance attained and the similarity of the features derived to features that are present in the visual system of primate brains. The selected student will have access to the facilities of INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/), including computing facilities that include two DELL PowerEdge C41402 servers and eight NVIDIA 32GB Tesla V100S, among other machines. The work will be developed using the Pytorch or TensorFlow programming platforms.

Analysis of visual sensor data for monitoring of open spaces A Modular Architecture for Model-Based Deep Reinforcement Learning Supervised by Arlindo L. Oliveira and authored by João Novo The objective of this dissertation is to process data received from distributed sensor arrays in order to infer the level and characteristics of space usage in urban environments. The final objective is to derive detailed person and vehicle data from data obtained by light and sound sensors. The student will be integrated in a team developing a large scale project managed by Schréder Hyperion. Project Magellan – Localizable, interoperable, cyber-safe, resilient, distributed autonomous and connected urban infrastructure - has as its objective the development of a new paradigm of urban infrastructure, resilient, robust, interconnected, open and interoperable that will support future smart cities. Knowledge of machine learning, analytics and programming. Interest in data processing and interest in learning data analysis methods. The selected student will have access to the facilities of INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/), including computing facilities that include two DELL PowerEdge C41402 servers and eight NVIDIA 32GB Tesla V100S, among other machines.

Towards Improving Ischemic Stroke Functional Outcome Prediction with Computed Tomography Brain Scans Using Deep Learning Supervised by Arlindo L. Oliveira and Catarina Fonseca and authored by Gonçalo Oliveira Stroke is the second leading cause of death and disability, of all the non transmissible diseases, in the world. Quick diagnosis and prognosis is of paramount importance given the rapid degradation of the affected brain and short time frame available for the recommended treatments. The collection of computed tomography brain scans is part of the standard patient care. However, their examination is manually done by experts. Also, despite containing strong patient functional outcome predictor features, these are rarely considered by the currently used clinical models that mostly only use demographic and clinical patient variables. This work explores three different approaches to improve on these models, obtaining results comparable to the state of the art in their respective categories. In the tabular approach, machine learning classifiers use the same type of variables used by the clinical models to predict the functional outcome. In the imaging approach, the outcome is directly predicted solely from the patient's brain scans, using deep artificial neural networks. Here several architectures never before tried in this task are explored, including multiple instance learning models and Siamese networks that leverage a useful brain hemisphere symmetry bias. Finally, in the hybrid approach, both important clinical features and imaging information are leveraged and combined in a simpler and more interpretable manner than that of existing models.

Transaction-Based Entity Monitoring in a Client Due Diligence Context Supervised by Arlindo L. Oliveira and Jacopo Bono and authored by Oleksandr Stopchak Current solutions to Anti-money Laundering encompass three different components, namely, transaction monitoring, screening and Customer Due Diligence. These have been mainly based on rule systems and human analysts, which can lead to many false positive alerts and a large load on human resources. In this work, we explore a novel approach to aid CDD. To do this, we propose the usage of machine learning methods to calculate an entity’s risk based on its transactional behavior by leveraging historical transactions to generate a Risk Score. First we summarize the transaction behavior into an embedding using feature engineering. Then we calculate a risk score that quantifies the dissimilarity of an entity’s behavior to what is expected using Anomaly Detection techniques. Finally, with the use of explainability techniques we clarify the assigned risk score by showing the specifics of an entity’s behavior that contributed to the final assessment of our approach. With our proposed method, we can reduce the burden on human analysts by 1) using machine-learning based techniques that can identify incorrectly classified, and therefore, potentially illicit entities by comparing their transactional behavior to other entities with the same label; and 2) generating a report with information that can provide a reasonable explanation for an assigned RS in the form of visualizations.

Pretraining the Vision Transformer using self-supervised methods for vision-based deep reinforcement learning Supervised by Arlindo L. Oliveira and authored by Manuel Goulão The Vision Transformer architecture has shown to be competitive in the computer vision (CV) space where it has dethroned convolution-based networks in several benchmarks. Nevertheless, Convolutional Neural Networks (CNN) remain the preferential architecture for the representation module in Reinforcement Learning. In this work, we study pretraining a Vision Transformer using several state-of-the-art self-supervised methods and assess data-efficiency gains from this training framework. We propose a new self-supervised learning method called TOV-VICReg that extends VICReg to better capture temporal relations between observations by adding a temporal order verification task. Furthermore, we evaluate the resultant encoders with Atari games in a sample-efficiency regime, procgen games for measuring generalization and an imitation learning task for a fast and reliable comparison of the representations. Our data-efficiency results show that the vision transformer, when pretrained with TOV-VICReg, outperforms the other self-supervised methods and the non-pretrained vision transformer but still struggles to overcome a CNN. Our generalization results show some limitations in our method when used in more visually complex games which leads to degradation of the generalization performance. Nevertheless, we were able to outperform a CNN in two of the ten Atari games where we perform a 100k steps evaluation and show a consistent data-efficiency gain in comparison to the non-pretrained vision transformer. Ultimately, we believe that such approaches in Deep Reinforcement Learning (DRL) might be the key to achieving new levels of performance as seen in natural language processing and computer vision.

Neural Models for Generating Clinically Accurate Chest X-Ray Reports Supervised by Arlindo L. Oliveira and Bruno Martins and authored by André Leite Image captioning models have been increasing their performance comprehensively, having shown that artificial intelligence can achieve successful results in computer vision tasks. However, there are still some tasks within the range of image captioning that need more focus, including the automatic clinical report generation. The automatic generation of radiology reports based on radiology images has gathered an increasing amount of focus in the last few years. This is supported by the repetitive and exhaustive work that these clinical reports demand. Artificial neural networks that address this task have been changing over the years, starting as convolutional neural networks, changing over to transformer-based models. However, these existing methodologies focus more on one of two important aspects, that being the fluency and human-readability capacity of the generated text, over the clinical efficiency of the model. Consequently, in this dissertation we propose a model capable of achieving competitive results regarding the human readability of the reports, as well as improving clinical efficiency. We propose to adapt the MedCLIP model to have an image-text encoder capable of concatenating both image and text. We further propose that this model works with the assistance of an Information Retrieval mechanism, to retrieve reports resulting on similarity evaluation done on an input x-ray, obtaining the closest reports. On the MIMIC-CXR dataset, our model has improved on both natural language processing metrics and clinical efficiency, over well-established models. Finally, we further show that our model can lead to more human-readable reports, while keeping clinical actuality, over most state-of-the-art models.

Old photo and image restoration using deep learning techniques Supervised by Arlindo L. Oliveira and authored by José Pereira There are multiple factors that can contribute to the degradation of an image. The process of recovering such images to their initial state is called Image Restoration. Nowadays many deep learning techniques have been proposed that claim to solve this problem. In this work, I select a few deep learning models both single (focus only on one type of degradation, such as super-resolution methods) and mixed degradation (when tackling all the defects at the same time) achieving state-of-the-art performance on different restoration tasks (Deblurring, Denoising, Super-Resolution, etc.), test them on a synthetically degraded dataset and evaluate them according to two objective metrics (PSNR and SSIM) as well as subjectively, through human perception. These are then combined and compared with the state-of-the-art method in old photo restoration which comprises an image-to-image translation framework based on deep latent space translation. This state-of-the-art approach outperformed all other methods and combinations of by a large margin.

Using a Siamese Network to Accurately Detect Ischemic Stroke in Computed Tomography Scans Supervised by Arlindo L. Oliveira and Catarina Fonseca authored by Beatriz Vieira The diagnosis procedure of stroke, a leading cause of death in the world, involves the acquisition of images using computed tomography scans, making possible the assessment of the severity of the incident and the type and location of the lesion. The fact that the brain has two hemispheres with a high level of anatomical similarity, exhibiting significant symmetry, has led to extensive research based on the assumption that a decrease in symmetry is directly related to the presence of pathologies. This work is focused on the analysis of the symmetry (or lack of it) of the two brain hemispheres, and on the use of this information for the classification of computed tomography brain scans of stroke patients. The objective is to contribute to the process of automatic identification of brain lesions caused by stroke events. To perform this task, we used the Siamese Network architecture, which uses two parallel neural networks that share the same weights. The composed network receives a double image (the original image and the mirrored one) and a label that reflects the existence or not of stroke. The network then extracts the relevant features and classifies the images taking into account their similarity. The resulting network can be used to classify unseen scans, depending on the perceived level of symmetry into one of two existing classes: evidence of stroke or absence of stroke. The accuracy of the proposed method is approximately 72%, significantly outperforming a standard convolutional network architecture, which was used as a baseline.

Siamese Transformer Networks for Improving Address Matching Supervised by Arlindo L. Oliveira and authored by André Duarte Address matching plays a very important role on the daily activities of post offices and companies responsible for processing and delivering packages. Address matching is a subtask of geocoding, and consists in pairing addresses, from multiple databases, that refer to the same place. Geocoding aims to assign physical coordinates (latitude and longitude) to an address so that the routes performed by the delivery-man can be planned accurately. Errors in the address matching are quite harmful to this type of companies, in economic, environmental, or reputational terms. There are several methodological approaches to perform address matching. Some methodologies involve doing a standardization of the address or even parsing the elements of the address, to then perform elementwise matching. These methodologies are not perfect and end up needing the manual correction of the address by a human. This dissertation contributes to the solution of this problem by presenting a model that executes with success and efficiency, the task of pairing Portuguese addresses. The proposed solution fits in the Deep Learning field and has its main focus on Siamese Neural Networks of Pre-Trained Transformers. In this field, there are already promising results for similar tasks, which prove the viability of Deep Learning models for solving this kind of problem. The obtained results on a real address matching task proved that the proposed solution is a promising approach. The model is able to map the addresses in this dataset with an accuracy never lower than 94% on Artery level and 90% on Door level.

A Modular Architecture for Model-Based Deep Reinforcement Learning Supervised by Arlindo L. Oliveira and authored by Tiago João Gaspar Ribeiro de Oliveira The model-based reinforcement learning (MBRL) paradigm, which uses planning algorithms, has recently achieved unprecedented results in the area of DRL. These agents are quite complex and involve multiple components, factors that can create challenges for research. In this work, we propose a modular software architecture (our implementation can be found in https://github.com/GaspTO/Modular_MBRL) suited for these types of agents, which makes possible the implementation of different algorithms and for each component to be easily configured (such as different exploration policies, search algorithms...). We illustrate the use of this architecture by implementing several algorithms and experimenting with agents created using different combinations of these. We also suggest a new simple search algorithm called averaged minimax that achieved good results in this work. Our experiments also show that the best algorithm combination is problem-dependent.

Improving the Performance of Deep Neural Networks in Vision Tasks with Attention Mechanisms. Supervised by Arlindo L. Oliveira and authored by Rafael Gamanho Pedro There is no single precise definition of "attention" for neural networks. Broadly, attention mechanisms are neural network layers that aggregate information from the entire input data. They do so according to the specific problem addressed, as it depends on the input data, such as a phrase or an image. This work will focus on the computer vision task. Attention mechanisms have gained traction in natural language processing, yet their use in computer vision has been on the rise for a few years. This usage of attention mechanisms is somewhat recent and has been advancing quickly, with new architectures published often. This thesis aims to study and compare different attention mechanisms to improve the performance in image classification tasks. Three use cases related to medical imaging will be used to ensure the benefits attention mechanisms bring to real-world scenarios. The results show that there are scenarios where attention mechanisms improve the performance on medical datasets. However, the performance increase was not as consistent as expected. The experiments also show that attention mechanisms need more data than their conventional counterparts.

Using Knowledge Graphs to model Digital Footprints Supervised by Arlindo L. Oliveira and authored by André Carlos Ruano Andrade Cavalheiro The new generations are born into a world where the internet is a natural extension of the real world. The online logs created throughout their lives might remain long after they are gone - detailed information about their everyday activity. Currently, corporations use this data to predict short-term actions in order to maximize the use of their services, which is but one of many use-cases that such an opportunity presents. Knowledge graphs, which have been the target of intensive research in recent years, were used in this work to model personal data. The project aims to create a framework for centralizing a person's logs originating from multiple sources on the web. Specifically, this work makes the following contributions: 1. Developed a framework to store a person's records into a usable and interpretable structure, providing a review of its possibilities and limitations with hopes of guiding future research. 2. Created a proof of concept made from a single user's data downloaded from five of the most widely used online platforms. 3. Performed experiments using pre-established models based on the concepts of metapaths, to explore the interactions between entities in the network and explore its semantic and structural value.

Evaluating generalization in Deep Reinforcement Learning with Procedural Generated Environments Supervised by Arlindo L. Oliveira and authored by Miguel Borges Freire Deep Reinforcement Learning agents, mainly those who learn from visual observations, often fail to transfer their knowledge to unseen environments. In games, standard Deep Reinforcement Learning protocols commonly promote testing in the same set of levels used in training. This practice leads an agent to easily overfit a given training set, failing to transfer its knowledge to out of distribution levels. To overcome this problem, we construct two separate training and test sets using procedurally generated environments from the Procgen Benchmark. We use this benchmark to measure the extent of overfitting and systematically study the effects of using regularization and data augmentation methods on the capacity of the agent to generalize. We found that, in general, using regularization and data augmentation improves generalization, with an efficacy that is dependent on the environment's dynamics. Furthermore, we study how network architectural decisions such as the depth and the width of the convolutional network, the usage of pooling layers, skip-connections, and modifications of the classification layer affect generalization. Finally, we empirically demonstrate that convolutional neural networks with small kernels in the early convolutional layers can accomplish the same generalization level as a deeper residual model.

Combining off and on-policy training in Deep Reinforcement Supervised by Arlindo L. Oliveira and authored by Alexandre João Gomes Borges MuZero is able to master both Atari games and board games by learning a model of the environment, that is then used with Monte Carlo Tree Search (MCTS) to decide what move to play in each position. During tree search, the algorithm simulates games by exploring several possible moves, and afterwards picks the action that corresponds to the most promising trajectory. Even though not all trajectories from these simulated games are useful, none of them are used for training. Using these trajectories would provide more data, more quickly, leading to faster convergence and sample efficiency. Recent work introduced an off-policy value target for AlphaZero that uses data from simulated games. Similarly, in this work, we propose a way to obtain off-policy targets by using data from simulated games in MuZero. We combine these off-policy targets with the on-policy targets already used in MuZero in several ways, and study the impact of these targets and their combinations in two environments with distinct characteristics.

Application of Deep Learning Techniques to the Diagnosis of Medical Images Supervised by Arlindo L. Oliveira and authored by Pedro Miguel Carreto Vaz Diabetic Retinopathy (DR) is the leading cause of visual disability worldwide. Although it is highly treatable when diagnosed in its earlier stages, there is currently a need of cheaper and more accurate ways to do so. Medical images have been used in diagnosis for a long time. Recent advancements in the computer vision field have shown remarkable results through the use of Convolutional Neural Networks, that have been able to reach state-of-the-art results in image segmentation. In this master's thesis, we implemented a V-Net like architecture in Python and study how image preprocessing techniques to highlight lesions associated with DR, and different optimization metrics have an impact on its results. The results show that the impact of this variables changes according to the lesion that we try to segment and that the V-Net is capable of obtaining good results for some of the segmentation problems.

MyWatson: A system for interactive acess of personal records Supervised by Arlindo L. Oliveira and authored by Pedro Miguel dos Santos Duarte With the number of photos people take growing, it’s getting increasingly difficult for a common person to manage all the photos in its digital library, and finding a single specific photo in a large gallery is proving to be a challenge. In this thesis, the MyWatson system is proposed, a web application leveraging content-based image retrieval, deep learning, and clustering, with the objective of solving the image retrieval problem, focusing on the user. MyWatson is developed on top of the Django framework, a high-level Python Web framework, and revolves around automatic tag extraction and a friendly user interface that allows users to browse their picture gallery and search for images via query by keyword. MyWatson’s features include the ability to upload and automatically tag multiple photos at once using Google’s Cloud Vision API, detect and group faces according to their similarity by utilizing a convolution neural network, built on top of Keras and Tensorflow, as a feature extractor, and a hierarchical clustering algorithm to generate several groups of clusters. Besides discussing state-of-the-art techniques, presenting the utilized APIs and technologies and explaining the system’s architecture with detail, a heuristic evaluation of the interface is corroborated by the results of questionnaires answered by the users. Overall, users manifested interest in the application and the need for features that help them achieve a better management of a large collection of photos.

Automatic Annotation of Unstructured Fields in Medical Databases Supervised by Arlindo L. Oliveira and Maria Luísa Torres Ribeiro Marques da Silva Coheur. Authored by Margarida Andreia Rosa Correia The increased use of systems based on Electronic Health Records caused an enormous increment of information available electronically, which can be processed by Data Mining techniques, leading to relevant findings. The expected result was that this information becomes easy to access, analyze and share. However, the text present in the clinical notes is written in natural language, and is, thus, unstructured, and difficult to automatically process. These clinical notes might contain pertinent data for the health of the patient. In this thesis, with the help of Natural Language Processing and Information Extraction techniques, we present a system that, given a clinical note, extracts relevant named entities from it, such as names of diseases, symptoms, treatments, diagnosis and drugs, generating structured information from unstructured free text. In addition, in order to avoid privacy issues and considering that these clinical notes might contain references to names of patients, doctors or another health professionals, we also present an anonymization step. Finally, we add a module that automatically corrects typos from these medical notes. Final results show that the system, in general, is able to recognize and interpret medical entities.

Biological Data Processing Using Grid Technologies Supervised by Arlindo L. Oliveira and authored by Sérgio Mendes Costa At present there is a growing interest in the development of systems in which scientific analysis with high computing or data storage and processing requirements can be performed. The cluster and Grid computing technologies have emerged has the best support infrastructures for this type of systems. Biological sciences are among those who have been benefiting more from the advancement of these technologies, namely in the study of gene expression mechanisms. In that sense, the discovery of transcription factor binding sites and the analysis of gene expression data are particularly relevant. In the first case, we usually search for short segments of DNA, known as motifs, that are well conserved. In the second case, we usually analyze microarray data using data mining techniques like biclustering. In the context of this thesis, efficient algorithms for motif inference in gene promoter regions and for the analysis of gene expression data were made available in the hermes cluster of Instituto Gulbenkian de Ciência. The algorithms were developed in the context of the BioGrid - Parallel Algorithms for Gene Annotation project. During this work, the necessary tasks of implementing, installing and testing were performed, as well as the development of Web interfaces and documentation for every program. In addition to that, a study was conducted in which the model-based testing technique was used to evaluate the software. The algorithms created in the context of the BioGrid project are now available in a reliable, integrated and user-friendly system for a large community of Bioinformatics users.

Modelling and Inference of Gene Regulatory Networks Supervised by Arlindo L. Oliveira and authored by José Miguel Ranhada Vellez Caldas A current problem in biology is how to find adequate models for the dynamics of gene regulatory networks. Recent technological advancements allow for the measurement of gene mRNA levels, in a population of cells, over a period of time. Given a particular gene regulatory network, time series for its components, and a parametrizable mathematical model, optimization algorithms may be used to fit the model's parameters to the observed dynamics. This is useful for validating both the hypothetical network and its model, and for providing new insights about the underlying biological system. In this thesis I analyze two case studies: the SOS DNA damage repair network in E. coli and a hypothetical network for the transcriptional regulation of the gene Flr1's response to oxidative stress in yeast, induced by the drug Mancozeb. For the SOS network, I use a known piecewise-linear model and the parameter inference algorithm BFGS. I compare two adaptations of piecewise linear models, obtaining a general form that encompasses both, and I describe a new version of an optimization algorithm that may be used for inferring parameters in that model. These results are applied to the Flr1 network. Both models are used to extract information that is confirmed by biological literature.

Pathological Analysis of Tissues using Deep Neural Networks Supervised by Arlindo L. Oliveira and João Cassis. Authored by Xavier Abreu Dias Pathological images or biopsy images are samples of tissues from a specific location of a human or animal body. Pathological analysis is necessary whenever there are any lesions or any indicative symptoms for a certain disease, like cancer or the presence of bacteria in tissues. Nowadays, a large amount of biopsies per day is requested and sent for analysis by pathologists in order to make a diagnosis. This process can be difficult, time-consuming, and requires experience in detecting abnormal tissues. With the advances of technology, powerful scanners have been developed that have the ability to amplify 40× and digitize whole slide images, being able to see at the 250 µm scale. The state-of-the-art supervised Deep Learning methods applied to slide images classification or disease detection use mostly deep annotations (rich annotations), i.e. specific information where the disease is located if any. This dissertation aims to contribute with a semi-supervised architecture that enables models to be built, using Multiple Instance Learning and Online Hard Example Mining, from weakly-annotated (that inform whether the whole slide has or not the disease) datasets. The whole architecture presented in this dissertations consists of an application of these semi-supervised methods on a deep architecture with an attention module. The whole architecture is fit based on a set of biopsy images provided by Hospital da Luz (Lisbon), whose some instances contain helicobacter pylori, achieving an accuracy of 91.67% and capturing all positive ones.

Deep Convolutional Encoder-Decoder Architectures for Clinically Relevant Coronary Artery Segmentation Supervised by Arlindo L. Oliveira and Mário Alexandre Teles de Figueiredo. Authored by João Lourenço Coelho da Silva X-ray coronary angiography is a crucial clinical procedure for the diagnosis and treatment of coronary artery disease, which accounts for roughly 16\% of global deaths every year. However, the images acquired in this procedure have low resolution and poor contrast, making lesion detection and assessment challenging. Accurate coronary artery segmentation not only helps mitigate these problems, but also allows the extraction of relevant anatomical features for further analysis by quantitative methods. Although automated segmentation of coronary arteries has been proposed before, previous approaches have used non-optimal segmentation criteria, leading to less useful results. Most methods either segment only the major vessel, discarding important information from the remaining ones, or segment the whole coronary tree, based mostly on contrast information, producing a noisy output that includes vessels that are not relevant for diagnostic nor therapeutic purposes. In this work, vessels are segmented according to their clinical relevance, using a segmentation criterion developed in collaboration with expert cardiologists. Additionally, the catheter, whose diameter is known and provides a scale factor that may be useful for diagnosis, is segmented simultaneously. To derive the optimal approach, an extensive comparative study of encoder-decoder architectures was conducted. Based on the UNet++, a new computationally efficient and high-performing decoder architecture is proposed, the EfficientUNet++. Combined with EfficientNet encoders, the EfficientUNet++ establishes a line of efficient and high-performing segmentation models, whose best-performing member achieves a generalized dice score of 0.9202 +/- 0.0356, and artery and catheter class dice scores of 0.8858 +/- 0.0461 and 0.7627 +/- 0.1812, respectively.

Automated Assessment of Coronary Artery Stenosis in X-ray Angiography using Deep Neural Networks Supervised by Arlindo L. Oliveira and Mário Alexandre Teles de Figueiredo. Authored by Dinis Lourenço Tavares Rodrigues Several methods for quantitative severity assessment of coronary artery stenosis exist as well as different measures, leading to distinct management of treatment procedures. It is of upmost importance to properly identify and classify all possible stenosis on an individual. A deep-learning three-step framework implementation was designed to automate the detection and assessment of stenosis severity. This study showcases a new clinically obtained dataset of properly de-identified X-ray invasive coronary angiography (ICA) sequences of 438 patients from Hospital de Santa Maria. For each sequence, radio-opaque contrast filled frames were annotated, defining full stenosis visibility with stenosis bounding boxes being annotated by an expert physician on reference frames followed by image processing techniques for propagation at each frame. Transfer learning dynamics of deep neural networks are exploited for supervised learning at each step, employing CNN's for angle view selection of the Left/Right Coronary Artery (LCA/RCA) achieving 0.97 Accuracy, single-shot detectors for stenosis detection achieving 0.83/0.81 mAR for LCA/RCA respectively and a new region of interest boost approach with CNN's for stenosis severity regression of the RCA was explored. Our method showcases the importance of transfer learning in stenosis severity assessment with limited data, achieving considerable performances. To the best of the author's knowledge, this is the first time that iFR was used as a metric for stenosis severity assessment tasks using deep learning techniques.

Applying Deep Learning to Medical Images Supervised by Arlindo L. Oliveira and Mário Alexandre Teles de Figueiredo. Authored by Ricardo Jorge da Silva Diniz Deep convolutional networks have recently been embraced by the academic community as a competitive solution for visual recognition tasks. Among these networks, the fully convolutional neural networks have been gaining traction as they drop the traditional fully-connected layers of CNNs in favor of more convolutional layers. The original fully convolutional network, using layer skipping, was capable of achieving great results when provided enough samples. This architecture was extended into the U-Net which outperforms the FCNN, while being both faster and less computationally cumbersome than it. Both architectures are designed to work with 2D input images. However most medical images, such as ultrasounds and MRIs, are 3D. Built upon the underlying principles beyond the U-Net and the FCNN, the V-Net was created. It is a volumetric FCNN which introduces a new objective function, discards pooling layers in favor of more convolutional layers and performs residual propagation. V-Nets have achieved a good performance across all visual recognition tasks, being comparable to the state-of-the-art solutions while requiring a fraction of the processing time. In this thesis several variants of U-Net and V-Net are implemented to, firstly, attest to their good performance on visual segmentation tasks of medical data, and, secondly, to assess how the objective function, kernel’s receptive fields, residual propagation, activation functions and optimization method impact the model’s performance. A secondary objective of this thesis is to bridge the gap between theoretical knowledge and practical implementations by analyzing Google’s Tensorflow API, which was designed specifically for distributed computing based machine learning.

Imputation Techniques for Clinical Data of Ischemic Stroke Patients Supervised by Arlindo L. Oliveira and Alexandre Paulo Lourenço Francisco. Authored by Filipa de Matos Marques In the 21st century, every year, approximately 880 thousand people living in Europe suffer an ischemic stroke. Predicting the patient’s outcome is key to choosing the course of treatment. In this master thesis, it was predicted the functional outcome, by the binary version, of the modified Rankin Scale at two points in time: three months and one year after the stroke took place. Often, data provided by health organisations to conduct these studies is incomplete which can impair the results. Thus the need arises to choose a proper way to handle the missing data. Here missing values were imputed with six different methods and the classifiers were then trained with seven distinct machine learning models. It was shown the area under the receiver operating characteristic curve for the best classifiers, at the three months and one-year marks, are 0.8217 and 0.7537, respectively. Moreover, it was not found a statistically significant difference between the performance of the distinct imputation methods for each machine learning model.

WeatherIST - iOS application for detailed weather prevision in Continental Portugal Supervised by Arlindo L. Oliveira and authored by Tiago João Alves Duarte This work is the result of a need to move the meteorological forecast system developed by METEO-IST to an iOS application. METEO-IST is a weather computational server owned by IST that calculates with great accuracy the different weather conditions (rain, wind, humidity, etc.) anywhere within the Portuguese continental territory. The predictions are calculated frequently (at every 15 minute intervals) and exhibit a great precision, distinguishing it from other global meteorological systems that are currently available. Currently the system makes the forecasts available through the group's website. At the request of several users, the objective was to create a native iOS application for the iPhone that provides these forecasts by taking advantage of the device’s capabilities.

Predicting Frequency and Claims of Health Insurance with Machine Learning techniques Supervised by Arlindo L. Oliveira and Luís Miguel Veiga Vaz Caldas de Oliveira. Authored by Pedro Octávio Couto Gonçalves In the health insurance industry, policies are typically one year contracts that are renewed after these twelve months. In Multicare, this renewal starts to be negotiated at the end of the first nine months of the current annuity. At this point it is necessary to set a prediction of how the present annuity will end, i.e, there is the need to forecast the loss ratio of the last three months of the annuity considering the loss ratios of the first nine months. This problem is currently handled using a time series algorithm, ARIMA, that forecasts future loss ratios considering only the past ones and ignoring all other external information that can also prove useful in predicting the behaviors of the insured population, both in terms of frequency of usage of the insurance and in terms of the cost of medical acts. This study incorporates a wide variety of external variables coming from different sources in the traditional datasets of Multicare and performs a comparison between several types of tree-based machine learning models, aiming to find the ones that lead to better performances in predicting claims and costs of the insured population. The main contribution of this work is the proposal of a new prediction model for the claims and costs of the insured population of health insurance and its inevitable comparison with the model that is currently in production in Multicare, based on ARIMA time series.