Using graph embeddings to explore deep neural network
architectures
Supervised by Arlindo L. Oliveira and authored by José
Carreira
Convolutional neural networks and vision transformers
represent the state of the art in artificial neural network
(ANN) models for vision problems, such as classification,
segmentation, and object detection. Many different
architectures exist, that exhibit significant variations in
performance, complexity and training cost. Using the
appropriate transformations, it is possible to generate graph
(or hypergraph) representations of deep neural network
architectures, and these representations can be embedded into
appropriate spaces that may be more amenable to performance
quantification. This dissertation will explore the idea that
graph embeddings of deep neural network architectures (and,
possibly, weights) can be used to explore the architecture
space in more effective ways than is possible today.
Requisites: The student should have significant programming
experience, and practical knowledge of machine learning
languages and environments, such as PyTorch or TensorFlow.
Notes: The work will be be developed in cooperation with
research groups from the University of Tokyo and the Hong Kong
Polytechnic, which have significant expertise in the graph
embedding techniques that will be used in this work. The
selected student will have access to the facilities of
INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/),
including computing facilities that include four DELL
PowerEdge C41402 servers, eight NVIDIA 32GB Tesla V100S and
eight NVIDIA 64GB Tesla A100, among other computing servers.
Stenosis detection in coronary X-ray angiographies
Supervised by Arlindo L. Oliveira and Miguel Menezes and
authored by Tomás Nunes
Automatic processing of images from coronary X-ray
angiographies using deep learning techniques has been
explored, and the results show that it is possible to perform
high-quality segmentation of relevant coronary arteries.
Building on top of existing segmentation methods, based on
deep convolutional neural networks, this dissertation will be
focused on the estimation of the value of the instantaneous
wave-free ratio (iFR) and/or the Fractional Flow Reserve (FFR)
index from segmented images. The objective is to develop a
methodology that can estimate the value of the iFR using
non-invasive procedures and that has sufficient sensitivity to
avoid the need for invasive measurement methods, such as the
insertion of a guidewire with a pressure sensor inserted
through a coronary catheter. Estimating the iFR and the FFR
indexes is a difficult task, since imaging data, even after
segmentation, will provide insufficient information, in many
cases. Exploration of the possible tradeoffs between positive
predictive value and recall will play an essential role in the
identification of the best approach. Co-supervisors: Miguel
Nobre Menezes (20%), João Lourenço Silva (40%) Requisites: The
student should have significant programming experience, and
practical knowledge of machine learning languages and
environments, such as PyTorch or TensorFlow. He/she should
also have interest in developing the understanding of medical
image processing and cardiology. Notes: This work will be
developed in cooperation with the school of department of
cardiology of the School of Nedicine of the University of
Lisbon. The selected student will have access to the
facilities of INESC-ID and the MLKD group
(https://mlkd.idss.inesc-id.pt/), including computing
facilities that include four DELL PowerEdge C41402 servers,
eight NVIDIA 32GB Tesla V100S and eight NVIDIA 64GB Tesla
A100, among other computing servers.
Operation log monitoring using machine learning
Supervised by Arlindo L. Oliveira and Fernando Silva authored
by José Velez
Traditional monitoring techniques may no longer be able to
handle the complexity of modern applications, infrastructures
and environments. These do not make the best use of the
massive amounts of data being generated, thus several alarms
are created that are not necessarily indicative of a new
incident. The main objective of this thesis is to improve the
monitoring and alarm generation by applying different Machine
Learning algorithms and techniques with the rich and vast
amount of data, to accurately detect complex problems even if
they are outside the boundaries of the monitored software,
which is common in modern architectures such as the Micro
Service. The proposed work is framed within a critical IT
application inside an international organization, in order to
provide business and research value by solving a real world
modern problem. The case study in question, consists in
developing a monitoring solution using state of the art
production Machine Learning (ML) algorithms, based on the
modern Artificial Intelligence for IT Operations (AIOps)
Platforms, to detect anomalies and generate reliable alarms
for complex faults in HERMES, a critical application of EDP.
Deep learning when data is scarce
Supervised by Arlindo Manuel Limede de Oliveira and authored
by Ana Pimenta Alves
Current deep learning models require enormous amounts of data
to be trained. Recent studies by DeepMind show that even
models like GPT-3, which is trained with 300 billion tokens,
may still be “significantly undertrained”. Simply gathering
more data to keep increasing the models’ performance is not
biologically reasonable (as humans don’t need such quantities
of data to learn), is not possible for some tasks (where
obtaining more data is very expensive) and widens the gap
between the researchers with the most resources and the rest
of the community. There are several approaches that try to
avoid this data requirement: few shot learning,
self-supervision, using pre-trained models, and loss
smoothing. The objective of this dissertation is to compare
these approaches and analyze in particular their relative
performance per dataset size. The selected student will have
access to the facilities of INESC-ID and the MLKD group
(https://mlkd.idss.inesc-id.pt/), including computing
facilities that include two DELL PowerEdge C41402 servers and
eight NVIDIA 32GB Tesla V100S, among other machines. The work
will be developed using the Pytorch or TensorFlow programming
platforms for machine learning and the Observable platform for
data processing. The selected student will work within the
scope of the Magellan project, and have access to the sources
of data and financial resources made available by the project.
Data fusion and object recognition from sensor data
Supervised by Arlindo Manuel Limede de Oliveira and authored
by Francisco Honório
With the increased opportunities for digitalization, cities
will need to ensure that the conditions of public spaces are
adequate for their functions. The use of sensor data (sound
and images) to provide information about the quality of public
spaces, ensuring safety and accessibility to all users,
represents an important tendency for smart cities. Smart
cities will use models for spaces, trained using data obtained
from sensors and used to provide information about the
characteristics of the spaces. The objective of this
dissertation is to develop algorithms to collect, identify,
and integrate sensor data and to create and train machine
learning models that can process audio and image data to
provide relevant information about public spaces. Models such
as Mask R-CNN, Faster R-CNN, and YOLO will be assessed and
trained using existing image databases, to perform real-time
object detection. Once trained, data from pilot sites will be
used to test the performance of the models. This project will
be developed within the scope of project Magellan, developed
in cooperation with Schréder and other research institutions.
The selected student will have access to the facilities of
INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/),
including computing facilities that include two DELL PowerEdge
C41402 servers and eight NVIDIA 32GB Tesla V100S, among other
machines. The work will be developed using the Pytorch or
TensorFlow programming platforms for machine learing and the
Observable platform for data processing. The selected student
will work within the scope of the Magellan project, and have
access to the sources of data and financial resources made
available by the project.
Using contrastive learning to learn representations from texts
and images
Supervised by Arlindo Manuel Limede de Oliveira and authored
by Pedro Henriques
Convolutional neural networks and transformer based
architectures have shown the ability to perform complex
classification and inference tasks, for images and texts.
Still, most of the existing systems rely on the use of massive
datasets of annotated data, such as ImageNet. This restricts
the applicability of the technology to areas where such
massive datasets exist, or imposes large labeling costs. Joint
learning from texts and images. Recently, an approach based on
contrastive language-image pre-training (CLIP) has
demonstrated the ability to learn from unlabeled data, and to
generate systems that are competitive with those trained on
labeled data. The objective of this dissertation is to apply a
CLIP-like approach to data available in Portuguese media, and
to assess the quality of the derived system in a set of tasks,
such as image classification, image captioning, and multimodal
question answering. The selected student will have access to
the facilities of INESC-ID and the MLKD group
(https://mlkd.idss.inesc-id.pt/), including computing
facilities that include two DELL PowerEdge C41402 servers and
eight NVIDIA 32GB Tesla V100S, among other machines. The work
will be developed using the Pytorch or TensorFlow programming.
Finding interesting regions in whole slide images using deep
learning
Supervised by: Arlindo Manuel Limede de Oliveira and Jonas
Almeida. Developed in cooperation with the Data Science &
Engineering Research Group of the Nacional Cancer Institute.
Authored by: Martim Afonso
Digital pathology, the analysis of visual information
generated from digitized specimen slides, is a rapidly growing
are. Whole-Slide Imaging (WSI) enables medical samples to be
processed and used in diagnostic medicine, leading to more
efficient and scalable processes made possible by deep
learning techniques. Whole-slide, very high-resolution,
images, need to be handled using special techniques that
enable specialists to rapidly focus on the Regions of Interest
(RoI) and identify relevant features for the task at hand. The
efficient manipulation of WSIs, in a clinical setting,
requires efficient system-level protocols and effective
feature detection algorithm that, combined, save time and
increase the productivity of physicians by automating the
triage of WSI for RoIs. The objective of this dissertation is
to develop effective deep learning techniques for the
identification of the regions of interest in whole-slide
images, and to integrate the resulting systems with
client-based viewing software developed by NIH researchers.
The system integration may or may not be included in the
dissertation work, depending on the results obtained during
the RoI identification phase. The selected student will have
access to the facilities of INESC-ID and the MLKD group
(https://mlkd.idss.inesc-id.pt/), including computing
facilities that include two DELL PowerEdge C41402 servers and
eight NVIDIA 32GB Tesla V100S, among other machines. The work
will be developed using the Pytorch or TensorFlow programming
platforms for machine learing and the Observable platform for
data processing.
Deep learning on chaos game representation of genetic
sequences
Supervised by: Arlindo Manuel Limede de Oliveira and Susana
Vinga. Developed in cooperation with the Data Science &
Engineering Research Group of the Nacional Cancer Institute,
headed by Jonas Almeida. Authored by: Vincente Silvestre
Deep learning (DL) has been applied with success to areas as
diverse as computer vision, natural language processing and
protein folding. The ability of deep learning architectures to
derive the appropriate features for classification and
inference enabled these systems to reach unparalleled
performance. However the successful application of deep
learning depends on the existence of an appropriate canonical
representation with built-in structure, in one, two or more
dimensions. The objective of this dissertation is to study the
application of deep learning techniques to genomic data, using
chaos game representation (CGR), an iterated function that
generates bijective maps between symbolic sequences and
cartesian spaces.. The dissertation will study the application
of standard deep learning architectures, such as ResNet of
EfficientNet to the inference of genotype-phenotype
correlation from the chaos game representation of genetic
sequences and mutation signatures. Genetic sequence data for
specific conditions and related information will be selected
from the Cancer Genome Atlas (TCGA), Polygenic Score Catalog
(PGS) databases and Genome-Wide Studies at NCI, and used to
test the DL+CGR approach. The selected student will have
access to the facilities of INESC-ID and the MLKD group
(https://mlkd.idss.inesc-id.pt/), including computing
facilities that include two DELL PowerEdge C41402 servers and
eight NVIDIA 32GB Tesla V100S, among other machines. The work
will be developed using the Pytorch or TensorFlow programming
platforms.
Efficient Algorithms for Medical Image Segmentation
Supervised by Arlindo L. Oliveira and authored by José
Martinho
With the growth in cancer cases and the increasing
expenditures in the healthcare system, it is necessary to
automate processes, aiming for a faster diagnostic and
decrease in expenses. Although current technologies enable to
capture high-resolution 3D images of organs, manual
segmentation of organs and tumours is still a complex process
that requires high expertise. State-of-the-art algorithms are
already very accurate. However, they are very
compute-intensive tasks, leading to the need for expensive
hardware and energy wasting. Coupling state-of-the-art
efficient feature extraction algorithms to the nnUNet
segmentation framework, this work proposes novel efficient
architectures for medical image segmentation. For some tasks,
similar results were achieved using around 30% less Floating
Point Operations (FLOPs) than the baseline nnUNet, also
decreasing the inference time. Morevover, a better performance
then nnUNet was achieved using architectures with slightly
longer inference time.
Using self-supervised contrastive learning to improve medical
image analysis
Supervised by Arlindo Manuel Limede de Oliveira and authored
by Miguel Rasquinho Ferreira
Deep learning architectures, which include convolutional
neural networks and vision transformers, have made it possible
to achieve human-like performance in several medical image
analysis tasks. However, some fields, including medical image
analysis, are limited by the lack of labeled data.
Furthermore, the use of extensive amounts of labeled data to
train deep neural network architectures leads to behaviors and
peculiar characteristics of the classifiers that do not have
parallel in human vision. Self-supervised contrastive learning
is a technique that can be effectively used to train systems
when labeled data does not exist or is sparse. Furthermore,
systems trained using self-supervised contrastive learning
have the potential to exhibit behaviors that are more similar
to the behavior of the primate’s vision system. The objective
of this dissertation is to apply self-supervised contrastive
learning techniques to problems in medical image, namely the
detection of stroke in computed-tomography images of human
brains. The results will be assessed both in term of the
performance attained and the similarity of the features
derived to features that are present in the visual system of
primate brains. The selected student will have access to the
facilities of INESC-ID and the MLKD group
(https://mlkd.idss.inesc-id.pt/), including computing
facilities that include two DELL PowerEdge C41402 servers and
eight NVIDIA 32GB Tesla V100S, among other machines. The work
will be developed using the Pytorch or TensorFlow programming
platforms.
Analysis of visual sensor data for monitoring of open spaces A
Modular Architecture for Model-Based Deep Reinforcement
Learning
Supervised by Arlindo L. Oliveira and authored by João Novo
The objective of this dissertation is to process data received
from distributed sensor arrays in order to infer the level and
characteristics of space usage in urban environments. The
final objective is to derive detailed person and vehicle data
from data obtained by light and sound sensors. The student
will be integrated in a team developing a large scale project
managed by Schréder Hyperion. Project Magellan – Localizable,
interoperable, cyber-safe, resilient, distributed autonomous
and connected urban infrastructure - has as its objective the
development of a new paradigm of urban infrastructure,
resilient, robust, interconnected, open and interoperable that
will support future smart cities. Knowledge of machine
learning, analytics and programming. Interest in data
processing and interest in learning data analysis methods. The
selected student will have access to the facilities of
INESC-ID and the MLKD group (https://mlkd.idss.inesc-id.pt/),
including computing facilities that include two DELL PowerEdge
C41402 servers and eight NVIDIA 32GB Tesla V100S, among other
machines.
Towards Improving Ischemic Stroke Functional Outcome
Prediction with Computed Tomography Brain Scans Using Deep
Learning
Supervised by Arlindo L. Oliveira and Catarina Fonseca and
authored by Gonçalo Oliveira
Stroke is the second leading cause of death and disability, of
all the non transmissible diseases, in the world. Quick
diagnosis and prognosis is of paramount importance given the
rapid degradation of the affected brain and short time frame
available for the recommended treatments. The collection of
computed tomography brain scans is part of the standard
patient care. However, their examination is manually done by
experts. Also, despite containing strong patient functional
outcome predictor features, these are rarely considered by the
currently used clinical models that mostly only use
demographic and clinical patient variables. This work explores
three different approaches to improve on these models,
obtaining results comparable to the state of the art in their
respective categories. In the tabular approach, machine
learning classifiers use the same type of variables used by
the clinical models to predict the functional outcome. In the
imaging approach, the outcome is directly predicted solely
from the patient's brain scans, using deep artificial neural
networks. Here several architectures never before tried in
this task are explored, including multiple instance learning
models and Siamese networks that leverage a useful brain
hemisphere symmetry bias. Finally, in the hybrid approach,
both important clinical features and imaging information are
leveraged and combined in a simpler and more interpretable
manner than that of existing models.
Transaction-Based Entity Monitoring in a Client Due Diligence
Context
Supervised by Arlindo L. Oliveira and Jacopo Bono and authored
by Oleksandr Stopchak
Current solutions to Anti-money Laundering encompass three
different components, namely, transaction monitoring,
screening and Customer Due Diligence. These have been mainly
based on rule systems and human analysts, which can lead to
many false positive alerts and a large load on human
resources. In this work, we explore a novel approach to aid
CDD. To do this, we propose the usage of machine learning
methods to calculate an entity’s risk based on its
transactional behavior by leveraging historical transactions
to generate a Risk Score. First we summarize the transaction
behavior into an embedding using feature engineering. Then we
calculate a risk score that quantifies the dissimilarity of an
entity’s behavior to what is expected using Anomaly Detection
techniques. Finally, with the use of explainability techniques
we clarify the assigned risk score by showing the specifics of
an entity’s behavior that contributed to the final assessment
of our approach. With our proposed method, we can reduce the
burden on human analysts by 1) using machine-learning based
techniques that can identify incorrectly classified, and
therefore, potentially illicit entities by comparing their
transactional behavior to other entities with the same label;
and 2) generating a report with information that can provide a
reasonable explanation for an assigned RS in the form of
visualizations.
Pretraining the Vision Transformer using self-supervised
methods for vision-based deep reinforcement learning
Supervised by Arlindo L. Oliveira and authored by Manuel
Goulão
The Vision Transformer architecture has shown to be
competitive in the computer vision (CV) space where it has
dethroned convolution-based networks in several benchmarks.
Nevertheless, Convolutional Neural Networks (CNN) remain the
preferential architecture for the representation module in
Reinforcement Learning. In this work, we study pretraining a
Vision Transformer using several state-of-the-art
self-supervised methods and assess data-efficiency gains from
this training framework. We propose a new self-supervised
learning method called TOV-VICReg that extends VICReg to
better capture temporal relations between observations by
adding a temporal order verification task. Furthermore, we
evaluate the resultant encoders with Atari games in a
sample-efficiency regime, procgen games for measuring
generalization and an imitation learning task for a fast and
reliable comparison of the representations. Our
data-efficiency results show that the vision transformer, when
pretrained with TOV-VICReg, outperforms the other
self-supervised methods and the non-pretrained vision
transformer but still struggles to overcome a CNN. Our
generalization results show some limitations in our method
when used in more visually complex games which leads to
degradation of the generalization performance. Nevertheless,
we were able to outperform a CNN in two of the ten Atari games
where we perform a 100k steps evaluation and show a consistent
data-efficiency gain in comparison to the non-pretrained
vision transformer. Ultimately, we believe that such
approaches in Deep Reinforcement Learning (DRL) might be the
key to achieving new levels of performance as seen in natural
language processing and computer vision.
Neural Models for Generating Clinically Accurate Chest X-Ray
Reports
Supervised by Arlindo L. Oliveira and Bruno Martins and
authored by André Leite
Image captioning models have been increasing their performance
comprehensively, having shown that artificial intelligence can
achieve successful results in computer vision tasks. However,
there are still some tasks within the range of image
captioning that need more focus, including the automatic
clinical report generation. The automatic generation of
radiology reports based on radiology images has gathered an
increasing amount of focus in the last few years. This is
supported by the repetitive and exhaustive work that these
clinical reports demand. Artificial neural networks that
address this task have been changing over the years, starting
as convolutional neural networks, changing over to
transformer-based models. However, these existing
methodologies focus more on one of two important aspects, that
being the fluency and human-readability capacity of the
generated text, over the clinical efficiency of the model.
Consequently, in this dissertation we propose a model capable
of achieving competitive results regarding the human
readability of the reports, as well as improving clinical
efficiency. We propose to adapt the MedCLIP model to have an
image-text encoder capable of concatenating both image and
text. We further propose that this model works with the
assistance of an Information Retrieval mechanism, to retrieve
reports resulting on similarity evaluation done on an input
x-ray, obtaining the closest reports. On the MIMIC-CXR
dataset, our model has improved on both natural language
processing metrics and clinical efficiency, over
well-established models. Finally, we further show that our
model can lead to more human-readable reports, while keeping
clinical actuality, over most state-of-the-art models.
Old photo and image restoration using deep learning techniques
Supervised by Arlindo L. Oliveira and authored by José Pereira
There are multiple factors that can contribute to the
degradation of an image. The process of recovering such images
to their initial state is called Image Restoration. Nowadays
many deep learning techniques have been proposed that claim to
solve this problem. In this work, I select a few deep learning
models both single (focus only on one type of degradation,
such as super-resolution methods) and mixed degradation (when
tackling all the defects at the same time) achieving
state-of-the-art performance on different restoration tasks
(Deblurring, Denoising, Super-Resolution, etc.), test them on
a synthetically degraded dataset and evaluate them according
to two objective metrics (PSNR and SSIM) as well as
subjectively, through human perception. These are then
combined and compared with the state-of-the-art method in old
photo restoration which comprises an image-to-image
translation framework based on deep latent space translation.
This state-of-the-art approach outperformed all other methods
and combinations of by a large margin.
Using a Siamese Network to Accurately Detect Ischemic Stroke
in Computed Tomography Scans
Supervised by Arlindo L. Oliveira and Catarina Fonseca
authored by Beatriz Vieira
The diagnosis procedure of stroke, a leading cause of death in
the world, involves the acquisition of images using computed
tomography scans, making possible the assessment of the
severity of the incident and the type and location of the
lesion. The fact that the brain has two hemispheres with a
high level of anatomical similarity, exhibiting significant
symmetry, has led to extensive research based on the
assumption that a decrease in symmetry is directly related to
the presence of pathologies. This work is focused on the
analysis of the symmetry (or lack of it) of the two brain
hemispheres, and on the use of this information for the
classification of computed tomography brain scans of stroke
patients. The objective is to contribute to the process of
automatic identification of brain lesions caused by stroke
events. To perform this task, we used the Siamese Network
architecture, which uses two parallel neural networks that
share the same weights. The composed network receives a double
image (the original image and the mirrored one) and a label
that reflects the existence or not of stroke. The network then
extracts the relevant features and classifies the images
taking into account their similarity. The resulting network
can be used to classify unseen scans, depending on the
perceived level of symmetry into one of two existing classes:
evidence of stroke or absence of stroke. The accuracy of the
proposed method is approximately 72%, significantly
outperforming a standard convolutional network architecture,
which was used as a baseline.
Siamese Transformer Networks for Improving Address Matching
Supervised by Arlindo L. Oliveira and authored by André Duarte
Address matching plays a very important role on the daily
activities of post offices and companies responsible for
processing and delivering packages. Address matching is a
subtask of geocoding, and consists in pairing addresses, from
multiple databases, that refer to the same place. Geocoding
aims to assign physical coordinates (latitude and longitude)
to an address so that the routes performed by the delivery-man
can be planned accurately. Errors in the address matching are
quite harmful to this type of companies, in economic,
environmental, or reputational terms. There are several
methodological approaches to perform address matching. Some
methodologies involve doing a standardization of the address
or even parsing the elements of the address, to then perform
elementwise matching. These methodologies are not perfect and
end up needing the manual correction of the address by a
human. This dissertation contributes to the solution of this
problem by presenting a model that executes with success and
efficiency, the task of pairing Portuguese addresses. The
proposed solution fits in the Deep Learning field and has its
main focus on Siamese Neural Networks of Pre-Trained
Transformers. In this field, there are already promising
results for similar tasks, which prove the viability of Deep
Learning models for solving this kind of problem. The obtained
results on a real address matching task proved that the
proposed solution is a promising approach. The model is able
to map the addresses in this dataset with an accuracy never
lower than 94% on Artery level and 90% on Door level.
A Modular Architecture for Model-Based Deep Reinforcement
Learning
Supervised by Arlindo L. Oliveira and authored by Tiago João
Gaspar Ribeiro de Oliveira
The model-based reinforcement learning (MBRL) paradigm, which
uses planning algorithms, has recently achieved unprecedented
results in the area of DRL. These agents are quite complex and
involve multiple components, factors that can create
challenges for research. In this work, we propose a modular
software architecture (our implementation can be found in
https://github.com/GaspTO/Modular_MBRL) suited for these types
of agents, which makes possible the implementation of
different algorithms and for each component to be easily
configured (such as different exploration policies, search
algorithms...). We illustrate the use of this architecture by
implementing several algorithms and experimenting with agents
created using different combinations of these. We also suggest
a new simple search algorithm called averaged minimax that
achieved good results in this work. Our experiments also show
that the best algorithm combination is problem-dependent.
Improving the Performance of Deep Neural Networks in Vision
Tasks with Attention Mechanisms.
Supervised by Arlindo L. Oliveira and authored by Rafael
Gamanho Pedro
There is no single precise definition of "attention" for
neural networks. Broadly, attention mechanisms are neural
network layers that aggregate information from the entire
input data. They do so according to the specific problem
addressed, as it depends on the input data, such as a phrase
or an image. This work will focus on the computer vision task.
Attention mechanisms have gained traction in natural language
processing, yet their use in computer vision has been on the
rise for a few years. This usage of attention mechanisms is
somewhat recent and has been advancing quickly, with new
architectures published often. This thesis aims to study and
compare different attention mechanisms to improve the
performance in image classification tasks. Three use cases
related to medical imaging will be used to ensure the benefits
attention mechanisms bring to real-world scenarios. The
results show that there are scenarios where attention
mechanisms improve the performance on medical datasets.
However, the performance increase was not as consistent as
expected. The experiments also show that attention mechanisms
need more data than their conventional counterparts.
Using Knowledge Graphs to model Digital Footprints
Supervised by Arlindo L. Oliveira and authored by André Carlos
Ruano Andrade Cavalheiro
The new generations are born into a world where the internet
is a natural extension of the real world. The online logs
created throughout their lives might remain long after they
are gone - detailed information about their everyday activity.
Currently, corporations use this data to predict short-term
actions in order to maximize the use of their services, which
is but one of many use-cases that such an opportunity
presents. Knowledge graphs, which have been the target of
intensive research in recent years, were used in this work to
model personal data. The project aims to create a framework
for centralizing a person's logs originating from multiple
sources on the web. Specifically, this work makes the
following contributions: 1. Developed a framework to store a
person's records into a usable and interpretable structure,
providing a review of its possibilities and limitations with
hopes of guiding future research. 2. Created a proof of
concept made from a single user's data downloaded from five of
the most widely used online platforms. 3. Performed
experiments using pre-established models based on the concepts
of metapaths, to explore the interactions between entities in
the network and explore its semantic and structural value.
Evaluating generalization in Deep Reinforcement Learning with
Procedural Generated Environments
Supervised by Arlindo L. Oliveira and authored by Miguel
Borges Freire
Deep Reinforcement Learning agents, mainly those who learn
from visual observations, often fail to transfer their
knowledge to unseen environments. In games, standard Deep
Reinforcement Learning protocols commonly promote testing in
the same set of levels used in training. This practice leads
an agent to easily overfit a given training set, failing to
transfer its knowledge to out of distribution levels. To
overcome this problem, we construct two separate training and
test sets using procedurally generated environments from the
Procgen Benchmark. We use this benchmark to measure the extent
of overfitting and systematically study the effects of using
regularization and data augmentation methods on the capacity
of the agent to generalize. We found that, in general, using
regularization and data augmentation improves generalization,
with an efficacy that is dependent on the environment's
dynamics. Furthermore, we study how network architectural
decisions such as the depth and the width of the convolutional
network, the usage of pooling layers, skip-connections, and
modifications of the classification layer affect
generalization. Finally, we empirically demonstrate that
convolutional neural networks with small kernels in the early
convolutional layers can accomplish the same generalization
level as a deeper residual model.
Combining off and on-policy training in Deep Reinforcement
Supervised by Arlindo L. Oliveira and authored by Alexandre
João Gomes Borges
MuZero is able to master both Atari games and board games by
learning a model of the environment, that is then used with
Monte Carlo Tree Search (MCTS) to decide what move to play in
each position. During tree search, the algorithm simulates
games by exploring several possible moves, and afterwards
picks the action that corresponds to the most promising
trajectory. Even though not all trajectories from these
simulated games are useful, none of them are used for
training. Using these trajectories would provide more data,
more quickly, leading to faster convergence and sample
efficiency. Recent work introduced an off-policy value target
for AlphaZero that uses data from simulated games. Similarly,
in this work, we propose a way to obtain off-policy targets by
using data from simulated games in MuZero. We combine these
off-policy targets with the on-policy targets already used in
MuZero in several ways, and study the impact of these targets
and their combinations in two environments with distinct
characteristics.
Application of Deep Learning Techniques to the Diagnosis of
Medical Images
Supervised by Arlindo L. Oliveira and authored by Pedro Miguel
Carreto Vaz
Diabetic Retinopathy (DR) is the leading cause of visual
disability worldwide. Although it is highly treatable when
diagnosed in its earlier stages, there is currently a need of
cheaper and more accurate ways to do so. Medical images have
been used in diagnosis for a long time. Recent advancements in
the computer vision field have shown remarkable results
through the use of Convolutional Neural Networks, that have
been able to reach state-of-the-art results in image
segmentation. In this master's thesis, we implemented a V-Net
like architecture in Python and study how image preprocessing
techniques to highlight lesions associated with DR, and
different optimization metrics have an impact on its results.
The results show that the impact of this variables changes
according to the lesion that we try to segment and that the
V-Net is capable of obtaining good results for some of the
segmentation problems.
MyWatson: A system for interactive acess of personal records
Supervised by Arlindo L. Oliveira and authored by Pedro Miguel
dos Santos Duarte
With the number of photos people take growing, it’s getting
increasingly difficult for a common person to manage all the
photos in its digital library, and finding a single specific
photo in a large gallery is proving to be a challenge. In this
thesis, the MyWatson system is proposed, a web application
leveraging content-based image retrieval, deep learning, and
clustering, with the objective of solving the image retrieval
problem, focusing on the user. MyWatson is developed on top of
the Django framework, a high-level Python Web framework, and
revolves around automatic tag extraction and a friendly user
interface that allows users to browse their picture gallery
and search for images via query by keyword. MyWatson’s
features include the ability to upload and automatically tag
multiple photos at once using Google’s Cloud Vision API,
detect and group faces according to their similarity by
utilizing a convolution neural network, built on top of Keras
and Tensorflow, as a feature extractor, and a hierarchical
clustering algorithm to generate several groups of clusters.
Besides discussing state-of-the-art techniques, presenting the
utilized APIs and technologies and explaining the system’s
architecture with detail, a heuristic evaluation of the
interface is corroborated by the results of questionnaires
answered by the users. Overall, users manifested interest in
the application and the need for features that help them
achieve a better management of a large collection of photos.
Automatic Annotation of Unstructured Fields in Medical
Databases
Supervised by Arlindo L. Oliveira and Maria Luísa Torres
Ribeiro Marques da Silva Coheur. Authored by Margarida Andreia
Rosa Correia
The increased use of systems based on Electronic Health
Records caused an enormous increment of information available
electronically, which can be processed by Data Mining
techniques, leading to relevant findings. The expected result
was that this information becomes easy to access, analyze and
share. However, the text present in the clinical notes is
written in natural language, and is, thus, unstructured, and
difficult to automatically process. These clinical notes might
contain pertinent data for the health of the patient. In this
thesis, with the help of Natural Language Processing and
Information Extraction techniques, we present a system that,
given a clinical note, extracts relevant named entities from
it, such as names of diseases, symptoms, treatments, diagnosis
and drugs, generating structured information from unstructured
free text. In addition, in order to avoid privacy issues and
considering that these clinical notes might contain references
to names of patients, doctors or another health professionals,
we also present an anonymization step. Finally, we add a
module that automatically corrects typos from these medical
notes. Final results show that the system, in general, is able
to recognize and interpret medical entities.
Biological Data Processing Using Grid Technologies
Supervised by Arlindo L. Oliveira and authored by Sérgio
Mendes Costa
At present there is a growing interest in the development of
systems in which scientific analysis with high computing or
data storage and processing requirements can be performed. The
cluster and Grid computing technologies have emerged has the
best support infrastructures for this type of systems.
Biological sciences are among those who have been benefiting
more from the advancement of these technologies, namely in the
study of gene expression mechanisms. In that sense, the
discovery of transcription factor binding sites and the
analysis of gene expression data are particularly relevant. In
the first case, we usually search for short segments of DNA,
known as motifs, that are well conserved. In the second case,
we usually analyze microarray data using data mining
techniques like biclustering. In the context of this thesis,
efficient algorithms for motif inference in gene promoter
regions and for the analysis of gene expression data were made
available in the hermes cluster of Instituto Gulbenkian de
Ciência. The algorithms were developed in the context of the
BioGrid - Parallel Algorithms for Gene Annotation project.
During this work, the necessary tasks of implementing,
installing and testing were performed, as well as the
development of Web interfaces and documentation for every
program. In addition to that, a study was conducted in which
the model-based testing technique was used to evaluate the
software. The algorithms created in the context of the BioGrid
project are now available in a reliable, integrated and
user-friendly system for a large community of Bioinformatics
users.
Modelling and Inference of Gene Regulatory Networks
Supervised by Arlindo L. Oliveira and authored by José Miguel
Ranhada Vellez Caldas
A current problem in biology is how to find adequate models
for the dynamics of gene regulatory networks. Recent
technological advancements allow for the measurement of gene
mRNA levels, in a population of cells, over a period of time.
Given a particular gene regulatory network, time series for
its components, and a parametrizable mathematical model,
optimization algorithms may be used to fit the model's
parameters to the observed dynamics. This is useful for
validating both the hypothetical network and its model, and
for providing new insights about the underlying biological
system. In this thesis I analyze two case studies: the SOS DNA
damage repair network in E. coli and a hypothetical network
for the transcriptional regulation of the gene Flr1's response
to oxidative stress in yeast, induced by the drug Mancozeb.
For the SOS network, I use a known piecewise-linear model and
the parameter inference algorithm BFGS. I compare two
adaptations of piecewise linear models, obtaining a general
form that encompasses both, and I describe a new version of an
optimization algorithm that may be used for inferring
parameters in that model. These results are applied to the
Flr1 network. Both models are used to extract information that
is confirmed by biological literature.
Pathological Analysis of Tissues using Deep Neural Networks
Supervised by Arlindo L. Oliveira and João Cassis. Authored by
Xavier Abreu Dias
Pathological images or biopsy images are samples of tissues
from a specific location of a human or animal body.
Pathological analysis is necessary whenever there are any
lesions or any indicative symptoms for a certain disease, like
cancer or the presence of bacteria in tissues. Nowadays, a
large amount of biopsies per day is requested and sent for
analysis by pathologists in order to make a diagnosis. This
process can be difficult, time-consuming, and requires
experience in detecting abnormal tissues. With the advances of
technology, powerful scanners have been developed that have
the ability to amplify 40× and digitize whole slide images,
being able to see at the 250 µm scale. The state-of-the-art
supervised Deep Learning methods applied to slide images
classification or disease detection use mostly deep
annotations (rich annotations), i.e. specific information
where the disease is located if any. This dissertation aims to
contribute with a semi-supervised architecture that enables
models to be built, using Multiple Instance Learning and
Online Hard Example Mining, from weakly-annotated (that inform
whether the whole slide has or not the disease) datasets. The
whole architecture presented in this dissertations consists of
an application of these semi-supervised methods on a deep
architecture with an attention module. The whole architecture
is fit based on a set of biopsy images provided by Hospital da
Luz (Lisbon), whose some instances contain helicobacter
pylori, achieving an accuracy of 91.67% and capturing all
positive ones.
Deep Convolutional Encoder-Decoder Architectures for
Clinically Relevant Coronary Artery Segmentation
Supervised by Arlindo L. Oliveira and Mário Alexandre Teles de
Figueiredo. Authored by João Lourenço Coelho da Silva
X-ray coronary angiography is a crucial clinical procedure for
the diagnosis and treatment of coronary artery disease, which
accounts for roughly 16\% of global deaths every year.
However, the images acquired in this procedure have low
resolution and poor contrast, making lesion detection and
assessment challenging. Accurate coronary artery segmentation
not only helps mitigate these problems, but also allows the
extraction of relevant anatomical features for further
analysis by quantitative methods. Although automated
segmentation of coronary arteries has been proposed before,
previous approaches have used non-optimal segmentation
criteria, leading to less useful results. Most methods either
segment only the major vessel, discarding important
information from the remaining ones, or segment the whole
coronary tree, based mostly on contrast information, producing
a noisy output that includes vessels that are not relevant for
diagnostic nor therapeutic purposes. In this work, vessels are
segmented according to their clinical relevance, using a
segmentation criterion developed in collaboration with expert
cardiologists. Additionally, the catheter, whose diameter is
known and provides a scale factor that may be useful for
diagnosis, is segmented simultaneously. To derive the optimal
approach, an extensive comparative study of encoder-decoder
architectures was conducted. Based on the UNet++, a new
computationally efficient and high-performing decoder
architecture is proposed, the EfficientUNet++. Combined with
EfficientNet encoders, the EfficientUNet++ establishes a line
of efficient and high-performing segmentation models, whose
best-performing member achieves a generalized dice score of
0.9202 +/- 0.0356, and artery and catheter class dice scores
of 0.8858 +/- 0.0461 and 0.7627 +/- 0.1812, respectively.
Automated Assessment of Coronary Artery Stenosis in X-ray
Angiography using Deep Neural Networks
Supervised by Arlindo L. Oliveira and Mário Alexandre Teles de
Figueiredo. Authored by Dinis Lourenço Tavares Rodrigues
Several methods for quantitative severity assessment of
coronary artery stenosis exist as well as different measures,
leading to distinct management of treatment procedures. It is
of upmost importance to properly identify and classify all
possible stenosis on an individual. A deep-learning three-step
framework implementation was designed to automate the
detection and assessment of stenosis severity. This study
showcases a new clinically obtained dataset of properly
de-identified X-ray invasive coronary angiography (ICA)
sequences of 438 patients from Hospital de Santa Maria. For
each sequence, radio-opaque contrast filled frames were
annotated, defining full stenosis visibility with stenosis
bounding boxes being annotated by an expert physician on
reference frames followed by image processing techniques for
propagation at each frame. Transfer learning dynamics of deep
neural networks are exploited for supervised learning at each
step, employing CNN's for angle view selection of the
Left/Right Coronary Artery (LCA/RCA) achieving 0.97 Accuracy,
single-shot detectors for stenosis detection achieving
0.83/0.81 mAR for LCA/RCA respectively and a new region of
interest boost approach with CNN's for stenosis severity
regression of the RCA was explored. Our method showcases the
importance of transfer learning in stenosis severity
assessment with limited data, achieving considerable
performances. To the best of the author's knowledge, this is
the first time that iFR was used as a metric for stenosis
severity assessment tasks using deep learning techniques.
Applying Deep Learning to Medical Images
Supervised by Arlindo L. Oliveira and Mário Alexandre Teles de
Figueiredo. Authored by Ricardo Jorge da Silva Diniz
Deep convolutional networks have recently been embraced by the
academic community as a competitive solution for visual
recognition tasks. Among these networks, the fully
convolutional neural networks have been gaining traction as
they drop the traditional fully-connected layers of CNNs in
favor of more convolutional layers. The original fully
convolutional network, using layer skipping, was capable of
achieving great results when provided enough samples. This
architecture was extended into the U-Net which outperforms the
FCNN, while being both faster and less computationally
cumbersome than it. Both architectures are designed to work
with 2D input images. However most medical images, such as
ultrasounds and MRIs, are 3D. Built upon the underlying
principles beyond the U-Net and the FCNN, the V-Net was
created. It is a volumetric FCNN which introduces a new
objective function, discards pooling layers in favor of more
convolutional layers and performs residual propagation. V-Nets
have achieved a good performance across all visual recognition
tasks, being comparable to the state-of-the-art solutions
while requiring a fraction of the processing time. In this
thesis several variants of U-Net and V-Net are implemented to,
firstly, attest to their good performance on visual
segmentation tasks of medical data, and, secondly, to assess
how the objective function, kernel’s receptive fields,
residual propagation, activation functions and optimization
method impact the model’s performance. A secondary objective
of this thesis is to bridge the gap between theoretical
knowledge and practical implementations by analyzing Google’s
Tensorflow API, which was designed specifically for
distributed computing based machine learning.
Imputation Techniques for Clinical Data of Ischemic Stroke
Patients
Supervised by Arlindo L. Oliveira and Alexandre Paulo Lourenço
Francisco. Authored by Filipa de Matos Marques
In the 21st century, every year, approximately 880 thousand
people living in Europe suffer an ischemic stroke. Predicting
the patient’s outcome is key to choosing the course of
treatment. In this master thesis, it was predicted the
functional outcome, by the binary version, of the modified
Rankin Scale at two points in time: three months and one year
after the stroke took place. Often, data provided by health
organisations to conduct these studies is incomplete which can
impair the results. Thus the need arises to choose a proper
way to handle the missing data. Here missing values were
imputed with six different methods and the classifiers were
then trained with seven distinct machine learning models. It
was shown the area under the receiver operating characteristic
curve for the best classifiers, at the three months and
one-year marks, are 0.8217 and 0.7537, respectively. Moreover,
it was not found a statistically significant difference
between the performance of the distinct imputation methods for
each machine learning model.
WeatherIST - iOS application for detailed weather prevision in
Continental Portugal
Supervised by Arlindo L. Oliveira and authored by Tiago João
Alves Duarte
This work is the result of a need to move the meteorological
forecast system developed by METEO-IST to an iOS application.
METEO-IST is a weather computational server owned by IST that
calculates with great accuracy the different weather
conditions (rain, wind, humidity, etc.) anywhere within the
Portuguese continental territory. The predictions are
calculated frequently (at every 15 minute intervals) and
exhibit a great precision, distinguishing it from other global
meteorological systems that are currently available. Currently
the system makes the forecasts available through the group's
website. At the request of several users, the objective was to
create a native iOS application for the iPhone that provides
these forecasts by taking advantage of the device’s
capabilities.
Predicting Frequency and Claims of Health Insurance with
Machine Learning techniques
Supervised by Arlindo L. Oliveira and Luís Miguel Veiga Vaz
Caldas de Oliveira. Authored by Pedro Octávio Couto Gonçalves
In the health insurance industry, policies are typically one
year contracts that are renewed after these twelve months. In
Multicare, this renewal starts to be negotiated at the end of
the first nine months of the current annuity. At this point it
is necessary to set a prediction of how the present annuity
will end, i.e, there is the need to forecast the loss ratio of
the last three months of the annuity considering the loss
ratios of the first nine months. This problem is currently
handled using a time series algorithm, ARIMA, that forecasts
future loss ratios considering only the past ones and ignoring
all other external information that can also prove useful in
predicting the behaviors of the insured population, both in
terms of frequency of usage of the insurance and in terms of
the cost of medical acts. This study incorporates a wide
variety of external variables coming from different sources in
the traditional datasets of Multicare and performs a
comparison between several types of tree-based machine
learning models, aiming to find the ones that lead to better
performances in predicting claims and costs of the insured
population. The main contribution of this work is the proposal
of a new prediction model for the claims and costs of the
insured population of health insurance and its inevitable
comparison with the model that is currently in production in
Multicare, based on ARIMA time series.