CS and AI Talks

This is bi-weekly seminar series from Ph.D. & Post-doctoral students. The idea is to know about current topics in CS and AI. It is open to anyone who wants to listen to a presentation and give a speech in the world as far as there is a relation with CS.

Meeting Room & Passcode: 318218

Join our Slack

Speaker Application form

Organizers

Berrenur Saylam & Burak Suyunu

Speakers and Topics

15 June 2023, 19.00 UTC+3h, Poster

Exploiting Clustering Patterns in Training Sets to Improve Classification Performance of Fully Connected Layers by Tolga Ahmet Kalaycı

Abstract: Fully connected layers are used in almost all neural network architectures ranging from multilayer perceptrons to deep neural networks. These layers allow any kind of interaction between features without making any assumption about the structure of the data. Thanks to this property, with sufficient complexity, fully connected layers are expected to learn any kind of patterns. Practical experience has revealed that this theoretical potential is often not realized. Success of convolutional and recursive layers and findings of many studies have proven that the intrinsic structure of a dataset holds a great potential to improve the success of a classification problem. These layers basically take advantage of the inductive bias based on spatial or sequential structures of specific data types such as text, image, video etc. Also, leveraging clustering to explore and exploit this intrinsic structure in classification problems has been the subject of various studies. In this thesis, two different methods are proposed. Both methods aim to improve the classification performance of fully connected layers by feeding them a prior information about the clustering stucture embedded in the training dataset. The first method is a regularization method that focuses on improving the classification results in case of high variance. The second method concentrates on making better predictions in case of high bias.

Bio: Tolga Ahmet Kalaycı received his B.S. degree from the Department of Industrial Engineering, Istanbul Technical University in 2008, and M.Sc. degree from Department of Industrial Engineering, Boğaziçi University in 2011. He conducted his Ph.D. studies on fully connected layers and graduated in 2023 at Department of Industrial Engineering, Istanbul Technical University. In this presentation, he will be presenting two papers that cover his thesis researches.

26 April 2023, 21.00 UTC+3h, Poster

Conformational Variability Analysis of Biomolecules in Cryogenic Electron Tomography by Mohamad Harastani

Abstract: Cryogenic electron tomography (cryo-ET) allows visualizing biomolecular complexes in situ. 3D data of biomolecules produced using cryo-ET are noisy, suffer from spacial anisotropies, and are difficult to analyze individually. Biomolecules are flexible, and analyzing their conformational variability is necessary to understand their functional mechanisms. Standard cryo-ET data processing methods average multiple copies of individual biomolecules to obtain structures at higher resolutions and consider that biomolecular conformational variability is discrete rather than continuous using the classification. In my thesis, I introduced the first two cryo-ET data processing methods for analyzing biomolecular continuous conformational variability, HEMNMA-3D and TomoFlow. HEMNMA-3D analyzes experimental data with the motion directions simulated by Normal Mode Analysis and allows the discovery of a large range of biomolecular motions. TomoFlow extracts motions from the data using the computer vision technique of Optical Flow. I showed the potential of these two methods on experimental cryo-ET data of nucleosome conformational variability in cells. The two methods show coherent results, shedding light on the conformational variability of nucleosomes in cells.

Bio: Mohamad has obtained his Ph.D. in BioImage Informatics from Sorbonne University, Paris in 2022. He is currently a postdoctoral researcher at the IGBMC, Strasbourg. He works on developing image analysis methods and software to analyze biomedical images and address biological questions.

12 April 2023, 20.30 UTC+3h, Poster

End-to-End Deep Multi-Modal Physiological Authentication With Smartbands by Deniz Ekiz

Abstract: The number of fitness tracker users increases every day. Most of the applications require authentication to protect privacy-preserving operations. Biometrics such as face images have been used widely as login tokens, but they have privacy issues. Moreover, occlusions like face masks used for COVID may reduce their effectiveness. Smartbands can track heart rate, movements, and electrodermal activities. They have been widely used for health-related applications. The use of smartbands for authentication is in the exploratory stage. Physiological signals gathered from smartbands may be used to create a multi-modal and multi-sensor authentication system. The popularity of smartbands enables us to deploy new applications without a need to buy additional hardware.

Bio: Deniz Ekiz received the M.S. degree from the Department of Computer Engineering, Boğaziçi University, Turkey, in 2019, where he is currently pursuing the Ph.D. degree. His research is focused on the applications of wearable technology.

29 March 2023, 20.30 UTC+3h, Poster

The Lawful Use of Data in Machine Learning by Osman Gazi Güçlütürk

Abstract: Training machine learning models requires processing large datasets of different types. Depending on the type of a given data point in a dataset or the jurisdiction in which the model will be used, the applicable legal framework changes. Models requiring text-based or image-based inputs, such as ChatGPT and Dall.E, could raise concerns in terms of copyright law, whereas profiling models could violate privacy and/or personal data protection rules. Given the highly fractioned and complex nature of our current regulatory framework and the cross-border nature of data transfers in machine learning applications, it is impossible to exhaustively map every type of data and the applicable legal framework thereto. Such an analysis should be made on a case-by-case basis. Still, there are some general legal frameworks that should be taken into account in almost all applications. In this talk, we will explore some of these generally applicable rules under Turkish law in an attempt to assess whether and under what conditions a given piece of data can be used lawfully to train machine learning models.

Bio: Osman Gazi Güçlütürk graduated from Galatasaray University, Faculty of Law in 2014. Subsequently, he obtained his M.A. from Ankara University, LL.M. from LSE, and MJur from the University of Oxford, in respective order. In 2021, Dr. Güçlütürk completed his Ph.D. from Galatasaray University with his thesis on the lawful use of data in machine learning-based artificial intelligence systems. Dr. Güçlütürk is currently working as an Assistant Professor of IT Law at Boğaziçi University, Faculty of Law, and as Visiting Fellow at Yale Law School’s Information Society Project.

Our meetings were postponed due to the Kahramanmaraş centered earthquakes that took place on February 6.

25 January 2023, 21.00 UTC+3h, Poster

A Local Search Approach To Boolean Satisfiability Problems: WalkSAT by Mehmet Akif Çördük

Abstract: In this talk, we will be discussing the NP-complete Boolean Satisfiability(SAT) problems. We will briefly introduce and compare different type of approaches such as DPLL, CDCL(conflict driven clause learning) and WalkSAT. We will discuss the implementation details of WalkSAT algorithm and discuss its advantages.

Bio: Akif is an alumni of Boğaziçi University. He is part of the Developer Technology Engineering team of NVIDIA Europe, Middle East, and Africa. Previously, he wrote his master’s thesis on GPU-based pricing engines, and later continued working on the same topic. Akif has a background in computer engineering and software engineering. He’s interested in parallel algorithms, GPU performance optimization, and combinatorial optimization.

5 January 2023, 19.00 UTC+3h, Poster

Natural Language Processing in Law & Judicial Judgement Prediction by Handan Güler

Abstract: There has been a growing interest for prediction of outcomes in court cases for a number of reasons including long prosecution processes and high litigation costs. Furthermore, the number of precedent cases is continuously growing, making it difficult for law professionals to spot relevant ones. Recent developments in Natural Language Processing (NLP) and Deep Learning (DL) increased the number of application thereof in the legal domain where the focus is mainly on text classification, information extraction and information retrieval and building of prediction models to predict litigation outcomes. In this talk, we will be exploring NLP and DL learning applications in law through a in-depth review of a journal article “Natural language processing in law: Prediction of outcomes in the higher courts of Turkey” by Mumcuoğlu et al., published in 2021. We will discuss how the legal system corpus was handled, how the collected data was pre-processed and we will talk about various methods of word embedding used to convert text into vectors. Furthermore, we will observe the performance of prediction models using Gated Recurrent Units, Long Short-term Memory and Bidirectional Long Short-term Memory, which were employed as classification methods in the subject study. We will also discuss results of two more studies by Wang et al. (2020) and Chen et al. (2019), which use the same legal dataset to predict judgement in judicial cases

Bio: In 2006, Handan Güler graduated from the Boğaziçi University Civil Engineering Undergraduate Program, and in 2010, received M.Sc. in Building/Architectural Engineering from Politecnico di Milano. In addition, she completed a non-thesis master’s program in Construction Management at Boğaziçi University in 2017. She is currently a Ph.D. student in the Civil Engineering Department at Boğaziçi University. Her research focuses on construction disputes and prediction of dispute occurence using Machine Learning and Deep Learning techniques. Furthermore, she has been working professionally since 2011, and she is currently the Assistant Contract Manager in the Marmaray Project.

15 December 2022, 19.00 UTC+3h, Poster

Targeted Drug Design: A Language-based Approach by Gökçe Uludoğan

Abstract: The development of novel compounds targeting proteins of interest is one of the most important tasks in the pharmaceutical industry. Recently, deep generative models have been applied to targeted molecular design and have shown promising results. In this talk, we will present a language based formulation to target specific drug design and a deep generative model exploiting pretrained biochemical language models.

Bio: Gökçe Uludoğan is currently a Ph.D. student in Computer Engineering department at Boğaziçi University. She received her B.S. and M.S. degrees in 2018 and 2021 from Boğaziçi University. Her research interests include deep learning, cheminformatics and natural language processing.

25 November 2022, 21.00 UTC+3h, Poster

Detecting Suicidal Ideation on Forums by Ahmet Emre Aladağ

Abstract: Some people with suicidal ideation express their intentions on online platforms such as Twitter, Facebook or Reddit. In this talk, we’ll be exploring how we can utilize Text Mining and Machine Learning methods to predict potential suicidal ideation. Such a prediction system may allow providing prompt support before it becomes too late for the person with suicidal ideation.

Bio: Emre is a PhD candidate in Boğaziçi University Computer Science Department. He works on predicting psychological attributes from user-generated data. He is interested in Text Mining, Machine Learning and Computational Social Science.

11 November 2022, 19.00 UTC+3h, Poster

A Semi-supervised Dependency Parsing Approach for Low-resource Languages by Şaziye Betül Özateş

Abstract: Code-switching dependency parsing stands as a challenging task due to both the scarcity of necessary resources and the structural difficulties embedded in code-switched languages. In this talk, we present novel sequence labeling models to be used as auxiliary tasks for dependency parsing of code-switched text in a semi-supervised scheme. We then discuss how auxiliary task usage enhances the performance of an LSTM-based dependency parsing model and compare its success with that of an XLM-R-based model with significantly more computational and space complexity.

Bio: Şaziye Betül Özateş is a computer scientist who specializes in natural language processing and machine learning. She completed her B.S. and M.S. studies in 2012 and 2014 in the Computer Engineering Department at Boğaziçi University where she was also a member of TABILAB. She recently received her Ph.D. degree under the guidance of Prof. Arzucan Özgür and Prof. Tunga Güngör in computer engineering from Boğaziçi University. From 2020 to 2022, she was a researcher in the Institute of Natural Language Processing at the University of Stuttgart. Her research focuses on improving automatic processing of low-resource natural languages with the help of various deep learning methods. Currently, she continues her research at KUIS-AI Center as a post-doctoral research fellow.

21 October 2022, 21.00 UTC+3h, Poster

Topic Modeling: Clustering with Labels by Burak Suyunu

Abstract: Topic models are often used to organize and interpret large and unstructured corpora of text documents. They try to explain the topics that constitute the semantic infrastructure of the document sets and try to find the distributions of these topics for the documents. Because of its unsupervised nature, the outputs of a topic model has to be interpretable to represent its success. However, the results of a topic model are usually weakly correlated with human interpretation.

In this talk, we will first provide a background on how topic models work and explain traditional topic modeling approaches such as NMF, LSA and LDA. Then we will present a semisupervised topic model called Theme Supervised Nonnegative Matrix Factorization that can benefit from labeled documents to improve and facilitate the interpretation of the topics. We will finish the talk with the recent topic models such as BERTopic and Top2Vec that utilizes word embeddings, language models, dimensionality reduction and clustering techiques.

Bio: Burak Suyunu received his B.S. and M.S. degrees in computer engineering from Boğaziçi University. Currently while pursuing his Ph.D. in computer engineering, he works as a teaching assistant in Boğaziçi University. His research interests include bioinformatics, NLP and machine learning.

5 October 2022, 21.00 UTC+3h, Poster

Wearable Computing for Health by Berrenur Saylam

Abstract: Wearable devices have become part of daily lives because of their low cost, small size, and computational power. They provide continuous measurements of physiological changes in the human body. While it is possible to extract knowledge about activity recognition, it is also possible to detect changes and measure behaviors, thoughts, and feelings based on the body’s response to those conditions. This talk will introduce the topic with current possible application domains and share case study results.

Bio: Berrenur Saylam is currently a Ph.D. student in Computer Engineering department at Boğaziçi University. She received B.S. in Computer Engineering and Industrial Engineering from Galatasaray University and M.S. in Computer Engineering from ENS de Lyon and Galatasaray University. Her research interests include wearable computing, machine learning, and federated learning.

5 September 2022, 21.00 UTC+3h, Poster

Multiplicity in the Partitioning of Signed Graphs by Nejat Arınık

Abstract: According to the structural balance theory, a signed graph is considered structurally balanced when it can be partitioned into a number of modules such that positive edges are located inside the modules and negatives ones are in-between them. In practice, real-world networks are rarely perfectly balanced. When it is not the case, one wants to measure the magnitude of the imbalance and to identify the set of edges related to the network imbalance. The Correlation Clustering (CC) problem is precisely defined as finding the partition with minimal imbalance.

Signed graph partitioning is an important task, which has many applications, as finding a balanced partition helps understanding the system modeled by the graph. However, the standard approach used in the literature is to find a single partition and focus the rest of the analysis on it, as if it was sufficient to fully characterize the studied system. Yet, it may not reflect the meso-structure of the network, and one may need to seek for other partitions to build a better picture. Although this need to look for multiplicity is extremely important from the end user’s perspective, only a very few works took it into consideration in their analysis, up to now.

One particular situation, where we want to relax this traditional single-partition assumption to allow searching for multiple partitions, arises in the context of the CC problem. When solving an instance of such problem, several or even many optimal partitions may coexist. If multiple optimal partitions coexist, one can then wonder how different/diverse they are. Put differently, we want to know what we loose when considering only one partition, while there might be multiple ones. In order to answer these questions, one should ideally enumerate completely the space of optimal partitions, and perform its analysis. To this end, we propose a new efficient solution space enumeration method and a cluster analysis-based framework in order to first enumerate the space of optimal partitions and then empirically study such space. Based on our empirical study, our main finding is the identification of 4 different situations: 1) unique solution; 2) single class of similar solutions; 3) several classes of similar solutions; 4) multiple solutions without a clear clustering structure.

Bio: Nejat Arınık is currently a post-doctoral researcher at TETIS in Montpellier in France. He received his Ph.D. in Computer Science from Avignon University in France in 2021. Before his Ph.D., he received his B.S. and M.S. degrees in computer engineering from Galatasaray University and INSA Lyon, respectively. His current research interests include data mining, complex network analysis, and operations research.

30 June 2022, 19.00 UTC+3h

Metamorphic Relations via Relaxations: An Approach to Obtain Oracles for Action-Policy Testing by Hasan Ferit Enişer

Abstract: Testing is a promising way to gain trust in a learned action policy, in particular if the policy is a neural network. A ‘‘bug’’ in this context constitutes undesirable or fatal policy behavior, for example, satisfying a failure condition. But how do we distinguish whether such behavior is due to bad policy decisions, or whether it is actually unavoidable under the given circumstances? This requires knowledge about optimal solutions, which defeats the scalability of testing. Related problems occur in software testing when the correct program output is not known. Metamorphic testing addresses this issue through metamorphic relations, specifying how a given change to the input should affect the output, thus providing an oracle for the correct output. Yet, how do we obtain such metamorphic relations for action policies? Here, we show that the well explored concept of relaxations in the Artificial Intelligence community can serve this purpose. We also design fuzzing strategies for test-case generation. In experiments on three single-agent games, our technology is able to effectively identify true bugs, i.e., avoidable failures of the policy under test, which has not been possible until now.

Bio: I am a third year Ph.D. student at MPI-SWS in Germany. Broadly, I work in the intersection of software testing/verification and artificial intelligence. I develop testing and verification techniques to ensure the dependability of AI-enabled systems. I am advised by Dr. Maria Christakis and work in the “Practical Formal Methods” group at MPI-SWS. I graduated from Boğaziçi University in 2017 with a M.Sc. in computer science. Before that, I did my B.Sc. studies in computer science at the same university.

16 June 2022, 20.30 UTC+3h

The Hitchhiker’s Guide to the Computational Genomics by Hakime Öztürk

Abstract: Technological advances in genomics have led to a rapid growth of biological data from large numbers of samples. The application of modern machine learning methods, such as deep learning, allows utilizing these very large data sets to find hidden structure within them and to make accurate predictions. In this talk, we first provide a background on various genomics research questions and then discuss some of the current research papers on the application of deep learning methodologies that aim to address these.

Bio: Hakime Öztürk is currently a post-doctoral researcher at German Cancer Research Center (DKFZ). She received her Ph.D. in Computer Engineering from Boğaziçi University in 2019. Her research interests include application of machine learning methodologies to biological problems.

26 May 2022, 20.30 UTC+3h

Federated Learning and Applications by Cihat Keçeci

Abstract: Increasing amount of distributed and private data enables the training of sophisticated machine learning models. Conventional machine learning models generally need a centralized dataset for training the model. However, the data may be distributed among different users, and it may not be possible to collect the dataset into a centralized machine due to privacy or other requirements. Federated Learning enables distributed training of machine learning models by sharing only the model parameters through a central server while keeping the client data private. In this talk, we will provide an introduction to Federated Learning by examining the common problems in the Federated Learning scenarios with the proposed solutions.

Bio: Cihat Keçeci received his B.S. (Hons.) and M.S. degrees in electrical and electronics engineering from Boğaziçi University, İstanbul, Turkey. He was with the Scientific and Technological Research Council of Turkey TÜBİTAK, Informatics and Information Security Research Center (BİLGEM). He is currently pursuing his Ph.D. degree in electrical and computer engineering with Texas A&M University, College Station, TX, USA. His research interests include machine learning, smart grids, and wireless communications.

28 April 2022, 20.30 UTC+3h

Behavioral Biometrics for User Authentication and Identification by Sümeyye Ağaç

Abstract: Nowadays, mobile and wearable devices and technologies have become a part of our daily lives because of their commoditization and comfort. Traditional user authentication systems use passwords or PINs to protect user-sensitive data and provide a personalized user experience in several services (e.g., personal emails, banks, and social networking accounts) in such devices. For security reasons, these passwords must be unique for each service, and each password must be long enough and composed of different character types, making it difficult to remember. With the help of biometrics (biological metrics), we move from “something the user knows” to “something the user is”. In this talk, we will discuss how systems can recognize their users based on the unique characteristics of them rather than relying on their knowledges. More precisely, we will concentrate on authentication and identification using behavioral biometrics (e.g., gait, keystroke dynamics), how they are used, their advantages and disadvantages, and, finally, give future directions on this topic.

Bio: Sümeyye Ağaç received her BS and MS degrees in Computer Engineering from Galatasaray University, Turkey, in 2016 and 2019, respectively. Currently, she is a Ph.D. student in Computer Engineering department of Boğaziçi University in Turkey. Her current research interests are human activity recognition, wearable computing and user authentication.

7 April 2022, 20.30 UTC+3h

Local Search Heuristics and Meta Heuristics by Mehmet Akif Çördük

Abstract: Combinatorial optimization problems are NP-hard problems with very large search space. It is impractical to solve instances with exact methods. Heuristics and meta-heuristics are helpful to explore the search space efficiently and find good local minimums. In this talk, we will be exploring local search heuristics and meta heuristics with a focus on vehicle routing algorithms. We will also talk about parallelization strategies and show possible implementations on Cuda architecture.