Domagoj Matijević

Associate Professor

Vice-Dean for Relations with Public Institutions and Industry

domagoj@mathos.hr

+385-31-224-825

9 (entrance through lobby 4) (ground floor)

Google Scholar Profile

School of Applied Mathematics and Informatics

Josip Juraj Strossmayer University of Osijek

Research Interests

Machine Learning
Optimization
Application of ML in bioinformatics

Degrees

PhD in CS, Max-Planck-Institute for Computer Science (Algorithms and Complexity), Saarbrücken, 2007.
- Geometric Optimization and Querying — Exact and Approximate (1.3MB)
  Thesis for obtaining the degree of a Doctor of the Engineering Sciences (Dr.-Ing.) of the natural-technical faculties of Saarland University
MS in CS, Saarland University, Germany, 2002.
BS in Mathematics and Computer Science, Department of Mathematics, University of Osijek, Croatia, 2001.

Publications

Journal Publications

D. Krupić, D. Matijević, N. Šuvak, J. Maltar, D. Ševerdija, Evaluating the agreement between human preferences, GPT-4V and Gemini Pro Vision assessments: Can AI recognize what people might like?, Computers in Human Behavior: Artificial Humans 6/ (2025)

Abstract

This study aims to introduce a methodology for assessing the agreement between AI and human ratings, specifically focusing on visual large language models (LLMs). This paper presents empirical findings on the alignment between ratings generated by GPT-4 Vision (GPT-4V) and Gemini Pro Vision with human subjective evaluations of environmental visuals. Using photographs of restaurant interior design and food, the study estimates the degree of agreement with human preferences. The intraclass correlation reveals that GPT-4V, unlike Gemini Pro Vision, achieves moderate agreement with participants’ general restaurant preferences. Similar results are observed for rating food photos. Additionally, there is good agreement in categorizing restaurants into low-cost, mid-range and exclusive categories based on interior quality. Finally, differences in ratings were observed at the subsample level based on age, gender, and socioeconomic status across the human sample and LLMs. The results of repeated-measures ANOVAs indicate varying degrees of alignment between humans and LLMs across different sociodemographic characteristics. Overall, GPT-4V currently demonstrates limited ability to provide meaningful ratings of visual stimuli compared to human ratings and performs better in this task compared to Gemini Pro Vision.
B. Borozan, T. Prusina, L. Borozan, D. Ševerdija, F. Rojas Ringeling, D. Matijević, S. Canzar, Optimal marker genes for c-separated cell types with SepSolve , Genome Research 35/ (2025), 2770-2780

Abstract

The identification of cell types in single-cell RNA-seq studies relies on the distinct expression signature of marker genes. A small set of target genes is also needed to design probes for targeted spatial transcriptomic experiments and to target proteins in single-cell spatial proteomics or for cell sorting. Although traditional approaches have relied on testing one gene at a time for differential expression between a given cell type and the rest, more recent methods have highlighted the benefits of a joint selection of markers that together distinguish all pairs of cell types simultaneously. However, existing methods either consider all pairs of individual cells, which becomes intractable even for medium-sized data sets, or ignore intra-cell-type expression variation entirely by collapsing all cells of a given type to a single representative. Here, we address these limitations and propose to find a small set of genes such that cell types are c-separated in the selected dimensions, a notion introduced previously in learning a mixture of Gaussians. To this end, we formulate a linear program that naturally takes into account expression variation within cell types without including each pair of individual cells in the model, leading to a highly stable set of marker genes that allow to accurately discriminate between cell types and that can be computed to optimality efficiently.
D. Matijević, A Python Package for Well-Separated Pair Decomposition, Journal of Open Research Software 12/1 (2024)

Abstract

Well-separated pair decomposition (WSPD) is a well-known geometric decomposition used for encoding distances, introduced in 1995 by Paul B. Callahan and S. Rao Kosaraju in a seminal paper. We implemented this remarkable decomposition following the nontrivial algorithm for computing the partial fair split tree of a point set presented in the original article. Our implementation is done in C++. In addition to that, we made further effort to publish it as a Python package on PyPI. By doing so, we made our software easily accessible on Windows, Linux, or macOS to researchers and students worldwide.
M. Blažević, S. Canzar, K. Elbassioni, D. Matijević, Anti Tai mapping for unordered labeled trees, Information Processing Letters 185/ (2024)

Abstract

The well-studied Tai mapping between two rooted labeled trees defines a one-to-one mapping between nodes in the first tree and the nodes in the second tree that preserves ancestor relationship. For unordered trees the problem of finding a maximum-weight Tai mapping is known to be NP-complete. In this work, we define an anti Tai mapping as a binary relation between two unordered labeled trees such that any two violate ancestor relationship and thus cannot be part of the same Tai mapping. Finding a maximum-weight anti Tai mapping arises in the cutting plane method for solving the maximum-weight Tai mapping problem via integer programming. We give an efficient polynomial-time algorithm for finding a maximum-weight anti Tai mapping for the case when one of the two trees is a path and further show how to extend this result in order to provide a polynomially computable lower bound on the optimal anti Tai mapping for two unordered labeled trees.
S. Canzar, V. Hoan Do, S. Jelić, S. Laue, D. Matijević, T. Prusina, Metric multidimensional scaling for large single-cell datasets using neural networks, Algorithms for Molecular Biology 19/21 (2024)

Abstract

Metric multidimensional scaling is one of the classical methods for embedding data into low-dimensional Euclidean space. It creates the low-dimensional embedding by approximately preserving the pairwise distances between the input points. However, current state-of-the-art approaches only scale to a few thousand data points. For larger data sets such as those occurring in single-cell RNA sequencing experiments, the running time becomes prohibitively large and thus alternative methods such as PCA are widely used instead. Here, we propose a simple neural network-based approach for solving the metric multidimensional scaling problem that is orders of magnitude faster than previous state-of-the-art approaches, and hence scales to data sets with up to a few million cells. At the same time, it provides a non-linear mapping between high- and low-dimensional space that can place previously unseen cells in the same embedding.
L. Borozan, F. Rojas Ringeling, S. Kao, E. Nikonova, P. Monteagudo, D. Matijević, M.L. Spletter, S. Canzar, Counting pseudoalignments to novel splicing events , Bioinformatics 39/7 (2023)

Abstract

Motivation Alternative splicing (AS) of introns from pre-mRNA produces diverse sets of transcripts across cell types and tissues, but is also dysregulated in many diseases. Alignment-free computational methods have greatly accelerated the quantification of mRNA transcripts from short RNA-seq reads, but they inherently rely on a catalog of known transcripts and might miss novel, disease-specific splicing events. By contrast, alignment of reads to the genome can effectively identify novel exonic segments and introns. Event-based methods then count how many reads align to predefined features. However, an alignment is more expensive to compute and constitutes a bottleneck in many AS analysis methods. Results Here, we propose fortuna, a method that guesses novel combinations of annotated splice sites to create transcript fragments. It then pseudoaligns reads to fragments using kallisto and efficiently derives counts of the most elementary splicing units from kallisto’s equivalence classes. These counts can be directly used for AS analysis or summarized to larger units as used by other widely applied methods. In experiments on synthetic and real data, fortuna was around 7× faster than traditional align and count approaches, and was able to analyze almost 300 million reads in just 15 min when using four threads. It mapped reads containing mismatches more accurately across novel junctions and found more reads supporting aberrant splicing events in patients with autism spectrum disorder than existing methods. We further used fortuna to identify novel, tissue-specific splicing events in Drosophila. Availability and implementation fortuna source code is available at https://github.com/canzarlab/fortuna.
D. Ševerdija, T. Prusina, L. Borozan, D. Matijević, Efficient Sentence Representation Learning via Knowledge Distillation with Maximum Coding Rate Reduction, CIT. Journal of Computing and Information Technology 31/4 (2023)

Abstract

Addressing the demand for effective sentence representation in natural language inference problems, this paper explores the utility of pre-trained large language models in computing such representations. Although these models generate high-dimensional sentence embeddings, a noticeable performance disparity arises when they are compared to smaller models. The hardware limitations concerning space and time necessitate the use of smaller, distilled versions of large language models. In this study, we investigate the knowledge distillation of Sentence-BERT, a sentence representation model, by introducing an additional projection layer trained on the novel Maximum Coding Rate Reduction (MCR2) objective designed for general-purpose manifold clustering. Our experiments demonstrate that the distilled language model, with reduced complexity and sentence embedding size, can achieve comparable results on semantic retrieval benchmarks, providing a promising solution for practical applications.
R. Čorić, D. Matijević, D. Marković, PollenNet - a deep learning approach to predicting airborne pollen concentrations, Croatian Operational Research Review 14/1 (2023), 1-13

Abstract

The accurate short-term forecasting of daily airborne pollen concentrations is of great importance in public health. Various machine learning and statistical techniques have been employed to predict these concentrations. In this paper, an RNN-based method called PollenNet is introduced, which is capable of predicting the average daily pollen concentrations for three types of pollen: ragweed (Ambrosia), birch (Betula), and grass (Poaceae). Moreover, two strategies incorporating measurement errors during the training phase are introduced, making the method more robust. The data for experiments were obtained from the RealForAll project, where pollen concentrations were gathered using a Hirst-type 7-day volumetric spore trap. Additionally, five types of meteorological data were utilized as input variables. The results of our study demonstrate that the proposed method outperforms standard models typically used for predicting pollen concentrations, specifically the pollen calendar method, pollen predictions based on patterns, and the naive approach.
D. Matijević, D. Vukičević, The Connection of the Generalized Robinson–Foulds Metric with Partial Wiener Indices, Acta Biotheoretica 71/5 (2023)

Abstract

In this work we propose the partial Wiener index as one possible measure of branch- ing in phylogenetic evolutionary trees. We establish the connection between the gen- eralized Robinson–Foulds (RF) metric for measuring the similarity of phylogenetic trees and partial Wiener indices by expressing the number of conflicting pairs of edges in the generalized RF metric in terms of partial Wiener indices. To do so we compute the minimum and maximum value of the partial Wiener index W(T, r, n), where T is a binary rooted tree with root r and n leaves. Moreover, under the Yule probabilistic model, we show how to compute the expected value of W(T, r, n). As a direct consequence, we give exact formulas for the upper bound and the expected number of conflicting pairs. By doing so we provide a better theoretical understand- ing of the computational complexity of the generalized RF metric.
D. Matijević, Well-Separated Pair Decompositions for High-Dimensional Datasets, Algorithms 16/5 (2023)

Abstract

Well-separated pair decomposition (WSPD) is a well known geometric decomposition used for encoding distances, introduced in a seminal paper by Paul B. Callahan and S. Rao Kosaraju in 1995. WSPD compresses O(n^2) pairwise distances of n given points from R^d in O(n) space for a fixed dimension d. However, the main problem with this remarkable decomposition is the “hidden” dependence on the dimension d, which in practice does not allow for the computation of a WSPD for any dimension d>2 or d>3 at best. In this work, I will show how to compute a WSPD for points in R^d and for any dimension d. Instead of computing a WSPD directly in R^d, I propose to learn nonlinear mapping and transform the data to a lower-dimensional space R^d′, d′=2 or d′=3, since only in such low-dimensional spaces can a WSPD be efficiently computed. Furthermore, I estimate the quality of the computed WSPD in the original R^d space. My experiments show that for different synthetic and real-world datasets my approach allows that a WSPD of size O(n) can still be computed for points in R^d for dimensions d much larger than two or three in practice.
S. Jelić, S. Laue, D. Matijević, P. Wijerama, A Fast Parallel Implementation of a PTAS for Fractional Packing and Covering Linear Programs, International Journal of Parallel Programming 43/5 (2015), 840-875

Abstract

We present a parallel implementation of the randomized (1+ε) approximation algorithm for packing and covering linear programs presented by Koufogiannakis and Young (2007). Their approach builds on ideas of the sublinear time algorithm of Grigoriadis and Khachiyan’s (Oper Res Lett 18(2):53–58, 1995) and Garg and Könemann’s (SIAM J Comput 37(2):630–652, 2007) non-uniform-increment amortization scheme. With high probability it computes a feasible primal and dual solution whose costs are within a factor of 1+ε of the optimal cost. In order to make their algorithm more parallelizable we also implemented a deterministic version of the algorithm, i.e. instead of updating a single random entry at each iteration we updated deterministically many entries at once. This slowed down a single iteration of the algorithm but allowed for larger step-sizes which lead to fewer iterations. We use NVIDIA’s parallel computing architecture CUDA for the parallel environment. We report a speedup between one and two orders of magnitude over the times reported by Koufogiannakis and Young (2007).
S. Funke, T. Malamatos, D. Matijević, N. Wolpert, Conic nearest neighbor queries and approximate Voronoi diagrams, Computational Geometry - Theory and Applications 48/2 (2015), 76-86

Abstract

Given a cone C and a set S of n points in R^d, we want to preprocess S into a data structure so that we can find fast an approximate nearest neighbor to a query point q with respect to the points of S contained in the translation of C with apex at q. We develop an approximate conic Voronoi diagram of O˜(n/eps^d) size that supports conic nearest neighbor queries in O(log⁡(n/eps)) time. Our preprocessing uses only the well-separated pair decomposition and a compressed quadtree. Previous results were restricted to simplicial cones and achieved polylogarithmic or higher query times. By increasing space to O˜(n/eps^{2d}) our data structure further supports queries for any cone direction and angle.
D. Matijević, D. Ševerdija, G. Martinović, Efficient Implementations of Guarding 1.5D Terrains, Croatian Operational Research Review 6/1 (2015), 79-89

Abstract

In 1.5D Terrain Guarding Problem we are given an $x$-monotone polygonal line defined by $k$ vertices and a set $G$ of points from the terrain, i.e. guards, and a set $N$ of points from the terrain which are to be seen (guarded) by guards. We deal with a weighted version of the guarding problem where guards $G$ have weights and the goal is to find a minimum weight subset of $G$ to cover, and a version of the problem where points from $N$ have demands, and the goal is to find the smallest subset from $G$ such that every point in $N$ is seen by the demanded number of guards. Both problems are NP-hard and have a factor $5$ approximation (cite{journals/alg/Elbassioni08}, cite{journal/Elbassioni12}). We show that if $(1+epsilon)$-approximate solver to the corresponding linear program is a computer, for any $epsilon>0$, an extra $1+epsilon$ factor will appear in the final approximation factor for both problems. We compare our parallel implementation based on GPU and CPU threads with textsc{Gurobi} solver and conclude that our algorithm outperforms Gurobi solver on large and dense inputs typically by one order of magnitude.
N. Čerkez, R. Čorić, M. Đumić, D. Matijević, Finding an optimal seating arrangement for employees traveling to an event, Croatian Operational Research Review 6/2 (2015), 419-427

Abstract

The paper deals with modelling a specific problem called the Optimal Seating Arrangement (OSA) as an Integer Linear Program and demonstrated that the problem can be efficiently solved by combining branch-and-bound and cutting plane methods. OSA refers to a specific scenario that could possibly happen in a corporative environment, i.e. when a company endeavors to minimize travel costs when employees travel to an organized event. Each employee is free to choose the time to travel to and from an event and it depends on personal reasons. The paper differentiates between using different travel possibilities in the OSA problem, such as using company assigned or a company owned vehicles, private vehicles or using public transport, if needed. Also, a user-friendly web application was made and is available to the public for testing purposes.
K. Elbassioni, S. Jelić, D. Matijević, The relation of Connected Set Cover and Group Steiner Tree, Theoretical Computer Science 438/ (2012), 96-101

Abstract

We report that the Connected Set Cover (CSC) problem is just a special case of the Group Steiner Tree (GST) problem. Based on that we obtain the first algorithm for CSC with polylogarithmic approximation guarantee as well as the first approximation algorithms for the weighted version of the problem and the version with requirements. Moreover, we argue that the inapproximability result of GST will carry on to the weighted version of the CSC problem.
R. Beier, S. Funke, D. Matijević, P. Sanders, Energy-Efficient Paths In Radio Networks, Algorithmica 61/2 (2011), 289-319

Abstract

We consider a radio network consisting of n stations represented as the complete graph on a set of n points in the Euclidean plane with edge weights $omega(p, q) = |pq|^delta +C_p$, for some constant $delta > 1 and nonnegative offset costs $C_p$. Our goal is to find paths of minimal energy cost between any pair of points that do not use more than some given number $k$ of hops. We present an exact algorithm for the important case when $delta = 2$, which requires $O(kn log n)$ time per query pair $(p, q)$. For the case of an unrestricted number of hops we describe a family of algorithms with query time $O(n^(1+alpha))$, where $alpha > 0$ can be chosen arbitrarily. If we relax the exactness requirement, we can find an approximate $(1+eps)$ solution in constant time by querying a data structure which has linear size and which can be build in $O(n log n)$ time. One tool we employ might be of independent interest: For any pair of points $(p, q)$ we can report in constant time the cluster pair (A,B) representing $(p, q)$ in a well-separated pair decomposition of P.
K. Elbassioni, E. Krohn, D. Matijević, J. Mestre, D. Ševerdija, Improved Approximations for Guarding 1.5-Dimensional Terrains, Algorithmica 60/2 (2011), 451-463

Abstract

We present a 4-approximation algorithm for the problem of placing the fewest guards on a 1.5D terrain so that every point of the terrain is seen by at least one guard. This improves on the currently best approximation factor of 5. Unlike most of the previous techniques, our method is based on rounding the linear programming relaxation of the corresponding covering problem. Besides the simplicity of the analysis, which mainly relies on decomposing the constraint matrix of the LP into totally balanced matrices, our algorithm, unlike previous work, generalizes to the weighted and partial versions of the basic problem.
D. Matijević, R. Osbild, Finding the Theta-Guarded Region, Computational Geometry - Theory and Applications 43/2 (2010), 207-218

Abstract

We are given a finite set of points (emph{guards}) in the plane. A Theta-cone is a cone with apex angle Theta. We call a Theta-cone empty, if it does not contain any guards in its interior. A point p is called Theta-guarded, if every Theta-cone with apex located at p is non-empty. The set of all Theta-guarded points is called the Theta-guarded region, or the $Theta$-region for short. The rationale behind this model is that a point is well-guarded only if it is guarded from all sides. In this paper we show the bound for the complexity of the Theta-region as well as present the algorithms for computing it.
J. Maue, P. Sanders, D. Matijević, Goal Directed Shortest Path Queries Using Precomputed Cluster Distances, ACM Journal of Experimental Algorithmics 14/3 (2009)

Abstract

We demonstrate how Dijkstra's algorithm for shortest path queries can be accelerated by using precomputed shortest path distances. Our approach allows a completely flexible tradeoff between query time and space consumption for precomputed distances. In particular, sublinear space is sufficient to give the search a strong "sense of direction". We evaluate our approach experimentally using large, real-world road networks.
S. Laue, D. Matijević, Approximating k-hop Minimum Spanning Trees in Euclidean metrics, Information Processing Letters 107/3-4 (2008), 96-101

Abstract

In the minimum-cost k-hop spanning tree (k-hop MST) problem, we are given a set of n points in a metric space, a positive small integer k and a root point r. We are interested in computing a rooted spanning tree of minimum cost such that the longest root-leaf path in the tree has at most k edges. We present a polynomial-time approximation scheme for the plane. Our algorithm is based on Arora's et al. technique for the Euclidean k-median problem.
S. Funke, D. Matijević, P. Sanders, Constant Time Queries for Energy Efficient Paths in Multi-Hop Wireless Networks, CIT. Journal of Computing and Information Technology 16/2 (2008), 119-130

Abstract

We investigate algorithms for computing energy efficient paths in ad-hoc radio networks. We demonstrate how advanced data structures from computational geometry can be employed to preprocess the position of radio stations in such a way that approximately energy optimal paths can be retrieved in constant time, i.e., independent of the network size. We put particular emphasis on actual implementations which demonstrate that large constant factors hidden in the theoretical analysis are not a big problem in practice.
F. Eisenbrand, S. Funke, A. Karrenbauer, D. Matijević, Energy-Aware Stage Illumination, International Journal of Computational Geometry & Applications 18/1-2 (2008), 107-129

Abstract

Consider the following illumination problem: given a stage represented by a line segment and a set of lightsources represented by a set of points in the plane, assign powers to the lightsources such that every point on the stage receives a sufficient amount, let's say one unit, of light while minimizing the overall power consumption. By assuming that the amount of light arriving from a fixed lightsource decreases rapidly with the distance from the lightsource, this becomes an interesting optimization problem. We propose to reconsider the classical illumination problems as known from computational geometry literature under this light attenuation model. This paper examines the simple problem introduced above and presents different solutions, based on convex optimization, discretization and linear programming, as well as a purely combinatorial approximation algorithm.

Refereed Proceedings

T. Prusina, B. Borozan, D. Matijević, L. Borozan, J. Maltar, S. Canzar, Accelerating Marker Gene Selection in Scrna-Seq, 2025 MIPRO 48th ICT and Electronics Convention, Opatija, Croatia, 2025, 1257-1262

Abstract
Single-cell RNA sequencing (scRNA-seq) generates vast amounts of data, enabling gene expression profiling at the resolution of individual cells. Accurate classification of cells by their tissue of origin relies on selecting informative marker genes that distinguish cell types while minimizing redundancy. However, identifying an optimal subset of genes presents a challenging combinatorial optimization problem in scRNA-seq analysis. In this paper, we focus on enhancing scGeneFit, a state-of-the-art marker gene selection method. We introduce optimized solvers and nearest-neighbor search techniques to improve both accuracy and computational efficiency. Through systematic benchmarking on real-world datasets, we demonstrate that our modifications yield superior performance compared to the original implementation.
B. Borozan, L. Borozan, D. Ševerdija, F. Rojas Ringeling, D. Matijević, Optimal Marker Genes for c-Separated Cell Types, RECOMB 2025, Seoul, Republika Koreja, 2025

Abstract
The identification of cell types in single-cell RNA-seq studies relies on the distinct expression signature of marker genes. A small set of target genes is also needed to design probes for targeted spatial transcriptomics experiments and to target proteins in single-cell spatial proteomics or for cell sorting. While traditional approaches have relied on testing one gene at a time for differential expression between a given cell type and the rest, more recent methods have highlighted the benefits of a joint selection of markers that together distinguish all pairs of cell types simultaneously. However, existing methods either impose constraints on all pairs of individual cells which becomes intractable even for medium-sized datasets, or ignore intra-cell type expression variation entirely by collapsing all cells of a given type to a single representative. Here we address these limitations and propose to find a small set of genes such that cell types are c-separated in the selected dimensions, a notion introduced previously in learning a mixture of Gaussians. To this end, we formulate a linear program that naturally takes into account expression variation within cell types without including each pair of individual cells in the model, leading to a highly stable set of marker genes that allow to accurately discriminate between cell types and that can be computed to optimality efficiently.
D. Ševerdija, T. Prusina, A. Jovanović, L. Borozan, J. Maltar, D. Matijević, Compressing Sentence Representation with Maximum Coding Rate Reduction (Best paper award in AIS - Artificial Intelligence Systems track), ICT and Electronics Convention (MIPRO), 2023 46th MIPRO, Opatija, Hrvatska, 2023

Abstract
In most natural language inference problems, sentence representation is needed for semantic retrieval tasks. In recent years, pre-trained large language models have been quite effective for computing such representations. These models produce high-dimensional sentence embeddings. An evident performance gap between large and small models exists in practice. Hence, due to space and time hardware limitations, there is a need to attain comparable results when using the smaller model, which is usually a distilled version of the large language model. In this paper, we assess the model distillation of the sentence representation model Sentence-BERT by augmenting the pre-trained distilled model with a projection layer additionally learned on the Maximum Coding Rate Reduction (MCR2) objective, a novel approach developed for general purpose manifold clustering. We demonstrate that the new language model with reduced complexity and sentence embedding size can achieve comparable results on semantic retrieval benchmarks.
B. Borozan, L. Borozan, D. Ševerdija, D. Matijević, S. Canzar, Fortuna Detects Novel Splicing in Drosophila scRNASeq Data, ICT and Electronics Convention (MIPRO), 2023 46th MIPRO, Opatija, Hrvatska, 2023, 410-415

Abstract
Recent developments in single-cell RNA sequencing techniques (scRNASeq) have made large quantities of sequenced data available across numerous species and tissues. Alternative splicing (AS) of pre-mRNA introns varies between tissues and even between cell-types and can be altered in disease. The study of novel AS, using standard RNASeq data, has been extensively studied for many years, while similar work on scRNASeq data has been scarce, despite its potential to offer a broader insight into cell-type specific processes. In this paper, we propose a novel pipeline that uses fortuna, a method that efficiently classifies and quantifies novel AS events, to process scRNASeq samples. Due to its short lifespan, high number of progeny, low maintenance cost, and intricate alternative splicing patterns similar in complexity to those of mammals, Drosophila Melanogaster (fruit fly) is a species of particular interest to researchers. Therefore, we experimentally evaluate our pipeline on real-world Drosophila single-cell data samples from the Fly Cell Atlas.
J. Maltar, D. Matijević, Optimization techniques for image representation in visual place recognition, 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, 2022, 877-882

Abstract
In visual place recognition we aim to match a given query image from a query database with the most appropriate reference image from a reference database. One of the main issues is how to represent a place. Although an ordinary RGB representation can represent a place, various, either handcrafted or learned representations such as deep convolutional neural networks achieve better quantitative results. By using optimization techniques, both convex and non-convex, we can adapt a place representation such that it fits into the problem of visual place recognition. Therefore, in this paper we examine numerous optimization techniques and incorporate them in the context of our problem. Quantitatively, in terms of the area under a curve (AUC) measure, conducted experiments show how such optimized representation outperforms unoptimized one.
A. Jovanović, I. Alqassem, N. Chappell, S. Canzar, D. Matijević, Predicting RNA splicing branchpoints, 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, 2022, 383-388

Abstract
RNA splicing is a process where introns are removed from pre-mRNA, resulting in mature mRNA. It requires three main signals, a donor splice site (5’ss), an acceptor splice site (3’ss) and a branchpoint (BP). Splice site prediction is a well-studied problem with several reliable prediction tools. However, branchpoint prediction is a harder problem, mainly due to varying nucleotide motifs in the branchpoint area and the existence of multiple branch-points in a single intron. An RNN based approach called LaBranchoR was introduced as the state-of-the-art method for predicting a single BP for each 3’ss. In this work, we explore the fact that previous research reported that 95% of introns have multiple BPs with an estimated average of 5 to 6 BPs per intron. To that end, we extend the existing encoder in the LaBranchoR network with a PointerNetwork decoder. We train our new encoder-decoder model, named RNA PtrNets, on 70-nucleotide-long annotated sequences taken from three publicly available datasets. We evaluate its accuracy and demonstrate how well the predictor can generate multiple branchpoints on the given datasets.
V. Hoan Do, M. Blažević, P. Monteagudo, L. Borozan, K. Elbassioni, S. Laue, F. Rojas Ringeling, D. Matijević, S. Canzar, Dynamic pseudo-time warping of complex single-cell trajectories, 23nd Annual International Conference on Research in Computational Molecular Biology, The George Washington University, 2019, 294-297

Abstract
Single-cell RNA sequencing enables the construction of trajectories describing the dynamic changes in gene expression underlying biological processes such as cell differentiation and development. The comparison of single-cell trajectories under two distinct conditions can illuminate the differences and similarities between the two and can thus be a powerful tool. Recently developed methods for the comparison of trajectories rely on the concept of dynamic time warping (dtw), which was originally proposed for the comparison of two time series. Consequently, these methods are restricted to simple, linear trajectories. Here, we adopt and theoretically link arboreal matchings to dtw and propose an algorithm to compare complex trajectories that more realistically contain branching points that divert cells into different fates. We implement a suite of exact and heuristic algorithms suitable for the comparison of trajectories of different characteristics in our tool Trajan. Trajan automatically pairs similar biological processes between conditions and aligns them in a globally consistent manner. In an alignment of singlecell trajectories describing human muscle differentiation and myogenic reprogramming, Trajan identifies and aligns the core paths without prior information. From Trajan’s alignment, we are able to reproduce recently reported barriers to reprogramming. In a perturbation experiment, we demonstrate the benefits in terms of robustness and accuracy of our model which compares entire trajectories at once, as opposed to a pairwise application of dtw. Trajan is available at https://github.com/canzarlab/Trajan.
L. Borozan, D. Matijević, S. Canzar, Properties of the generalized Robinson-Foulds metric, 42nd International Convention - MIPRO 2019, Opatija, 2019, 330-335

Abstract
Comparing hierarchical structures is a problem with many applications in various fields of biology. In this work we address the problem of comparing phylogenetic trees and quantifying their dissimilarities. The most commonly applied measure of similarity between phylogenetic trees is the Robinson Foulds (RF) metric. The Jaccard-Robinson-Foulds (JRF) metric (of order k) has been recently proposed as a generalization of the RF metric that preserves its widely appreciated properties but increases its resolution and robustness. Here, we conduct thorough experimental analysis of the JRF metric and variations thereof on both real world and simulated data. Our main aim is to deepen the understanding of the properties of this generalized RF metric in comparison to the classical RF metric and other matching based distance measures. To compute the JRF distance between trees, we employ the recently proposed branch-and-cut solver Trajan.
S. Jelić, D. Matijević, The relation of Connected Set Cover and Group Steiner Tree, Conference on Applied Mathematics and Scientific Computing 2011, Trogir, 2011

Abstract
Let $U$ be the universe of elements which have to be covered, $mathcal{S}$ family of subsets of $U$ and $G=(mathcal{S},E)$ connected graph on vertex set $mathcal{S}$. We say that subfamily $mathcal{R}subseteqmathcal{S}$ is emph{connected set cover} (CSC) if every $uin U$ is covered by at least one set from $mathcal{R}$ and subgraph $G[mathcal{R}]$ induced by $mathcal{R}$ is connected. The problem is to find connected set cover with respect to $(U,mathcal{S},G)$ with minimum number of sets (vertices). On the other hand, suppose that we are given a graph $G$ with edge weight function $w:E(G)rightarrowR^+$ and family of subsets of vertices $mathcal{G} = {g_1,g_2,ldots,g_k},quad g_isubset V$ which will be called groups. The well known and well studied emph{Group Steiner Tree} (GST) is to find a subtree $T$ that minimizes the weight function $sum_{ein E(T)}w(e)$ such that $V(T)cap g_ineqemptyset$ for all $iin{1,ldots,k}$. We showed in our work that CSC is equivalent to the variant of GST when all edge weights equal to $1$. Hence, all algorithms for GST immediately apply for CSC problem as well. As a result, we obtain an approximation algorithm for CSC problem with approximation ratio $O(log^2mloglog mlog n)$ where $n$ is the size of universe $U$ and $m$ is the size of family $mathcal{S}$. This is the first polylogarithmic approximation algorithm for CSC problem. Natural generalization of CSC problem is to associate the nonnegative weight function with sets in $mathcal{S}$. Weighted CSC problem assumes finding of connected set cover that minimizes the total weight of subfamily $mathcal{R}$. We showed that this problem can be solved by reduction to the fault-tolerant version of Group Steiner problems for which $O(sqrt{m}log m)$ approximation algorithm is known. We also consider generalization of CSC problem where each element $u$ from universe has requirement $r_u$ on number of sets covering element $u$ associated. We showed the reduction of this problem to the variant of GST problem with requirements associated to the groups for which $O(log^2 mloglog mlog(Rcdot n))$ approximation algorithm is known, where $R$ dentes the largest requirement.
D. Matijević, G. Martinović, P. Taler, DISTRIBUTER - The Distributed System for Efficient Execution of Parallel Programs, 33rd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, 2010
K. Elbassioni, D. Matijević, D. Ševerdija, I. Ali, Guarding 1.5D Terrains with Demands, 26th European Workshop on Computational Geometry (EuroCG), Dortmund, 2010
K. Elbassioni, D. Matijević, J. Mestre, D. Ševerdija, Improved approximations for guarding 1.5-dimensional terrains, Symposium on Theoretical Aspects of Computer Science (STACS), Freiburg, 2009

Abstract
We present a 4-approximation algorithm for the problem of placing a fewest guards on a 1.5D terrain so that every point of the terrain is seen by at least one guard. This improves on the currently best approximation factor of 5. Our method is based on rounding the linear programming relaxation of the corresponding covering problem. Besides the simplicity of the analysis, which mainly relies on decomposing the constraint matrix of the LP into totally balanced matrices, our algorithm, unlike previous work, generalizes to the weighted and partial versions of the basic problem.
H. Bast, S. Funke, D. Matijević, TRANSIT: Ultrafast Shortest-Path Queries with Linear-Time Preprocessing, The Shortest Path Problem: Ninth DIMACS Implementation Challenge, Rutgers University, Piscataway, NJ , 2009, 175-192

Abstract
We introduce the concept of transit nodes, as a means for preprocessing a road network, with given coordinates for each node and a travel time for each edge, such that point-to-point shortest-path queries can be answered extremely fast. The transit nodes are a set of nodes, as small as possible, with the property that every shortest path that is non-local in the sense that it covers a certain not too small euclidean distance passes through at least on of these nodes. With such a set and precomputed distances from each node in the graph to its few, closest transit nodes, every non-local shortest path query becomes a simple matter of combining information from a few table lookups. For the US road network, which has about 24 million nodes and 58 million edges, we achieve a worst-case query processing time of about 10 microseconds (not milliseconds) for 99% of all queries. This improves over the best previously reported times by two orders of magnitude.
S. Laue, D. Matijević, Approximating k-hop Minimum Spanning Trees in Euclidean metrics, 19th Canadian Conference on Computational Geometry (CCCG), Ottawa, 2007
H. Bast, S. Funke, D. Matijević, P. Sanders, D. Schultes, In Transit to Constant Time Shortest-Path Queries in Road Networks, 9th Workshop on Algorithm Engineering and Experimentation (ALENEX), New Orleans, 2007
J. Maue, P. Sanders, D. Matijević, Goal Directed Shortest Path Queries Using Precomputed Cluster Distances, 5th International Workshop on Experimetal Algorithms (WEA), Menorca Island, 2006, 316-327
F. Eisenbrand, S. Funke, A. Karrenbauer, D. Matijević, Energy-Aware Stage Illumination, 21st ACM Symposium on Computational Geometry (SoCG), Pisa, Italy, 2005
S. Funke, D. Matijević, P. Sanders, Constant Time Queries for Energy Efficient Paths in Multi-Hop Wireless Networks, AlgorithmS for Wireless And mobile Networks (A_SWAN), Boston, 2004
S. Funke, D. Matijević, P. Sanders, Approximating Energy Efficient Paths in Wireless Multi-Hop Networks, 11th Annual European Symposium on Algorithms (ESA), Budapest, 2003, 230-241

Others

L. Borozan, D. Matijević, S. Canzar, Combinatorial optimization algorithms for (pseudo)alignment in bioinformatics (2021)

Abstract
The field of bioinformatics is a fast growing interdisciplinary field with a strong contribution from mathematics and computer science. This thesis will deal with mathematical problems and algorithmic challenges from that field. Its first focus will be the comparison of hierarchic structures, mainly phylogenetic trees, which is used to explain various biological processes such as the evolution of the species. We will study mathematical models and algorithmic techniques which quantify the distance between such structures as means of determining the similarities or dissimilarities between them. The focus will be given to formulating the problem based on matching in the context of integer linear programming. Our goal will be to find a novel solution which respects the ancestry relations defined by those hierarchical structures and is often overlooked in the current research. Our main result will be given in a form of a software tool - Trajan, which will be tested on both the real world and simulated data. The second focus of the thesis will come from the problem of sequencing the RNA molecule. It is a combinatorial process of reconstruction of the RNA molecule from short nucleotide sequences which is used to analyze the transcriptome of a biological sample. Many recent studies consider a problem of quantification and classification of unannotated splicing events which often occur due to the mutations caused by abnormal state of the organism, e.g. cancer. We will present another software tool, called fortuna, which brings together high accuracy and fast running times to the analysis of the alternative splicing events unlike any of the well established competitor tools.
D. Matijević, D. Ševerdija, S. Jelić, L. Borozan, Uparena optimizacijska metoda, Math.e : hrvatski matematički elektronski časopis 30/ (2016)

Abstract
U ovom članku analiziramo metode gradijentnog i zrcalnog spusta u području konveksne optimizacije s danim naglaskom na njihove brzine konvergencije. Nadalje, uparujući dvije spomenute metode dobivamo takozvanu uparenu metodu čija analiza konvergencije pokazuje ubrzanje u odnosu na gradijentnu i zrcalnu metodu, te bilo koju drugu nama poznatu metodu prvoga reda.
D. Matijević, S. Poljak, Fourierov red i Fourierova transformacija, Math.e : hrvatski matematički elektronski časopis 19/ (2011)

Abstract
U ovom radu objasniti ćemo kako svaku periodičku funkciju možemo napisati kao sumu (ne nužno konačnu) sinusa različitih amplituda, faza i frekvencija – Fourierov red. U drugom dijelu rada motivirati ćemo formulu za Fourierovu transformaciju preko Fourierova reda te objasniti formulu za diskretnu Fourierovu transformaciju.
D. Matijević, D. Ševerdija, Problem vidljivosti, Osječki matematički list (2010), prihvaćen za objavljivanje

Abstract
Za dvije točke kažemo da vide jedna drugu ukoliko ne postoji prepreka koja bi presjecala segment koji ih spaja. Na temelju geometrijskog modela predstaviti ćemo klasične probleme vidljivosti kao što su problem galerije, problem utvrde i problem čuvanja terena. Iznosimo osnovne rezultate vezane uz spomenute probleme i neke od varijacija tih problema.
S. Jelić, D. Matijević, Stage Illumination Problem (2009)

Abstract
Consider the following illumination problem: given a stage represented by a horizontal line segment and a set of light sources represented by a set of points in the plane above, assign powers to the light sources such that every point on the stage receives a sufficient amount (say one unit) of light while minimizing the overall power consumption. Under the assumption that the amount of light arriving from a fixed light source decreases rapidly with the distance from the light source, this becomes an interesting optimization problem. Two approximation algorithms based on linear programming are used in this Demonstration.

Books

D. Matijević, N. Truhar, Uvod u računarstvo, Odjel za matematiku, Sveučilište Josipa Jurja Strossmayera u Osijeku, Osijek, 2012.

Technical Reports

A. Descoeudres, T. Prusina, N. Schmidt, V. Hoan Do, S. Mages, J. Klughammer, D. Matijević, S. Canzar, An explanatory benchmark of spatial domain detection reveals key drivers of method performance (2026)

Abstract
The spatial organization of cells within tissues is critical for understanding biological function and disease, and spatial transcriptomics enables genome-wide mapping of this organization. Numerous computational methods aim to identify spatial domains, yet their performance is often evaluated on limited datasets, leading to conflicting conclusions. Here, we present an explanatory benchmark of 26 spatial domain detection methods across 63 tissue sections from six spatial transcriptomics technologies, supplemented by over 1,000 semi-synthetic datasets that systematically vary resolution, gene panel size, and tissue architecture. By jointly analyzing real and semi-synthetic data across this broad parameter space, our benchmark uncovers systematic performance differences and sources of variability that are obscured in standard evaluations. Although most spatial methods outperform non-spatial baselines, their performance depends strongly on data resolution and cellular heterogeneity. To enable systematic analysis beyond individual methods, we introduce a modular, plug-and-play benchmarking framework that facilitates method refinement and component exchange. Using this framework, an ablation study of neural network-based approaches shows that the choice of preprocessing and clustering often has a larger impact on performance than ar-chitectural novelty alone. Together, these results provide a principled foundation for informed method selection and offer guidance for the development of robust and scalable spatial domain detection tools as spatial transcriptomics technologies continue to advance.
L. Borozan, D. Matijević, ARDENA: SCENARIO-DRIVEN CONTROL OF REAL-TIME LLM AGENTS (2026)

Abstract
Abstract. Large language models (LLMs) have enabled increasingly capable conversational agents, but reliably controlling their behavior in real-time interactive environments remains a significant challenge. Existing approaches often rely on model fine-tuning or alignment procedures that are difficult to adapt to changing interaction requirements. This paper introduces layered scenario-driven LLM control, a framework that enables runtime behavior control through structured prompting. By combining persistent context with scenario-specific constraints, the approach allows agent behavior to be modified during interaction without changing the underlying model. The framework is implemented in ARDena, a real-time multimodal embodied agent that integrates speech interaction, visual perception, tool use, and avatar-based response generation. The proposed approach is evaluated with respect to control effectiveness, response latency, and operational stability. The results demonstrate that scenario definitions alone can produce substantially different interaction behaviors while maintaining stable real-time operation, highlighting the effectiveness of scenario-driven prompting for controlling LLM agents.

Projects

Advanced Algorithms and Optimization Models Supported by Mathematical Theory (OptimaAI)

(1 October 2025 – 30 September 2029); AI algorithms and optimization models in bioinformatics, natural language processing, visual tracking, and industrial optimization.
GPU based implementation for computing the solution to the Quasitriangular Matrix Equation

Project run in 2010/11 in collaboration with Prof. Ninoslav Truhar, Petar Taler and Zoran Tomljanović , Dept. of Mathematics, University of Osijek
Project was supported by NVIDIA Corporation through the Academic Partnership Program.
Fast and Efficient Kinetic Spanners

Project leader (on Croatian side): Domagoj Matijevic, Dept. of Mathematics, University of Osijek
Project leader (on German side) : Soeren Laue, Lehrstuhl fuer Theoretische Informatik II, University of Jena
Project was funded in 2010 by the German Academic Exchange Service and Croatian Ministry of Science, Education and Sports

Teaching

Konzultacije (Office Hours): Utorak (Tue) 10:00-11:00. Konzultacije su moguće i po dogovoru.

Teme za diplomske radove (Master thesis topics):

Teme se dogovaraju izravno na konzultacijama s nastavnikom.

Present and past courses

I am currently teaching:

I also taught:

Semantika programskih jezika (Semantics of programming languages – \lambda-calculus)
Složenost algoritama (Algorithms Complexity)
Uvod u računarstvo (Introduction to Computer Science)
Uvod u programiranje (Introduction to programming)
Programiranje i programsko inženjerstvo (Programming and software engineering)
Ugrađeni sustavi (Embedded Systems)
Klijentsko web programiranje (Client-side web programming)
Linearna optimizacija (Linear Optimization)
Izračunljiva geometrija (Computational Geometry)
Algoritmi i strukture podataka (Algorithms and Data Structures)
Operacijska istraživanja (Operational Research)
Računalne mreže i usluge (Data networks and services)
Modeliranje i dizajniranje baza podata (Modelling and designing of data-bases)

Sveučilišni prijediplomski studiji

Sveučilišni diplomski studiji

Ostalo

Diseminacija

Projekti

Časopisi

Podrška

Studentske sekcije