Luka Borozan
Teaching Assistant Department of Mathematics Josip Juraj Strossmayer University of Osijek Trg Ljudevita Gaja 6 Osijek, HR31000, Croatia¸

Research Interests
 Computational molecular biology
Discrete and convex optimization  Linear programming
Degrees
 PhD in Mathematics, Faculty of Science  Department of Mathematics, University of Zagreb, Croatia, 2021.
 MSc in Mathematics and Computer Science, Department of Mathematics, University of Osijek, Croatia, 2015.
 BSc in Mathematics, Department of Mathematics, University of Osijek, Croatia, 2013.
Publications
Preprints
 Compressing Sentence Representation with Maximum Coding Rate Reduction
Ševerdija D, Prusina T, Jovanović A, Borozan L, Maltar J, Matijević D.
arXiv. 2023 (accepted for publication to MIPRO2023).
Journal Publications
 L. Borozan, F. Rojas Ringeling, S. Kao, E. Nikonova, P. Monteagudo, D. Matijević, M.L. Spletter, S. Canzar, Counting pseudoalignments to novel splicing events , Bioinformatics 39/7 (2023)Motivation Alternative splicing (AS) of introns from premRNA produces diverse sets of transcripts across cell types and tissues, but is also dysregulated in many diseases. Alignmentfree computational methods have greatly accelerated the quantification of mRNA transcripts from short RNAseq reads, but they inherently rely on a catalog of known transcripts and might miss novel, diseasespecific splicing events. By contrast, alignment of reads to the genome can effectively identify novel exonic segments and introns. Eventbased methods then count how many reads align to predefined features. However, an alignment is more expensive to compute and constitutes a bottleneck in many AS analysis methods. Results Here, we propose fortuna, a method that guesses novel combinations of annotated splice sites to create transcript fragments. It then pseudoaligns reads to fragments using kallisto and efficiently derives counts of the most elementary splicing units from kallisto’s equivalence classes. These counts can be directly used for AS analysis or summarized to larger units as used by other widely applied methods. In experiments on synthetic and real data, fortuna was around 7× faster than traditional align and count approaches, and was able to analyze almost 300 million reads in just 15 min when using four threads. It mapped reads containing mismatches more accurately across novel junctions and found more reads supporting aberrant splicing events in patients with autism spectrum disorder than existing methods. We further used fortuna to identify novel, tissuespecific splicing events in Drosophila. Availability and implementation fortuna source code is available at https://github.com/canzarlab/fortuna.
 Đ. Borozan, L. Borozan, Analyzing totalfactor energy efficiency in Croatian counties: evidence from a nonparametric approach, Central European Journal of Operations Research 26/3 (2018), 673694Using energy efficiently has become top priority concern which requires an adequate policy reaction bearing in mind both energy conservation and efforts to combat adverse climate changes. The paper explored the totalfactor energy efficiency and change trends in technical efficiency in the Croatian counties during the period 2001–2013. Employing data envelopment analysis, the overall technical, pure technical and scale efficiency are assessed. Considering the empirical results, we have concluded the following. Technical inefficiency is generated almost equally by the pure technical effect and an incorrect production scale. The overall geographical distribution of the technical efficiency scores points to the presence of spatial concentration, i.e., a dualistic pattern (centre vs. periphery) in the production process. The differences between the best practice and the worst technical efficiency scores indicate the presence of significant disparities among Croatian counties. The years with deteriorating electricity efficiency seem to coincide with the important economic/energy changes that happened in Croatia. Finally, subnational governments may play an important role in energy efficiency policies.
 D. Marković, L. Borozan, On Parameter Estimation by Nonlinear Least Squares in Some Special TwoParameter Exponential Type Models, Applied Mathematics & Information Sciences 9/6 (2015), 29252931Twoparameter growth models of exponential type f(t;a,b) = g(t)exp(a+bh(t)), where a and b are unknown parameters and g and h are some known functions, are frequently employed in many different areas such as biology, finance, statistic, medicine, ect. The unknown parameters must be estimated from the data (w_i, t_i, y_i), i = 1,...,n, where t_i denote the values of the independent variable, y_i are respective estimates of regression function f and w_i > 0 are some data weights. A very popular and widely used method for parameter estimation is the method of least squares. In practice, to avoid using nonlinear regression, this kind of problems are commonly transformed to linear, which is not statistically justified. In this paper we show that for strictly positive g and strictly monotone h original nonlinear problem has a solution. Generalization in the lp norm (1 ≤ p < ∞) and some illustrative examples are also given.
Refereed Proceedings
 D. Ševerdija, T. Prusina, A. Jovanović, L. Borozan, J. Maltar, D. Matijević, Compressing Sentence Representation with Maximum Coding Rate Reduction (Best paper award in AIS  Artificial Intelligence Systems track), ICT and Electronics Convention (MIPRO), 2023 46th MIPRO, Opatija, Hrvatska, 2023In most natural language inference problems, sentence representation is needed for semantic retrieval tasks. In recent years, pretrained large language models have been quite effective for computing such representations. These models produce highdimensional sentence embeddings. An evident performance gap between large and small models exists in practice. Hence, due to space and time hardware limitations, there is a need to attain comparable results when using the smaller model, which is usually a distilled version of the large language model. In this paper, we assess the model distillation of the sentence representation model SentenceBERT by augmenting the pretrained distilled model with a projection layer additionally learned on the Maximum Coding Rate Reduction (MCR2) objective, a novel approach developed for general purpose manifold clustering. We demonstrate that the new language model with reduced complexity and sentence embedding size can achieve comparable results on semantic retrieval benchmarks.
 B. Borozan, L. Borozan, D. Ševerdija, D. Matijević, S. Canzar, Fortuna Detects Novel Splicing in Drosophila scRNASeq Data, ICT and Electronics Convention (MIPRO), 2023 46th MIPRO, Opatija, Hrvatska, 2023, 410415Recent developments in singlecell RNA sequencing techniques (scRNASeq) have made large quantities of sequenced data available across numerous species and tissues. Alternative splicing (AS) of premRNA introns varies between tissues and even between celltypes and can be altered in disease. The study of novel AS, using standard RNASeq data, has been extensively studied for many years, while similar work on scRNASeq data has been scarce, despite its potential to offer a broader insight into celltype specific processes. In this paper, we propose a novel pipeline that uses fortuna, a method that efficiently classifies and quantifies novel AS events, to process scRNASeq samples. Due to its short lifespan, high number of progeny, low maintenance cost, and intricate alternative splicing patterns similar in complexity to those of mammals, Drosophila Melanogaster (fruit fly) is a species of particular interest to researchers. Therefore, we experimentally evaluate our pipeline on realworld Drosophila singlecell data samples from the Fly Cell Atlas.
 V. Hoan Do, M. Blažević, P. Monteagudo, L. Borozan, K. Elbassioni, S. Laue, F. Rojas Ringeling, D. Matijević, S. Canzar, Dynamic pseudotime warping of complex singlecell trajectories, 23nd Annual International Conference on Research in Computational Molecular Biology, The George Washington University, 2019, 294297Singlecell RNA sequencing enables the construction of trajectories describing the dynamic changes in gene expression underlying biological processes such as cell differentiation and development. The comparison of singlecell trajectories under two distinct conditions can illuminate the differences and similarities between the two and can thus be a powerful tool. Recently developed methods for the comparison of trajectories rely on the concept of dynamic time warping (dtw), which was originally proposed for the comparison of two time series. Consequently, these methods are restricted to simple, linear trajectories. Here, we adopt and theoretically link arboreal matchings to dtw and propose an algorithm to compare complex trajectories that more realistically contain branching points that divert cells into different fates. We implement a suite of exact and heuristic algorithms suitable for the comparison of trajectories of different characteristics in our tool Trajan. Trajan automatically pairs similar biological processes between conditions and aligns them in a globally consistent manner. In an alignment of singlecell trajectories describing human muscle differentiation and myogenic reprogramming, Trajan identifies and aligns the core paths without prior information. From Trajan’s alignment, we are able to reproduce recently reported barriers to reprogramming. In a perturbation experiment, we demonstrate the benefits in terms of robustness and accuracy of our model which compares entire trajectories at once, as opposed to a pairwise application of dtw. Trajan is available at https://github.com/canzarlab/Trajan.
 L. Borozan, D. Matijević, S. Canzar, Properties of the generalized RobinsonFoulds metric, 42nd International Convention  MIPRO 2019, Opatija, 2019, 330335Comparing hierarchical structures is a problem with many applications in various fields of biology. In this work we address the problem of comparing phylogenetic trees and quantifying their dissimilarities. The most commonly applied measure of similarity between phylogenetic trees is the Robinson Foulds (RF) metric. The JaccardRobinsonFoulds (JRF) metric (of order k) has been recently proposed as a generalization of the RF metric that preserves its widely appreciated properties but increases its resolution and robustness. Here, we conduct thorough experimental analysis of the JRF metric and variations thereof on both real world and simulated data. Our main aim is to deepen the understanding of the properties of this generalized RF metric in comparison to the classical RF metric and other matching based distance measures. To compute the JRF distance between trees, we employ the recently proposed branchandcut solver Trajan.
 Đ. Borozan, L. Borozan, The stationarity of per capita electricity consumption in Croatia allowing for structural break(s), 13th International Symposium on Operational Research, Bled, Slovenia, 2015, 337342Understanding the stationarity properties of electricity consumption provides valuable insights for energy policymakers and practitioners. The paper examines the unit root properties of per capita electricity consumption for Croatian counties using the panel unit root tests with structural break(s) during the period 20012013. The results indicate that the series of most counties are nonstationary processes, and that statistically significant structural break(s) happened only in a few of them. Hence, the impacts of shocks on per capita electricity consumption are permanent and have a long memory for a majority of them. Moreover, their behaviors are pathdependent.
Others
 L. Borozan, D. Matijević, S. Canzar, Combinatorial optimization algorithms for (pseudo)alignment in bioinformatics (2021)The field of bioinformatics is a fast growing interdisciplinary field with a strong contribution from mathematics and computer science. This thesis will deal with mathematical problems and algorithmic challenges from that field. Its first focus will be the comparison of hierarchic structures, mainly phylogenetic trees, which is used to explain various biological processes such as the evolution of the species. We will study mathematical models and algorithmic techniques which quantify the distance between such structures as means of determining the similarities or dissimilarities between them. The focus will be given to formulating the problem based on matching in the context of integer linear programming. Our goal will be to find a novel solution which respects the ancestry relations defined by those hierarchical structures and is often overlooked in the current research. Our main result will be given in a form of a software tool  Trajan, which will be tested on both the real world and simulated data. The second focus of the thesis will come from the problem of sequencing the RNA molecule. It is a combinatorial process of reconstruction of the RNA molecule from short nucleotide sequences which is used to analyze the transcriptome of a biological sample. Many recent studies consider a problem of quantification and classification of unannotated splicing events which often occur due to the mutations caused by abnormal state of the organism, e.g. cancer. We will present another software tool, called fortuna, which brings together high accuracy and fast running times to the analysis of the alternative splicing events unlike any of the well established competitor tools.
 D. Matijević, D. Ševerdija, S. Jelić, L. Borozan, Uparena optimizacijska metoda, Math.e : hrvatski matematički elektronski časopis 30 (2016)U ovom članku analiziramo metode gradijentnog i zrcalnog spusta u području konveksne optimizacije s danim naglaskom na njihove brzine konvergencije. Nadalje, uparujući dvije spomenute metode dobivamo takozvanu uparenu metodu čija analiza konvergencije pokazuje ubrzanje u odnosu na gradijentnu i zrcalnu metodu, te bilo koju drugu nama poznatu metodu prvoga reda.
Technical Reports
 T. Prusina, D. Matijević, L. Borozan, J. Maltar, A. Jovanović, Compressing Sentence Representation with maximum Coding Rate Reduction (2023)In most natural language inference problems, sentence representation is needed for semantic retrieval tasks. In recent years, pretrained large language models have been quite effective for computing such representations. These models produce highdimensional sentence embeddings. An evident performance gap between large and small models exists in practice. Hence, due to space and time hardware limitations, there is a need to attain comparable results when using the smaller model, which is usually a distilled version of the large language model. In this paper, we assess the model distillation of the sentence representation model SentenceBERT by augmenting the pretrained distilled model with a projection layer additionally learned on the Maximum Coding Rate Reduction (MCR2)objective, a novel approach developed for generalpurpose manifold clustering. We demonstrate that the new language model with reduced complexity and sentence embedding size can achieve comparable results on semantic retrieval benchmarks.
Projects
 Razvoj interaktivnog virtualnog okruženja
Odjel za matematiku, Sveučilište u Osijeku  Sveučilište u Osijeku), voditelj projekta: Luka Borozan
1.5.2021.  1.5.2025.  Primjena metoda optimizacije u biomedicini, (Odjel za matematiku, Sveučilište u Osijeku  Ministarstvo znanosti i obrazovanja, Program znanstvenotehnološke suradnje između Republike Hrvatske i Republike Srbije), voditelji projekta: Slobodan Jelić, Dušan Jakovetić, 01.01.2019.  01.07. 2022.
 Problem procjene parametara u nekim dvoparametarskim monotonim matematičkim modelima (Odjel za matematiku, Sveučilište u Osijeku  Sveučilište u Osijeku), voditelj projekta: Darija Marković, 25.9.2013.  24.9.2014.
Professional Activities
Teaching
Funkcijsko programiranje
Moderni računalni sustavi
Operativni sustavi
Semantika programskih jezika
Matematička logika u računalnoj znanosti
Prethodne godine:
Linearna Algebra II
Matematika (PFOS)
Integralni račun
Matematika I (GFOS)
Primjene diferencijalnog i integralnog računa II
Uvod u teoriju mjere
Uvod u teoriju integracije
Vektorski prostori
Elementarna matematika
Uvod u strukture podataka i algoritme
Funkcionalno programiranje
Bioinformatika
Konzultacije (Office Hours): Po dogovoru putem emaila.
Personal
Here goes the private stuff.