Cosmin E. Oancea

Associate Professor
HIPERFIT Centre, DIKU , Office 01-0-017
University of Copenhagen
Nørre Campus,
Universitetsparken 5,
DK-2100 Copenhagen Ø

Phone:   +45 23 82 80 86
CV here

Research Interests

I have research interests in a variety of topics from computer system field, including programming language design and implementation, optimizing compilers for highly-parallel systems, high-performance implementation of AI algorithms, parallel algorithms, memory management, computer algebra.

2011-Present @DIKU, University of Copenhagen

Co-architected the core data-parallel language of Futhark and its optimizing compiler. Futhark is a purely functional, map-reduce language supporting (parallel) bulk operation on regular arrays. Futhark currently targets efficient execution on GPGPUs and seeks a common ground that combines the strengths of the functional and imperative approaches. Futhark has been used in inter-disciplinary collaboration, e.g., in remote sensing, finance and image processing, and has generated a nontrivial number of MSc or BSc theses.

2009-2011 collaborating with Lawrence Rauchwerger at Parasol Lab, Texas A&M University

2007-2009 collaborating with Alan Mycroft at the Computer Laboratory, University of Cambridge

2000-2006 PhD studies, Advisor Stephen M. Watt, The University of Western Ontario

Selected PC Work

PPC: SC’22, ICPP (’18,’19, ’22), PPoPP (’17, ’19, ’21, ’23), EuroPar’20, PACT (’15, ’16, ’19, ’23), IPDPS (’18 Track Co-Chair, ’25), FHPC’17 (Co-Chair), ICS (’14, ’23). Local organizer of PLDI'24 in Copenhagen (with Fritz Henglein).

Academic Awards and Honors (Selected)

DIKU Teacher of the Year Award (2013 and 2015)

Selected Refereed Papers

  1. Cosmin E. Oancea and Stephen M. Watt. "GPU Implementations for Midsize Integer Addition and Multiplication". Accepted for publication in Springer LNCS; currently available on arXiv

  2. Lotte M. Bruun, Ulrik S. Larsen, Nikolaj H. Hinnerskov and Cosmin E. Oancea. "Reverse-Mode AD of Multi-Reduce and Scan in Futhark", In Procs. of Symposium on Implementation and Application of Functional Languages (IFL'23), 2024. publication rights licensed to ACM, doi PDF (author copy)

  3. D. Serykh, S. Oehmcke, C. Oancea, D. Masiliunas, Jan Verbesselt, Y. Cheng, S. Horion, F. Gieseke and N. Hinnerskov. "Seasonal-Trend Time Series Decomposition on Graphics Processing Units", In Procs. of Procs of BigData, 2023.

  4. P. Munksgaard, C. E. Oancea and T. Henriksen. "Compiling a functional array language with non-semantic memory information", In Procs. of Symposium on Implementation and Application of Functional Languages (IFL’22), 2023. PDF

  5. R.Schenck, O. Rønning, T. Henriksen and C. E. Oancea. "AD for an Array Language with Nested Parallelism", In Procs of Int. Conf. for High Performance Computing, Networking, Storage and Analysis (SC), 2022. PDF

  6. P. Munksgaard, T. Henriksen, P. Sadayappan and C. E. Oancea. "Memory Optimizations in an Array Language", In Procs of Int. Conf. for High Performance Computing, Networking, Storage and Analysis (SC), 2022. PDF

  7. P. Munksgaard, S. Breddam, T. Henriksen, F. Gieseke and C. E. Oancea. "Dataset Sensitive Autotuning of Multi-Versioned Code based on Monotonic Properties", best paper award, 22nd Int. Symposium on Trends in Functional Programming (TFP), 2021. PDF

  8. W. Pawlak, M. Hlava, M. Metaksov and C. E. Oancea. "Acceleration of Lattice Models for Pricing Portfolios of Fixed-Income Derivatives", In Procs. of Int. Workshop on Libraries, Languages and Compilers for Programming (ARRAY), 2021. PDF

  9. C. E. Oancea, T. Robroek and F. Gieseke. "Approximate Nearest-Neighbour Fields via Massively-Parallel Propagation-Assisted KD Trees", IEEE Int. Conf. on Big Data, special track of Machine Learning and Big Data (MLDB), 2020. PDF

  10. T. Henriksen, S. Hellfritzsch, P. Sadayappan and C. E. Oancea. "Compiling Generalized Histograms for GPU", International Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2020. PDF

  11. F. Gieseke, S. Rosca, T. Henriksen, J. Verbesselt and C. E. Oancea. "Massively-Parallel Change Detection for Satellite Time Series Data with Missing Values", IEEE 36th International Conference on Data Engineering (ICDE), pages 385-396, 2020 PDF

  12. W. Pawlak, M. Elsman and C. E. Oancea. "A Functional Approach to Accelerating Monte Carlo based American Option Pricing", 31st International Symposium on Implementation and Application of Functional Languages (IFL'19). Singapore. September, 2019. PDF

  13. T. Henriksen, F. Thorøe, M. Elsman and C. E. Oancea. "Incremental Flattening for Nested Data Parallelism", International Symposium on Principles and Practice of Parallel Programming (PPoPP), pp 53–67, Washington D.C., US, 2019. PDF

  14. M. Elsman, T. Henriksen, D. Annenkov and C. E. Oancea, "Static Interpretation of Higher-order Modules in Futhark: Functional GPU Programming in the Large", Proc. ACM Program. Lang. (ICFP’18), pp 97:1–97:30, St. Louis, US, 2018. PDF

  15. T. Henriksen, M. Elsman and C. E. Oancea. "Modular Acceleration: Tricky Cases of Functional High-Performance Computing", Procs. of Workshop on Functional High-Performance Computing (FHPC), St. Louis, US, 2018. PDF

  16. T. Henriksen, N. G. W. Serup, M. Elsman, F. Henglein and C. E. Oancea. "Futhark: Purely Functional GPU-programming with Nested Parallelism and In-place Array Updates", Int. Conf. Programming Languages Design and Implementation (PLDI), Barcelona, Spain, 2017. PDF

  17. F. Gieseke, C. E. Oancea and C. Igel. "Bufferkdtree: A Python library for massive nearest neighbor queries on multi-many-core devices", Knowledge-Based Systems Journal, 120:13, 2017.

  18. T. Henriksen, M. Dybdal, H. Urms, A. S. Kiehn, D. Gavin, H. Abelskov, M. Elsman and C. E. Oancea. "APL on GPUs: A TAIL from the Past, Scribbled in Futhark", 5th Int. Workshop on Functional High Performance Computing (FHPC), Nara, Japan, 2016. PDF

  19. T. Henriksen, K. F. Larsen and C. E. Oancea. "Design and GPGPU Performance of Futhark’s Redomap Construct", 3rd Int. Workshop on Libraries, Languages and Compilers for Programming (ARRAY), pp. 17-24, Santa Barbara, US, 2016. PDF

  20. C. Andreetta, V. Begot, J. Berthold, M. Elsman, F. Henglein, T. Henriksen, M. Nordfang and C. E. Oancea. "FinPar: A Parallel Financial Benchmark", ACM Journal Trans. Archit. Code Optim. (TACO), vol. 13(2), pp. 18.1–18.27, 2016.

  21. F. C. Gieseke, C. E. Oance, A. Mahabal, C. Igle and T. Heskes. "Bigger buffer k-d trees on multi-many-core systems", Big Data Deep Learning in High Performance Computing, Springer, pp. 172-180, 2016.

  22. C. E. Oancea and L. Rauchwerger. "Scalable conditional-induction variables (CIV) analysis", 13th IEEE/ACM International Symposium on Code Generation and Optimization (CGO'15), San Francisco, USA, pp. 213-224, 2015. PDF

  23. Fabian C. Gieseke, Justin Heinermann, Cosmin E. Oancea and Christian Igel. "Buffer k-d trees: processing massive nearest neighbor queries on GPUs", 31st Int. Conf. on Machine Learning (ICML'14), Beijing, China, 2014. PDF

  24. Fabian C. Gieseke, Kai L. Polsterer, Cosmin E. Oancea and Christian Igel. "Speedy Greedy Feature Selection: Better Redshift Estimation via Massive Parallelism.", 22nd European Symposium on Artificial Neural Networks (ESANN), pp. 87-92, Belgium, 2014.

  25. Troels Henriksen, Martin Elsman, and Cosmin E. Oancea. "A Hybrid Approach to Size Inference in Futhark", 3rd ACM-SIGPLAN Workshop on Functional High-Performance Computing (FHPC), Guthenburg, Sweden, September 2014. PDF

  26. Troels Henriksen and Cosmin E. Oancea. "Bounds Checking: An Instance of Hybrid Analysis", ACM SIGPLAN Int. Workshop on Libraries, Languages and Compilers for Array Programming (ARRAY). Edinburgh, UK, June 2014. PDF

  27. Troels Henriksen and Cosmin E. Oancea. "A T2 Graph-Reduction Approach To Fusion", 2nd ACM SIGPLAN Workshop on Functional High-Performance Computing. Boston, Massachusetts. September 2013. PDF

  28. Cosmin E. Oancea, Christian Andreetta, Jost Berthold, Alain Frisch, and Fritz Henglein. "Financial software on GPUs: between Haskell and Fortran.", 1st ACM SIGPLAN workshop on Functional high-performance computing (FHPC ‘12). Copenhagen 2012. PDF

  29. Cosmin E. Oancea and Lawrence Rauchwerger. "Logical Inference Techniques for Loop Parallelization", 33rd ACM-SIGPLAN Conf. on Prog. Lang. Design and Implem. (PLDI'12), pp 509-520, June 2012. PDF

  30. Cosmin E. Oancea and Lawrence Rauchwerger. "A Hybrid Approach to Proving Memory Reference Monotonicity", 24th Int. Lang. and Compilers for Parallel Computing (LCPC'11), LNCS, Vol 7146, pp 61-75, Sept 2013. PDF

  31. Cosmin E. Oancea and Stephen M. Watt. "An Architecture for Generic Extensions", Science of Computer Programming Journal, Elsevier, Vol 76(4), pp 258-277, doi:10.1016/j.scico.2009.09.008, 2011.

  32. Cosmin E. Oancea, Alan Mycroft and Tim Harris. "A Lightweight In-Place Implementation for Software Thread-Level Speculation", 21st ACM Symposium on Parallelism in Algorithms and Architectures (SPAA'09), August 2009, Calgary, Canada. PDF

  33. Cosmin E. Oancea, Alan Mycroft and Stephen M. Watt. "A New Approach to Parallelising Tracing Algorithms", International Symposium on Memory Management (ISMM'09), June 2009, Dublin, Ireland. PDF

  34. Cosmin E. Oancea and Alan Mycroft. "Set-Congruence Dynamic Analysis for Software Thread-Level Speculation", Languages and Compilers for Parallel Computing (LCPC'08) 21st Annual Workshop, August 2008, Edmonton, Canada. PDF

  35. Cosmin E. Oancea and Alan Mycroft. "Software Thread-Level Speculation: an Optimistic Library Implementation", International Workshop on Multicore Software Engineering (IWMSE'08), pp 23-32 (ACM Digital Library), May 2008, Leipzig, Germany. PDF

  36. Cosmin E. Oancea and Stephen M. Watt. "Generic Library Extension in a Heterogeneous Environment", Library-Centric Software Design LCSD'06, pp. 25-35, October 2006, Portland, USA. PDF

  37. Cosmin E. Oancea. "Parametric Polymorphism for Software Component Architectures and Related Optimizations", PhD Thesis, Department of Computer Science, The University of Western Ontario, Advisor: Stephen M. Watt, July 2005, London, ON, Canada. PDF

  38. Cosmin E. Oancea and Stephen M. Watt. "Parametric Polymorphism for Software Component Architectures", ACM Object-Oriented Programming, Systems, Languages and Applications OOPSLA'05, pp. 147 - 166, October 2005, San Diego, USA. PDF

  39. Cosmin E. Oancea and Stephen M. Watt. "Domains and Expressions: An Interface Between Two Approaches to Computer Algebra," ACM International Symposium on Symbolic and Algebraic Computation ISSAC'05, pp. 261 - 269, July 2005, Beijing, China. PDF

  40. Cosmin E. Oancea, Jason W. Selby, Mark Giesbrecht, and Stephen M. Watt. "Distributed Models of Thread Level Speculation", International Conference on Parallel and Distributed Processing Techniques and Applications PDPTA'05, pp. 920-927, June 2005, Las Vegas, USA. PDF

  41. Cosmin E. Oancea, Clare So, and Stephen M. Watt. "Generalization in Maple", Maple Conference, pp. 277-382, July 2005, Waterloo, Canada.

  42. Yannis Chicha, Michael Lloyd, Cosmin E. Oancea, and Stephen M. Watt. "Parametric Polymorphism for Computer Algebra Software Components", 6th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing SYNASC'04, pp. 119 - 130, Sept. 2004, Timisoara, Romania. PDF

  43. Cosmin E. Oancea and Stephen M. Watt. "A Framework for Using Aldor Libraries with Maple", EACA'04 satellite conference to the ISSAC'04, pp. 219 - 224, June 2004, Universidad de Santander, Spain. PDF

Selected Supervised Students

  • Troels Henriksen. "Exploiting functional invariants to optimise parallelism: a dataflow approach", MSc Thesis, Department of Computer Science (DIKU), University of Copenhagen, February 2014. PDF

  • Troels Henriksen. "Design and Implementation of the Futhark Programming Language", PhD Thesis, Department of Computer Science (DIKU), University of Copenhagen, December 2017. PDF

  • Niels G. W. Serup. "Memory Block Merging in Futhark", MSc Thesis, Department of Computer Science (DIKU), University of Copenhagen, November 2017. PDF

  • Niels G. W. Serup. "Extending Futhark with a write construct", MSc Project, Department of Computer Science (DIKU), University of Copenhagen, June 2016. PDF

  • Rasmus Wriedt Larsen. "Generating Efficient Code for Futhark's Segmented Redomap", MSc Thesis, Department of Computer Science (DIKU), University of Copenhagen, March 2017. PDF

  • Mette Marie Kowalski. "Designing and Accelerating a Generic FFT Library in Futhark", BSc Thesis, Department of Computer Science (DIKU), University of Copenhagen, June 2018. PDF

  • Mikkel Storgaard Knudsen. "FShark: Futhark programming in FSharp", MSc Thesis, Department of Computer Science (DIKU), University of Copenhagen, August 2018. PDF

  • Marek Hlava and Martin Metaksov. "Accelerated Interest Rate Option Pricing using Trinomial Trees", MSc Thesis, Department of Computer Science (DIKU), University of Copenhagen, August 2018. PDF

  • Many more, see Selected Student Projects on Futhark webpage.

  • Teaching at DIKU:

  • 2014-present: Programming Massively Parallel Hardware (PMPH), 7.5 ECTS, MSc elective course, attended by 41 students in the 2023-2024 edition. "Lecture Notes for the Software Track of the PMPH Course", Cosmin Oancea, 2023

  • 2019-present: Data-Parallel Programming (DPP), 7.5 ECTS, MSc elective course, attended by 20 students in the 2023-2024 edition.

  • 2021-present: Grundlaeggende Datalogi, 15 ECTS, 1st year BSc mandatory course in the KOM-IT department.

  • 2012-2019: Implementation of Programming Languages (IPS), 7.5 ECTS, 2nd year BSc mandatory course.

  • 2009-2011 @Parasol Laboratory, Department of Computer Science and Engineering, Texas A&M University

    In the imperative context, entirely static analysis has been successfully applied to (auto-) parallelizing loops with simple control flow and affine array indexing, but is often conservative when these assumptions do not hold. On the other hand, entirely dynamic analysis of memory-reference traces, e.g., thread-level speculation, can be applied aggressively, but incurs significant (and often non-scalable) overheads.

    In collaboration with Prof. Lawrence Rauchwerger, I have studied how to unify static and dynamic analysis into a hybrid compiler framework for the Fortran77 language, which succeeds in detecting and optimizing parallelism for a large class of difficult loops at negligible (or in the worst case scalable) runtime overheads.

    The idea has been to use static, interprocedural analysis (i) to model loop parallelism as an equation on abstract sets (of array references), and furthermore (ii) to extract a cascade of sufficient conditions for parallelism, of increasing time complexity, that are verified at runtime until one succeeds. An evaluation of ~30 benchmarks from SPEC and Perfect-Club suites (~1000 loops) demonstrates that the approach is viable and outperforms commercial compilers by a significant margin.

    2007-2009 @Computer Laboratory, University of Cambridge

    In collaboration with Prof. Alan Mycroft, I have studied topics related to dynamic extraction and optimization of parallelism. In particular:

    2000-2006 @ the Computer Science Department, The University of Western Ontario, ORCCA Lab

    As a PhD student, advised by Prof. Stephen M. Watt, I have studied various aspects related to language interoperability. The original motivation for my research was the observation that although parametric polymorphism was already mainstream, software component architectures of that time, such as CORBA, JNI, DCOM, were lagging behind the advances in common programming practice and were not supporting parametric polymorphism across languages boundaries.

    In this context, I investigated how to resolve different binding times and parametrization semantics in a range of representative languages, such as C++, Java, Aldor, Maple, and have identified a common ground that could be suitable mapped to different language bindings. This work has resulted into two frameworks: Alma that allows interoperability between two very different computer algebra systems (Aldor and Maple), and GIDL (Generic Interface Definition Language), a more systematic solution for parametric polymorphism that is designed as a generic extension framework, that can be easily adapted to work on top of various component architectures.