Olli Saarikivi / CV

Experience

Microsoft AI, Redmond, USA

   
Member of Technical Staff 22 Jan 2024 - present

Pretraining team focusing on synthetic data.

Microsoft Research, Redmond, USA

   
Senior Researcher 18 Sept 2020 - 21 Jan 2024
Post Doctoral Researcher 2 Apr 2018 - 17 Sept 2020
Research Intern 6 June 2016 - 9 Sept 2016
Research Intern 2 Feb 2015 - 8 May 2015

Joined Sebastien Bubeck’s Physics of AGI team mid 2023 to build the Phi small language models. I worked on synthetic generations and LLM based data filtering, ran ablations, helped organize early coding capabilities efforts, added necessary features to our PyTorch inference stack and worked to ensure our evaluations were uncontaminated.

Built MSCCL, a programmable communication library for GPUs, which delivered significant speedups for both internal and a key partner’s machine learning workloads. I wrote compilers, lead the language design and built algorithm syntheses.

Designed a novel derivatives-based regex matcher and shipped it as the first guaranteed-linear-time engine in .NET 7. Partnered with CredScan to help them achieve a double digit end-to-end throughput improvement.

Partnered with AzureML to ship a novel gradient aggregation technique for massively distributed ML training into Uber’s Horovod training library.

Lead the CHET/EVA compiler projects to make homomorphic encryption accessible to non-experts.

Developed a stream comprehension compiler and adapted the approach to improve query compilation in the SCOPE language.

Aalto University, Espoo, Finland

   
Doctoral Student 1 Sept 2013 – 31 Mar 2018
Research Assistant 1 June 2010 – 31 Aug 2013
Research Assistant 1 June 2009 – 31 Aug 2009

Extended a Java dynamic symbolic execution tool to support multi-threaded programs.

Developed a verification tool for C programs on LLVM and participated in SV-COMP.

Optofidelity Ltd.

   
Trainee 17 June 2008 – 29 June 2008
Trainee 5 June 2007 – 24 Aug 2007

Nokia Research Center

   
Trainee 26 June 2006 – 28 July 2006
Trainee 1 Sept 2004 – 31 May 2006

Education

Aalto University, Department of Computer Science

   
Doctor of Science (Technology) Sept 2013 – Mar 2018
Master of Science (Technology) Sept 2011 – Aug 2013

Helsinki University, Department of Computer Science

   
Bachelor of Science Sept 2007 – Apr 2011

Publications

Marah Abdin, Sam Jacobs, Ammar Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Masahiro Tanaka, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, Ziyi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou. Phi-3 technical report: a highly capable language model locally on your phone. Whitepaper, 2024. arXiv

Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Harkirat Singh Behl, Xin Wang, Sébastien Bubeck, Ronen Eldan, Adam Tauman Kalai, Yin Tat Lee, Yuanzhi Li. Textbooks are all you need. Whitepaper, 2023. arXiv

Margus Veanes, Thomas Ball, Gabriel Ebner, Olli Saarikivi. Symbolic automata: ω-regularity modulo theories. Preprint, 2023. arXiv

Olli Saarikivi, Margus Veanes, Stephen Toub, Daniel Moseley, Jose Perez Rodriguez Finite automaton construction using regular expression derivatives to simulate behavior of a backtracking engine. US Patent 11,983,223, 2024. Google Patents

Abhinav Jangda, Saeed Maleki, Maryam Mehri Dehnavi, Madan Musuvathi, Olli Saarikivi. A framework for fine-grained synchronization of dependent GPU kernels. CGO 2024. DOI arXiv

Zhiqi Lin, Youshan Miao, Guanbin Xu, Cheng Li, Olli Saarikivi, Saeed Maleki, Fan Yang. Tessel: boosting distributed execution of large DNN models via flexible schedule search. HPCA 2024. DOI arXiv

Dan Moseley, Mario Nishio, Jose Perez Rodriguez, Olli Saarikivi, Stephen Toub, Margus Veanes, Tiki Wan, Eric Xu. Derivative based nonbacktracking real-world regex matching with backtracking semantics. PLDI 2023. DOI

Meghan Cowan, Saeed Maleki, Madanlal Musuvathi, Olli Saarikivi, Yifan Xiong. MSCCLang: Microsoft collective communication language. ASPLOS 2023. DOI arXiv

Aashaka Shah, Vijay Chidambaram, Meghan Cowan, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi, Rachee Singh. TACCL: guiding collective algorithm synthesis using communication sketches. NSDI 2023. USENIX arXiv

Abhinav Jangda, Jun Huang, Guodong Liu, Amir Hossein Nodehi Sabet, Saeed Maleki, Youshan Miao, Madanlal Musuvathi, Todd Mytkowicz, Olli Sarikivi. Breaking the computation and communication abstraction barrier in distributed machine learning workloads. ASPLOS 2022. DOI arXiv

Zixian Cai, Zhengyang Liu, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz, Jacob Nelson, Olli Saarikivi. Synthesizing optimal collective algorithms. PPoPP 2021. DOI arXiv

Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi. Distributed training of embeddings using graph analytics. IPDPS 2021. DOI arXiv

Sangeeta Chowdhary, Wei Dai, Kim Laine, Olli Saarikivi. EVA improved: compiler and extension library for CKKS. WAHC 2021. DOI

Madanlal Musuvathi, Kim Laine, Kristin Lauter, Hao Chen, Olli Saarikivi, Saeed Maleki, Roshan Dathathri, Todd Mytkowicz. Homomorphic evaluation of tensor programs US Patent 11,177,935, 2021. Google Patents

Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi, Tianju Xu, Vadim Eksarevskiy, Jaliya Ekanayake, Emad Barsoum. Scaling distributed training with adaptive summation. MLSys 2021. PDF arXiv

Lenka Turoňová, Lukáš Holík, Ondřej Lengál, Olli Saarikivi, Margus Veanes, Tomáš Vojnar. Regex matching with counting-set automata. OOPSLA 2020. DOI

Roshan Dathathri, Blagovesta Kostova, Olli Saarikivi, Wei Dai, Kim Laine, Madan Musuvathi. EVA: an encrypted vector arithmetic language and compiler for efficient homomorphic computation. PLDI 2020. DOI

Lukáš Holík, Ondřej Lengál, Olli Saarikivi, Lenka Turoňová, Margus Veanes, Tomáš Vojnar. Succinct determinisation of counting automata via sphere construction. APLAS 2019. DOI arXiv

Roshan Dathathri, Olli Saarikivi, Hao Chen, Kim Laine, Kristin Lauter, Saeed Maleki, Madanlal Musuvathi, Todd Mytkowicz. CHET: an optimizing compiler for fully-homomorphic neural-network inferencing. PLDI 2019. DOI arXiv

Olli Saarikivi, Margus Veanes, Tiki Wan, Eric Xu. Symbolic regex matcher. TACAS 2019. DOI

Olli Saarikivi, Margus Veanes. Minimization of symbolic transducers. CAV 2017. DOI

Olli Saarikivi, Margus Veanes, Todd Mytkowicz, Madan Musuvathi. Fusing effectful comprehensions. PLDI 2017. DOI

Olli Saarikivi, Margus Veanes. Translating C# to branching symbolic transducers. LPAR 2017 Short Presentations. PDF

Olli Saarikivi, Hernán Ponce de León, Kari Kähkönen, Keijo Heljanko, Javier Esparza. Minimizing test suites with unfoldings of multithreaded programs. TECS 16(2), 2017. DOI

Olli Saarikivi, Keijo Heljanko. LCTD: Tests-guided proofs for C programs on LLVM (competition contribution). TACAS 2016. DOI

Olli Saarikivi, Keijo Heljanko. LCTD: Test-guided proofs for C programs on LLVM. JLAMP 85(6), 2016. DOI

Kari Kähkönen, Olli Saarikivi, Keijo Heljanko. Unfolding based automated testing of multithreaded programs. ASE 22(4), 2015. DOI

Hernán Ponce de León, Olli Saarikivi, Kari Kähkönen, Keijo Heljanko, Javier Esparza. Unfolding based minimal test suites for testing multithreaded programs. ACSD 2015. DOI

Olli Saarikivi, Keijo Heljanko. Reporting races in dynamic partial order reduction. NFM 2015. DOI

Olli Saarikivi. Test-Guided proofs for C programs on LLVM. Master’s Thesis, Aalto University, Finland, 2013. PDF

Kari Kähkönen, Olli Saarikivi, Keijo Heljanko. LCT: A parallel distributed testing tool for multithreaded Java programs. PDMC 2012. DOI

Kari Kähkönen, Olli Saarikivi, Keijo Heljanko. Using unfoldings in automated testing of multithreaded programs. ASE 2012. DOI

Olli Saarikivi, Kari Kähkönen, Keijo Heljanko. Improving dynamic partial order reductions for concolic testing. ACSD 2012. DOI

Kari Kähkönen, Tuomas Launiainen, Olli Saarikivi, Janne Kauttio, Keijo Heljanko, Ilkka Niemelä. LCT: An open source concolic testing tool for Java programs. BYTECODE 2011. PDF