{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# The Julia programming language\n", "\n", "### A look at arrays and Trilinos integration\n", "\n", "
\n", "### FOSDEM 2018\n", "
\n", "Bart Janssens" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Outline\n", "* A bit about Julia\n", "\n", "* Then something about integration\n", "\n", "\n", " \n", " \n", "\n", "
\n", "* Notebook available at https://github.com/barche/fosdem2018\n", "* Run on https://juliabox.com\n", " \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# About Julia\n", "## Why another language?\n", "* Solve the \"two language problem\":\n", " - Prototype in a simple language\n", " - Write production code in a fast language" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## So what is it?\n", "* High-level programming language for scientific computing\n", "* Dynamic language\n", "* Strongly typed, with user-defined types and generics\n", "* JIT-compiled using LLVM\n", "* Native interface to C\n", "* Central concept: **multiple dispatch**" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## A simple function" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "hideOutput": true, "hidePrompt": true }, "outputs": [ { "data": { "text/plain": [ "add (generic function with 1 method)" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function add(a,b)\n", " return a + b\n", "end" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Shorter version:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "add (generic function with 1 method)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "add(a,b) = a+b" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "add(1,2)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "3.0" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "add(1.0, 2.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Where are the types?" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Int64" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "typeof(1)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Int64" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "typeof(add(1,2))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "slideshow": { "slide_type": "fragment" } }, "outputs": [ { "data": { "text/plain": [ "Float64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "typeof(add(1.0,2))" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## A look at the assembly\n", "Each combination of types for the arguments compiles a different version of the code:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\t.text\n", "Filename: In[2]\n", "\tpushq\t%rbp\n", "\tmovq\t%rsp, %rbp\n", "Source line: 1\n", "\tleaq\t(%rdi,%rsi), %rax\n", "\tpopq\t%rbp\n", "\tretq\n", "\tnopw\t(%rax,%rax)\n" ] } ], "source": [ "@code_native add(1,2)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\t.text\n", "Filename: In[2]\n", "\tpushq\t%rbp\n", "\tmovq\t%rsp, %rbp\n", "Source line: 1\n", "\tvaddsd\t%xmm1, %xmm0, %xmm0\n", "\tpopq\t%rbp\n", "\tretq\n", "\tnopw\t(%rax,%rax)\n" ] } ], "source": [ "@code_native add(1.0,2.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Equivalence with C++\n", "The Julia function `add(a,b) = a+b` is equivalent to the following C++ function:\n", "\n", "```c++\n", "template\n", "auto add(A a, B b) -> decltype(a + b)\n", "{\n", " return a + b;\n", "}\n", "```\n", "\n", "* Template function valid for any type `A` and `B`\n", "* C++ compiles a new version for every combination of types\n", "* Automatic (static) computation of the return type (`decltype` annotation)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Multiple dispatch\n", "* In Julia, not writing types means any type can be used\n", "* It still tracks the types and computes the correct return type\n", "* Each combination of argument types yields a newly compiled version of the function\n", "* Type computation can happen dynamically (at runtime) or statically (during JIT compilation)\n", "* The static type computation is equivalent to the C++ version\n", "* Deciding which function to call is **multiple dispatch** (static or dynamic)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Types" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "# Define a type\n", "struct MyNumber\n", " n::Int\n", "end" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MyNumber(2)" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Create an instance\n", "const mynum = MyNumber(2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## What about our `add` function?" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "ename": "LoadError", "evalue": "\u001b[91mMethodError: no method matching +(::MyNumber, ::Int64)\u001b[0m\nClosest candidates are:\n +(::Any, ::Any, \u001b[91m::Any\u001b[39m, \u001b[91m::Any...\u001b[39m) at operators.jl:424\n +(\u001b[91m::Complex{Bool}\u001b[39m, ::Real) at complex.jl:247\n +(\u001b[91m::Char\u001b[39m, ::Integer) at char.jl:40\n ...\u001b[39m", "output_type": "error", "traceback": [ "\u001b[91mMethodError: no method matching +(::MyNumber, ::Int64)\u001b[0m\nClosest candidates are:\n +(::Any, ::Any, \u001b[91m::Any\u001b[39m, \u001b[91m::Any...\u001b[39m) at operators.jl:424\n +(\u001b[91m::Complex{Bool}\u001b[39m, ::Real) at complex.jl:247\n +(\u001b[91m::Char\u001b[39m, ::Integer) at char.jl:40\n ...\u001b[39m", "", "Stacktrace:", " [1] \u001b[1madd\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::MyNumber, ::Int64\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m./In[2]:1\u001b[22m\u001b[22m", " [2] \u001b[1minclude_string\u001b[22m\u001b[22m\u001b[1m(\u001b[22m\u001b[22m::String, ::String\u001b[1m)\u001b[22m\u001b[22m at \u001b[1m./loading.jl:522\u001b[22m\u001b[22m" ] } ], "source": [ "add(mynum, 2)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "add (generic function with 2 methods)" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "add(a::MyNumber, b) = MyNumber(a.n + b)" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "data": { "text/html": [ "2 methods for generic function add:" ], "text/plain": [ "# 2 methods for generic function \"add\":\n", "add(a::MyNumber, b) in Main at In[13]:1\n", "add(a, b) in Main at In[2]:1" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "methods(add)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "MyNumber(4)" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "add(mynum, 2)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Interoperability\n", "* Built-in: C\n", "* Through packages:\n", " - C++: [Cxx.jl](https://github.com/Keno/Cxx.jl) and [CxxWrap.jl](https://github.com/JuliaInterop/CxxWrap.jl)\n", " - Python: [PyCall.jl](https://github.com/JuliaPy/PyCall.jl)\n", " - R, Matlab, ..." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Calling C functions\n", "Using `ccall`:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1.0" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ccall(:fabs, Float64, (Float64,),-1)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "The overhead is low:" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "using BenchmarkTools" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 2.832 ns (0 allocations: 0 bytes)\n" ] }, { "data": { "text/plain": [ "1.0" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@btime ccall(:fabs, Float64, (Float64,),-1.0)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1.808 ns (0 allocations: 0 bytes)\n" ] }, { "data": { "text/plain": [ "1.0" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@btime abs(-1.0)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## MPI\n", "* The [MPI.jl](https://github.com/JuliaParallel/MPI.jl) package uses `ccall` to provide access to MPI from within Julia.\n", "* Code is executed using `mpirun julia myscript.jl`\n", "* Next, we will compare between C and Julia using a simple array sum using `MPI_Reduce`" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### C\n", "\n", "```c\n", "#include \n", "#include \n", "#include \n", "#include \n", "\n", "int main(int argc, char** argv)\n", "{\n", " int rank, size;\n", " double* array = NULL;\n", " const int N = 1024*1024*100;\n", " double mysum[1];\n", " double globalsum[1];\n", " struct timeval t1, t2;\n", "\n", " MPI_Init(&argc, &argv);\n", " MPI_Comm_rank(MPI_COMM_WORLD, &rank);\n", " MPI_Comm_size(MPI_COMM_WORLD, &size);\n", "\n", " const int myN = N / size;\n", "\n", " array = (double*)malloc(sizeof(double)*myN);\n", " for(int i = 0; i != myN; ++i)\n", " array[i] = rank+1;\n", "\n", " gettimeofday(&t1, NULL);\n", "\n", " mysum[0] = 0.0;\n", " for(int i = 0; i != myN; ++i)\n", " mysum[0] += array[i];\n", "\n", " MPI_Reduce(mysum, globalsum, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);\n", "\n", " gettimeofday(&t2, NULL);\n", " double time_spent = (double) (t2.tv_usec - t1.tv_usec) / 1000000 + (double) (t2.tv_sec - t1.tv_sec);\n", "\n", " double expected_result = 0;\n", " for(int i = 0; i != size; ++i)\n", " expected_result += (i+1)*myN;\n", "\n", " if(rank == 0) {\n", " if(expected_result != globalsum[0])\n", " printf(\"Wrong result, expected %f, got %f\\n\", expected_result, globalsum[0]);\n", " printf(\"C Sum time: %f s\\n\", time_spent);\n", " }\n", "\n", " free(array);\n", " MPI_Finalize();\n", " return 0;\n", "}\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Julia" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 0.124784 seconds (3.26 k allocations: 186.788 KiB)\n", " 0.101493 seconds (13 allocations: 704 bytes)\n", " 0.097252 seconds (13 allocations: 704 bytes)\n" ] } ], "source": [ "using MPI\n", "using Base.Test\n", "\n", "const N = 1024*1024*100\n", "\n", "MPI.Init()\n", "rank = MPI.Comm_rank(MPI.COMM_WORLD)\n", "size = MPI.Comm_size(MPI.COMM_WORLD)\n", "\n", "const myN = N รท size # Integer division\n", "const array = fill(rank+1.0,myN)\n", "\n", "function mpi_sum(a)\n", " mysum = 0.0\n", " for x in a\n", " mysum += x\n", " end\n", " return MPI.Reduce(mysum, +, 0, MPI.COMM_WORLD)\n", "end\n", "\n", "expected = sum(myN*(1:size))\n", "for i in 1:3\n", " rank == 0 ? (@time @test expected == mpi_sum(array)) : mpi_sum(array)\n", "end\n", "\n", "MPI.Finalize()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Timings on 4 cores\n", "\n", "#### C\n", "0.022 s - 0.065 s\n", "\n", "#### Julia\n", "0.024 s - 0.027 s\n", "\n", "Both are equivalent, but the Julia code is a lot simpler\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## CxxWrap.jl\n", "* Uses `ccall` to access pointers to C++ functions using C\n", "* Interfacing code is written in C++\n", "* Julia functions generated automatically\n", "* Approach similar to Boost.Python" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "#### Some timings\n", "Loop over 50 M elements doing `double` multiplication:\n", "```\n", "Julia:\n", " 0.066912 seconds (4 allocations: 160 bytes)\n", "ccall in Julia loop:\n", " 0.104013 seconds (4 allocations: 160 bytes)\n", "CxxWrap.jl call in Julia loop:\n", " 0.106241 seconds (4 allocations: 160 bytes)\n", "C++, loop in the C++ code:\n", " 0.064130 seconds (4 allocations: 160 bytes)\n", "Julia cfunction callback in C++ loop\n", " 0.277433 seconds (4 allocations: 160 bytes)\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## [Trilinos.jl](https://github.com/barche/Trilinos.jl)\n", "\n", "* Interface to some parts of the [Trilinos](https://trilinos.org/) C++ library\n", "* Focus on the Tpetra matrix library and iterative solvers\n", "* Uses CxxWrap.jl\n", "* Example: 2D Laplace in C++ and Julia" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Laplace example:\n", "Solve the equation $\\nabla^2 \\varphi + f = 0 $ with $f = 2h^2((1-x^2)+(1-y^2))$ and 0 on the boundary:\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### C++ code\n", "```c++\n", "template\n", "void fill_laplace2d(MatrixT& A, const CartesianGrid& g)\n", "{\n", " Teuchos::TimeMonitor local_timer(*fill_time);\n", " const auto& rowmap = *A.getRowMap();\n", " const local_t n_my_elms = rowmap.getNodeNumElements();\n", "\n", " // storage for the per-row values\n", " global_t row_indices[5] = {0,0,0,0,0};\n", " scalar_t row_values[5] = {4.0,-1.0,-1.0,-1.0,-1.0};\n", "\n", " for(local_t i = 0; i != n_my_elms; ++i)\n", " {\n", " const global_t global_row = rowmap.getGlobalElement(i);\n", " const local_t row_n_elems = laplace2d_indices(row_indices, global_row, g);\n", " row_values[0] = 4.0 - (5-row_n_elems);\n", " A.replaceGlobalValues(global_row, Teuchos::ArrayView(row_indices,row_n_elems), Teuchos::ArrayView(row_values,row_n_elems));\n", " }\n", "}\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Julia code\n", "```julia\n", "function fill_laplace2d!(A, g::CartesianGrid)\n", " rowmap = Tpetra.getRowMap(A)\n", " n_my_elms = Tpetra.getNodeNumElements(rowmap)\n", "\n", " # storage for the per-row values\n", " row_indices = [0,0,0,0,0]\n", " row_values = [4.0,-1.0,-1.0,-1.0,-1.0]\n", "\n", " for i in 0:n_my_elms-1\n", " global_row = Tpetra.getGlobalElement(rowmap,i)\n", " row_n_elems = laplace2d_indices!(row_indices, global_row, g)\n", " row_values[1] = 4.0 - (5-row_n_elems)\n", " Tpetra.replaceGlobalValues(A, global_row, Teuchos.ArrayView(row_indices,row_n_elems), Teuchos.ArrayView(row_values,row_n_elems))\n", " end\n", "end\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Timing comparison for 1000 x 1000 matrix (in ms)\n", "| |C++ |Julia|\n", "|------------------|-----|-----|\n", "|Graph construction|190.7|115.2|\n", "|Source term |41.17|32.01|\n", "|Matrix filling |137.1|102.5|\n", "|Dirichlet setup |0.899|0.761|\n", "|Check time |41.06|29.42|" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Conclusions\n", "\n", "* Julia is a fast high-level language\n", "* It offers excellent interoperability to reuse existing libraries\n", "* It delivers on the promise of matching C/C++ performance" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## There is much more...\n", "* Generic types and functions\n", "* Julia parallel programming\n", "* Metaprogramming: Julia expression trees\n", "* Macros\n", "* Generated functions (for e.g. loop-unrolling)\n", "* [CUDAnative.jl](https://github.com/JuliaGPU/CUDAnative.jl): Write CUDA kernels in Julia\n", "* Many more excellent packages" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Detailed examples" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Some dispatching benchmarks\n" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [], "source": [ "using BenchmarkTools" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 1.554 ns (0 allocations: 0 bytes)\n" ] }, { "data": { "text/plain": [ "3" ] }, "execution_count": 22, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@btime add(1,2)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sum_arr1 (generic function with 1 method)" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Test function with loop\n", "function sum_arr1(array)\n", " result = 0\n", " for x in array\n", " result += x\n", " end\n", " return result\n", "end" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### The test arrays:" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "data": { "text/plain": [ "3-element Array{Int64,1}:\n", " 1\n", " 2\n", " 3" ] }, "execution_count": 24, "metadata": {}, "output_type": "execute_result" } ], "source": [ "const arr1 = [1,2,3]" ] }, { "cell_type": "code", "execution_count": 25, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Float64,1}:\n", " 1.0\n", " 2.0\n", " 3.0" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "const arr2 = [1.0, 2.0, 3.0]" ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3-element Array{Any,1}:\n", " 1 \n", " 2.0\n", " 3 " ] }, "execution_count": 26, "metadata": {}, "output_type": "execute_result" } ], "source": [ "const arr3 = Any[1, 2.0, 3]" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### The results:" ] }, { "cell_type": "code", "execution_count": 27, "metadata": { "slideshow": { "slide_type": "-" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 4.621 ns (0 allocations: 0 bytes)\n" ] }, { "data": { "text/plain": [ "6" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@btime sum_arr1(arr1)" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 67.244 ns (8 allocations: 128 bytes)\n" ] }, { "data": { "text/plain": [ "6.0" ] }, "execution_count": 28, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@btime sum_arr1(arr2)" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 80.644 ns (2 allocations: 32 bytes)\n" ] }, { "data": { "text/plain": [ "6.0" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@btime sum_arr1(arr3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Fixing the problem" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "sum_arr2 (generic function with 1 method)" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "function sum_arr2(array)\n", " result = zero(eltype(array)) # 0 here was only valid for Int\n", " for x in array\n", " result += x\n", " end\n", " return result\n", "end" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 4.114 ns (0 allocations: 0 bytes)\n" ] }, { "data": { "text/plain": [ "6" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@btime sum_arr2(arr1)" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 4.621 ns (0 allocations: 0 bytes)\n" ] }, { "data": { "text/plain": [ "6.0" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "@btime sum_arr2(arr2)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [], "source": [ "# Won't work:\n", "#@btime sum_arr2(arr3)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Implementing a Julia array over a Kokkos view from Trilinos\n", "\n", "The Julia type:\n", "```julia\n", "\"\"\"\n", "Expose a view using its raw data pointer\n", "\"\"\"\n", "struct PtrView{ScalarT,N,LayoutT} <: AbstractArray{ScalarT,N}\n", " ptr::Ptr{ScalarT}\n", " size::NTuple{N,Int}\n", " stored_view # prevent GC of the actual view\n", "end\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Implementing a Julia array over a Kokkos view from Trilinos\n", "\n", "The functions to implement:\n", "\n", "```julia\n", "# Zero-based indexing\n", "using CustomUnitRanges: filename_for_zerorange\n", "include(filename_for_zerorange)\n", "\n", "# Indexing directives\n", "Base.IndexStyle{ScalarT,N,LayoutT}(::Type{PtrView{ScalarT,N,LayoutT}}) = IndexLinear()\n", "Base.indices{ScalarT,N}(v::PtrView{ScalarT,N,LayoutLeft}) = map(ZeroRange, v.size)\n", "Base.indices{ScalarT,N}(v::PtrView{ScalarT,N,LayoutLeft},d) = ZeroRange(v.size[d])\n", "\n", "# Operator []\n", "Base.getindex{ScalarT}(v::PtrView{ScalarT,1,LayoutLeft}, i::Integer) = unsafe_load(v.ptr,i+1)\n", "Base.setindex!{ScalarT}(v::PtrView{ScalarT,1,LayoutLeft}, value, i::Integer) = unsafe_store!(v.ptr,value,i+1)\n", "Base.getindex{ScalarT,N}(v::PtrView{ScalarT,N,LayoutLeft}, i::Integer) = unsafe_load(v.ptr,i)\n", "Base.setindex!{ScalarT,N}(v::PtrView{ScalarT,N,LayoutLeft}, value, i::Integer) = unsafe_store!(v.ptr,value,i)\n", "\n", "# Construction of array with the same shape\n", "Base.similar{ScalarT,N,LayoutT}(v::PtrView{ScalarT,N,LayoutT}) = Array{ScalarT,N}(length.(indices(v)))\n", "```" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Julia 0.6.0", "language": "julia", "name": "julia-0.6" }, "language_info": { "file_extension": ".jl", "mimetype": "application/julia", "name": "julia", "version": "0.6.0" } }, "nbformat": 4, "nbformat_minor": 2 }