Content deleted Content added
Citation bot (talk | contribs) m Alter: number. Add: pages, issue, volume, doi. Removed parameters. Formatted dashes. | You can use this bot yourself. Report bugs here. | User-activated. |
Link suggestions feature: 3 links added. |
||
(43 intermediate revisions by 22 users not shown) | |||
Line 1:
{{Short
{{Infobox programming language
| name = Cuneiform
| logo = G18225.png
Line 8 ⟶ 9:
| founder =
| status = Active
| latest release version = 3.0.
| latest release date = {{release date|2018|
| latest preview version =
| latest preview date =
| typing = [[Static typing|static]], simple types
| implementations =
| dialects =
| influenced_by = [
| influenced =
| operating system = [[Linux]], [[
| programming language = [[Erlang (programming language)|Erlang]]
| license = [[Apache License]] 2.0
| website = {{URL|
| file_ext = .cfl
| year = 2013
Line 26 ⟶ 27:
'''Cuneiform''' is an [[open source software|open-source]] [[Scientific workflow system|workflow language]]
for large-scale scientific data analysis.<ref>{{Cite web|url=https://github.com/joergen7/cuneiform|title = Joergen7/Cuneiform|website = [[GitHub]]|date = 14 October 2021}}</ref><ref>{{Cite journal
| last1 = Brandt | first1 = Jörgen
| last2 = Bux | first2 = Marc N.
Line 39 ⟶ 40:
It is a [[Type system#STATIC|statically typed]] [[Functional programming|functional programming language]] promoting [[parallel computing]]. It features a versatile [[foreign function interface]] allowing users to integrate software from many external programming languages. At the organizational level Cuneiform provides facilities like [[Conditional (computer programming)|conditional branching]] and [[Recursion|general recursion]] making it [[Turing completeness|Turing-complete]]. In this, Cuneiform is the attempt to close the gap between scientific workflow systems like [[Apache Taverna|Taverna]], [[KNIME]], or [[Galaxy (computational biology)|Galaxy]] and large-scale data analysis programming models like [[MapReduce]] or [[Pig (programming tool)|Pig Latin]] while offering the generality of a functional programming language.
Cuneiform is implemented in distributed [[Erlang (programming language)|Erlang]]. If run in distributed mode it drives a [[POSIX]]-compliant distributed file system like [[Gluster]] or [[Ceph (software)#CephFS|Ceph]] (or a [[Filesystem in Userspace|FUSE]] integration of some other file system, e.g., [[Apache Hadoop#HDFS|HDFS]]). Alternatively, Cuneiform scripts can be executed on top of [[HTCondor]] or [[Apache Hadoop|Hadoop]].<ref>{{cite web|title=Scalable Multi-Language Data Analysis on Beam: The Cuneiform Experience by Jörgen Brandt|url=http://beta.erlangcentral.org/videos/scalable-multi-language-data-analysis-on-beam-the-cuneiform-experience-by-jorgen-brandt/#.WBLlE2hNzIU|website=Erlang Central|
{{Cite journal
| last1 = Bux | first1 = Marc
Line 82 ⟶ 83:
| url = http://www.di.fc.ul.pt/~bessani/publications/dmah15-bbc.pdf
}}
</ref><ref>{{cite web|title=Scalable Multi-Language Data Analysis on Beam: The Cuneiform Experience|url=http://www.erlang-factory.com/euc2016/jorgen-brandt|website=Erlang-factory.com|
Cuneiform is influenced by the work of Peter Kelly who proposes functional programming as a model for scientific workflow execution.<ref>{{cite journal
Line 90 ⟶ 91:
| year = 2009
| title = Lambda calculus as a workflow model
| journal = Concurrency
| volume = 21
| issue = 16
| pages = 1999–2017
| doi = 10.1002/cpe.1448| s2cid = 10833434
}}</ref><ref> {{cite journal
| title = Workflows and extensions to the Kepler scientific workflow system to support environmental sensor data access and analysis
Line 112 ⟶ 114:
| pages = 42–50
| year = 2010
| doi = 10.1016/j.ecoinf.2009.08.008
}}
</ref>
In this, Cuneiform is distinct from related workflow languages based on [[dataflow programming]] like [[Swift (parallel scripting language)|Swift]]
{{cite journal
| title = Nextflow enables reproducible computational workflows
Line 128 ⟶ 131:
| pages = 316–319
| year = 2017
| doi = 10.1038/nbt.3820 | pmid = 28398311 | s2cid = 9690740 }}
</ref>
==External software integration==
External tools and libraries (e.g., [[R (programming language)|R]] or [[Python (programming language)|Python]] libraries) are integrated via a [[foreign function interface]]. In this it resembles, e.g., [[KNIME]] which allows the use of external software through snippet nodes, or [[Apache Taverna|Taverna]] which offers [[BeanShell]] services for integrating [[Java (programming language)|Java]] software. By defining a task in a foreign language it is possible to use the API of an external tool or library. This way, tools can be integrated directly without the need of writing a wrapper or reimplementing the tool.<ref>{{cite web|title=A Functional Workflow Language Implementation in Erlang|url=http://www.erlang-factory.com/static/upload/media/1448992381831050cuneiformberlinefl2015.pdf|
Currently supported foreign programming languages are:
{{div col}}
* [[Bash (Unix shell)|Bash]]
* [[Elixir (programming language)|Elixir]]
* [[Erlang (programming language)|Erlang]]
* [[Java (programming language)|Java]]
* [[JavaScript]]
* [[MATLAB]]
* [[GNU Octave]]
Line 145 ⟶ 151:
* [[R (programming language)|R]]
* [[Racket (programming language)|Racket]]
{{div col end}}
Foreign language support for [[AWK]] and [[gnuplot]] are planned additions.
==Type
Cuneiform provides a simple, statically checked type system.<ref>
Line 154 ⟶ 162:
| last2 = Reisig | first2 = Wolfgang
| last3 = Leser | first3 = Ulf
| journal = [[Journal of Functional Programming]]
| volume = 27
| year = 2017
| doi = 10.1017/S0956796817000119 | s2cid = 6128299 }}
</ref> While Cuneiform provides lists as [[compound data
===Base
As base data types Cuneiform provides Booleans, strings, and files. Herein, files are used to exchange data in arbitrary format between foreign functions.
===Records and
Cuneiform provides
<syntaxhighlight lang="swift">
let r : <a1 : Str, a2 : Bool> =
<a1 = "my string", a2 = true>;
</syntaxhighlight>
Records can be accessed either via projection or via [[pattern matching]]. The example below extracts the two fields <code>a1</code> and <code>a2</code> from the record <code>r</code>.
<syntaxhighlight lang="swift">
let a1 : Str = ( r|a1 );
let <a2 = a2 : Bool> = r;
</syntaxhighlight>
===Lists and
Furthermore, Cuneiform provides lists as compound data types. The example below shows the definition of a variable <code>xs</code> being a file list with three elements.
<syntaxhighlight lang="erlang">
let xs : [File] =
['a.txt', 'b.txt', 'c.txt' : File];
</syntaxhighlight>
Lists can be processed with the for and fold operators. Herein, the for operator can be given multiple lists to consume list element-wise (similar to <code>for/list</code> in [[Racket (programming language)|Racket]], <code>mapcar</code> in [[Common Lisp]] or <code>zipwith</code> in [[Erlang (programming language)|Erlang]]).
Line 194 ⟶ 202:
The example below shows how to map over a single list, the result being a file list.
<syntaxhighlight lang="ruby">
for x <- xs do
process-one( arg1 = x )
: File
end;
</syntaxhighlight>
The example below shows how to zip two lists the result also being a file list.
<syntaxhighlight lang="ruby">
for x <- xs, y <- ys do
process-two( arg1 = x, arg2 = y )
: File
end;
</syntaxhighlight>
Finally, lists can be aggregated by using the fold operator. The following example sums up the elements of a list.
<syntaxhighlight lang="text">
fold acc = 0, x <- xs do
add( a = acc, b = x )
end;
</syntaxhighlight>
==Parallel execution==
Line 224 ⟶ 232:
For example, the following Cuneiform program allows the applications of <code>f</code> and <code>g</code> to run in parallel while <code>h</code> is dependent and can be started only when both <code>f</code> and <code>g</code> are finished.
let output-of-f : File = f();
let output-of-g : File = g();
h( f = output-of-f, g = output-of-g );
}}
The following Cuneiform program creates three parallel applications of the function <code>f</code> by mapping <code>f</code> over a three-element list:
let xs : [File] =
['a.txt', 'b.txt', 'c.txt' : File];
Line 241 ⟶ 249:
: File
end;
}}
Similarly, the applications of <code>f</code> and <code>g</code> are independent in the construction of the record <code>r</code> and can, thus, be run in parallel:
{{sxhl|lang=erlang|1=
let r : <a : File, b : File> =
<nowiki><a = f(), b = g()></nowiki>;
}}
==Examples==
A hello-world script:
<syntaxhighlight lang="ruby">
def greet( person : Str ) -> <out : Str>
in Bash *{
Line 260 ⟶ 268:
( greet( person = "world" )|out );
</syntaxhighlight>
This script defines a task <code>greet</code> in [[Bash (Unix shell)|Bash]] which prepends <code>"Hello "</code> to its string argument <code>person</code>.
The function produces a record with a single string field <code>out</code>.
Line 266 ⟶ 274:
Command line tools can be integrated by defining a task in [[Bash (Unix shell)|Bash]]:
<syntaxhighlight lang="ruby">
def samtoolsSort( bam : File ) -> <sorted : File>
in Bash *{
Line 272 ⟶ 280:
samtools sort -m 2G $bam -o $sorted
}*
</syntaxhighlight>
In this example a task <code>samtoolsSort</code> is defined.
It calls the tool [[SAMtools]], consuming an input file, in BAM format, and producing a sorted output file, also in BAM format.
==Release
{| class="wikitable"
Line 282 ⟶ 290:
! Version !! Appearance !! Implementation Language !! Distribution Platform !! Foreign Languages
|-
!
|
| [[
| [[Apache Hadoop]]
| Bash,
|-
! 2.0.x
| Mar. 2015
| [[Java (programming language)|Java]]
| [[HTCondor]], [[Apache Hadoop]]
| Bash, BeanShell, Common Lisp, MATLAB, GNU Octave, Perl, Python, R, Scala
|-
! 2.2.x
Line 294 ⟶ 308:
| Bash, Perl, Python, R
|-
!
|
| [[
| Distributed Erlang
| Bash,
|}
Line 311 ⟶ 319:
Cuneiform's surface syntax was revised twice, as reflected in the major version number.
===Version
In its first draft published in May 2014, Cuneiform was closely related to [[Make (software)|Make]] in that it constructed a static [[data dependency]] graph which the interpreter traversed during execution. The major difference to later versions was the lack of conditionals, recursion, or static type checking. Files were distinguished from strings by juxtaposing single-quoted string values with a tilde <code>~</code>. The script's query expression was introduced with the <code>target</code> keyword. Bash was the default foreign language. [[Function application]] had to be performed using an <code>apply</code> form that took <code>task</code> as its first keyword argument. One year later, this surface syntax was replaced by a streamlined but similar version.
The following example script
<pre>
declare download-ref-genome;
deftask download-fa( fa : ~path ~id ) *{
wget $path/$id.fa.gz
gunzip $id.fa.gz
mv $id.fa $fa
}*
ref-genome-path = ~'ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/chromosomes';
ref-genome-id = ~'chr22';
ref-genome = apply(
task : download-fa
path : ref-genome-path
id : ref-genome-id
);
target ref-genome;
</pre>
Line 357 ⟶ 370:
</pre>
===Version 3===
The current version of Cuneiform's surface syntax, in comparison to earlier drafts, is an attempt to close the gap to mainstream functional programming languages. It features a simple, statically checked type system and introduces records in addition to lists as a second type of compound [[data structure]]. Booleans are a separate base data type.
The following script untars a file resulting in a file list.
<pre>
def untar( tar : File ) -> <fileLst : [File]>
in Bash *{
tar xf $tar
fileLst=`tar tf $tar`
}*
let hg38Tar : File =
'hg38/hg38.tar';
let <fileLst = faLst : [File]> =
untar( tar = hg38Tar );
faLst;
</pre>
|