Julia – Programming Language

Julia is a high performance, high-level programming language. It is very popular because of its high speed, machine learning packages and its expressive syntax.

It combines the good parts of Python, R, Ruby, Matlab, and Perl and it runs nearly as fast as C. Besides, it’s super easy to use python and R packages within Julia. I will show you how shortly.

In this article, I will focus on the basics of Julia for Data Science.

Julian Programming Language

 

Content

  1. What is Julia ?
  2. Why Julia ?
  3. Features of Julia
  4. Installing Julia
  5. Important Packages in Julia for Data Science
  6. Basic Syntax in Julia
  7. String in Julia
  8. Data Structures in Julia
  9. Installing Packages in Julia

1. What is Julia ?

Julia is a MIT certified free open source, high level , high performance programming language. It’s too much in talks because of it’s high speed and computation power.

While it is a general purpose language and can be used to write any application, high-performance numerical analysis and computational science are among the major applications of Julia.

It supports concurrent, parallel and distributed computing and direct calling of C and Fortran libraries without glue code. Let me explain what is glue code. Glue code unites programs or software components that would not be compatible otherwise.

Julia uses the JIT(Just in Time) – compiler which generates native machine code. The JIT compiler provides stability via multiple dispatch, which makes it easy to complile the code to an efficient one.

Julia supports multiple tools, editors (Vim, Emacs etc) as well IDEs (Juno, Microsoft Visual Studio etc).

2. Why Julia

According to Julia creators (Jeff Bezanson, Stefan Karpinski, Viral B. Shah and Alan Edelman

“We want a language that’s open source, with a liberal license. We want the speed of C with the dynamism of Ruby. We want a language that’s homoiconic, with true macros like Lisp, but with an obvious, familiar mathematical notation like Matlab. We want something as usable for general programming as Python, as easy for statistics as R, as natural for string processing as Perl, as powerful for linear algebra as Matlab, as good at glueing programs together as the shell. Something that is dirt simple to learn yet keeps the most serious hackers happy. We want it interactive and we want it compiled.”

Julia is a work straight out of MIT, a high-level language that has a syntax as friendly as Python and performance as competitive as C. Julia “aims to create an unprecedented combination of ease-of-use, power, and efficiency in a single language.

Sold?

Let me put it in simple words:

R programming is very famous for statistical modelling as it has a wide range of statistical packages. Python is very famous for it’s expressive like syntax which is easy to read and learn, it’s very compatible with other programming languages as well. C/C++ is very famous for it’s high execution speed.

What if I tell you Julia passes all these criterias. Having doubts about it right?

Let’s see it with a benchmark by julialang.org – https://julialang.org/assets/benchmarks/benchmarks.svg

Julia Benchmarks

The figure above shows performance of Julia and 10 other programming languages relative to C. Lower the better. For the benchmarks very low-level tasks are considered not too complex tasks

3. Advantages of Julia –

Julia has 100s of different features. I have listed few most important ones among those, which makes Julia so popular.

  • Free and open source
  • Outshines Technical Computing
  • Multiple dispatch
  • Parallelism
  • Call C functions directly
  • Call python functions using PyCall package
  • Automatic generation of efficient, specialized code for different argument types
  • Lisp-like macros and other meta-programming facilities
  • Efficient support for Unicode, including but not limited to UTF-8
  • Seamless package and dependency handling
  • Automated memory management
  • Easy to read to write syntax
  • JIT(Just In Time) compiler

4. Installing Julia

There are different ways to install Julia.

If you don’t want to install it in your system, it’s perfectly fine as well. You can directly use Julia in the cloud. Let’s go through all the different ways one by one.

But first let me show you how to use Julia in the clouds.

1. Julia in cloud – JuliaBox

The simplest of option – no setup required. Open Juliabox, log in using google account/linkedin account/github account. You can create a juliabox account as well and login with the account. It’s totally upto you. As of now there’s no difference in the features provided by different log-ins.

Once logged in, create a new notebook.

It will open a new jupyter notebook. It’s exactly the same as local jupyter notebook. And you are ready to write your first code in Julia

2. Command Line Installation –

You can download the latest version of julia from here as per the operating system requirements. It will install the Julia terminal in your local machine.

3. iJulia (Jupyter) –

I personally prefer iPython notebooks while writing the python codes. Similar to iPython notebooks, you can use iJulia notebooks with similar features. You can install iJulia notebooks by following these simple steps

4. IDE –

Juno is the best suited IDE availabe right now jo Julia. You can download Juno from here

5. Important Packages in Julia for Data Science

28000+ packages are available in Julia at this moment which makes Julia a multi purpose language. I have listed down few of the most important packages for statistical analysis and machine learning.

  1. DataFrames: Reading lot of files in Excel-Style.
  2. CSV: Reading and writing CSV files in Julia
  3. Bokeh: Visualizing and Plotting in Julia
  4. StatsBase: Basic functions for statistics including descriptive statistics and moments, counting and ranking, etc.
  5. TimeSeries : Functions for time series analysis
  6. MultivariateStats: Linear regression, multidimensional scaling, etc.
  7. HypothesisTests: Popular hypothesis tests handling confidence interval, p value, and various parametric and non-parametric tests
  8. ScikitLearn : Julia’s implemenatation of ScikitLearn API
  9. AutoMLPipeline : Package that makes it trivial to create and evaluate machine learning pipeline architectures.
  10. Clustering: Various clustering algorithms, such as K-means, K-medoids, affinity propagation, and Density-based Spatial Clustering of Applications with Noise (DBSCAN).
  11. BackpropNeuralNet: Neural Network Implementation of Julia
  12. Interact.jl: Interactive widgets such as dropdowns, sliders and checkboxes to easily implement julia code.
  13. Flux: ML library for Julia
  14. MXnet: MXnet Deep learning framework for Julia.
  15. DifferentialEquations: Multi-language suite for high-performance solvers of differential equations
  16. JuMP: Modeling language for Mathematical Optimization (linear, mixed-integer, conic, semidefinite, nonlinear)
  17. Plots: Powerful convenience for Julia visualizations and data analysis
  18. PyCall: Package to call Python functions from the Julia language
  19. Turing: Bayesian inference with probabilistic programming.
  20. TensorFlow: A Julia wrapper for TensorFlow
  21. Cxx: The Julia C++ Interface
  22. Optim: Optimization functions for Julia
  23. PackageCompiler: Compile your Julia Package
  24. Distributions: A Julia package for probability distributions and associated functions.

For more packages, refer following link: http://pkg.julialang.org/

The Julia Observer gives a nice explorer sort of functionalioty to view major Julia packages by functionality and domain.

6. Basic Syntax of Julia

As I earlier said Julia has a very simple syntax which makes it easy to read and learn. Let me walk you through few of the most basic operations of Julia.

6.1 Print Operations

println() function is used to print the value/string to the console in Julia

println("It's so simple")              # Print It's so simple

6.2 Assign value to variables

The assignment operator used in julia is an equal sign =

Variable on the left side followed by equal sign = and then the value you want to assign to the variable. Works just like most other programming languages.

int_variable = 10                      # Assign integer value to a variable
println(int_variable)

string_variable = "This is a string"   # Assign integer value to a variable
println(string_variable)

float_variable = 10.1                  # Assign integer value to a variable
println(float_variable)
10
This is a string
10.1

Notice, you don’t need to specify the type of variable while assigning the value. You just need to assign the value to the variable. Julia automatically identifies the type of variable based on the value provided.

6.3 Commenting Operations

Commenting operations in Julia are similar to that of Python and R. # pound/hash key is used to write a comment in Julia.

#= =# is used to write multi-line comments.

6.4 Basic maths operations

add      = 2 + 3              # Addition
println("Addition ",add)

subtract = 3 - 2              # Subtraction
println("Subtraction ",subtract)

multiply = 3 * 2              # Multiplication
println("Multiplication ",multiply)

quotient = 10 / 3             # Division 
println("Division ",quotient)

inv_div  = 10 \ 3             # Inverse Division 
println("Inverse Division ",inv_div)

power    = 3 ^ 2              # Power
println("Power ",power)

modulus  = 10 % 3             # Remainder/Modulus
println("Remainder/Modulus ",modulus)
Addition 5
Subtraction 1
Multiplication 6
Division 3.3333333333333335
Inverse Division 0.3
Power 9
Remainder/Modulus 1

6.5 Updating Operators

All the mathematical operations work in the similar manner

Ex. += -= *= /= \= ÷= %= ^=

# Update x by adding 5 to x
x = 2
x += 5
println(x)
7

7. Strings in Julia

Strings are finite sequences of characters.

7.1 String formation

" " or """ """ are used to formulate a string in Julia.

' ' is used to formulate only character in Julia unlike R and Python. It cannot be used to form a string.

string_1 = "In single quotes"       # Assign a string to a variable using single quotes `" "`

string_2 = """In triple quotes"""   # Assign a string to a variable using triple quotes `""" """`

There are couple of functional differences between strings enclosed in single and triple quotes.

In the latter case, you can use quotation marks within your string. But if used with single quotes " ", Julia won’t be able to differentiate between different quotes.

# Use " " to formulate a string having special characters like " "

"Here, we get an "error" because it's difficult to understand where this string ends "
syntax: cannot juxtapose string literal
# Use """ """ to formulate a string having special characters like " "

""" Wow, no "errors"!!! """         
" Wow, no \"errors\"!!! "

7.2 Characters

‘ ‘ Apostrophe define a character, but NOT a string.

character = 'c'                    # Assign a character to a variable using apostrophe ' '
typeof(character)                  # Check the type of vairable
Char

Woah! it’s character not a string.

But wait let’s see what will happen if a string is passed in apostrophe.

# Assign a string to a variable using apostrophe ' '

character = 'This is not a character' 
syntax: invalid character literal

There’s error in console.

Now that, you have created string, how about joining different strings?

7.3 Interpolation of Strings

The $ sign is used to insert existing variables into a string. It brings the functionality of f-strings in python.

Let’s see this with help of an example

# Define few variables 
subject = "Machine Learning"

ml_experience = 3
software_experiene = 5

# Insert values of existing variables and print the statements
println("Hi, I am good in $subject.")
println("I have $ml_experience years of relevant experience in Machine Learning and $(ml_experience + software_experiene) years of total experience.")
Hi, I am good in Machine Learning.
I have 3 years of relevant experience in Machine Learning and 8 years of total experience.

7.4 Concatenation of Strings

You can concetenate strings in various ways, let’s see one by one

7.4.1 string() function

# Define few variables 
subject = "Machine Learning"
ml_experience = "3 years"

# Concatenate various strings to formulate a single string using string() function
string("Hi, I am good in ", subject, ". I have ", ml_experience, " of relevant experience in Machine Learning")
"Hi, I am good in Machine Learning. I have 3 years of relevant experience in Machine Learning"

7.4.2 * operator

# Concatenate various strings to formulate a single string using * operator
"Hi, I am good in "*subject*". I have "*ml_experience*" of relevant experience in Machine Learning"
"Hi, I am good in Machine Learning. I have 3 years of relevant experience in Machine Learning"

8. Data Structures in Julia

Variables work fine while dealing with less amount of data. But once you start working on huge datasets, you need to store data in structures like tuples,arrays and dictionaries.


As an overview, tuples and arrays are both ordered sequences of elements (so we can index into them). Dictionaries and arrays are both mutable.

We’ll explain this more below!


Let’s see some of the most basic ones.

8.1 Tuples

Tuples are created by enclosing elements in single bracis (). Tuples are acccesed by indexes. In other words you can see tuples are ordered sequence of elements.

Tuples are immutable, we can’t update it once it is created.

Julia is 1-based indexing like R, not 0-based like Python.

# Create a tuple
ds_requirements = ("statistics", "ml", "domain knowledge", "programming")
println(ds_requirements, "\n")

# Index a touples
ds_requirements[1]
println("First element of the tuple is : ", ds_requirements[1])
("statistics", "ml", "domain knowledge", "programming")

First element of the tuple is : statistics

NamedTuples

As the name suggests, the tuples havig name in addition to the index are called Named Tuples. These are very useful when the data is related to each other and later on we want to index it by the relation

There’s not much difference in syntax as compared to the tuple syntax. You need to use the = to assign value to the name in the tupple.

ds_requirements = (basic_statistics = "Linear Algebra", basic_ml = "Linear Regression", domain_knowledge = "Bussiness Study", programming = "Julia")
(basic_statistics = "Linear Algebra", basic_ml = "Linear Regression", domain_knowledge = "Bussiness Study", programming = "Julia")
# Name Indexing of named touples
ds_requirements.programming
"Julia"
# Indexing of touples
ds_requirements[4]
"Julia"

Named Tuples are also like tuples i.e. they are immutable. If you wish to use mutable relation based structure, you can use Dictionaries in juila.

8.2 Dictionaries

Julia dictionary is an unordered collection of elements. A dictionary has a key: value pair.

You can create a dictionary using the Dict() function. You can create and empty dictionary and update it later on as well as create a key value pair.

The ‘=>‘ operator is used to assign value to the key

# Create a dictionary 
ds_requirements = Dict("basic_statistics" => "Linear Algebra", "basic_ml" => "Linear Regression", "domain_knowledge" => "Bussiness Study", "programming" => "Julia")
ds_requirements
Dict{String,String} with 4 entries:
  "basic_ml"         => "Linear Regression"
  "programming"      => "Julia"
  "basic_statistics" => "Linear Algebra"
  "domain_knowledge" => "Bussiness Study"

As i mentioned earlier dictionaries are unordered collection of elements, that mean you can’t index it. You must be thinking how do I access the values then.

Let’s see how to access the values in Dictionaries

# Key Indexing of Dictionaries
ds_requirements["programming"]
"Julia"

You must be thinking what’s the problem in indexing a dictionary. The problem is when we provide an index to the dictionary, Julia things we are trying to find the value associate the key (index number)

ds_requirements[4]
KeyError: key 4 not found



Stacktrace:

 [1] getindex(::Dict{String,String}, ::Int64) at ./dict.jl:477

 [2] top-level scope at In[64]:1

As I already mentioned a dictionary can be updated later on as well

# Update the value assocaiated to "intermediate_ml" key
ds_requirements["intermediate_ml"] = "Ensamble Algorithms"
ds_requirements
Dict{String,String} with 5 entries:
  "intermediate_ml"  => "Ensamble Algorithms"
  "basic_ml"         => "Linear Regression"
  "programming"      => "Julia"
  "basic_statistics" => "Linear Algebra"
  "domain_knowledge" => "Bussiness Study"

Once a dictionary is created, key – value pair can be deleted as well with pop!() function

# Delete the key value pair "intermediate"
pop!(ds_requirements, "intermediate_ml")
ds_requirements
Dict{String,String} with 4 entries:
  "basic_ml"         => "Linear Regression"
  "programming"      => "Julia"
  "basic_statistics" => "Linear Algebra"
  "domain_knowledge" => "Bussiness Study"

8.3 Arrays

Similar to tuple and dictionary, arrays are used to store multiple values in one single variable

Arrays are mutuable as well as ordered. You can index them like tupples and update them like dictionaries.

You can create arrays by enclsing the elements in [ ] brackets. You can create an empty array and update it later on as well.

# Create an array
ml_algo = ["Linear Regression", "Logistic Regression", "SVM", "XGBoost"]
4-element Array{String,1}:
 "Linear Regression"
 "Logistic Regression"
 "SVM"
 "XGBoost"

The 1 in Array{String,1} means this is a one dimensional vector. An Array{String,2} would be a 2d matrix, etc. The String is the type of each element.

You can store mixed data in an array as well. Let’s try to store int as well as strings in an array

mixed_array = [1, 2, 3, "Stats", "Julia", "ML"]
6-element Array{Any,1}:
 1
 2
 3
  "Stats"
  "Julia"
  "ML"

Now when we have created an array. Let’s see how to access the values.

# Indexing of arrays
ml_algo[1]
"Linear Regression"
# Update arrays using index
ml_algo[1] = "AdaBoost"
ml_algo
4-element Array{String,1}:
 "AdaBoost"
 "Logistic Regression"
 "SVM"
 "XGBoost"

A new element can be added to an array using push!() function and an existing element at the end of an array can be removed by using pop!() function

# Add a new element "Linear Regression" to the ml_algo array
push!(ml_algo, "Linear Regression")
5-element Array{String,1}:
 "AdaBoost"
 "Logistic Regression"
 "SVM"
 "XGBoost"
 "Linear Regression"
# Remove the last element "Linear Regression" from the ml_algo array
pop!(ml_algo)
ml_algo
4-element Array{String,1}:
 "AdaBoost"
 "Logistic Regression"
 "SVM"
 "XGBoost"

Multidimentional arrays can also be created and used in Julia. Let’s create a 2 dimentional array.

# Create a 2 dimentional array
ml_algo = [["Linear Regression", "SVM Regressor"],["k-means clustering", "PAM"]]
2-element Array{Array{String,1},1}:
 ["Linear Regression", "SVM Regressor"]
 ["k-means clustering", "PAM"]

9. Installing packages

You have seen different packages for data science in Julia. But how to access these packages ? Let’s see

9.1 Installing Julia Package

# Install Package "DataFrames"
using Pkg
Pkg.add("DataFrames")

You have seen how install Julia packages. But I had mentioned it’s compatible with python and R as well. Let’s see how to call a python package in Julia

9.2 Call Python package in Julia

# Add Pycall function to Julia
using Pkg
Pkg.add("PyCall")


# Use math.sin() function of python in Julia
using PyCall
math = pyimport("math")
math.sin(math.pi / 4)
0.7071067811865475

Conclusion

So, now you should have a fair idea of what Julia is, how to install it and how to get started with some basic coding in the Julialang. Next I will see you with more Data Science oriented topics in Julia.