Numpy Tutorial – Your first numpy guide to build python coding foundations
This is part 1 of the numpy tutorial covering all the core aspects of performing data manipulation and analysis with numpy's ndarrays. Numpy is the most basic and a powerful package for data manipulation and scientific computing in python.
NumPy is the backbone of data science in Python. This tutorial covers arrays, indexing, reshaping, and random numbers — all the basics you need to work with data. By the end, you’ll know how to create, inspect, and work with NumPy arrays like a pro.
Numpy Tutorial Part 1: Introduction to Arrays. Photo by Bryce Canyon.
This post has interactive code — click ‘Run’ or press Ctrl+Enter on any code block to execute it directly in your browser. The first run may take a few seconds to initialize.
Also Read:
Numpy Tutorial – Gentle Introduction [Part 1] [This Article]
Numpy – Vital Functions for Data Analysis [Part 2]
Contents
- Introduction to numpy
- How to create a numpy array?
- How to inspect the size and shape of a numpy array?
- How to extract specific items from an array?
4.1 How to reverse the rows and the whole array?
4.2 How to represent missing values and infinite?
4.3 How to compute mean, min, max on the ndarray? - How to create a new array from an existing array?
- Reshaping and Flattening Multidimensional arrays
6.1 What is the difference between flatten() and ravel()? - How to create sequences, repetitions, and random numbers?
7.1 How to create repeating sequences?
7.2 How to generate random numbers?
7.3 How to get the unique items and the counts?
1. Introduction to Numpy
NumPy is the most important Python package for working with numbers and data. If you plan to do data analysis or machine learning, you need to learn it well.
Why? Because pandas is built on top of NumPy. Scikit-learn relies on it heavily too. These are the main tools you’ll use for data work and ML projects.
So what does NumPy actually give you?
At its core, NumPy gives you the ndarray — short for n-dimensional array. You can store many items of the same data type in an ndarray. The tools built around this array object make math and data tasks fast and easy.
You might think: “I can already store numbers in a Python list. I can do math with list comprehensions and for-loops. Why do I need NumPy?”
Great question. NumPy arrays have big advantages over lists. Let me show you by first creating a NumPy array.
2. How to create a numpy array?
There are many ways to create a NumPy array. The most common way is to pass a Python list to np.array.
Let’s start with a simple 1D array.
# Create a 1d array from a list import numpy as np list1 = [0, 1, 2, 3, 4] arr1d = np.array(list1) # Print the array and its type print(type(arr1d)) print(arr1d)
The key difference between an array and a list? Arrays handle vectorized operations. A Python list does not.
This means when you apply an operation, it runs on every item in the array — not on the array object itself.
For example, try adding 2 to every item in a list:
list1 + 2 # TypeError — you can't do this with a list
That fails. But with a NumPy array, it just works:
import numpy as np arr1d = np.array([0, 1, 2, 3, 4]) # Add 2 to each element print(arr1d + 2)
One thing to note: once you create a NumPy array, you can’t grow its size. You’d have to create a new array. Lists don’t have this limit — you can append freely.
But NumPy has many more strengths. Let’s keep going.
You can also pass a list of lists to create a 2D array (like a matrix):
import numpy as np # Create a 2d array from a list of lists list2 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]] arr2d = np.array(list2) print(arr2d)
You can set the data type with the dtype argument. Common NumPy dtypes include: 'float', 'int', 'bool', 'str', and 'object'.
For tighter memory control, use 'float32', 'float64', 'int8', 'int16', or 'int32'.
import numpy as np list2 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]] # Create a float 2d array arr2d_f = np.array(list2, dtype='float') print(arr2d_f)
See those decimal points? That tells you it’s a float. You can convert to a different type with astype:
import numpy as np
list2 = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
arr2d_f = np.array(list2, dtype='float')
# Convert to int
print(arr2d_f.astype('int'))
# Chain conversions: float -> int -> str
print(arr2d_f.astype('int').astype('str'))
Here’s an important rule: every item in a NumPy array must share the same data type. Lists don’t have this rule.
If you need to mix types (like numbers and strings), set dtype='object':
import numpy as np
# Boolean array: any nonzero value becomes True
arr2d_b = np.array([1, 0, 10], dtype='bool')
print("Boolean array:", arr2d_b)
# Object array: can hold mixed types
arr1d_obj = np.array([1, 'a'], dtype='object')
print("Object array:", arr1d_obj)
You can always convert an array back to a Python list with tolist():
import numpy as np arr1d_obj = np.array([1, 'a'], dtype='object') print(arr1d_obj.tolist())
To sum up, the main differences between NumPy arrays and Python lists:
- Arrays support vectorized operations. Lists don’t.
- Arrays have a fixed size after creation. You must create a new array to change it.
- Every array has exactly one dtype. All items must match that type.
- NumPy arrays use much less memory than equivalent Python lists.
3. How to inspect the size and shape of a numpy array?
Every array has traits that tell you about its shape and layout. Let me walk you through the key ones.
Take arr2d — we built it from a list of lists, so it has 2 axes (rows and columns, like a matrix). A list of list of lists would give 3 axes, like a cube.
When someone hands you a NumPy array, what do you check first? Here are the five things I always look at:
- Number of dimensions (
ndim) - Items in each dimension (
shape) - Data type (
dtype) - Total item count (
size) - A few sample values (through indexing)
import numpy as np
# Create a 2d array with 3 rows and 4 columns
list2 = [[1, 2, 3, 4], [3, 4, 5, 6], [5, 6, 7, 8]]
arr2 = np.array(list2, dtype='float')
print("Array:\n", arr2)
print('Shape:', arr2.shape)
print('Datatype:', arr2.dtype)
print('Size:', arr2.size)
print('Num Dimensions:', arr2.ndim)
4. How to extract specific items from an array?
You can pull out parts of an array using indexing, starting at 0 — just like Python lists.
But here’s the difference: NumPy arrays accept one index per dimension, separated by commas. Lists can’t do this.
import numpy as np list2 = [[1, 2, 3, 4], [3, 4, 5, 6], [5, 6, 7, 8]] arr2 = np.array(list2, dtype='float') # Extract the first 2 rows and first 2 columns print(arr2[:2, :2])
Note: if you try list2[:2, :2] on a regular list, you’ll get an error. Multi-dimensional indexing is a NumPy feature.
NumPy also supports boolean indexing. You create a True/False mask, and only the True positions are kept:
import numpy as np
arr2 = np.array([[1, 2, 3, 4], [3, 4, 5, 6], [5, 6, 7, 8]], dtype='float')
# Create a boolean mask: which values are greater than 4?
b = arr2 > 4
print("Boolean mask:\n", b)
# Filter: keep only values where mask is True
print("Filtered values:", arr2[b])
4.1 How to reverse the rows and the whole array?
Reversing works the same way as with lists — use the ::-1 slice. For a full reversal of a 2D array, apply it to both axes:
import numpy as np
arr2 = np.array([[1, 2, 3, 4], [3, 4, 5, 6], [5, 6, 7, 8]], dtype='float')
# Reverse only the rows
print("Rows reversed:\n", arr2[::-1, ])
# Reverse both rows and columns
print("Fully reversed:\n", arr2[::-1, ::-1])
4.2 How to represent missing values and infinite?
Use np.nan for missing values and np.inf for infinity. Let’s insert some into our array and then replace them:
import numpy as np
arr2 = np.array([[1, 2, 3, 4], [3, 4, 5, 6], [5, 6, 7, 8]], dtype='float')
# Insert nan and inf
arr2[1, 1] = np.nan
arr2[1, 2] = np.inf
print("With nan and inf:\n", arr2)
# Replace nan and inf with -1
# Important: don't use arr2 == np.nan (it won't work!)
missing_bool = np.isnan(arr2) | np.isinf(arr2)
arr2[missing_bool] = -1
print("After replacement:\n", arr2)
4.3 How to compute mean, min, max on the ndarray?
Every ndarray has built-in methods for basic stats. Here’s how to use them on the whole array, and also by row or column.
import numpy as np
arr2 = np.array([[1, 2, 3, 4], [3, -1, -1, 6], [5, 6, 7, 8]], dtype='float')
# Overall stats
print("Mean:", arr2.mean())
print("Max:", arr2.max())
print("Min:", arr2.min())
Need the min for each row or column? Use np.amin with the axis parameter. Axis 0 goes down columns. Axis 1 goes across rows.
import numpy as np
arr2 = np.array([[1, 2, 3, 4], [3, -1, -1, 6], [5, 6, 7, 8]], dtype='float')
print("Column-wise minimum:", np.amin(arr2, axis=0))
print("Row-wise minimum:", np.amin(arr2, axis=1))
What if you want a custom function applied row-wise? That’s where np.apply_along_axis comes in — we’ll cover that in Part 2.
Here’s another useful one — the cumulative sum:
import numpy as np
arr2 = np.array([[1, 2, 3, 4], [3, -1, -1, 6], [5, 6, 7, 8]], dtype='float')
print("Cumulative sum:", np.cumsum(arr2))
5. How to create a new array from an existing array?
This is a common trap. When you slice an array and save it to a new name, the new name is just a view of the source. It points to the same spot in memory.
Change the view, and the original changes too. Watch:
import numpy as np
arr2 = np.array([[1, 2, 3, 4], [3, -1, -1, 6], [5, 6, 7, 8]], dtype='float')
# This is a VIEW, not a copy
arr2a = arr2[:2, :2]
arr2a[0, 0] = 100 # This changes arr2 as well!
print("Original after view change:\n", arr2)
To avoid this, use .copy(). This creates a completely separate array:
import numpy as np
arr2 = np.array([[100, 2, 3, 4], [3, -1, -1, 6], [5, 6, 7, 8]], dtype='float')
# This is a COPY — changes won't affect arr2
arr2b = arr2[:2, :2].copy()
arr2b[0, 0] = 999
print("Original after copy change:\n", arr2)
print("Copy:\n", arr2b)
6. Reshaping and Flattening Multidimensional arrays
Reshaping moves items into a new layout while keeping all the data intact.
Flattening turns any multi-axis array into a flat 1D array. Let me show both.
import numpy as np
arr2 = np.array([[100, 2, 3, 4], [3, -1, -1, 6], [5, 6, 7, 8]], dtype='float')
# Reshape from 3x4 to 4x3
print("Reshaped (4x3):\n", arr2.reshape(4, 3))
6.1 What is the difference between flatten() and ravel()?
Both turn a multi-axis array into 1D. But there’s a key difference.
flatten() returns a copy. Changes to the flattened array won’t affect the original.
ravel() returns a view. Changes to the raveled array will affect the source. But it uses less memory since it skips the copy step.
import numpy as np
arr2 = np.array([[100, 2, 3, 4], [3, -1, -1, 6], [5, 6, 7, 8]], dtype='float')
# flatten() returns a copy
b1 = arr2.flatten()
b1[0] = 999
print("After changing flatten result:")
print("Original unchanged:\n", arr2)
# ravel() returns a view
b2 = arr2.ravel()
b2[0] = 101
print("\nAfter changing ravel result:")
print("Original changed:\n", arr2)
7. How to create sequences, repetitions and random numbers using numpy?
np.arange creates custom number sequences. It works like Python’s range, but returns an ndarray:
import numpy as np
print("Default (0 to 4):", np.arange(5))
print("0 to 9:", np.arange(0, 10))
print("0 to 9, step 2:", np.arange(0, 10, 2))
print("10 to 1, decreasing:", np.arange(10, 0, -1))
With np.arange, you set the start, stop, and step. But what if you want exactly N numbers between two values?
That’s what np.linspace is for. You tell it how many numbers you want, and it figures out the spacing:
import numpy as np # 10 evenly spaced numbers between 1 and 50 print(np.linspace(start=1, stop=50, num=10, dtype=int))
Note: forcing dtype=int causes rounding, so the spacing isn’t perfectly even.
There’s also np.logspace, which spaces values on a log scale. By default it uses base 10 — so start=1 means 101 and stop=50 means 1050:
import numpy as np np.set_printoptions(precision=2) # 10 values from 10^1 to 10^50 print(np.logspace(start=1, stop=50, num=10, base=10))
Need arrays filled with zeros or ones? Use np.zeros and np.ones:
import numpy as np
print("Zeros:\n", np.zeros([2, 2]))
print("Ones:\n", np.ones([2, 2]))
7.1 How to create repeating sequences?
Two handy functions here: np.tile repeats the whole array, while np.repeat repeats each element.
import numpy as np
a = [1, 2, 3]
# Tile: repeat the whole array twice
print('Tile: ', np.tile(a, 2))
# Repeat: repeat each element twice
print('Repeat:', np.repeat(a, 2))
7.2 How to generate random numbers?
NumPy’s random module creates random numbers of any shape. Here are the most common functions:
import numpy as np
# Uniform random numbers between [0, 1)
print("Uniform 2x2:\n", np.random.rand(2, 2))
# Normal distribution (mean=0, variance=1)
print("Normal 2x2:\n", np.random.randn(2, 2))
# Random integers between [0, 10)
print("Random ints 2x2:\n", np.random.randint(0, 10, size=[2, 2]))
# Single random number
print("Single random:", np.random.random())
# Random choices from a list
print("Random vowels:", np.random.choice(['a', 'e', 'i', 'o', 'u'], size=10))
Every time you run these functions, you get different numbers. To get the same random numbers each time, set a seed:
import numpy as np
# Method 1: Using RandomState
rn = np.random.RandomState(100)
print("With RandomState:\n", rn.rand(2, 2))
# Method 2: Using seed
np.random.seed(100)
print("With seed:\n", np.random.rand(2, 2))
Both give the same results. The seed can be any number — just use the same seed each time to get the same output.
7.3 How to get the unique items and the counts?
Use np.unique to find distinct values. Set return_counts=True to see how many times each value appears:
import numpy as np
np.random.seed(100)
arr_rand = np.random.randint(0, 10, size=10)
print("Random array:", arr_rand)
# Get unique values and their counts
uniqs, counts = np.unique(arr_rand, return_counts=True)
print("Unique items:", uniqs)
print("Counts: ", counts)
8.0 Conclusion
That wraps up Part 1 of the NumPy series. You now know how to create arrays, check their shape, slice and index them, reshape and flatten, and make sequences and random numbers.
Next up: advanced NumPy for data analysis, where we’ll cover the key functions you need for real data work.
Recommended Course: If you liked this article, you will enjoy the exhaustive NumPy Course. It covers NumPy from first principles, the recommended way to learn NumPy for a strong foundation for programming for ML / AI: Numpy for Data Science.
Build a strong Python foundation with hands-on exercises designed for aspiring Data Scientists and AI/ML Engineers.
Start Free Course →