Menu

Pandas DateTime

Let’s understand how to work with Date-Time values in Pandas.

While working with real time data, we often come across date or time values. The raw data itself might be represented as a string of text, but you will want to convert it to a datetime format in order to work with it. These are common while working on a project and pandas can handle such values with ease. It provides all the features needed to work with `datetime` data.

In this article, you will learn how to create a pandas datetime series, different features of date-time accessors, and how to convert the column type from string to pandas date-time objects.

pandas.date_range

  • Syntax: pandas.date_range(start=None, end=None, periods=None, freq=None, tz=None, normalize=False, name=None, closed=None, **kwargs)Purpose: Return a fixed frequency DatetimeIndex.
  • Parameters:
    • start: str or datetime-like Left bound for generating dates.
    • end: str or datetime-like Right bound for generating dates.
    • periods: int Number of periods to generate.
  • Returns : DatetimeIndex _For full list of parameters, refer to the official documentation

Create Pandas DateTime Series

To create a pandas datetime series in pandas, you can use the date_range function. It takes the parameter start as the starting date, periods as the number of values, and end as the ending date.

Specifying any type of date, start or end with periods parameter will return a pandas datetime series

A. Create Pandas Datetime series with start and periods parameter.

This will create the series with dates starting from start date till the number of periods defined.

# import pandas
import pandas as pd

# Create dates series with 10 periods
pd.date_range(start='25/05/2021', periods = 5)
DatetimeIndex(['2021-05-25', '2021-05-26', '2021-05-27', '2021-05-28',
               '2021-05-29'],
              dtype='datetime64[ns]', freq='D')

B. Create Pandas Datetime series with start and end parameter.

This will create a series consisting of all the dates between the start and end dates.

pd.date_range(start='25/05/2021', end='29/05/2021')
DatetimeIndex(['2021-05-25', '2021-05-26', '2021-05-27', '2021-05-28',
               '2021-05-29'],
              dtype='datetime64[ns]', freq='D')

C. Create Pandas Datetime series with end and periods parameter.

You will still get a series in such a case, but dates will start from the end date minus periods.

pd.date_range(end='29/05/2021', periods=5)
DatetimeIndex(['2021-05-25', '2021-05-26', '2021-05-27', '2021-05-28',
               '2021-05-29'],
              dtype='datetime64[ns]', freq='D')

D. Create Pandas Datetime series with start, end and periods parameter.

Case 1: Periods = difference between the start and end date

This will create a series consisting of all the dates between the start and end date.

pd.date_range(start='25/05/2021', end='29/05/2021', periods=5)
DatetimeIndex(['2021-05-25', '2021-05-26', '2021-05-27', '2021-05-28',
               '2021-05-29'],
              dtype='datetime64[ns]', freq=None)

Case 2: When number of Periods is not equal to the difference between the start and end date

This will create a linearly spaced series between the start and end date. To maintain the gap between the dates, it can go up to hours, minutes, and seconds level.

pd.date_range(start='25/05/2021', end='29/05/2021', periods=10)
DatetimeIndex(['2021-05-25 00:00:00', '2021-05-25 10:40:00',
               '2021-05-25 21:20:00', '2021-05-26 08:00:00',
               '2021-05-26 18:40:00', '2021-05-27 05:20:00',
               '2021-05-27 16:00:00', '2021-05-28 02:40:00',
               '2021-05-28 13:20:00', '2021-05-29 00:00:00'],
              dtype='datetime64[ns]', freq=None)

Create DataFrame with Pandas DateTime series

date_range function returns the required pandas datetime series. To include this series as a part of the dataframe, create a new column and assign the pandas datetime series of the same length as the dataframe.

Step 1. Create a DataFrame

df = pd.DataFrame({
                    'LinkedIn Followers': [45, 34, 23, 8, 21],
                    'Facebook Followers': [32, 43, 23, 50, 21],
                    'Twitter': [14, 42, 21, 12, 45]
                  }, index=['Abhishek', 'Saumya', 'Ayushi', 'Saksham', 'Rajveer']
                 )
df
Create Databases

Step 2. Create a Datetime series of the same length as DataFrame and assign it to column of DataFrame

dates = pd.date_range(start='25/05/2021', end='26/05/2021', periods=len(df))
df['date'] = dates
df
Creating pandas datetime series

What are the Pandas DateTime Accessor Methods

Datetime accessors allow access to various pandas datetime attributes such as day, month, year for dates and minutes, hour and seconds for time. To use a datetime accessor, chain the .dt prefix to the pandas datetime type column. Then you can access any attribute for date or time.

1. Extracting Day, Month, and Year

Consider the dataframe created in the previous section-

The ‘date’ column is a pandas datetime series. Add .dt accessor to ‘date’ column and after that, you can add .year, .month or .date to access the attributes.

#  Create a new column for the year, month, and day attributes

df['year'] = df.date.dt.year
df['month'] = df.date.dt.month
df['day'] = df.date.dt.day


df[['date', 'year', 'month', 'day']]
Pandas Datetime accessor

2. Extracting Day and Month name

The day name (Monday, Tuesday..) and month name (January, February..) can also be accessed easy using .dt accessor’s functions .day_name() and .month_name().

# Create a new column for day and month names.

df['day_name'] = df.date.dt.day_name()
df['month_name'] = df.date.dt.month_name()

df[['date', 'day_name', 'month_name']]
Pandas Datetime accessor

 

 

3. Extracting day of year and day of week

Using the .dt accessor, you can find out the position of date in the year (.dayofyear) and week (.dayofweek). The position of date in the year means how many days have passed until that day with reference to 1st January. The position in the week is with reference to Monday, denoted as 0, and Sunday as 6.

# Create a new column for the day of year and day of week

df['day_of_year'] = df.date.dt.dayofyear
df['day_of_week'] = df.date.dt.dayofweek

df[['date', 'day_of_year', 'day_of_week']]
Pandas Datetime accessor

4. Check if a given year is a Leap year

.dt.is_leap_year returns the boolean value indicating if the date belongs to leap year.

# Create a new column for leap year
df['is_leap_year'] = df.date.dt.is_leap_year
df[['date', 'is_leap_year']]
Pandas Datetime accessor

5. Extracting hour, minutes, and seconds

Following the same lines of accessing date attributes, you can access the hour, minutes and seconds attribute of time using .hour, .minute, and .second chained with .dt accessor.

# Create a new column for hour, minute, and second attributes
df['hour'] = df.date.dt.hour
df['minute'] = df.date.dt.minute
df['second'] = df.date.dt.second

df[['date', 'hour', 'minute', 'second']]
Pandas Datetime accessor

Convert column type from string to datetime format

Method 1: Using pandas.to_datetime()

You can convert the column consisting of datetime values in string format into datetime type using the to_datetime() function. It takes the column or string which needs to be converted into datetime format.

pandas.to_datetime

  • Syntax: pandas.to_datetime(arg, errors=’raise’, dayfirst=False, yearfirst=False, utc=None, format=None, exact=True, unit=None, infer_datetime_format=False, origin=’unix’, cache=True)Purpose: Convert argument to datetime.
  • Parameters:
    • arg:int, float, str, datetime, list, tuple, 1-d array, Series, DataFrame/dict-like The object to convert to a datetime.
    • errors:{‘ignore’, ‘raise’, ‘coerce’}, default ‘raise’ For invalid parsing, ‘raise’ will raise an exception, ‘coerce’ will set as NaT, and ‘ignore’ will return the input
    • dayfirst:bool, default False If True, parses dates with the day first, eg 10/11/12 is parsed as 2012-11-10.
    • yearfirst:bool, default False If True parses dates with the year first, eg 10/11/12 is parsed as 2010-11-12. If both dayfirst and yearfirst are True, yearfirst is preceded (same as dateutil)
    • format:str, default None The strftime to parse time, eg “%d/%m/%Y”
  • Returns datetime. If parsing succeeded. Return type depends on the input _For full list of parameters, refer to the offical documentation.

Data Type of Dates before change

# Dates in string format
datesStr = pd.Series(['2021-05-25 ', '2021-05-26 ',
            '2021-05-27 ', '2021-05-28' ,
            '2021-05-29 ', '2021-05-30 ',
            '2021-05-31 ', '2021-06-01 ',
            '2021-06-02 ', '2021-06-23 '], name='Dates')

# Create a dataframe
df = pd.DataFrame(datesStr)

# Data type of Dates 
df.dtypes
Dates    object
dtype: object

Data Type of Dates after change

# Using pd.to_datetime() for conversion
df['Dates'] = pd.to_datetime(df.Dates)

# Data type of dates
df.dtypes
Dates    datetime64[ns]
dtype: object

Method 2: Using .astype function

astype function can cast the pandas object into the specified data type. You can pass datetime64[ns], data type for datetime format to convert the string type column to datetime.

# Dates in string format
datesStr = pd.Series(['2021-05-25 ', '2021-05-26 ',
            '2021-05-27 ', '2021-05-28' ,
            '2021-05-29 ', '2021-05-30 ',
            '2021-05-31 ', '2021-06-01 ',
            '2021-06-02 ', '2021-06-23 '], name='Dates')

# Create a dataframe
df = pd.DataFrame(datesStr)

# Conversion of string type to datetime
df['Dates'] = df.Dates.astype('datetime64[ns]')

# Data type of dates
df.dtypes
Dates    datetime64[ns]
dtype: object

Practical Tips

  1. While dealing with datetime values, always check the data type of the column containing dates. It should be datetime64[ns].
  2. For better performance, make the column containing dates as an index and then parse them as datetime values. Applying operations on index based datetime series is faster as compared to other columns. It will also allow you to directly access the datetime attributes without the .dt accessor.

Test your knowledge

Q1: What will happen if the number of periods defined in the date range function is greater than the difference between the start date and end date?

Answer: Datetime series will be created with differences in hour, minutes, and seconds. If the periods parameter is much larger, then it can go up to nanoseconds.

Q2: How can you check if the dates of the column belong to leap year or not?

Answer: By using the .dt.is_leap_year attribute which returns the boolean result about leap year status.

Q3: How can you determine your current age?

_Hint: pd.todatetime(‘today’) returns the current time stamp datetime object

Answer: Suppose your DOB is ‘1999-12-25’. Then, convert this date and present date to datetime format and subtract the years.

import pandas as pd

presentDate = pd.to_datetime('today')
DOB = pd.to_datetime('1999-12-25')

print("Current Age: {}".format(presentDate.year - DOB.year))

Q4: Answer the following questions using the given dataset.

import pandas as pd

#  DataFrame
df = pd.DataFrame({
                    'LinkedIn Followers': [45, 34, 23, 8, 21, 53, 32],
                    'Facebook Followers': [32, 43, 23, 50, 21, 31, 53],
                    'Twitter': [14, 42, 21, 12, 45, 32, 23]
                  }, index=['Abhishek', 'Saumya', 'Ayushi', 'Saksham', 'Rajveer', 'Mukul', 'Ritik']
                 )

# DateTime series for dates column
df['dates'] = ['2021-05-28', '2021-05-28',
               '2021-05-28', '2021-05-29',
               '2021-05-29', '2021-05-30',
               '2021-05-30']


df

Q4.1: Check if the ‘dates’ column is of the right data type (datetime64[ns]). If not then convert it to datetime format.

Answer:

df.dates.dtype
dtype('O')

The ‘dates’ column needs to be converted to datetime format

# conversion of string type dates to datetime format
df['dates'] = pd.to_datetime(df.dates)

Q4.2: Who gained the maximum number of Twitter followers on Friday?

Answer:

# Extracting date names 
df['day_name'] = df.dates.dt.day_name()

# Filtering results by Friday and then sorting by Twitter
sortedResult = df[df['day_name'] == 'Friday'].sort_values(by='Twitter', ascending=False)

# Displaying the name of the person
sortedResult.index[0]
'Saumya'

This article was contributed by Shreyansh B.

Course Preview

Machine Learning A-Z™: Hands-On Python & R In Data Science

Free Sample Videos:

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science

Machine Learning A-Z™: Hands-On Python & R In Data Science