Your Last Name Change Was on Invalid Date. You Can Change It Again in Nan Days

10 Tricks for Converting Numbers and Strings to Datetime in Pandas

Pandas tips and tricks to help you become started with Data Analysis

Photo by Sanah Suvarna on Unsplash

When doing data assay, it is important to ensure correct information types. Otherwise, you may get unexpected results or errors. Datetime is a common data type in data science projects and the data is often saved equally numbers or strings. During data analysis, yous will likely need to explicitly convert them to a datetime type.

This article will discuss how to catechumen numbers and strings to a datetime type. More specifically, you will learn how to use the Pandas built-in methods to_datetime() and astype() to bargain with the following common problems:

  1. Converting numbers to datetime
  2. Converting strings to datetime
  3. Handling day first format
  4. Dealing with custom datetime format
  5. Handling parse mistake
  6. Handling missing values
  7. Assembling datetime from multiple columns
  8. Converting multiple columns at once
  9. Parsing date column when reading a CSV file
  10. Deviation between astype() and to_datetime()

Please cheque out the Notebook for the source code.

1. Converting numbers to datetime

Pandas has 2 built-in methods astype() and to_datetime() that can exist used to catechumen numbers to datetime. For instance, to convert numbers denote second to datetime:

          df = pd.DataFrame({'date': [1470195805, 1480195805, 1490195805],
'value': [2, 3, 4]})

When using to_datetime() , nosotros need to call information technology from Pandas and ready the statement unit='southward':

          >>>            pd.to_datetime(df['date'], unit of measurement='due south')                    0   2016-08-03 03:43:25
i 2016-11-26 21:xxx:05
ii 2017-03-22 15:16:45
Proper name: date, dtype: datetime64[ns]

When using astype() , nosotros demand to call it from a Series (the date cavalcade) and pass in 'datetime[s]':

          >>> df['engagement'].astype('datetime64[s]')          0   2016-08-03 03:43:25
one 2016-11-26 21:30:05
2 2017-03-22 15:16:45
Proper noun: date, dtype: datetime64[ns]

Similarly, we can catechumen numbers announce other units (D,s, ms, the states, ns) to datetime, for instance, numbers denote the mean solar day

          df = pd.DataFrame({'appointment': [1470, 1480, 1490],            
'value': [2, 3, 4]})
>>> pd.to_datetime(df['date'], unit='D') 0 1974-01-x
i 1974-01-20
2 1974-01-30
Name: engagement, dtype: datetime64[ns]
>>> df['date'].astype('datetime64[D]') 0 1974-01-ten
1 1974-01-20
2 1974-01-30
Proper name: date, dtype: datetime64[ns]

2. Converting strings to datetime

Often, you lot'll find that dates are represented as strings. In Pandas, strings are shown as object, it's the internal Pandas lingo for the string.

          >>> df = pd.DataFrame({'appointment':['3/x/2015','3/eleven/2015','3/12/2015'],
'value': [ii, 3, four]})
>>> df.dtypes
engagement object
value int64
dtype: object

Both to_datetime() and astype() can be used to catechumen strings to datetime.

          >>> pd.to_datetime(df['engagement'])                    0   2015-03-10
ane 2015-03-11
2 2015-03-12
Name: date, dtype: datetime64[ns]
>>> df['date'].astype('datetime64') 0 2015-03-10
1 2015-03-11
two 2015-03-12
Name: date, dtype: datetime64[ns]

three. Handling mean solar day first format

Past default, to_datetime() will parse strings with calendar month first (MM/DD, MM DD, or MM-DD) format, and this organization is relatively unique in the United Land.

In most of the rest of the world, the day is written starting time (DD/MM, DD MM, or DD-MM). If you lot would similar Pandas to consider day kickoff instead of month, you can prepare the argument dayfirst to True.

          df = pd.DataFrame({'appointment': ['3/x/2000', '3/xi/2000', '3/12/2000'],
'value': [two, 3, 4]})
df['engagement'] = pd.to_datetime(df['engagement'], dayfirst=True)

Image by writer

Alternatively, you pass a custom format to the statement format.

4. Treatment custom datetime format

By default, strings are parsed using the Pandas built-in parser from dateutil.parser.parse. Sometimes, your strings might be in a custom format, for example, YYYY-d-m HH:MM:SS. Pandas to_datetime() has an argument called format that allows you to pass a custom format:

          df = pd.DataFrame({'date': ['2016-half-dozen-10 xx:30:0',              
'2016-7-1 xix:45:30',
'2013-10-12 iv:five:ane']
,
'value': [2, 3, 4]})
df['date'] = pd.to_datetime(df['date'], format="%Y-%d-%m %H:%M:%Due south")
Prototype by writer

4. Treatment parse error

If a engagement does non meet the timestamp limitations, we volition go a ParseError when converting. For instance, an invalid string a/eleven/2000:

          df = pd.DataFrame({'appointment': ['3/10/2000',            'a/eleven/2000', 'iii/12/2000'],
'value': [ii, 3, 4]})
# Getting ParseError
df['date'] = pd.to_datetime(df['date'])
image by author

to_datetime() has an argument called errors that allows yous to ignore the error or force an invalid value to NaT.

          df['date'] = pd.to_datetime(df['date'],            errors='ignore')
df

Image by writer

And to strength an invalid value to NaT:

          df['engagement'] = pd.to_datetime(df['appointment'],            errors='coerce')        

Image by author

6. Treatment missing values

In Pandas, missing values are given the value NaN, short for "Not a Number".

          df = pd.DataFrame({'engagement': ['3/10/2000',            np.nan, '3/12/2000'],
'value': [ii, iii, iv]})

When converting a cavalcade with missing values to datetime, both to_datetime() and astype() are irresolute Numpy's NaN to Pandas' NaT and this allows information technology to be a datetime.

          >>> df['date'].astype('datetime64')                    0   2000-03-10
i NaT
2 2000-03-12
Proper noun: appointment, dtype: datetime64[ns]
>>> pd.to_datetime(df['date']) 0 2000-03-10
i NaT
2 2000-03-12
Name: date, dtype: datetime64[ns]

Alternatively, we can replace Numpy NaN with another value (for example replacing NaN with 'three/xi/2000')

          df = pd.DataFrame({'date': ['3/10/2000', np.nan, '3/12/2000'],
'value': [2, iii, 4]})
df['date'] = df['date'].fillna('3/11/2000').astype('datetime64[ns]')

To larn more than nearly working with missing values

7. Assembling a datetime from multiple columns

to_datetime() can be used to assemble a datetime from multiple columns also. The keys (columns label) can be mutual abbreviations like ['twelvemonth', 'calendar month', 'twenty-four hour period', 'infinitesimal', 'second', 'ms', 'us', 'ns']) or plurals of the same.

          df = pd.DataFrame({'year': [2015, 2016],
'month': [2, iii],
'twenty-four hours': [4, v],
'hour': [x,11]
})

To create a datetime column from a subset of columns

          >>> pd.to_datetime(df[['calendar month','twenty-four hour period','twelvemonth']])          0   2015-02-04
1 2016-03-05
dtype: datetime64[ns]

To create a datetime cavalcade from the entire DataFrame

          >>> pd.to_datetime(df)                    0   2015-02-04 10:00:00
one 2016-03-05 11:00:00
dtype: datetime64[ns]

eight. Converting multiple columns at one time

And then far, we accept been converting data type 1 column at a fourth dimension. At that place is a DataFrame method also called astype() allows united states of america to convert multiple cavalcade information types at once. It is fourth dimension-saving when you have a agglomeration of columns you want to change.

          df = df.astype({
'date_start': 'datetime64',
'date_end': 'datetime64'

})

9. Parsing date column when reading a CSV file

If you want to set up the data type for each column when reading a CSV file, yous can utilise the statement parse_date when loading data with read_csv():

Note the data type datetime64 is not supported by dtype, and nosotros should use parse_dates statement instead.

          df = pd.read_csv(
'dataset.csv',
dtype={
# datetime64[ns] is non supported
'value': 'float16'
},
parse_dates=['date']
)

To larn more than about parsing date column with Pandas read_csv():

10. Departure between astype('datetime64') and to_datetime()

astype() is the common method to convert data type from one to other. The method is supported by both Pandas DataFrame and Series. If you need to convert a bunch of columns, the astype() should exist the first selection as it:

  • can convert multiple columns at once
  • has the best performance (shown in the screenshot below)

Yet, astype() won't work for a column with invalid information. For instance, an invalid date string a/xi/2000. If we attempt to utilise astype() we would get a ParseError. Every bit of Pandas 0.xx.0, this error can be suppressed by setting the argument errors='ignore', but your original data will exist returned untouched.

The Pandas to_datetime() office can handle these values more than gracefully. Rather than fail, nosotros can set the argument errors='coerce' to coerce invalid values to NaT.

In add-on, it can be very hard to apply astype() when dealing with custom datetime format. The Pandas to_datetime() has an statement called format and offers more possibility in the fashion of custom conversion.

Conclusion

We accept seen how nosotros can convert a Pandas data column to a datetime blazon with astype() and to_datetime(). to_datetime() is the simplest way and offers fault handling and more possibility in the fashion of custom conversion, while astype() has better performance and tin convert multiple columns at once.

I hope this commodity will assist you to save time in learning Pandas. I recommend y'all to check out the documentation for the astypes() and to_datetime() API and to know about other things y'all tin can exercise.

Thanks for reading. Please check out the notebook for the source code and stay tuned if you are interested in the practical aspect of machine learning.

You may be interested in some of my other Pandas articles:

  • x tricks to catechumen information to a numeric blazon in Pandas
  • Pandas json_normalize() you lot should know for flattening JSON
  • All Pandas cutting() you lot should know for transforming numerical data into categorical information
  • Using Pandas method chaining to improve code readability
  • How to exercise a Custom Sort on Pandas DataFrame
  • All the Pandas shift() you should know for information analysis
  • When to use Pandas transform() function
  • Pandas concat() tricks yous should know
  • All the Pandas merge() yous should know
  • Working with datetime in Pandas DataFrame
  • Pandas read_csv() tricks you lot should know
  • 4 tricks you should know to parse date columns with Pandas read_csv()

More than tutorials can be found on my Github

zunigaloortambel.blogspot.com

Source: https://towardsdatascience.com/10-tricks-for-converting-numbers-and-strings-to-datetime-in-pandas-82a4645fc23d

0 Response to "Your Last Name Change Was on Invalid Date. You Can Change It Again in Nan Days"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel