Your Last Name Change Was on Invalid Date. You Can Change It Again in Nan Days
10 Tricks for Converting Numbers and Strings to Datetime in Pandas
Pandas tips and tricks to help you become started with Data Analysis
When doing data assay, it is important to ensure correct information types. Otherwise, you may get unexpected results or errors. Datetime is a common data type in data science projects and the data is often saved equally numbers or strings. During data analysis, yous will likely need to explicitly convert them to a datetime type.
This article will discuss how to catechumen numbers and strings to a datetime type. More specifically, you will learn how to use the Pandas built-in methods to_datetime()
and astype()
to bargain with the following common problems:
- Converting numbers to datetime
- Converting strings to datetime
- Handling day first format
- Dealing with custom datetime format
- Handling parse mistake
- Handling missing values
- Assembling datetime from multiple columns
- Converting multiple columns at once
- Parsing date column when reading a CSV file
- Deviation between
astype()
andto_datetime()
Please cheque out the Notebook for the source code.
1. Converting numbers to datetime
Pandas has 2 built-in methods astype()
and to_datetime()
that can exist used to catechumen numbers to datetime. For instance, to convert numbers denote second to datetime:
df = pd.DataFrame({'date': [1470195805, 1480195805, 1490195805],
'value': [2, 3, 4]})
When using to_datetime()
, nosotros need to call information technology from Pandas and ready the statement unit='southward'
:
>>> pd.to_datetime(df['date'], unit of measurement='due south') 0 2016-08-03 03:43:25
i 2016-11-26 21:xxx:05
ii 2017-03-22 15:16:45
Proper name: date, dtype: datetime64[ns]
When using astype()
, nosotros demand to call it from a Series (the date cavalcade) and pass in 'datetime[s]'
:
>>> df['engagement'].astype('datetime64[s]') 0 2016-08-03 03:43:25
one 2016-11-26 21:30:05
2 2017-03-22 15:16:45
Proper noun: date, dtype: datetime64[ns]
Similarly, we can catechumen numbers announce other units (D
,s
, ms
, the states
, ns
) to datetime, for instance, numbers denote the mean solar day
df = pd.DataFrame({'appointment': [1470, 1480, 1490],
'value': [2, 3, 4]}) >>> pd.to_datetime(df['date'], unit='D') 0 1974-01-x
i 1974-01-20
2 1974-01-30
Name: engagement, dtype: datetime64[ns] >>> df['date'].astype('datetime64[D]') 0 1974-01-ten
1 1974-01-20
2 1974-01-30
Proper name: date, dtype: datetime64[ns]
2. Converting strings to datetime
Often, you lot'll find that dates are represented as strings. In Pandas, strings are shown as object, it's the internal Pandas lingo for the string.
>>> df = pd.DataFrame({'appointment':['3/x/2015','3/eleven/2015','3/12/2015'],
'value': [ii, 3, four]})
>>> df.dtypes engagement object
value int64
dtype: object
Both to_datetime()
and astype()
can be used to catechumen strings to datetime.
>>> pd.to_datetime(df['engagement']) 0 2015-03-10
ane 2015-03-11
2 2015-03-12
Name: date, dtype: datetime64[ns] >>> df['date'].astype('datetime64') 0 2015-03-10
1 2015-03-11
two 2015-03-12
Name: date, dtype: datetime64[ns]
three. Handling mean solar day first format
Past default, to_datetime()
will parse strings with calendar month first (MM/DD, MM DD, or MM-DD) format, and this organization is relatively unique in the United Land.
In most of the rest of the world, the day is written starting time (DD/MM, DD MM, or DD-MM). If you lot would similar Pandas to consider day kickoff instead of month, you can prepare the argument dayfirst
to True
.
df = pd.DataFrame({'appointment': ['3/x/2000', '3/xi/2000', '3/12/2000'],
'value': [two, 3, 4]}) df['engagement'] = pd.to_datetime(df['engagement'], dayfirst=True)
Alternatively, you pass a custom format to the statement format
.
4. Treatment custom datetime format
By default, strings are parsed using the Pandas built-in parser from dateutil.parser.parse
. Sometimes, your strings might be in a custom format, for example, YYYY-d-m HH:MM:SS. Pandas to_datetime()
has an argument called format
that allows you to pass a custom format:
df = pd.DataFrame({'date': ['2016-half-dozen-10 xx:30:0',
'2016-7-1 xix:45:30',
'2013-10-12 iv:five:ane'],
'value': [2, 3, 4]}) df['date'] = pd.to_datetime(df['date'], format="%Y-%d-%m %H:%M:%Due south")
4. Treatment parse error
If a engagement does non meet the timestamp limitations, we volition go a ParseError when converting. For instance, an invalid string a/eleven/2000
:
df = pd.DataFrame({'appointment': ['3/10/2000', 'a/eleven/2000', 'iii/12/2000'],
'value': [ii, 3, 4]}) # Getting ParseError
df['date'] = pd.to_datetime(df['date'])
to_datetime()
has an argument called errors
that allows yous to ignore the error or force an invalid value to NaT
.
df['date'] = pd.to_datetime(df['date'], errors='ignore')
df
And to strength an invalid value to NaT
:
df['engagement'] = pd.to_datetime(df['appointment'], errors='coerce')
6. Treatment missing values
In Pandas, missing values are given the value NaN
, short for "Not a Number".
df = pd.DataFrame({'engagement': ['3/10/2000', np.nan, '3/12/2000'],
'value': [ii, iii, iv]})
When converting a cavalcade with missing values to datetime, both to_datetime() and astype() are irresolute Numpy's NaN
to Pandas' NaT
and this allows information technology to be a datetime.
>>> df['date'].astype('datetime64') 0 2000-03-10
i NaT
2 2000-03-12
Proper noun: appointment, dtype: datetime64[ns] >>> pd.to_datetime(df['date']) 0 2000-03-10
i NaT
2 2000-03-12
Name: date, dtype: datetime64[ns]
Alternatively, we can replace Numpy NaN
with another value (for example replacing NaN
with 'three/xi/2000'
)
df = pd.DataFrame({'date': ['3/10/2000', np.nan, '3/12/2000'],
'value': [2, iii, 4]}) df['date'] = df['date'].fillna('3/11/2000').astype('datetime64[ns]')
To larn more than nearly working with missing values
7. Assembling a datetime from multiple columns
to_datetime()
can be used to assemble a datetime from multiple columns also. The keys (columns label) can be mutual abbreviations like ['twelvemonth', 'calendar month', 'twenty-four hour period', 'infinitesimal', 'second', 'ms', 'us', 'ns']) or plurals of the same.
df = pd.DataFrame({'year': [2015, 2016],
'month': [2, iii],
'twenty-four hours': [4, v],
'hour': [x,11]
})
To create a datetime column from a subset of columns
>>> pd.to_datetime(df[['calendar month','twenty-four hour period','twelvemonth']]) 0 2015-02-04
1 2016-03-05
dtype: datetime64[ns]
To create a datetime cavalcade from the entire DataFrame
>>> pd.to_datetime(df) 0 2015-02-04 10:00:00
one 2016-03-05 11:00:00
dtype: datetime64[ns]
eight. Converting multiple columns at one time
And then far, we accept been converting data type 1 column at a fourth dimension. At that place is a DataFrame method also called astype()
allows united states of america to convert multiple cavalcade information types at once. It is fourth dimension-saving when you have a agglomeration of columns you want to change.
df = df.astype({
'date_start': 'datetime64',
'date_end': 'datetime64'
})
9. Parsing date column when reading a CSV file
If you want to set up the data type for each column when reading a CSV file, yous can utilise the statement parse_date
when loading data with read_csv()
:
Note the data type datetime64
is not supported by dtype
, and nosotros should use parse_dates
statement instead.
df = pd.read_csv(
'dataset.csv',
dtype={
# datetime64[ns] is non supported
'value': 'float16'
},
parse_dates=['date']
)
To larn more than about parsing date column with Pandas read_csv()
:
10. Departure between astype('datetime64')
and to_datetime()
astype()
is the common method to convert data type from one to other. The method is supported by both Pandas DataFrame and Series. If you need to convert a bunch of columns, the astype()
should exist the first selection as it:
- can convert multiple columns at once
- has the best performance (shown in the screenshot below)
Yet, astype()
won't work for a column with invalid information. For instance, an invalid date string a/xi/2000
. If we attempt to utilise astype()
we would get a ParseError. Every bit of Pandas 0.xx.0, this error can be suppressed by setting the argument errors='ignore',
but your original data will exist returned untouched.
The Pandas to_datetime()
office can handle these values more than gracefully. Rather than fail, nosotros can set the argument errors='coerce'
to coerce invalid values to NaT
.
In add-on, it can be very hard to apply astype()
when dealing with custom datetime format. The Pandas to_datetime()
has an statement called format
and offers more possibility in the fashion of custom conversion.
Conclusion
We accept seen how nosotros can convert a Pandas data column to a datetime blazon with astype()
and to_datetime()
. to_datetime()
is the simplest way and offers fault handling and more possibility in the fashion of custom conversion, while astype()
has better performance and tin convert multiple columns at once.
I hope this commodity will assist you to save time in learning Pandas. I recommend y'all to check out the documentation for the astypes()
and to_datetime()
API and to know about other things y'all tin can exercise.
Thanks for reading. Please check out the notebook for the source code and stay tuned if you are interested in the practical aspect of machine learning.
You may be interested in some of my other Pandas articles:
- x tricks to catechumen information to a numeric blazon in Pandas
- Pandas json_normalize() you lot should know for flattening JSON
- All Pandas cutting() you lot should know for transforming numerical data into categorical information
- Using Pandas method chaining to improve code readability
- How to exercise a Custom Sort on Pandas DataFrame
- All the Pandas shift() you should know for information analysis
- When to use Pandas transform() function
- Pandas concat() tricks yous should know
- All the Pandas merge() yous should know
- Working with datetime in Pandas DataFrame
- Pandas read_csv() tricks you lot should know
- 4 tricks you should know to parse date columns with Pandas read_csv()
More than tutorials can be found on my Github
Source: https://towardsdatascience.com/10-tricks-for-converting-numbers-and-strings-to-datetime-in-pandas-82a4645fc23d
0 Response to "Your Last Name Change Was on Invalid Date. You Can Change It Again in Nan Days"
Post a Comment