Ethereum is the second-largest cryptocurrency by market capitalization behind Bitcoin. Its main purpose is to help execute decentralized smart contracts. It's current market capitalization is larger than big companies like Netflix, Disney, Walmart and Twitter.
Due to its volatilitary, predicting cryptocurrency price is partcularly difficult. However, by applying broad time series philosophy, we can at least provide a picture that accounts for this inherent volatility. This endeavor exemplifies an intersection between time series analysis, finance, and data science.
Much like stock prices, cryptocurrency prices are time-series data. Because of this, there are a multitude of algorithms in machine learning that can be leveraged for this analysis.
To successfully capture time series data, it is assumed to have four main components:
The above outlines the main components but not the only.
To start this project, the following libraries were imported:
To be noted, the Prophet library is a popular library that is used specifically for forecasting time series data. It is a very straightforward and customizable library that is used to create accurate and reasonable forecasts. It is an additive regression model with four principal components:
import pandas as pd
import yfinance as yf
from datetime import datetime
from datetime import timedelta
import plotly.graph_objects as go
from prophet import Prophet
from prophet.plot import plot_plotly, plot_components_plotly
import warnings
warnings.filterwarnings('ignore')
pd.options.display.float_format = '${:,.2f}'.format
In order to capture the data on Ethereum prices, I used the yfinance
library which is the Yahoo! Finance market data downloader.
Additionally, I used the today
function from the datetime
library so that when this notebook is run, the date for today will be updated.
The price for Ethereum started late 2015, so January 1st of 2016 will be set as the start date.
today = datetime.today().strftime('%Y-%m-%d')
start_date = '2016-01-01'
eth_df = yf.download('ETH-USD',start_date, today)
eth_df.tail()
[*********************100%%**********************] 1 of 1 completed
Open | High | Low | Close | Adj Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
2024-04-15 | $3,156.83 | $3,277.56 | $3,026.54 | $3,101.60 | $3,101.60 | 21925843181 |
2024-04-16 | $3,101.14 | $3,127.16 | $2,997.75 | $3,084.92 | $3,084.92 | 19441391169 |
2024-04-17 | $3,084.92 | $3,123.67 | $2,918.55 | $2,984.73 | $2,984.73 | 17711869375 |
2024-04-18 | $2,984.71 | $3,094.84 | $2,956.13 | $3,066.03 | $3,066.03 | 15183777035 |
2024-04-19 | $3,065.95 | $3,127.11 | $2,868.80 | $3,059.28 | $3,059.28 | 20399982867 |
Running the above code we can see that the data has date
, open
, high
, low
, close
, adjusted close price
, and volume
.
The open price will be used as our price value. The other price columns aren't needed for the Prophet model, so they will be dropped later.
To perform a bit of analysis on the data, I ran info()
.
eth_df.info()
<class 'pandas.core.frame.DataFrame'> DatetimeIndex: 2354 entries, 2017-11-09 to 2024-04-19 Data columns (total 6 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Open 2354 non-null float64 1 High 2354 non-null float64 2 Low 2354 non-null float64 3 Close 2354 non-null float64 4 Adj Close 2354 non-null float64 5 Volume 2354 non-null int64 dtypes: float64(5), int64(1) memory usage: 128.7 KB
Additionally, null values were screened.
eth_df.isnull().sum()
Open 0 High 0 Low 0 Close 0 Adj Close 0 Volume 0 dtype: int64
A date column is needed for the Prophet model, however it is not listed as one of the columns. To understand why, I looked at the columns of the data.
eth_df.columns
Index(['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')
Based on the above output, it is evident that the date column wasn't indexed. As such I reset the index and saved date
as a column.
eth_df.reset_index(inplace=True)
eth_df.columns
Index(['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')
Only two columns can be used in our data frame — “ds” and “y”, which are the date and open columns respectively.
As such I put the columns into a new data fram and renamed the function to change the column names.
df = eth_df[["Date", "Open"]]
new_names = {
"Date": "ds",
"Open": "y",
}
df.rename(columns=new_names, inplace=True)
Now, the data is ready for Prophet.
Additionally, I coded a visualization of the price column using the plotly library which provides interactivity.
# plot the open price
x = df["ds"]
y = df["y"]
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=y))
# Set title
fig.update_layout(
title_text="Time series plot of Ethereum Open Price",
)
fig.update_layout(
xaxis=dict(
rangeselector=dict(
buttons=list(
[
dict(count=1, label="1m", step="month", stepmode="backward"),
dict(count=6, label="6m", step="month", stepmode="backward"),
dict(count=1, label="YTD", step="year", stepmode="todate"),
dict(count=1, label="1y", step="year", stepmode="backward"),
dict(step="all"),
]
)
),
rangeslider=dict(visible=True),
type="date",
)
)
Looking at the plot, there are two major spikes that might be influential on our prophet model.
We can also tell that the fluctuation in our price exaggerates as year increases. This could signify the type of time series data this is. This is discussed below.
First, the model is defined and then tuned to fit the data frame.
m = Prophet(
seasonality_mode="multiplicative"
)
m.fit(df)
11:24:39 - cmdstanpy - INFO - Chain [1] start processing 11:24:40 - cmdstanpy - INFO - Chain [1] done processing
<prophet.forecaster.Prophet at 0x11b75f010>
Now, an entire year's worth of date data for our prophet model to make predictions is selected.
future = m.make_future_dataframe(periods = 365)
future.tail()
ds | |
---|---|
2714 | 2025-04-15 |
2715 | 2025-04-16 |
2716 | 2025-04-17 |
2717 | 2025-04-18 |
2718 | 2025-04-19 |
We see the date is one year from today’s date.
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
ds | yhat | yhat_lower | yhat_upper | |
---|---|---|---|---|
2714 | 2025-04-15 | $4,681.13 | $2,779.76 | $6,362.98 |
2715 | 2025-04-16 | $4,682.38 | $2,885.08 | $6,349.59 |
2716 | 2025-04-17 | $4,718.12 | $2,856.96 | $6,402.38 |
2717 | 2025-04-18 | $4,722.19 | $2,830.83 | $6,403.03 |
2718 | 2025-04-19 | $4,719.08 | $2,873.91 | $6,412.13 |
We can also get the price prediction for the next day.
next_day = (datetime.today() + timedelta(days=1)).strftime('%Y-%m-%d')
forecast[forecast['ds'] == next_day]['yhat'].item()
3361.1787164774996
Prophet has built-in plotly functions that can help us easily visualize our forecast.
plot_plotly(m, forecast)
Our forecasting model includes growth curve trend
, weekly seasonal
, and yearly seasonal
components which can be visualized like this:
plot_components_plotly(m, forecast)
Our model tells us that:
There will be an upward trend for the price of Ethereum.
The price of ETH is lowest in the mid-to-later part of the year on a Saturday.
ETH is most expensive around May on a Thursday.
Moving forward, this price prediction model will be experimented with alternatives like the ARIMA model or Deep learning (LSTM Models) to perform forecasting, and then compare their performance using diagnostics like R-squared or RMSE.