Predicting Ethereum Price with Python¶

Ethereum is the second-largest cryptocurrency by market capitalization behind Bitcoin. Its main purpose is to help execute decentralized smart contracts. It's current market capitalization is larger than big companies like Netflix, Disney, Walmart and Twitter.

Predicting cryptocurrency price¶

Due to its volatilitary, predicting cryptocurrency price is partcularly difficult. However, by applying broad time series philosophy, we can at least provide a picture that accounts for this inherent volatility. This endeavor exemplifies an intersection between time series analysis, finance, and data science.

Time Series data¶

Much like stock prices, cryptocurrency prices are time-series data. Because of this, there are a multitude of algorithms in machine learning that can be leveraged for this analysis.

To successfully capture time series data, it is assumed to have four main components:

  • trend (ex: increase in prices, pollution, decrease in sales…)
  • seasonal (ex: seasons, festivals, religious activities, climate…)
  • cyclical (ex: business cycles)
  • irregular (unexpected events like natural disasters or accidents)

The above outlines the main components but not the only.

Libraries needed¶

To start this project, the following libraries were imported:

  • pandas
  • yfinance
  • datetime
  • plotly
  • prophet

To be noted, the Prophet library is a popular library that is used specifically for forecasting time series data. It is a very straightforward and customizable library that is used to create accurate and reasonable forecasts. It is an additive regression model with four principal components:

  • A piecewise linear or logistic growth curve trend. Prophet automatically detects changes in trends by selecting changepoints from the data.
  • A yearly seasonal component modeled using Fourier series.
  • A weekly seasonal component using dummy variables.
  • A user-provided list of important holidays.

Loading the libraries¶

In [1]:
import pandas as pd
import yfinance as yf
from datetime import datetime
from datetime import timedelta
import plotly.graph_objects as go
from prophet import Prophet
from prophet.plot import plot_plotly, plot_components_plotly
import warnings

warnings.filterwarnings('ignore')

pd.options.display.float_format = '${:,.2f}'.format

Getting the data¶

In order to capture the data on Ethereum prices, I used the yfinance library which is the Yahoo! Finance market data downloader.

Additionally, I used the today function from the datetime library so that when this notebook is run, the date for today will be updated.

The price for Ethereum started late 2015, so January 1st of 2016 will be set as the start date.

In [2]:
today = datetime.today().strftime('%Y-%m-%d')
start_date = '2016-01-01'

eth_df = yf.download('ETH-USD',start_date, today)

eth_df.tail()
[*********************100%%**********************]  1 of 1 completed
Out[2]:
Open High Low Close Adj Close Volume
Date
2024-04-15 $3,156.83 $3,277.56 $3,026.54 $3,101.60 $3,101.60 21925843181
2024-04-16 $3,101.14 $3,127.16 $2,997.75 $3,084.92 $3,084.92 19441391169
2024-04-17 $3,084.92 $3,123.67 $2,918.55 $2,984.73 $2,984.73 17711869375
2024-04-18 $2,984.71 $3,094.84 $2,956.13 $3,066.03 $3,066.03 15183777035
2024-04-19 $3,065.95 $3,127.11 $2,868.80 $3,059.28 $3,059.28 20399982867

Running the above code we can see that the data has date, open, high, low, close, adjusted close price, and volume.

The open price will be used as our price value. The other price columns aren't needed for the Prophet model, so they will be dropped later.

To perform a bit of analysis on the data, I ran info().

In [3]:
eth_df.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2354 entries, 2017-11-09 to 2024-04-19
Data columns (total 6 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Open       2354 non-null   float64
 1   High       2354 non-null   float64
 2   Low        2354 non-null   float64
 3   Close      2354 non-null   float64
 4   Adj Close  2354 non-null   float64
 5   Volume     2354 non-null   int64  
dtypes: float64(5), int64(1)
memory usage: 128.7 KB

Additionally, null values were screened.

In [4]:
eth_df.isnull().sum()
Out[4]:
Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64

A date column is needed for the Prophet model, however it is not listed as one of the columns. To understand why, I looked at the columns of the data.

In [5]:
eth_df.columns
Out[5]:
Index(['Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')

Based on the above output, it is evident that the date column wasn't indexed. As such I reset the index and saved date as a column.

In [6]:
eth_df.reset_index(inplace=True)
eth_df.columns
Out[6]:
Index(['Date', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume'], dtype='object')

Only two columns can be used in our data frame — “ds” and “y”, which are the date and open columns respectively.

As such I put the columns into a new data fram and renamed the function to change the column names.

In [7]:
df = eth_df[["Date", "Open"]]

new_names = {
    "Date": "ds", 
    "Open": "y",
}

df.rename(columns=new_names, inplace=True)

Now, the data is ready for Prophet.

Additionally, I coded a visualization of the price column using the plotly library which provides interactivity.

In [8]:
# plot the open price

x = df["ds"]
y = df["y"]

fig = go.Figure()

fig.add_trace(go.Scatter(x=x, y=y))

# Set title
fig.update_layout(
    title_text="Time series plot of Ethereum Open Price",
)

fig.update_layout(
    xaxis=dict(
        rangeselector=dict(
            buttons=list(
                [
                    dict(count=1, label="1m", step="month", stepmode="backward"),
                    dict(count=6, label="6m", step="month", stepmode="backward"),
                    dict(count=1, label="YTD", step="year", stepmode="todate"),
                    dict(count=1, label="1y", step="year", stepmode="backward"),
                    dict(step="all"),
                ]
            )
        ),
        rangeslider=dict(visible=True),
        type="date",
    )
)

Looking at the plot, there are two major spikes that might be influential on our prophet model.

We can also tell that the fluctuation in our price exaggerates as year increases. This could signify the type of time series data this is. This is discussed below.

Prophet model¶

First, the model is defined and then tuned to fit the data frame.

In [9]:
m = Prophet(
    seasonality_mode="multiplicative" 
)

m.fit(df)
11:24:39 - cmdstanpy - INFO - Chain [1] start processing
11:24:40 - cmdstanpy - INFO - Chain [1] done processing
Out[9]:
<prophet.forecaster.Prophet at 0x11b75f010>

Now, an entire year's worth of date data for our prophet model to make predictions is selected.

In [10]:
future = m.make_future_dataframe(periods = 365)
future.tail()
Out[10]:
ds
2714 2025-04-15
2715 2025-04-16
2716 2025-04-17
2717 2025-04-18
2718 2025-04-19

We see the date is one year from today’s date.

Model predictions¶

In [11]:
forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail()
Out[11]:
ds yhat yhat_lower yhat_upper
2714 2025-04-15 $4,681.13 $2,779.76 $6,362.98
2715 2025-04-16 $4,682.38 $2,885.08 $6,349.59
2716 2025-04-17 $4,718.12 $2,856.96 $6,402.38
2717 2025-04-18 $4,722.19 $2,830.83 $6,403.03
2718 2025-04-19 $4,719.08 $2,873.91 $6,412.13

We can also get the price prediction for the next day.

In [12]:
next_day = (datetime.today() + timedelta(days=1)).strftime('%Y-%m-%d')

forecast[forecast['ds'] == next_day]['yhat'].item()
Out[12]:
3361.1787164774996

Forecast plots¶

Prophet has built-in plotly functions that can help us easily visualize our forecast.

In [13]:
plot_plotly(m, forecast)

Forecast components¶

Our forecasting model includes growth curve trend, weekly seasonal, and yearly seasonal components which can be visualized like this:

In [14]:
plot_components_plotly(m, forecast)

Summary¶

Our model tells us that:

  • There will be an upward trend for the price of Ethereum.

  • The price of ETH is lowest in the mid-to-later part of the year on a Saturday.

  • ETH is most expensive around May on a Thursday.

Moving forward, this price prediction model will be experimented with alternatives like the ARIMA model or Deep learning (LSTM Models) to perform forecasting, and then compare their performance using diagnostics like R-squared or RMSE.

In [ ]: