Creating a custom gym (OpenAi) environment for algorithmic trading

Gym is a library which provides environments for the reinforcement learning algorithms. There are plenty of environments included in the library such as classic control, 2D and 3D robots, and atari games. An environment includes some public functions containing ‘step’, ‘reset’, and ‘render’. When the ‘step’ function is called, it returns four values, namely ‘obervation’, ‘reward’, ‘done’, and ‘info’. ‘observation’ object holds state information about the environment such as position, velocity, etc. When an agent takes an action (executing the step function), as a result a ‘reward’ is achieved. The goal is to increase the total reward until the end of the epsiode. When the end of the episode is reached, ‘done’ flag becomes ‘True’. ‘info’ object is a dictionary value which can be used for debugging purposes. If we want to reset the environment variables, we simply call ‘reset’ function. When a graphical interface is intended to be seen, the ‘render’ function is called. Although the library contains a lot of useful environments, it does not have an environment for algorithmic trading. In this post, our goal is to build a custom gym environment for algorithmic trading. The environment will be a custom class which will inherit the ‘gym.Env’ class and it will contain the basic functions (step, reset, render) as well as the auxillary methods. We can start by installing the gym environment executing the following line in your conda prompt:

pip install gym

I’m not going to mention the whole details of the custom class, instead I will point out some important things. There will be three actions (-1, 0, 1) in our action space. ‘-1’ means ‘sell’, ‘0’ means ‘sit’ or ‘no action’, and finally ‘1’ means ‘buy’. The feature names will be defined by the user. For instance, the user could choose ‘rate of change’, ‘bollinger bandwidth’, ‘percent b’, ‘rsi’, etc. The name of the features will be holded by the variable ‘data_features’. Our observation space will be the concationation of ‘data_features’ and ‘position’. Positon variable (pos) will be ‘1’ if a buy action is taken and ‘0’ if a sell action occurs. The whole required data for an episode will be contained in a pandas DataFrame (df). That is, a row will contain ‘price’ and ‘data_features’. For instance, [‘close price’, ‘rate of change’, ‘bandwidth’, ‘rsi’,’percent b’]. At each step, we will calculate the immediate reward, gain (profit), account, and the total number of the coins. Initially, we will have 100 dollars. If the gain becomes 50 at the end of the episode, this means we made 50 percent profit and our money reached to 150 dollars. Immediate reward calculation algorithm is simple. If the action is buy, then the reward will be calculates as ‘next price’ – ‘current price’ and if the action is ‘sell’, the it will be computed as -1 * (‘next price’ – ‘current price’) . If the action is sit which means no action, then we will calculate the reward according to our position. If the position is 1, then the reward will be computes as ‘next price’ – ‘current price’ and if the position is 0, then it will be computed as -1 * (‘next price’ – ‘current price’). At the end, we need to convert the immediate reward to a percentage value by dividing it by ‘current price’ and multiplying it with 100. We also have a buy or sell cost. If there is a buy or sell, we need to subtract a trade cost from the reward. Notice that ‘sit’ action has no cost. In this stage, you can also use your own immediate reward calculation algorithm. Now I’m going to share python code of the custom class namely ‘binanceEnvironment’.

import pandas as pd
import numpy as np
import gym

class binanceEnvironment(gym.Env):  
    metadata = {'render.modes': ['human']}   
    def __init__(self):
        self.actions = [-1,0,1] # sell, sit, buy
        self.data_features = None # holds feature names
        self.state_size = None #input state size
        self.action_size = len(self.actions) #action size
        self.df = None #data frame which holds data features and close price
        self.initial_account = 100 # dollars
        self.offset = 20 #any episode does not start before the offset
        self.rand_start_index = 0 #random starting index of the episode
        self.pos = np.array([0]) # position: 1 - traded, 0 - no trade
        self.account = self.initial_account # running account = 0 # total number of coins
        self.done = 0 # flag indicating the end of the simulation
        self.sim_len = 0 #simulation length
        self.t = 0 #time index
        self.gain = 0 # running profit
        self.episode = 0 #episode number
        self.episode_old = 0 #previous episode number
        self.trades = 0 # number of trades
        self.observation_space = None #observation space
        self.input_shape = None #input shape for the deep network
        self.trade_cost = 0 #trading cost
        self.current_index = 0 #current time index
        self.is_random = True #random start

    def init_file(self, file_name, data_features, trade_cost = 0.5, is_random = True):
        self.is_random = is_random
        self.data_features = data_features
        self.state_size = len(self.data_features) + 1
        self.df = pd.read_csv(file_name)
        self.data_len = self.df.shape[0]
        self.sim_len = self.data_len 
        self.input_shape = self.__get_state().shape
        self.observation_space = self.__get_state()
        self.trade_cost = trade_cost
    def step(self, action_id):

        #if an illegal action is taken, the correct it
        action_id = self.correct_action(action_id)
        #update pos according to action
        self.pos[0] = self.pos[0] + self.actions[action_id] 

        #get current and next indexes
        self.current_index = self.rand_start_index + self.t
        next_index = self.current_index + 1

        #retrieve next state line from the data frame
        next_state_line = self.df[self.data_features].iloc[next_index]

        #get current and next prices
        price = self.df.iloc[self.current_index]['close']
        price_next = self.df.iloc[next_index]['close']

        #update account and total number of coins according to action
        if (self.actions[action_id] == 1): # if action is buy
   = + self.account / price
            self.account = 0
        elif (self.actions[action_id] == -1): #if action is sell
            self.account = self.account + price *
   = 0

        # this variable keeps whether any buy or sell action is taken
        action_taken = np.abs(self.actions[action_id])
        #update action history[self.current_index, 'action']= self.actions[action_id]
        #calculate current asset
        asset = self.__calculate_asset(price)

        #gain is the profit from the beginning
        self.gain = (asset - self.initial_account)/self.initial_account * 100 
        #update profit history[self.current_index, 'profit']= self.gain
        #----- immediate reward calculation algorithm ---------
        reward = 0
        if (self.actions[action_id] == 1):
            reward = price_next - price
        elif (self.actions[action_id] == -1):
            reward = -1.0 * (price_next - price)
            if self.pos[0] == 0:
                reward = -1.0 * (price_next - price)
                reward = price_next - price
        reward = reward / price

        reward_coeff = 100 #percent

        reward = reward_coeff*reward

        #increment number of trades if any buy or sell action is taken
        self.trades = self.trades + action_taken

        #discount reward by trade cost id any buy or sell action is taken
        reward = reward - action_taken * self.trade_cost

        #increment time 
        self.t = self.t + 1
        done = False
        if self.current_index == self.sim_len - 2:
            done = True
        #construct next state (we also add our postion (pos: 0 or 1) to the state)
        next_state = np.append(next_state_line.to_numpy(),self.pos,axis=None).reshape(1,self.state_size)

        return next_state, reward, done, {}
    def reset(self):
        if self.is_random:
            #episode starts between offset and 100 (upper limit)
            self.rand_start_index = np.random.randint(self.offset,100)
            print('start index: {}'.format(self.rand_start_index))
            self.rand_start_index = 0
        self.pos = np.array([0])
        self.account = self.initial_account = 0
        self.done = 0
        self.sim_len = self.data_len 
        self.t = 0
        self.gain = 0
        self.trades = 0
        self.episode = self.episode + 1 #increment episode number
        self.current_index = 0
        self.df['action'] = 0 #reset action history
        self.df['profit'] = 0 #reset profit history
        return self.__get_state()
    def render(self, mode='human', close=False):
        if self.current_index % 100 == 0:
            if self.episode_old != self.episode:
                self.episode_old = self.episode 
            print("episode: {}, time: {}, gain: {:.2f}, trades: {}"
                        .format(self.episode, self.current_index, self.gain, self.trades))

    def __calculate_asset(self, price):
        if self.pos[0] == 0:
            return self.account
            return self.account + * price

    def __get_state(self):
        state_line = self.df[self.data_features].iloc[self.rand_start_index + self.t]

        state = np.append(state_line.to_numpy(),self.pos,axis=None).reshape(1,self.state_size)

        return state

    def correct_action(self,action_id):
        if self.pos[0] == 0 and (self.actions[action_id] == -1):
            action_id = 1
        elif self.pos[0] == 1 and (self.actions[action_id] == 1):
            action_id = 1
        return action_id

Now, we should follow some steps in order to use our custom class. In your working director, create a folder namely ‘binanceLib’. In this folder, create a file named ‘’, copy and paste the code above. In the same folder, you should also create another file named ‘’. This file should include the line below:

from binanceLib.binanceEnvironment import binanceEnvironment

Now, let us talk about how we use the custom class. In your working (training or testing) file, first import some gym methods. Then, register and create the environment. Finally, our environment is ready for algorithmic trading. The sample usage codes are given below:

from gym.envs.registration import register, make
import gym

ENV_NAME = 'binanceEnvironment-v0'

if ENV_NAME in gym.envs.registry.env_specs:
    del gym.envs.registry.env_specs[ENV_NAME]


env = make(ENV_NAME) #make the environment

file_name = 'ltcusdt-1hour-out.csv'
feature_list = ['rate of change','bandwidth','rsi','percent b','coefficient of variation']
trade_cost = 0.5

env.init_file(file_name, feature_list, trade_cost)

That’s all for today. Enjoy your trading!

Sharing is caring!

6 thoughts on “Creating a custom gym (OpenAi) environment for algorithmic trading

  1. If you get this error
    AttributeError: module ‘binanceLib’ has no attribute ‘binanceEnvironment’
    try to rename the init file
    mv binanceLib/ binanceLib/

  2. Very cool project! But when looking at the profit calculation you seem to not take into account the trading cost. Trading cost are a percentage of the total traded amount so you incur this cost every time you trade.

    Proposed solution:
    # update account and total number of coins according to action
    if (self.actions[action_id] == 1): # if action is buy = + (self.account / price) * (1-self.trade_cost/100)
    self.account = 0
    elif (self.actions[action_id] == -1): # if action is sell
    self.account = self.account + (price * * (1-self.trade_cost/100) = 0

    As you will notice, increasing the trading cost heavily influences the results. Binance uses a 0.1% trading fee but you might want to work with 0.3% to 0.4% taking into account slippage. Good luck !

    1. I calculate it at the end. For instance, if your profit is 30% and the number of trades is 150, then I say 130$ – 150*0.1$ = 115$.

  3. I get this error

    TypeError: len is not well defined for symbolic Tensors. (activation_3/Softmax:0) Please call `x.shape` rather than `len(x)` for shape information.

Leave a Reply

Your email address will not be published. Required fields are marked *