In this post, I’m going to briefly present the deep Q-learning method which is the combination of reinforcement learning (RL) and deep neural networks. After that, I’m going to show how I applied this technique to algorithmic trading. At the end of the post, I’m going to share the python implementation of the training procedure. Before starting, I strongly recommend you to read the previous posts given below since you will need the classes and functions that were presented previously. You should follow the steps in the following order:

- For obtaining the raw data, please see the post ‘How to obtain raw trading data from binance‘
- For feature extraction, please see ‘A complete class for feature extraction‘
- For gym environment, please see ‘Creating a custom gym (OpenAI) environment for algorithmic trading‘

Q-learning is a kind of reinforcement learning technique which is model free. In Q-learning, the goal is to maximize the cumulative future reward by selecting the appropriate actions according to the observed states. If the process is Markov, the technique finds the optimal policy. The ‘Q’ letter represents the quality of the action. The Q table in the algorithm is updated according to the following rule:

```
Q(s_t,a_t) <-- (1-alpha) * Q(s_t,a_t) + alpha * (r_t + gamma * max_a Q(s_t+1,a))
alpha: learning rate
gamma: discount factor
a_t: current action
s_t: current state
s_t+1: next state
r_t: current reward
```

Notice that in order for this algorithm to work, it requires that the states must be discrete. In some problems having small number of discrete states, this approach works. However, most of the real world problems, such as financial trading, the states are not dicrete and there are plenty of states. In such problems, classical Q-learning algorithm is not applicable. However, we can adapt this technique to such problems by function approximation. One method is to use artificial neural networks. The problem with a neural network is that it is a non-linear approximator which may result in divergence and unstability in the learning stage. To overcome this, Google’s Deep Mind team developed a method namely ‘experience replay’ in 2013. Instead of feature extraction, they use a convolutional neural network which mimics receptive fields of human visual cortex. Using this method, they achieved human level control for most of the control environments.

In this post, we are not going to use convolutional neural networks to extract the features. Instead, we will use our own features as I described in the previous posts. However, we are still going to use the deep Q-networks (DQN) for algorithmic trading. For this, first, we are going to construct a simple neural network model. You will need some keras libraries. If you don’t have them, install them by typing the following lines in your terminal or conda prompt:

```
pip install keras
pip install keras-rl
```

In your working directory, create a file named ‘binanceCreateModel.py’. In this file, using your favorite editor, copy and paste the following codes:

```
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Dropout
from keras.constraints import max_norm
def create_model(env):
dropout_prob = 0.8 #aggresive dropout regularization
num_units = 256 #number of neurons in the hidden units
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.input_shape))
model.add(Dense(num_units))
model.add(Activation('relu'))
model.add(Dense(num_units))
model.add(Dropout(dropout_prob))
model.add(Activation('relu'))
model.add(Dense(env.action_size))
model.add(Activation('softmax'))
print(model.summary())
return model
```

As you can see, our simple network contains three dense layers. The number of units in the hidden layers is 256. We only apply the dropout regularization after the second layer. The dropout regularization rate is very aggressive since the input trading data is very noisy. By selecting such a value, we aim to minimize the overfitting and prevent the neural network to memorize the specific states. Here, you may try some other network models.

Now, it is time to create the training script file. As we mentioned previously, we are going to use a DQN agent. We don’t need to implement this algorithm because the ‘keras-rl’ library already has it. In your working folder, create a file named ‘binanceTrain.py’. Using your editor, copy and paste the following codes.

```
from binanceFeatures import FeatureExtractor
from binanceCreateModel import create_model
from keras.optimizers import Adam
from rl.agents.dqn import DQNAgent
from rl.policy import EpsGreedyQPolicy
from rl.memory import SequentialMemory
import gym
from gym.envs.registration import register, make
ENV_NAME = 'binanceEnvironment-v0'
if ENV_NAME in gym.envs.registry.env_specs:
del gym.envs.registry.env_specs[ENV_NAME]
register(
id=ENV_NAME,
entry_point='binanceLib:binanceEnvironment',
max_episode_steps=10000,
)
env = make(ENV_NAME)
input_file = 'ltcusdt-1hour.csv'
output_file = 'ltcusdt-1hour-out.csv'
w_file_name = 'ltcusdt-1hour-weights.h5f'
feature_extractor = FeatureExtractor(input_file, output_file)
feature_extractor.extract()
feature_list = feature_extractor.get_feature_names()
trade_cost = 0.5
env.init_file(output_file, feature_list, trade_cost)
num_training_episodes = 30
model = create_model(env)
memory = SequentialMemory(limit=5000, window_length=1)
policy = EpsGreedyQPolicy()
dqn = DQNAgent(model=model, nb_actions=env.action_size, memory=memory,
nb_steps_warmup=50, target_model_update=1e-2, policy=policy)
dqn.compile(Adam(lr=1e-3), metrics=['mse'])
dqn.fit(env, nb_steps=num_training_episodes * env.sim_len, visualize=True, verbose=0)
# Here, we save the final weights.
dqn.save_weights(w_file_name, overwrite=True)
```

Finally, we are ready to train our agent! Notice that the training process contains 30 episodes. You may try different values, but make sure that you don’t overtrain! When the training process completes, the script automatically saves the learned weights into a file. In the next post, we are going to test our network and see how much money we are going to earn. Enjoy your trading!

Hi,

I am getting the error

TypeError: len is not well defined for symbolic Tensors. (activation_3/Softmax:0) Please call `x.shape` rather than `len(x)` for shape information.

for the below line. Tried with tensorflow 1.14 and tensorflow 2.0 as well ( got a different error with 2.0). Could you please share the version of the addons ?

dqn.fit(env, nb_steps=num_training_episodes * env.sim_len, visualize=True, verbose=0)

Cheers.