Making investment decisions under uncertainty has always been a challenge for investors. There have been extensive number of studies in the literature, proposing methods under the portfolio optimization framework. The majority of the theoretical and practical approaches focused on two different perspectives, return and risk. However, optimizing with respect to these variables has been a notoriously difficult task due to the stochastic and unpredictable nature of the financial time series. To avoid such difficulties, investors could invest in portfolios managed by professionals according to predetermined criteria. One common approach for this is to invest in index funds. They provide a generic risk and return profile via market-capitalization weighting method, thus, they are good fit for many investors. In this study, we seek to gain better performance than this weighting method. Accordingly, we propose two reinforcement learning agents to solve this portfolio optimization problem under Markov decision process framework. These models do not have a prior knowledge about the stock market environment, they directly learn by trial-and-error and output policy as the portfolio weighting model. We train our models with a dataset composed of price indicators and fundamental company information. The proposed models were tested with BIST30 market constituents in a 1 year time duration. We show that these agents can generate 45% greater returns and 1.17 greater Sharpe ratio compared to BIST30 market index. While generating higher returns, we also encourage agents to diversify the portfolio through a modified reward function to increase real life applicability.