top of page

solar power predıctıon

As it is known, renewable energy is a type of energy that is continuous and positive for the future of the world. The importance of using machine learning algorithms for the most efficient provision of renewable energy, which is the most important candidate for the energy problem in the future of the world, is simply demonstrated in this project.

solar_page.png

This project can be defined as a project to create a machine learning regression model that predicts the energy produced by solar way, which is a renewable energy type. The models to be created for this problem are designed to best predict the power values that a solar panel is expected to produce. By comparing these estimation results with their production, unexpected performance deficiencies of the panel can be detected and resolved.

In the dataset used in the project, the real-time power generation amount (Power(kW)) of a solar panel belonging to Enerjisa Company between 01.01.2019 and 14.08.2021 is given in a 1-hour period.

ss2.png

In addition to real-time power generation; Some SCADA data of the wind turbine were shared for the date range 01.01.2019 – 14.12.2021.

ss3.png

MAIN STEPS  OF THE PROJECT:

  1. Preprocessing:

  • Missing values are determined and handled with time-series based filling technique.(Data for this project is generally time series data and order of the data has importance, so if any missing value is determined, it filled with a value which is the nearest value according to datetime.)

  • Some datatypes are converted to approtiate one.(object to datetime or object to float etc.)

  • Power generation data is time-series data, so dates of data must be continuous and if any missing date exists, the import patterns of data can be missed. So some checks are applied to data to control it.

​   2.  Feature Engineering:

  • New features are generated by using other features with different calculation and combinations.     

  • Some new features created:

  • time-features (Day, Dayofweek, Weekofyear etc.): It includes time-series information inside.

  • wind_direction_sin,wind_direction_cos: ‘WindDirection’ column exist in the data, but i noticed that sun panels generally set with some angle, so vertical and horizontal force of wind direction can carry more information than previous one.

  • Is_day: It checks if current time is daytime or not. I thought that it is candidate for being one of the moest discriminative features, because when it is not a daytime, energy generation is zero and it can be prevent to noisy values about it.

  • Sun_angle: Solar panels can generate energy when it is a datetime. But it depends on time period of day, and it is changeable hour by hour. So angle of sun information is extracted from an api called suntime.

​   3.  Feature Selection:

  • After the feature engineering part, we have final columns which are candidate for train a model. So, correlation matrix is created with all candidate features. I checked to correlation matrix for detect any highly correlated variable and at the same time,I checked which predictor features(inputs) correlated most with result variable(output).

  • One of the pairs which are highly correlated features and features which has very low correlation between output variable are dropped.

ss_corr.png

​   4.  Modeling:

  • Train and test data are seperated.

  • For choosing the best model algorithm, I used AutoML frame which gets parameters like metric(rmse for this project), estimator list(I choose models with generally has high accuracy or low loss which are lgbm,xgboost,catboost) and task(regression for this project) and after many iterations with trying and comparing many parameter combination groups, the best model with its best hyperparameters are given by AutoML function.

  • After the process I mentioned above, best estimator was Lightgbm.

ss4.png
  • Training part is finished and evaluation of trained model, RMSE of model is 17.19.

ss5.png
bottom of page