首页科研项目在研项目

科研项目

联系我们

more »
  • 联系我们
  • 网络通信与数据库技术实验室
  • 电话:0411-62274392
  • 地址:大连市经济技术开发区图强街321号 大连理工大学软件学院
  • 邮编:116620
  • E-mail:ubinec@163.com

在研项目

基于深度学习的员工离职原因分析及预测(企业合作)

1Data Pre-processing

The motivation behind the resignation is often complex. There are significant differences in the amount of income, peer relationships, company prospects, career planning, and so on, in different companies. Hence, we will collect multi-source heterogeneous big data from the following three directions:

1.1 Attendance & Entrance data processing

   Since each employee corresponds to multiple data,we first summarizing the records of each employee using ID and delete unavailable data. Then, the time and other attributes are processed into an “OneHot” format, which represent data by “Int format” numbers.

In order to representing changes in employee status dynamically, we divide the employee attendance information into three stages in chronological order. From the first record to the last record, the information is averaged into three stages. Specifically, if there are employees with very less attendance information, the information is useless and will be discarded.

Then we extract some attributes for each stage. We fetch data from attendance information. By utilizing the original information and the generated features, the following characteristic variables are constructed for each stage:

totalCount: Total count of attendances.

intervMeanTime: Average number of attendances.

earlyCount: Earliest attendance time – Limited time.

timeMean: Average number of “Attendance time – Limited time”.

meanM: Average number of attendances per month.

fnCount1: totalCount (Stage2) – totalCount (Stage1).

fnCount2: totalCount(Stage3) – totalCount (Stage2).

Entrance data have similar attributes to attendance data. Therefore, the process mode of entrance data is similar to that above.

1.2 Employee, Resume & Personal data processing 

We will program text reader engine to extract resume information and also filter out invalid information for data preprocessing. After that, we union employee data, resume data and personal data model together.

1.3 Relevant data processing

We will collect the employee and software industry relevant data through the Internet. Relevant data includes Industry average salary, Average promotion period and Socioeconomic. After that, the attributes of these data are integrated to the fusion data form. Finally, we get social networking information of employees to understand emotional positive correlated information.

1.4 Data classification

We summarize all the relevant data collected. Afterwards, in order to training and filtering the learning model better, the relevant data are classified into three categories: Training dataset, Validation dataset and Test dataset. The training dataset has more data, while the other two datasets have less data. As shown in Figure 4 (Upper), each category of data is used for different learning stages, making the model not over-fitting. However, new employee data is regarded as the input of model, not involved in training stage.


2       Employee Portrait Learning Model & Resigned Risk Predication Model

In this case, the static data is suitable for Linear Regression, Random Forest and Neural Network method. Other dynamic data such as attendance information, salary and position change is more suitable for Markov Chain method. As shown in Figure 4 (Lower), we use labeled employee data for training, while unlabeled data for prediction.

In addition, the decision model in the system is more suitable for Reinforcement Learning. Therefore, we decide to combine variety machine learning methods to process multi-source heterogeneous big data sets. Model input includes personal information and all correlated fusion data. After the data is processed, the deep study learning model will give the predicted resignation risk analysis and compensation measure degree.