خلاصه مقاله:
A correlation ranking procedure is proposed for selection of factors in principal component-artificial neural network (PC-ANN). The model was applied in the aqueous Solubility (-logS) evaluation of diverse Organic molecules. Experimental values for the observed -logS values for organic molecules can range from about -0.380 (oxalic acid) to 10.410 (2,2',3,3',4,5,5',6,6'-PCB) -log units. Ten different Sh indices were calculated for each molecule. Principal component analysis of the Sh data matrix showed that the seven PCs could explain 99.97% of variances in the Sh data matrix. The extracted PCs were used as the predictor variables (input) for PCR and ANN models. The ANN model could explain 97.63% of variances in the solubility data, while the value obtained from PCR procedures were 84.27%. For the PCR studies, the data set was divided into a training set of 320 compounds for model building and an external prediction set of 60 compounds for model validation. Both subsets were chosen to ensure that a diverse set of compounds was present. For the ANN studies, a cross-validation set of 50 compounds was chosen, leaving 270 compounds in the training set, and the prediction set remained the same. Models to predict the solubility is constructed using PCR and PC-ANN with errors comparables to the experimental errors of the solubility data. The root mean-square-errors (RMSerror) associated with the calibration, prediction, and validation set compounds used for the PC-ANN model were 0.314, 0.450, and 0.314 -logS units, respectively.
کلمات کلیدی: QSPR, Topological Indices, Aqueous Solubility, PCR, PC-ANN.
Correlation ranking procedure for factor selection in PC-ANN modeling and application to aqueous Solubility evaluation