How to combine numerical and categorical columns in spark pca
vectorAssembler = VectorAssembler(inputCols = ['Clump_Thickness', 'Cell_Size', 'Cell_Shape',\ 'Marginal_Adhesion', 'Epithelial_Cell_Size', 'Normal_Nucleoli', 'Bland_Chromatin',\ 'Bare_Nuclei', 'Mitoses', 'Class'], outputCol = 'features') transformed_df = vectorAssembler.transform(df) transformed_df=transformed_df.select("features") transformed_df.show()