Dimensionality reduction- One way to deal w Big files by dec. its size

image

Now, in above image last, I ask- what might be that PCA component1 or component2 in X or Y axis respectively?

That is covariance matrix elements.

image

So, how do we decide which is principal component1 and which is principal component2?

It is by:

image

euclidian vanya tye shortest distance ho

Hopefully, no math headache, code as below, however math process bujheko ramro

image

image

So, here, we see visually how pca reduces data. NOTE: dimensional redn is not cols reduction, which I thought

teso vaye, ya ta principal component1 matra dekhax, feri stratum ko ma ta diuta PCA banayera khichirathyo. correct fix my this understandg loophole

_PS: Heatmaps, t-SNE Plots, Multi-dimensional Scaling are just alternatives to Dimensional Reduction method

yt Zach Star, Josh Starmer_



Nothing to do w above context are posts from below on: stratification means breaking down population to smaller subsets ie samples such that based on certain criteria/ features, each subset almost proportionately reflects similar distribution of criteria/ features.

eg. if I have more red about 90%, less green about 10% balls, then in smaller bowl I put red n green bowls in similar ratio. Stra s’ud be done when population is v disproportionately distributed.


another context - xgboostA2Z

image


logistic-Regressn-2

Logistic-Regressn-moreThan2



data leakages in models fools one in such bad way that you are proud, about to yell that model is 90% accurate this, that

but in-fact 90% accuracy was not because of ml accurately tracing underlyling functions aka eqn that is mapping x to y.

I will give simple example: Lets say, we are trying to predict if Joe says yes or no, if we ask him to eat sthg

based on X:

then, Y aka (Joe’s ans in yes or no to eat sthg) =

f(‘askingHimEatPizzaOrRice’, ‘AreWeAskingHimAloneOrGroup’,’HisAnsofWillYouDevourSomeFood’, )

If we train this above equation on train data’s X, Y, then this ML will give 100% accuracy on test data

WHY? - Its because whether he will eat or not is already pre-answered by Whether he will devour some Food. This type of leakage is leaking correct prediction or ground truth ie Y into X of train data.

so, knowing that this feature can be x not y ,or knowing that this feature can or cant be x or pre-checking this x is in fact just aliased y is very important. This is called feature engineering or feature selection.

btw, our 2019 mistake was, as far as I could infer was: leaking info from future into past- it was time series data

other leakage causes are:



another context: image

These above algos are present in Scikit-learn libraries, which I stick to. so, idc about alternate libraries like keras, pyTorch etc.

image

Compiler mai sabai lekhya hunchh ta. Patiently padha Vivek image