《python高維數據分析》課件-第4章_第1頁
《python高維數據分析》課件-第4章_第2頁
《python高維數據分析》課件-第4章_第3頁
《python高維數據分析》課件-第4章_第4頁
《python高維數據分析》課件-第4章_第5頁
已閱讀5頁,還剩202頁未讀 繼續免費閱讀

下載本文檔

版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領

文檔簡介

Chapter4

Partial

Least

Squares

Analysi4.1Basic

Concep4.2NIPALS

and

SIMPLS

Algorithm4.3Programming

Method

of

Standard

Partial

Least

Square4.4Example

Application4.5Stack

Partial

Least

Squares

4.1BasicConcep

4.1.1PartialLeastSquares

ConsiderthegeneralsettingofalinearPLSalgorithmtomodeltherelationbetweentwodatasets(blocksofvariables).DenotebyX?RNanN-dimensionalspaceofvariablesrepresentingthefirstblockandsimilarlybyy?RNaspacerepresentingthesecondblockofvariables.PLSmodelstherelationsbetweenthesetwoblocksbymeansofscorevectors.

After

observing

n

data

samples

from

each

block

of

variables,

PLS

decomposes

the

(n×N)matrix

of

zero-mean

variables

X

and

the

(n×M)matrix

of

zero-mean

variables

Y

into

theform

Graphically,itcanbeshownasFig.4.1,wheretheT,Uare(n×p)matricesofthepextractedscorevectors(components,latentvectors),the(N×p)matrixPandthe(M×p)matrixQrepresentmatricesofloadingsandthe(n×N)matrixEandthe(n×M)matrixFarethematricesofresiduals.ThePLSmethod,whichinitsclassicalformisbasedonthenonlineariterativepartialleastsquares(NIPALS)algorithm,findsweightvectorsw,csuchthat

where

cov(t,u)

=tTu/n

denotes

the

sample

covariance

between

the

score

vectors

t

andu.

The

NIPALS

algorithm

starts

with

random

initialization

of

the

Y-space

score

vector

uand

repeats

a

sequence

of

the

following

steps

until

convergence.

Notethatu=yifM=1,thatis,Yisaone-dimensionalvectorthatwedenotebyy.InthiscasetheNIPALSprocedureconvergesinasingleiteration.

Itcanbeshownthattheweightvectorwalsocorrespondstothefirsteigenvectorofthefollowingeigenvalueproblem

TheX-andY-spacescorevectorstanduarethengivenas

wheretheweightvectorc

isdefineinsteps(4)and(5)ofNIPALS.Similarly,eigenvalueproblemsfortheextractionoft,uorcestimatescanbederived.TheuserthensolvesforoneoftheseeigenvalueproblemsandtheotherscoreorweightvectorsarereadilycomputableusingtherelationsdefinedinNIPALS.

4.1.2Form

of

Partial

Least

Squares

PLS

is

an

iterative

process.After

the

extraction

of

the

score

vectorst,u

the

matricesXand

Yare

deflated

by

subtracting

their

rank-one

approximations

based

on

t

and

u.Different

forms

of

deflation

define

several

variants

of

PLS.

Usingequation(4.1.1)thevectorsofloadingspandqarecomputedascoefficientsofregressingXontandYonu,respectively

4.1.2.1PLS

Mode

A

The

PLS

Mode

A

is

based

on

rank-one

deflation

of

individual

block

matrices

using

thecorresponding

score

and

loading

vectors.

In

each

iteration

of

PLSModeA

the

Xand

Ymatrices

are

deflated

4.1.2.2PLS1,PLS2

PLS1(one

of

the

block

of

dataconsists

of

a

single

variable)

and

PLS2(both

blocksare

multidimensional)

are

used

as

PLS

regression

methods.The

sevariants

of

PLS

are

themost

frequently

used

PLS

approaches.

The

relationship

between

Xand

Y

is

asymmetric.Two

assumptions

aremade:i)

the

score

vectors{ti}pi=1are

good

predictors

of

Y;pdenotes

the

number

of

extracted

score

vectors—PLS

iterations,ii)

a

linear

inner

relationbetween

the

scores

vectors

t

and

u

exists;that

is,

where

D

is

the

(p×p)

diagonal

matrix

and

Hdenotes

the

matrix

of

residuals.

The

asymmetricassumption

of

the

predictor-predicted

variable(s)

relation

is

transformed

into

a

deflationscheme

where

the

predictor

space,

sayX,

score

vectors{ti}pi=1

are

good

predictors

of

Y.The

score

vectors

are

then

used

to

deflate

Y,that

is,

a

component

of

the

regression

of

Y

ont

is

removed

from

Y

at

each

iteration

of

PLS

4.1.2.3PLS-SB

Asoutlinedattheendofthepreviousparagraphthecomputationofalleigenvectorsofequation(4.1.3)atoncewoulddefineanotherformofPLS.Thiscomputationinvolvesasequenceofimplicitrank-onedeflationsoftheoverallcross-productmatrix.ThisformofPLSwasusedinandinaccordancewithitisdenotedasPLS-SB.IncontrasttoPLS1andPLS2,theextractedscorevectors{ti}pi=1areingeneralnotmutuallyorthogonal.

4.1.3PLS

Regression

As

mentioned

in

the

previous

section,

PLS1

and

PLS2

can

be

used

to

solve

linearregression

problems.

Combining

assumption(4.1.5)

of

a

linear

relation

be

tween

the

scoresvectors

t

and

uwith

the

decomposition

of

the

Ymatrix,

equation(4.1.1)

can

be

written

as

This

defines

the

equation

where

CT=DQT

nowdenotesthe(p×M)matrix

of

regression

coefficients

and

F*=HQT+F

is

the

residual

matrix.

Equation(4.1.6)is

simply

the

decomposition

of

Yusingordinary

least

squaresreg

ression

with

orthogonal

predictors

T.

Wenowconsiderorthonormalisedscorevectorst,thatis,TTT=I,andthematrixC=YTTofthenotscaledtolengthoneweightvectorsc.Itisusefultoredefineequation(4.1.6)intermsoftheoriginalpredictorsX.Todothis,weusetherelationship

wherePisthematrixofloadingvectorsdefinedinequation(4.1.1).Pluggingthisrelationintoequation(4.1.6),weyiel

Forabetterunderstandingofthesematrixequations,theyarealsogiveningraphicalrepresentationinFig.4.2.

whereBrepresentsthematrixofregressioncoefficients

Forthelastequality,therelationsamongT,U,WandPareused[12,10,17].NotethatdifferentscalingsoftheindividualscorevectorstandudonotinfluencetheBmatrix.FortrainingdatatheestimateofPLSregressionis

and

for

testing

data

we

have

where

Xt

and

Tt=XtXTU(TTXXTU)-1

represent

the

matrices

of

testing

data

and

scorevectors,respectively.

4.1.4Statistic

From

thematrices

of

residuals

Ehand

Fhsums

of

squares

canbe

calculated

asfollows:

the

total

sum

of

squares

over

a

matrix,

the

sums

of

squares

over

rows,and

thesums

of

squares

overcolumns.These

sums

of

squares

can

be

used

to

construct

variance-like

estimators.The

statistical

propertiesof

these

estimatorshavenotundergone

arigorous

mathematical

treatment

yet,but

some

properties

can

be

understood

intuitively.

Sumsofsquaresoverthecolumnsindicatetheimportanceofavariableforacertaincomponent.Sumsofsquaresovertherowsindicatehowwelltheobjectsfitthemodel.Thiscanbeusedasanoutlierdetectioncriterion.IllustrationsaregiveninFig.4.3(a)forvariablestatisticsandinFig.4.3(b)forsamplestatistics.

AnadvantageofPLSisthatthesestatisticscanbecalculatedforeverycomponent.Thisisanidealmeansoffollowingthemodel-buildingprocess.Theevolutionofthesestatisticscanbefollowed(asshowninFig.4.3(a)and(b))asmoreandmorecomponentsarecalculatedsothatanideaofhowthedifferentobjectsandvariablesfitcanbeobtained.Incombinationwithacriterionformodeldimensionality,thestatisticscanbeusedtoestimatewhichobjectsandvariablescontributemainlytothemodelandwhichcontributemainlytotheresidual.

4.2NIPALS

and

SIMPLS

Algorithm

4.2.1NIPALS

In

this

section,

the

transpose

of

the

matrix

is

represented

by

the

superscript

4.2.1.1Theory

The

PLS

algorithm

as

described

in

this

section

will

be

called

the“standard”PLS

algorithm.It

has

been

presented

in

detail

elsewhere[3-6].

For

some

alternative

implementations

of

PLSsee

e.g.references[7-9].The

first

step

in

standard

PLS

is

to

center

the

data

matrices

Xand

Y,

giving

X0,and

Y0,

respectively.

Then

a

set

of

A

orthogonal

Xblock

factor

scoresT=[t1,t2,…,tA]and

companion

Yblock

factor

scores

U=[u1,u2,…,uA]

arecalculated

factor

by

factor.

The

first

PLS

factorst1,and

u1,are

weighted

sumsof

thecentered

variables:t1=X0w1,and

u1=Y0q1,

respectively.

Usually

theweights

aredetermined

via

the

NIPALS

algorithm.This

is

the

iterative

sequence:

OncethefirstXblockfactort1isobtainedoneproceedswithdeflatingthedatamatrices.ThisyieldsnewdatasetsX1andY1,whicharethematricesofresidualsobtainedafterregressingallvariablesont1

4.2.1.2NIPALS-PLSFactorsinTermsofOriginalVariables

Eachoftheweightvectorswa,a=2,3,…,A,usedfordefiningtheassociatedfactorscores,appliestoadifferentmatrixofresidualsXa-1,

and

not

to

the

original

centered

data

X0.This

obscures

the

interpretati

on

of

the

factors,mainly

because

one

looses

sight

of

what

is

in

the

depletedmatrices

Xa,as

one

goes

tohigher

dimensions,

a≥1.Some

X

variables

are

used

in

the

first

factors,

others

only

muchlater.Therelationbetweenfactorsandvariablesisbetterdisplayedbytheloadingspa(a=1,2,…,A).Indeed,theweightvectors,collectedinthep×AmatrixW,havefoundlessuseininterpretingPLSregressionmodelsthantheloadingvectors.

Itisthereforeadvantageoustore-expresstheNIPALS-PLSfactorstaintermsoftheoriginalcentereddataX0,say

or,collectingthealternativeweightvectorsinap×AmatrixR=[r1,r2,…,rA

],

ThefactorscoresTcomputedviaNIPALS-PLS,i.e.viadepletedXmatrices,canbeexpressedexactlyaslinearcombinationsofthecenteredXvariables,sincealldeflatedmatricesXaandfactorscoresta,a=1,2,…,A,lieinthecolumnspaceofX0.Thus,RcanbecomputedfromtheregressionofTonX0:

where

P=[p1,p2,…,pA]

is

the(p×A)

matrix

of

factor

loadings

and

the

superscript-indicates

any

generalized

inverse

and+indicates

the

uniqueMoore-Penrose

pseudo-inverse.We

also

have

the

relation.

Sincer'bpa=r'bX'0ta/(t'ata)=t'ata/(t'ata)=δab.HereIAisthe(A×A)identitymatrixandδabisKronecker’sdelta.ThusRisageneralizedinverseofP'.AnotherexpressionforRis.

whichfollowsfromtheobservationthatRandWsharethesamecolumnspaceandthatP'Rshouldbeequaltotheidentitymatrix.

Theexplicitcomputationofthe(pseudo-)inversematricesineqnuation(4.2.15)and(4.2.17)detractssomewhatfromthePLS-NIPALSalgorithm,thatisotherwiseverystraightforward.Hiiskuldssongivesthefollowingrecurrentrelation.

starting

with

r1=w1,

How

ever,

this

relation

depends

on

the

tridiagonal

structure

of

P'Pand

is

only

correct

for

univariate

Y

=y(m=1,PLS1).Equation(4.2.19)

and

(4.2.20)form

a

set

of

updating

formulas

that

is

generally

applicable:

starting

with

G1=Ip

.Note

that

the

vectors

raare

not

normalized,

in

contrast

to

theweight

vectors

wa,

Thus

in

equation

(4.2.13),

neither

ta

nor

raare

normalized.

WhentheRweightsareavailable,aclosedformmultipleregression-typepredictionmodelcanbeobtainedmorereadily:

Here,B

PLS=Rdiag(b)Q'=W(P'W)-1diag(b)Q'isthep×msetofbiasedmultivariateregressioncoefficientsobtainedviaPLSregression.

ThemodificationweproposeleadstothedirectcomputationoftheweightsR.InthiswayweavoidtheconstructionofdeflateddatamatricesX1,X2,…,XAandY1,Y2,…,YAandby-passthecalculationofweightsW.Theexplicitcomputationofmatrixinversesasinequation(4.2.15)or(4.2.17)isalsocircumvented.ThenewlydefinedRissimilar,butnotidentical,tothe“standard”Rintroducedinequation(4.2.14).Infact,ournewRcontainsnormalizedweightvectorsjustasWinstandardPLS.

Thus,thetaskwefaceistocomputeweightvectorsraandqa(a=1,2,…,A),whichcanbeapplieddirectlytothecentereddata:

Theweightsshouldbedeterminedsuchastomaximizethecovarianceofscorevector

taandua

undersomeconstraints.(Thetermcovariancewillbeusedsomewhatlooselyandinterchangeablywiththetermscross-productorinnerproduct;theymerelydifferbyascalarfactorn-1).Specifically,fourconditionscontrolthesolution:

(1)maximizationofcovariance

(2)normalizationofweightsra:

(3)normalizationofweightsqa:

(4)orthogonalityoftscores

4.2.2.2SIMPLSAlgorithm

ItisexpedienttocomputeSa+1fromitspredecessorSa.ToachievetheprojectionontothecolumnspaceofPa

willbecarriedoutasasequenceoforthogonalprojections.ForthisweneedanorthonormalbasisofPa,sayVa=[v1,v2,…,va].VamaybeobtainedfromaGramSchmidtorthonormalizationofPa,i.e.,

starting

with

Vi=v1

∝p1.An

additional

simplification

is

possible

when

the

response

isunivariate

(m=1,PLS1).In

this

case,

one

may

employ

the

orthogonality

properties

of

P,viz.,p'bpa=0,forb≤a-2.ThesepropertiescarryovertotheorthonormalizedloadingsV,i.e.,p'bVa=0,forb≤a-2.Thus,orthogonalityofpa

withrespecttoVa-2isautomaticallytakencareofandequation(4.2.32)simplifiesto

Theprojectionontothesubspacespannedbythefirstaloadingvectors,pa(p'a

pa)-1p'a,cannowbereplacedbyVaV'aandtheprojectionontheorthogonalimplementPa⊥byIp-VaV'a=∏a1(IpVbV'b).Thus,utilizingtheorthonormalityofV,theproductmatricesSa(a=1,2,…),aresteadilydepletedbyprojectingouttheperpendiculardirectionsva:

4.2.2.3Fitting,PredictionandResidualAnalysis

ForthedevelopmentofthetheoryandalgorithmofSIMPLSitwasconvenienttochoosenormalizedweightvectorsra.Thischoice,however,isinnowayessential.Wewillnowswitchtoanormalizationofthescores

tainstead,sincethisconsiderablysimplifiessomeoftheensuingformulas.ThecodegivenintheAppendixalreadyusesthelatternormalizationscheme.Thusweredefinera=ra/|X0ra|andta=ta/|ta|,givingunit-lengthscorevectorstaandorthonormalT:T'T=IA.

Predictedvaluesofthecalibrationsamplesarenowobtainedas

giving

Fornewobjectsweemploythestraightforwardpredictionformula

Thefactorscoresta*=x0*raandleverageh*=∑ta*2maybecomputedfordiagnosticpurposes,e.g.,toassesswhetherornotthenewobjectlieswithintheregioncoveredbythetrainingobjects.

4.2.2.4DetailedSIMPLSAlgorithm

4.3Programming

Method

of

StandardPartial

Least

Squares

4.3.1Cross-validation

Learning

the

parameters

of

a

prediction

function

and

testing

it

on

the

same

data

is

amethodological

mistake:

a

model

that

would

just

repeat

the

labels

of

the

samples

that

ithas

just

seen

would

have

a

perfect

score

but

would

fail

to

predict

any

thing

useful

on

yet-unseen

data.

This

situation

is

called

overfitting.

To

avoid

it,it

is

common

practice

whenperforming

a

(supervised)

machine

learning

experiment

to

hold

out

part

of

the

availabledata

asatest

set

X_test,

Y_

test.

Note

that

the

word“experiment”

is

not

intended

todenote

academic

use

only,

because

even

in

commercial

setting

smachine

learning

usuallystarts

out

experimentally.

Whenevaluatingdifferentsettings(“hyperparameters”)forestimators,thereisstillariskofoverfittingonthetestsetbecausetheparameterscanbetweakeduntiltheestimatorperformsoptimally.Thisway,knowledgeaboutthetestsetcan“leak”intothemodelandevaluationmetricsnolongerreportongeneralizationperformance.Tosolvethisproblem,yetanotherpartofthedatasetcanbeheldoutasaso-called“validationset”:trainingproceedsonthetrainingset,afterwhichevaluationisdoneonthevalidationset,andwhentheexperimentseemstobesuccessful,finalevaluationcanbedoneonthetestset..

However,bypartitioningtheavailabledataintothreesets,wedrasticallyreducethenumberofsampleswhichcanbeusedforlearningthemodel,andtheresultscandependonaparticularrandomchoiceforthepairof(train,validation)sets.

Asolutiontothisproblemisaprocedurecalledcross-validation(CVforshort).Atestsetshouldstillbeheldoutforfinalevaluation,butthevalidationsetisnolongerneededwhendoingCV.

4.3.1.1Cross-validation

Iterators

For

i.i.d.

Data

Thefollowingcross-validatorscanbeusedinsuchcases.

Note

Whileii..d.dataisacommonassumptioninmachinelearningtheory,itrarelyholdsinpractice.Ifoneknowsthatthesampleshavebeengeneratedusingatime-dependentprocess,it’ssafertouseatime-seriesawarecross-validationscheme.Similarlyifweknowthatthegenerativeprocesshasagroupstructure(samplesfromcollectedfromdifferentsubjects,experiments,measurementdevices)itsafertousegroup-wisecross-validation

1.K-Fold

Inthebasicapproach,calledK-FoldCV,thetrainingsetissplitintoksmallersets(otherapproachesaredescribedbelow,butgenerallyfollowthesameprinciples).ThefollowingprocedureisfollowedforeachoftheK“folds”:

(1)Amodelistrainedusingk-1ofthefoldsastrainingdata;

(2)theresultingmodelisvalidatedontheremainingpartofthedata.

Exampleof2-foldcross-validationonadatasetwith4samples:

Hereisavisualizationofthecross-validationbehaviorinFig.4.4.NotethatK-Foldisnotaffectedbyclassesorgroups.

2.RepeatedK-Fold

RepeatedK-FoldrepeatsK-Foldntimes.ItcanbeusedwhenonerequirestorunK-Foldntimes,producingdifferentsplitsineachrepetition

Exampleof2-foldK-Foldrepeated2times:

3.LeaveOneOut(LOO)

LeaveOneOut(orLOO)isasimplecross-validation.Eachlearningsetiscreatedbytakingallthesamplesexceptone,thetestsetbeingthesampleleftout.Thus,fornsamples,wehavendifferenttrainingsetsandndifferenttestsset.Thiscross-validationproceduredoesnotwastemuchdataasonlyonesampleisremovedfromthetrainingset:

4.LeavePOut(LPO)

LeavePOutisverysimilartoLeaveOneOutasitcreatesallthepossibletraining/testsetsbyremovingpsamplesfromthecompleteset.Fornsamples,thisproducestrain-testpairs.UnlikeLeaveOneOutandK-Fold,thetestsetswilloverlapforp>1.

ExampleofLeave-2-Outonadatasetwith4samples:

5.LeavePOut(LPO)RandomPermutationsCross-validationa.k.a.Shuffle&Split

TheShuffleSplititeratorwillgenerateauserdefinednumberofindependenttrain/testdatasetsplits.Samplesarefirstshuffledandthensplitintoapairoftrainandtestsets.

Itispossibletocontroltherandomnessforreproducibilityoftheresultsbyexplicitlyseedingtherandomstatepseudorandomnumbergenerator.

Hereisausageexample:

Hereisavisualizationofthecross-validationbehaviorinFig.4.5.NotethatShuffleSplitisnotaffectedbyclassesorgroups.

4.3.1.2Cross-validationIteratorswithStratificationBasedonClassLabels

Someclassificationproblemscanexhibitalargeimbalanceinthedistributionofthetargetclasses:forinstancetherecouldbeseveraltimesmorenegativesamplesthanpositivesamples.InsuchcasesitisrecommendedtousestratifiedsamplingasimplementedinStratifiedK-FoldandStratifiedShuffleSplittoensurethatrelativeclassfrequenciesisapproximatelypreservedineachtrainandvalidationfold.

1.StratifiedK-Fold

StratifiedK-FoldisavariationofK-Foldwhichreturnsstratifiedfolds:containsapproximatelythesamepercentageofsamplesofeachtargetclassasthecompleteset.

Exampleofstratified3-foldcross-validationonadatasetwith10samplesfromtwoslightlyunbalancedclasses:

Hereisavisualizationofthecross-validationbehaviorinFig.4.6.

2.StratifiedShuffleSplit

StratifiedShuffleSplitisavariationofShuffleSplit,whichreturnsstratifiedsplits,i.e,whichcreatessplitsbypreservingthesamepercentageforeachtargetclassasinthecompleteset.

Hereisavisualizationofthecross-validationbehaviorinFig.4.7.Fig.4.7StratifiedShuffleSplit

4.3.1.3Cross-validation

Iterators

for

Grouped

Data

The

ii..d.

assumption

is

broken

if

the

underlying

generative

process

yield

groups

ofdependent

samples.

Such

a

grouping

of

data

is

domain

specific.An

examplewould

bewhen

there

ismedical

data

collected

frommultiple

patients,

withmultiple

samples

taken

fromeachpatient.

And

such

data

is

likely

to

be

dependent

on

the

individual

group.

In

our

example,the

patient

id

for

each

sample

will

beits

group

identifier.

1.GroupK-Fold

GroupK-FoldisavariationofK-Foldwhichensuresthatthesamegroupisnotrepresentedinbothtestingandtrainingsets.Forexampleifthedataisobtainedfromdifferentsubjectswithseveralsamplesper-subjectandifthemodelisflexibleenoughtolearnfromhighlypersonspecificfeaturesitcouldfailtogeneralizetonewsubjects.GroupK-Foldmakesitpossibletodetectthiskindofoverfittingsituations.

Imagineyouhavethreesubjects,eachwithanassociatednumberfrom1to3:

Eachsubjectisinadifferenttestingfold,andthesamesubjectisneverinbothtestingandtraining.Noticethatthefoldsdonothaveexactlythesamesizeduetotheimbalanceinthedata.

Hereisavisualizationofthecross-validationbehaviorinFig.4.8.Fig.4.8GroupK-Fol

2.LeaveOneGroupOut

LeaveOneGroupOutisacross-validationschemewhichholdsoutthesamplesaccordingtoathird-partyprovidedarrayofintegergroups.Thisgroupinformationcanbeusedtoencodearbitrarydomainspecificpre-definedcross-validationfolds.

Eachtrainingsetisthusconstitutedbyallthesamplesexcepttheonesrelatedtoaspecificgroup.

Forexample,inthecasesofmultipleexperiments,LeaveOneGroupOutcanbeusedtocreateacross-validationbasedonthedifferentexperiments:wecreateatrainingsetusingthesamplesofalltheexperimentsexceptone:

3.LeavePGroupOut

LeavePGroupsOutissimilarasLeaveOneGroupOut,butremovessamplesrelatedtoPgroupsforeachtraining/testset.

4.GroupShuffleSplit

TheGroupShuffleSplititeratorbehavesasacombinationofShuffleSplitandLeavePGroupsOut,andgeneratesasequenceofrandomizedpartitionsinwhichasubsetofgroupsareheldoutforeachsplit.

Hereisausageexample:

Hereisavisualizationofthecross-validationbehaviorinFig.4.9Fig.4.9GroupShuffleSpli

4.3.2Procedure

of

NIPALS

4.3.2.1Inner

Loop

of

The

Iterative

NIPALS

Algorithm

Provides

an

alternative

to

the

svd(X'Y);

returns

the

first

left

and

right

singularvectors

of

X'Y.

See

PLS

for

themeaning

of

the

parameters.It

is

similar

to

the

PowermethodfordeterminingtheeigenvectorsandeigenvaluesofaX'Y.

4.3.2.2Center

X

and

Y

4.3.2.3NIPALS

ThisclassimplementsthegenericPLSalgorithm,constructors’parametersallowtoobtainaspecificimplementationsuchas:

ThisimplementationusesthePLSWold2blocksalgorithmbasedontwonestedloops:

(i:)Theouterloopiterateovercomponents.

(ii)Theinnerloopestimatestheweightsvectors.Thiscanbedonewithtwoalgo.(a)theinnerloopoftheoriginalNIPALSalgo,or(b)aSVDonresidualscross-covariancematrices

4.4Example

Application

4.4.1Demo

of

PLS

Software

version

python

2.7,and

aMicrosoftWindows

7

operating

system.Cross-validation

and

train

test

split

are

performedusing

the

sklearnpackage,

respectively.Dataset

loading

is

done

using

the

scipy

package,and

other

programs

can

be

implementedby

individuals..

4.4.2CornDataset

Inthissectionthecorndatasetwasusedforexperiments.LatentvariablesofPLSareallowedtotakevaluesintheset[1,15],anditisdeterminedbythe10-foldcross-validation.Nopre-processingmethodswereusedotherthanmean-centering.Table4.1showsthetrainingerror,cross-validationerror,predictionerror,andprincipalcomponentnumberofthePLSmodelformoisture,oil,protein,andstarchcontentdirectlyusingthecorndataset.

Inthispaper,theprincipalcomponentofthePLSalgorithmisselectedbythe10-foldcross-validationmethod.TheRMSECVofthePLSmodelisgiveninFig.4.10-Fig.4.12,respectively.Fig.4.10TheselectionprocessoftheoptimallatentvariablesnumberfromPLSmodelaboutthem5specinstrumentFig.4.11TheselectionprocessoftheoptimallatentvariablesnumberfromPLSmodelaboutthemp5specinstrumentFig.4.12TheselectionprocessoftheoptimallatentvariablesnumberfromPLSmodelaboutthemp6specinstrume

溫馨提示

  • 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
  • 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
  • 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
  • 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
  • 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
  • 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
  • 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

評論

0/150

提交評論