




版權說明:本文檔由用戶提供并上傳,收益歸屬內容提供方,若內容存在侵權,請進行舉報或認領
文檔簡介
去掉停用詞
問題描述現有某電視劇彈幕信息,請去掉彈幕信息里面的停用詞,然后以列表的形式輸出彈幕中詞頻最高的10個詞。contentslikeCounttv_name0二刷的朋友有嗎20111我希望一切能重來312這段眼神變化的太妙了913良心啊,一小時18414基本都好201............59995這個葉爸有點東西2271259996眼鏡掉在案發現場了901259997俺的眼睛掉在廠里了101259998他不戴假發你更不習慣171259999那是什么藥呀3312輸出結果詞語詞頻孩子2030爬山1913嚴良1511真的1407一個1305媽媽939演技902一起865普普846感覺782問題分析問題描述問題解答怎樣將句子切割成為詞語?
怎樣把彈幕信息表和停用詞表聯合起來?怎樣統計詞頻?cut()merge()value_counts()操作提示利用jieba庫中的cut()函數對彈幕信息進行分詞后轉換為數據框,將之與停用詞數據框進行合并,篩選出不在停用詞表中的詞語,統計這些詞出現的詞頻,這樣得到了題目要求的結果。程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼pandas提供了大量能使我們快速便捷地處理數據的函數和方法。程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼jieba是python的一個中文分詞庫,具有高性能、高準確率、可擴展等特點程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼contentslikeCounttv_name0二刷的朋友有嗎20111我希望一切能重來312這段眼神變化的太妙了913良心啊,一小時18414基本都好201............59995這個葉爸有點東西2271259996眼鏡掉在案發現場了901259997俺的眼睛掉在廠里了101259998他不戴假發你更不習慣171259999那是什么藥呀3312data=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼切割后的列表為:['$','0','1','2','3','4','5','6','7','8','9......'非獨','靠','順','順著','首先','!',',',':',';','?']程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼
stopword0$10213243.....741!742,743:744;745?生成的停用詞表stop_word=pd.DataFrame(stop_word,columns=["stopword"])程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼word0二刷1的2朋友3有4嗎......339467那339468是339469什么339470藥339471呀word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])程序代碼word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])contentslikeCounttv_name0二刷的朋友有嗎20111我希望一切能重來312這段眼神變化的太妙了913良心啊,一小時18414基本都好201............59995這個葉爸有點東西2271259996眼鏡掉在案發現場了901259997俺的眼睛掉在廠里了101259998他不戴假發你更不習慣171259999那是什么藥呀3312“二刷的朋友有嗎我希望一切能重來這段眼神變化的太妙了良心啊,一小時……好了警官你是下一個好一個不戴眼鏡的斯文敗類兒子你啥時候學習啊居然還不說實話?我不戴假發更厲害演完這部電影,伊能靜開始怕了你看我還有機會嗎這個葉爸有點東西眼鏡掉在案發現場了俺的眼睛掉在廠里了他不戴假發你更不習慣那是什么藥呀”程序代碼word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])“二刷的朋友有嗎我希望一切能重來這段眼神變化的太妙了良心啊,一小時……好了警官你是下一個好一個不戴眼鏡的斯文敗類兒子你啥時候學習啊居然還不說實話?我不戴假發更厲害演完這部電影,伊能靜開始怕了你看我還有機會嗎這個葉爸有點東西眼鏡掉在案發現場了俺的眼睛掉在廠里了他不戴假發你更不習慣那是什么藥呀”[‘二刷’,‘的’,‘朋友’,‘有’,‘嗎’,‘我’,‘希望’,……‘他’,’不戴’,‘假發’,你,‘更不’,‘習慣’,‘那是’,‘什么’,‘藥’,‘呀’]程序代碼word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])[‘二刷’,‘的’,‘朋友’,‘有’,‘嗎’,‘我’,‘希望’,……‘他’,’不戴’,‘假發’,你,‘更不’,‘習慣’,‘那是’,‘什么’,‘藥’,‘呀’]word0二刷1的2朋友3有4嗎......339467那339468是339469什么339470藥339471呀程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")wordstopword0二刷NaN1的的2朋友NaN3有有4嗎嗎.........339467那那339468是是339469什么什么339470藥NaN339471呀呀程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼wordstopword0二刷NaN2朋友NaN6希望NaN9重來NaN10這段NaN.........339451案發現場NaN339455眼睛NaN339458廠里NaN339462戴假發NaN339466習慣NaNword=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))程序代碼詞語數量孩子2023爬山1911嚴良1511真的1407一個1305
...
...卡爾1胡成1親熱1碰過1案發現場1word=word.value_counts()程序代碼importpandasaspdimportjiebadata=pd.read_csv(r"D:\pydata\項目四\某電視劇彈幕信息.csv")stop_word=open(r"D:\pydata\項目四\停用詞.txt","r",encoding='utf-8')stop_word=stop_word.read().split()stop_word=pd.DataFrame(stop_word,columns=["stopword"])word=pd.DataFrame(jieba.cut("".join(data.iloc[:,1].astype(str))),columns=["word"])word=pd.merge(word,stop_word,left_on=["word"],right_on=["stopword"],how="left")word=word.query("stopword.isnull()andword.str.len()>1",engine='python')["word"]word=word.value_counts()print(word.head(10))任務小結merge()函數通過列或索引將兩個數據框相關的數據行合并成一行,構成一個新的數據框。為了提供更為靈活的操作來滿足實際工作的需要。一展身手現有某電視劇彈幕信息,請去掉彈幕信息里面的停用詞,然后以列表的形式輸出第一集的彈幕中詞頻最高的10個詞。結果為如下列表:['爬山','真實','一起','電影','豐田','不錯','感覺','欺負','秦昊','真的']制作團隊制作:劉學重慶市九龍坡職業教育中心選取男士最喜歡的電影主講人:劉學重慶市九龍坡職業教育中心問題描述現有三張表,“users”(用戶信息)表,“ratings”(評分)表,“movies”(電影信息)表,三個表的字段如圖所示,請統計出男士最喜歡的10部電影的信息。UserID用戶idGender性別Age年齡Occupation職業Zip-code郵編MovieID電影idTitle電影名Genres類型UserID用戶idMovieID電影idRating評分Timestamp時間戳users
ratingsmovies輸出結果MovieIDTitleGenresRating787GateofHeavenlyPeace,The(1995)Documentary5.0985SmallWonders(1996)Documentary5.03233SmashingTime(1967)Comedy5.03280Baby,The(1973)Horror5.03172Ulysses(Ulisse)(1954)Adventure5.0439DangerousGame(1993)Drama5.0130Angela(1995)Drama5.03656Lured(1947)Crime5.01830FollowtheBitch(1998)Comedy5.0989SchlafesBruder(BrotherofSleep)(1995)Drama5.0問題分析問題描述問題解答最終輸出的信息從那幾個表中獲???
怎樣對表進行合并?怎樣得出男性評分最高的電影?3張表都需要merge()先合并表再統計操作提示首先是合并評分表和用戶信息表,得出男性評分最高的電影的ID和評分,然后把得到的新表和電影信息表進行合并,最后對評分進行降序排序就得出了男性最喜歡的電影信息。程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")①info=info[info["Gender"]=="M"]②info=info.groupby("MovieID")["Rating"].mean()③res=pd.merge(movies,info,on="MovieID")④res=res.sort_values(by="Rating",ascending=False)⑤res=res.round({"Rating":2})res.head(10)程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")①info=info[info["Gender"]=="M"]②info=info.groupby("MovieID")["Rating"].mean()③res=pd.merge(movies,info,on="MovieID")④res=res.sort_values(by="Rating",ascending=False)⑤res=res.round({"Rating":2})res.head(10)程序代碼pandas提供了大量能使我們快速便捷地處理數據的函數和方法。程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")①info=info[info["Gender"]=="M"]②info=info.groupby("MovieID")["Rating"].mean()③res=pd.merge(movies,info,on="MovieID")④res=res.sort_values(by="Rating",ascending=False)⑤res=res.round({"Rating":2})res.head(10)程序代碼
MovieIDTitleGenres01ToyStory(1995)Animation|Children's|Comedy12Jumanji(1995)Adventure|Children's|Fantasy23GrumpierOldMen(1995)Comedy|Romance34WaitingtoExhale(1995)Comedy|Drama45FatheroftheBridePartII(1995)Comedy............38783948MeettheParents(2000)Comedy38793949RequiemforaDream(2000)Drama38803950Tigerland(2000)Drama38813951TwoFamilyHouse(2000)Drama38823952Contender,The(2000)Drama|ThrillerMovies表中的數據程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼UserIDMovieIDRatingTimestamp011193597830076011661397830210921914397830196831340849783002754123555978824291...............1000204604010911956716541100020560401094595670488710002066040562595670474610002076040109649567156481000208604010974956715569ratings表中的數據程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼UserIDGenderAgeOccupationZip-code01F1104806712M56167007223M25155511734M4570246045M252055455..................60356036F25153260360366037F4517600660376038F5611470660386039F4500106060396040M25611106
users表中的數據程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼UserIDMovieIDRatingTimestampGenderAgeOccupationZip-code111935978300760F1104806716613978302109F1104806719143978301968F11048067134084978300275F11048067123555978824291F11048067........................604010911956716541M25611106604010945956704887M2561110660405625956704746M25611106604010964956715648M25611106604010974956715569M25611106info=pd.merge(ratings,users,on="UserID",how="inner")評分表和用戶表合并后的數據框程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼UserIDMovieIDRatingTimestampGenderAgeOccupationZip-code213575978298709M561670072230684978299000M561670072215374978299620M56167007226473978299351M561670072221944978299297M561670072........................604010911956716541M25611106604010945956704887M2561110660405625956704746M25611106604010964956715648M25611106604010974956715569M25611106info=info[info["Gender"]=="M"]篩選出所有男性用戶后的表程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼MovieID
14.13055223.17523832.99415242.48235352.888298...…39483.6418383949468181839514.04347839523.787986男性用戶對各個電影的評分平均值info=info.groupby("MovieID")["Rating"].mean()程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼MovieIDTitleGenresRating1ToyStory(1995)Animation|Children's|Comedy4.1305522Jumanji(1995)Adventure|Children's|Fantasy3.1752383GrumpierOldMen(1995)Comedy|Romance2.9941524WaitingtoExhale(1995)Comedy|Drama2.4823535FatheroftheBridePartII(1995)Comedy2.888298............3948MeettheParents(2000)Comedy3.6418383949RequiemforaDream(2000)Drama4.1741073950Tigerland(2000)Drama3.6818183951TwoFamilyHouse(2000)Drama4.0434783952Contender,The(2000)Drama|Thriller3.787986各個電影的男性用戶評分均值res=pd.merge(movies,info,on="MovieID")程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼MovieIDTitleGenresRating787GateofHeavenlyPeace,The(1995)Documentary5.0985SmallWonders(1996)Documentary5.03233SmashingTime(1967)Comedy5.03280Baby,The(1973)Horror5.03172Ulysses(Ulisse)(1954)Adventure5.0............3460HillbillysinaHauntedHouse(1967)Comedy1.0834PhatBeach(1996)Comedy1.03136JamesDeanStory,The(1957)Documentary1.03904UninvitedGuest,An(2000)Drama1.0684Windows(1980)Drama1.0排序后的數據表res=res.sort_values(by="Rating",ascending=False)程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼MovieIDTitleGenresRating787GateofHeavenlyPeace,The(1995)Documentary5.00985SmallWonders(1996)Documentary5.003233SmashingTime(1967)Comedy5.003280Baby,The(1973)Horror5.003172Ulysses(Ulisse)(1954)Adventure5.00439DangerousGame(1993)Drama5.00130Angela(1995)Drama5.003656Lured(1947)Crime5.001830FollowtheBitch(1998)Comedy5.00989SchlafesBruder(BrotherofSleep)(1995)Drama5.003517Bells,The(1926)Crime|Drama5.002931TimeoftheGypsies(Domzavesanje)(1989)Drama4.833245IAmCuba(SoyCuba/YaKuba)(1964)Drama4.75598WindowtoParis(1994)Comedy4.6753Lamerica(1994)Drama4.67res=res.round({"Rating":2})程序代碼importpandasaspdmovies=pd.read_table(r"D:\pydata\項目四\movies.dat",sep='::',header=None,names=['MovieID','Title','Genres'],engine='python',encoding='iso-8859-15')ratings=pd.read_table(r"D:\pydata\項目四\ratings.dat",sep='::',header=None,names=['UserID','MovieID','Rating','Timestamp'],engine='python',encoding='iso-8859-15')users=pd.read_table(r"D:\pydata\項目四\users.dat",sep='::',header=None,names=['UserID','Gender','Age','Occupation','Zip-code'],engine='python',encoding='iso-8859-15')info=pd.merge(ratings,users,on="UserID",how="inner")info=info[info["Gender"]=="M"]info=info.groupby("MovieID")["Rating"].mean()res=pd.merge(movies,info,on="MovieID")res=res.sort_values(by="Rating",ascending=False)res=res.round({"Rating":2})res.head(10)程序代碼MovieIDTitleGenresRating787GateofHeavenlyPeace,The(1995)Documentary5.0985SmallWonders(1996)Documentary5.03233SmashingTime(1967)Comedy5.03280Baby,The(1973)Horror5.03172Ulysses(Ulisse)(1954)Adventure5.0439DangerousGame(1993)Drama5.0130Angela(1995)Drama5.03656Lured(1947)Crime5.01830FollowtheBitch(1998)Comedy5.0989SchlafesBruder(BrotherofSleep)(1995)Drama5.0res.head(10)任務小結merge()函數通過列或索引將兩個數據框相關的數據行合并成一行,構成一個新的數據框。為了提供更為靈活的操作來滿足實際工作的需要。一展身手請根據“movies”(電影信息)表,“users”(用戶信息)表,“ratings”(評分)表,求出女性最不喜歡的10部電影。結果如圖所示。MovieIDTitleGenresRating3695ToxicAvengerPartIII:TheLastTemptationof...Comedy|Horror1.075BigBully(1996)Comedy|Drama1.01439MeetWallySparks(1997)Comedy1.02207JamaicaInn(1939)Drama1.02256Parasite(1982)Horror|Sci-Fi1.03899Circus(2000)Comedy1.03027Slaughterhouse2(1988)Horror1.03592TimeMasters(LesMa?tresduTemps)(1982)Animation|Sci-Fi1.03574Carnosaur3:PrimalSpecies(1996)Horror|Sci-Fi1.02039Cheetah(1989)Adventure|Children's1.0制作團隊制作:劉學重慶市九龍坡職業教育中心統計各競賽項目的人數主講人:劉學重慶市九龍坡職業教育中心問題描述學校技能大賽啟動后,老師收到了各個班級的技能大賽報名表,怎樣快速地統計出各個項目的參數人數呢?老師收到的報名表文件如圖所示。輸出結果
比賽項目人數2019C程序設計522019VF數據庫482020C程序設計442020VF數據庫41三維動畫3二維動畫制作18二維動畫制作(2021級)129圖像處理(2021級)168圖文混排147幻燈片制作129表格處理113視頻剪輯(2021級)129問題分析問題描述問題解答怎樣將多個文件的數據讀入到一個數據框?
數據要以什么為依據來分組然后計算出各個項目的參賽人數?依次追加“比賽項目”操作提示首先是讀取班級表文件夾下面的第一個數據表,然后把其他的表追加到它的后面,最后通過gro
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網頁內容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經權益所有人同意不得將文件中的內容挪作商業或盈利用途。
- 5. 人人文庫網僅提供信息存儲空間,僅對用戶上傳內容的表現方式做保護處理,對用戶上傳分享的文檔內容本身不做任何修改或編輯,并不能對任何下載內容負責。
- 6. 下載文件中如有侵權或不適當內容,請與我們聯系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025年互聯網金融平臺用戶信任度提升與風險控制策略研究
- 住宅空置車位管理辦法
- 企業服務專員管理辦法
- 中職食堂飯菜管理辦法
- 鄉村土地使用管理辦法
- 豐田售后維修管理辦法
- 鄉鎮人員考核管理辦法
- 休學創業學籍管理辦法
- 臨時生產工廠管理辦法
- 企業安全預防管理辦法
- 新生兒科健康宣教手冊
- 老舊小區施工安全文明施工方案
- JCT640-2010 頂進施工法用鋼筋混凝土排水管
- 中科大固體物理課程作業答案88張課件
- 泵用機械密封的設計與制造
- SOAP病歷的書寫課件
- GB/T 25517.2-2010礦山機械安全標志第2部分:危險圖示符號
- S-150溶劑油化學品安全技術說明書(江蘇華倫)
- 七年級音樂作業
- 江蘇建筑施工安全臺賬(正式版)
- 高中數學必修二 第十章 概率 章末測試(提升)(含答案)
評論
0/150
提交評論