您所在的位置: 首頁> 新聞列表> [背景提升]師大女生:我在哈佛做生物統(tǒng)計科研
本文標(biāo)題:師大女生:我在哈佛做生物統(tǒng)計科研,如今留學(xué)的人越來越多,不論高中生、大學(xué)生還是讀研的學(xué)生,都想早日去留學(xué)接受好的教育,很多同學(xué)對美國讀研,美國留學(xué)中介,美國留學(xué)條件,美國留學(xué)網(wǎng),美國留學(xué)申請,美國研究生留學(xué)的相關(guān)問題有所疑問,下面澳際小編整理了《[背景提升]師大女生:我在哈佛做生物統(tǒng)計科研》,歡迎閱讀,如有疑問歡迎聯(lián)系我們的在線老師,進行一對一答疑。
Z同學(xué),來自大陸的一所重點師范大學(xué),統(tǒng)計專業(yè),大四。夢想著進入美國名校讀統(tǒng)計學(xué)或商業(yè)分析專業(yè)。為此,參加美國名校科研,增加學(xué)術(shù)背景,開拓視野,獲得真知。通過美國名??蒲欣蠋煹闹笇?dǎo),申請到了哈佛大學(xué)生物統(tǒng)計專業(yè)的科研機會。
由于交叉學(xué)科的普遍性,生物統(tǒng)計專業(yè),碩博錄取中,每年都有很多數(shù)學(xué)、統(tǒng)計、計算機等背景的學(xué)生。
Summer Research Statement Report
I am honored to have this opportunity to participate in the summer research program at Harvard medical school. Now I will report to all on this research project. In recent years, machine learning has been very popular in the field of artificial intelligence, and it is also a new tool for improving prediction level. My major and machine learningare also very relevant, therefore, before I came to the United States, I had decided to learn some machine learning algorithm as soon as I can, and applying in medical related field, although I did not know own research. Before leaving, I also think aboutthe possible difficulties in the project: one is the data acquisition and preprocessing, the data is one of the key factors for successful machine learning, machine learning professor N, a Amazon AI team member, once said: no matter how good is an algorithm, the best way to drive machine learning progressing is to obtain large amounts of data.The second is the improvement of the algorithm.These two hypotheses have also been proved in a month of scientific research.
我很榮幸有這個機會參加哈佛醫(yī)學(xué)院的暑期研究項目?,F(xiàn)在我要向大家報告近年來,機器學(xué)習(xí)在人工智能領(lǐng)域得到了廣泛的應(yīng)用,同時也是一種新的學(xué)習(xí)工具。提高預(yù)測水平。我的專業(yè)和機器學(xué)習(xí)也非常重要,因此,在我來美國之前,我有決定盡快學(xué)習(xí)一些機器學(xué)習(xí)算法,并應(yīng)用于醫(yī)學(xué)相關(guān)領(lǐng)域,雖然我自己不知道。在離開之前,我還考慮了項目中可能遇到的困難:一是數(shù)據(jù)采集和預(yù)處理,數(shù)據(jù)是機器學(xué)習(xí)成功的關(guān)鍵因素之一,機器學(xué)習(xí)N教授,一個亞馬遜AI團隊成員,曾經(jīng)說過:不不管算法有多好,驅(qū)動機器學(xué)習(xí)的最好方法是獲取大量的數(shù)據(jù)。改進算法,這兩個假設(shè)在一個月的科學(xué)研究中也得到了證實。
After the first meeting, I understand the content of the degrees of freedom is very high, from the selected topic, suppose every steps, data, and even if solving the problem,every step is up to myself, mentor’s rich background can help in every way to me. Although the mentor said I also can choose the subject of the financial sector to study, which is a small kidding? Ha~ Because I had no medical background, so in the early stage of the research I take some time to replenish the knowledge of tumor and genes, so it can complete the topic selection better, ultimately I determine the project isTumor Gene Identification, research mainly with MATLAB platform.
第一次見面后,我了解自由度的內(nèi)容很高,從選題出發(fā),假設(shè)每一步、數(shù)據(jù),即使解決了問題,每一步都取決于我自己,導(dǎo)師豐富的背景可以幫助我。說我也可以選擇金融部門的課題來研究,這是個小玩笑嗎?因為我沒有醫(yī)學(xué)背景,所以在研究初期,我需要一定時間補充腫瘤和基因的知識,從而完成選題。好,最后我決定項目的istumor基因鑒定,研究主要以MATLAB為平臺。
After determining the research content, I analyze the existing data, the data characteristics of these genes are less samples, but gene dimensionality is high, and these data had been labeled set. So with these data,and basing on the literature study, I decided to choose the SVM prediction model for training and classification. (I need explain data a little more—data is another classmate send to me, there have been some problems with the data at the beginning, so I added another set of data, all data in project is from TCGA database.)Then I encountered the first difficulty: the preprocessing of the data, the quality of the data will affect the classification effect of the later SVM, so I spent a lot of time on the data processing.The processing of data is divided into three steps: First, the data is been normalized, so that the data is in the same level, which will eliminate the differences of data as much as possible. Second, remove extraneous genes and redundant genes, so that the genes where remained are genes that are either mutated or mutating and not duplicated.When removing extraneous genes,I chose the information index to classification method, which is a good way to consider the effect of variance size on classification results, this way is based on the common signal-to-noise ratio method.In removing redundant genes, I chose The correlation coefficient of redundancy elimination method, determininga gene whether need to eliminate with the help of the similarity between each gene, the final classification results shows that the feature extraction function of genes is very obvious.Third, I used the principal component method to classify the genes, after these three steps, there are only 134 genes left, which greatly reduce the dimension and get the expected result. After the data preprocessing, the data sets were randomly divided into training setandtest set, first put the training set into the SVM model to determine sample type, the accuracy is as high as 98.8889%, this result is good. So this model cango on forecasting, the accuracy in forecasting test set classification is 99.2063%.To this end, the study of the project ended and the classification effect is the ideal result.
在確定研究內(nèi)容后,對現(xiàn)有的數(shù)據(jù)進行分析,發(fā)現(xiàn)這些基因的數(shù)據(jù)特征較少,但基因較少。維度很高,這些數(shù)據(jù)被標(biāo)記為集合。因此,根據(jù)這些數(shù)據(jù),在文獻研究的基礎(chǔ)上,我決定選擇支持向量機訓(xùn)練分類預(yù)測模型。(我需要解釋數(shù)據(jù),多一點數(shù)據(jù)是另一個同學(xué)發(fā)給我的,有首先是數(shù)據(jù)的一些問題,所以我增加了一組數(shù)據(jù),在項目的所有數(shù)據(jù)從TCGA數(shù)據(jù)庫。)然后我遇到的第一個難點是數(shù)據(jù)的預(yù)處理,數(shù)據(jù)的質(zhì)量會影響后期SVM的分類效果,所以我在數(shù)據(jù)處理上花費了大量的時間。數(shù)據(jù)的處理分為三個步驟:首先,對數(shù)據(jù)進行歸一化處理,使數(shù)據(jù)處于同一水平,盡可能地消除數(shù)據(jù)的差異。第二,去除多余的基因和冗余。因此,基因,仍然是任何突變或變異和不重復(fù)基因的基因。當(dāng)去除多余的基因,我選擇了信息索引的分類方法,這是一個很好的方法來考慮方差大小對分類的影響。結(jié)果,這種方法是基于常用的信噪比方法,在去除冗余基因時,選擇了相關(guān)系數(shù)。redundancy elimination method, determininga gene whether need to eliminate with the help of the similarity between each gene, the final分類結(jié)果表明,基因的特征提取功能非常明顯。第三。對這些基因進行分類,在這三個步驟之后,只剩下134個基因,大大減少了維數(shù)并得到了預(yù)期。結(jié)果,經(jīng)過數(shù)據(jù)預(yù)處理、數(shù)據(jù)集隨機分為訓(xùn)練setandtest集,先放在SVM的訓(xùn)練模型確定樣品類型,準(zhǔn)確度高達98.8889%,效果良好。該模型可以預(yù)測的準(zhǔn)確性在預(yù)測中,測試集分類為99.2063%,對項目的研究結(jié)束,分類效果是理想的結(jié)果。
Actually, before determine using the SVM algorithm, I also tried lasso algorithm and neural network, lasso algorithm and principal component analysis has same effect, dimension reduction, it all have a good effect on extracting feature selection. The BP neural network is one of the prediction algorithm will often use, but only after a lasso algorithm processing of data still belonged to the noisy and high dimension data, it did not achieved ideal effect in the BP neural network training.Finally, the SVM algorithm is been found for the characteristics of genetic data, and doing a large number of effective data preprocessing before the application classifier, which can result in a better results.
實際上,在使用SVM算法確定之前,我還嘗試了套索算法和神經(jīng)網(wǎng)絡(luò),套索算法和本金。構(gòu)件分析具有相同的效果,降維,對特征提取都有很好的效果。其中一種預(yù)測算法將經(jīng)常使用,但只有經(jīng)過套索算法處理的數(shù)據(jù)仍然屬于噪聲和高。維數(shù)數(shù)據(jù),在bp神經(jīng)網(wǎng)絡(luò)訓(xùn)練中未取得理想效果,最后得到了支持向量機算法。遺傳數(shù)據(jù)的特點,并在應(yīng)用分類器之前做大量有效的數(shù)據(jù)預(yù)處理,這可能導(dǎo)致較好的結(jié)果。
The main scientific research project for the SVM algorithm improvement concentrate on the data processing, on the feature selection and extraction achieved good effect, and then in the classifier training also achieved good results.This scientific research project also need to continue to study: first, although the classification result is not bad, but the operation is very time consuming, especially in eliminate gene steps, which need up to an hour, hope it can accelerate the speed in the future.Second, the application of the data is open, if the model is applied to hospitals, which is a more real complex and large database, whether such processing method can also achieve ideal result or not, so support vector machine (SVM) on gene expression data analysis research have a lot of work to do in the future.
支持向量機算法改進的主要科研項目集中在數(shù)據(jù)處理、特征選擇和提取取得了很好的效果,然后在分類器的訓(xùn)練中也取得了良好的效果。繼續(xù)研究:第一,雖然分類效果不錯,但操作非常耗時,特別是在消除基因方面。需要一個小時的步驟,希望它能加快未來的速度。應(yīng)用于醫(yī)院,這是一個更真實、復(fù)雜、龐大的數(shù)據(jù)庫,這種處理方法是否也能達到理想的效果呢? 因此,支持向量機(SVM)在基因表達數(shù)據(jù)分析方面的研究還有很多工作要做。
China and the United States have a lot of differences in teaching even if in university.Although I have participated in some projects with my teacher before in China, what I do more is doingwith teacher's leader step by step; but this project research degrees of freedom is very high, in order that what the algorithm I want to apply is much more, sometimes I can not find direction, and I overturn the idea in the past many times, always looking for a new and suitable thought, this also lead to a few problems on time management.What’s more, in communication with my dear mentor, I sometimes feel that I don't have the idea of taking shape to communicate with my tutor, these thought should be changed in my future study life.Boston is a very attractive city, where science and technology has become a pillar industry, whichnow many cities want to transformation in the direction of development. Duringthe leisure time, I always can meet some interesting people and thingsin the library or campus, this also let me really looking forward to the future study life.In addition to the knowledge gains, my oral English has also been improved, which is not only thanks to my mentor but also to my host family.Finally, the project was completed with the help of my dear mentor L, L, teacher Z and teacher L. Thanks all of you very much!
中國和美國,即使在大學(xué)有很多差異教學(xué)。雖然我也參加了一些項目,我教師在中國,我所做的更多的是對著老師的領(lǐng)導(dǎo)一步一步;但本項目研究的自由度很高,為了我要應(yīng)用的算法要多很多,有時候我找不到方向,我在過去推翻了這個想法。很多時候,總是在尋找新的合適的思想,這也導(dǎo)致了一些時間管理上的問題。與我親愛的導(dǎo)師的溝通,我有時覺得我不知道如何形成與我的導(dǎo)師溝通,這些在我未來的學(xué)習(xí)生活中,思想應(yīng)該改變。波士頓是一個非常有吸引力的城市,那里的科學(xué)和技術(shù)已經(jīng)成為支柱。行業(yè),現(xiàn)在許多城市想發(fā)展的方向轉(zhuǎn)變。在閑暇時間,我總能遇到一些人。有趣的人和事圖書館或校園,這也讓我很期待今后的學(xué)習(xí)生活中,除了對知識的提高,我的口語也得到了提高,這不僅要感謝我的導(dǎo)師,還要感謝我的寄宿家庭。項目完成了我親愛的Z老師,L老師 ,L老師。感謝您們!
附學(xué)生科研期間周總結(jié):
week 3
接著之前的周報,側(cè)重于統(tǒng)計方法的研究,因此我決定利用機器學(xué)習(xí)算法做肺癌ALK基因的分類預(yù)測。首先要做的是提取特征值,通常用的是都是主成分分析等方法,曾經(jīng)研讀過的一篇關(guān)于財政評價體系構(gòu)建文獻,它利用lasso算法很好地進行了特征歸類,因此我就想利用此算法提取出更加利于分析的特征值,在提取出來后,原有的特征變得比之前小了很多,隨后進行分類預(yù)測,我選擇的是適用度非常廣的神經(jīng)網(wǎng)絡(luò)算法,但是在輸出分類結(jié)果的時候并沒有得到預(yù)期的結(jié)果,即預(yù)測精度并沒有得到很好地提高,可能導(dǎo)致此種的原因有:第一,數(shù)據(jù)集過于片面,數(shù)據(jù)量可能不夠;第二,選擇了錯誤的算法。在下一周我將換一種機器學(xué)習(xí)方法進行分類預(yù)測。希望可以提高預(yù)測精度。
week 4
由于上一周應(yīng)用lasso和神經(jīng)網(wǎng)絡(luò)模型得到的預(yù)測精度并沒有得到很好地提高,因此在本周中我換了一種新的機器學(xué)習(xí)方法——SVM方法進行分類預(yù)測。由于對這個模型沒有學(xué)過,因此先用了一些時間來重新學(xué)習(xí)了一下,截止目前我已經(jīng)得到了初步的分類結(jié)果,但對于某些參數(shù)的設(shè)置和應(yīng)用我還是沒太搞清楚,因此我申請再多用一點點時間來得到更精確的分類結(jié)果。在實驗中,沒有得到理想的分類結(jié)果其實也是正常情況之一,如果多個方法做出來結(jié)果相同那么就可以說明此組數(shù)據(jù)的結(jié)果就是一類的。但是這個結(jié)論現(xiàn)在定下還有一點過早,我需要再一點時間來檢測是否有問題出現(xiàn)。
此外,如果您對移民、置業(yè)相關(guān)服務(wù)感興趣,您可手機訪問:澳際移民置業(yè)官網(wǎng)http://immi.aoji.cn/
通過上述內(nèi)容講解,希望能幫助同學(xué)們更好的了解[背景提升]師大女生:我在哈佛做生物統(tǒng)計科研最新動態(tài)。想要咨詢更多最新[背景提升]師大女生:我在哈佛做生物統(tǒng)計科研留學(xué)相關(guān)問題,可隨時咨詢我們專業(yè)的出國留學(xué)高級顧問老師。溫馨提示:了解最新出國留學(xué)動態(tài),也可以掃描下方二維碼直接添加海外顧問老師微信。
以上就是有關(guān)[背景提升]師大女生:我在哈佛做生物統(tǒng)計科研的相關(guān)內(nèi)容介紹,希望對您有所幫助,對此如果還有什么想要了解的,可以關(guān)注澳際留學(xué)網(wǎng)相關(guān)資訊或咨詢專業(yè)的顧問老師。
[背景提升]師大女生:我在哈佛做生物統(tǒng)計科研內(nèi)容來自互聯(lián)網(wǎng)不代表本網(wǎng)觀點,如果本網(wǎng)轉(zhuǎn)載的稿件涉及您的版權(quán)請發(fā)郵件至jinglanghua@aoji.cn,我們將第一時間本網(wǎng)將依照國家相關(guān)法律法規(guī)盡快妥善處理
劉興 經(jīng)驗: 16年 案例:4272 擅長:美國,澳洲,亞洲,歐洲
本網(wǎng)站(m.innerlightcrystal.com,刊載的所有內(nèi)容,訪問者可將本網(wǎng)站提供的內(nèi)容或服務(wù)用于個人學(xué)習(xí)、研究或欣賞,以及其他非商業(yè)性或非盈利性用途,但同時應(yīng)遵守著作權(quán)法及其他相關(guān)法律規(guī)定,不得侵犯本網(wǎng)站及相關(guān)權(quán)利人的合法權(quán)利。除此以外,將本網(wǎng)站任何內(nèi)容或服務(wù)用于其他用途時,須征得本網(wǎng)站及相關(guān)權(quán)利人的書面許可,并支付報酬。
本網(wǎng)站內(nèi)容原作者如不愿意在本網(wǎng)站刊登內(nèi)容,請及時通知本站,予以刪除。
1、拔打奧際教育全國咨詢熱線: 400--601--0022 (8:00-24:00)。
2、點擊 【在線咨詢】,我們會有咨詢老師為您提供專業(yè)的疑難問題解答。
3、 【在線預(yù)約】咨詢,填寫表單信息,隨后我們會安排咨詢老師回訪。