第四關作業1，翻譯一篇大神的文章

01-29

本期作業要求的是翻譯一篇關於大神Hadley Wickham的文章，我選的是他的個人網站關於dplyr包的文章，水平有限，笨拙的很，但是我知道未來的自己一定不笨拙，我們一起活在未來吧！

dplyr

Overview

dplyr is a grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges:

mutate() adds new variables that are functions of existing variables
select() picks variables based on their names.
filter() picks cases based on their values.
summarise() reduces multiple values down to a single summary.
arrange() changes the ordering of the rows.

These all combine naturally with group_by() which allows you to perform any operation "by group". You can learn more about them in vignette("dplyr"). As well as these single-table verbs, dplyr also provides a variety of two-table verbs, which you can learn about in vignette("two-table").

dplyr is designed to abstract over how the data is stored. That means as well as working with local data frames, you can also work with remote database tables, using exactly the same R code. Install the dbplyr package then read vignette("databases", package = "dbplyr").

If you are new to dplyr, the best place to start is the data import chapter in R for data science.

Installation

# The easiest way to get dplyr is to install the whole tidyverse:ninstall.packages("tidyverse")nn# Alternatively, install just dplyr:ninstall.packages("dplyr")nn# Or the the development version from GitHub:n# install.packages("devtools")ndevtools::install_github("tidyverse/dplyr")n

If you encounter a clear bug, please file a minimal reproducible example on github. For questions and other discussion, please use the manipulatr mailing list.

Usage

library(dplyr)nnstarwars %>% n filter(species == "Droid")n#> # A tibble: 5 x 13n#> name height mass hair_color skin_color eye_color birth_year gendern#> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>n#> 1 C-3PO 167 75 <NA> gold yellow 112 <NA>n#> 2 R2-D2 96 32 <NA> white, blue red 33 <NA>n#> 3 R5-D4 97 32 <NA> white, red red NA <NA>n#> 4 IG-88 200 140 none metal red 15 nonen#> 5 BB8 NA NA none none black NA nonen#> # ... with 5 more variables: homeworld <chr>, species <chr>, films <list>,n#> # vehicles <list>, starships <list>nnstarwars %>% n select(name, ends_with("color"))n#> # A tibble: 87 x 4n#> name hair_color skin_color eye_colorn#> <chr> <chr> <chr> <chr>n#> 1 Luke Skywalker blond fair bluen#> 2 C-3PO <NA> gold yellown#> 3 R2-D2 <NA> white, blue redn#> 4 Darth Vader none white yellown#> 5 Leia Organa brown light brownn#> # ... with 82 more rowsnnstarwars %>% n mutate(name, bmi = mass / ((height / 100) ^ 2)) %>%n select(name:mass, bmi)n#> # A tibble: 87 x 4n#> name height mass bmin#> <chr> <int> <dbl> <dbl>n#> 1 Luke Skywalker 172 77 26.02758n#> 2 C-3PO 167 75 26.89232n#> 3 R2-D2 96 32 34.72222n#> 4 Darth Vader 202 136 33.33007n#> 5 Leia Organa 150 49 21.77778n#> # ... with 82 more rowsnnstarwars %>% n arrange(desc(mass))n#> # A tibble: 87 x 13n#> name height mass hair_color skin_colorn#> <chr> <int> <dbl> <chr> <chr>n#> 1 Jabba Desilijic Tiure 175 1358 <NA> green-tan, brownn#> 2 Grievous 216 159 none brown, whiten#> 3 IG-88 200 140 none metaln#> 4 Darth Vader 202 136 none whiten#> 5 Tarfful 234 136 brown brownn#> # ... with 82 more rows, and 8 more variables: eye_color <chr>,n#> # birth_year <dbl>, gender <chr>, homeworld <chr>, species <chr>,n#> # films <list>, vehicles <list>, starships <list>nnstarwars %>%n group_by(species) %>%n summarise(n n = n(),n mass = mean(mass, na.rm = TRUE)n ) %>%n filter(n > 1)n#> # A tibble: 9 x 3n#> species n massn#> <chr> <int> <dbl>n#> 1 Droid 5 69.75000n#> 2 Gungan 3 74.00000n#> 3 Human 35 82.78182n#> 4 Kaminoan 2 88.00000n#> 5 Mirialan 2 53.10000n#> # ... with 4 more rown

概述

dplyr包是一種數據處理的語法，提供一種始終如一的幫助你解決大部分一般的數據處理挑戰的操作：

·mutate() 增加新的現有變數功能的變數

·select（）挑選變數以他們的名字為基礎

·filter（）挑選箱子以他們的價值為基礎

·summarise（）減少倍數價值降低到單一的摘要

·arrange（）變化某一特定行的順序

這些自然的聯合關於group_by()允許你操作執行任何的運算「by group」。你能學習更多的關於他們的簡介（「dplyr」）。還有這些？，dplyr也提供多種多樣的？，你可以學習關於裝飾圖案（「two-table」）

如果你對於「dplyr」函數是個新手，最好開始的地方是數據輸入章節在為了數據科學的r語言中。

最容易的方式獲得「dplyr」是安裝整個「tidyverse」。

安裝（「tidyverse）

#作為一種選擇，只安裝」dplyr「

#或者來自」社交編程及代碼託管網站「的高級版本：

#安裝包（」devtools")

devtools::install_github

("tidyverse/dplyr")

假如你遭遇一個明顯的錯誤，請把最小限度的可複寫的例子歸檔在「github」。

關於答案和其他的討論請使用郵件列表。

導入(dplyr)

#># tibble包：5×13

#> 姓名身高質量頭髮顏色膚色眼睛顏色生日性別

#> <數值型><有符號整型> <dbl> <數值型> <數值型> <數值型> <dbl> <數值型>

#> 1C-3PO 167 75 缺失金色黃色 112 缺失

#> 2R2-D2 96 32 缺失，白色藍色紅色 33 缺失

#>3R5-D4 97 32 缺失，白色紅色紅色缺失 <缺失>

#>4 IG-88 200 140 沒有金屬紅色 15 沒有

#>5 BB8 缺失缺失沒有沒有黑色缺失沒有

#>#...多於5個的變數：家園<數值型>，種類<數值型>，薄膜<列表>，

#># 架次<列表>，星際飛船<列表>

星球大戰 %>%

挑選（姓名，ends_with("color"))

#># tibble包：87×4

#> 姓名頭髮顏色膚色眼睛顏色

#> <數值型> <數值型 > <數值型> <數值型>

#>1 盧克。天行者金髮白皙的藍色

#>2 C-3PO 缺失金色黃色

#>3 R2-D2 <缺失>白色，藍色黃色

#>4 達斯維德沒有白色黃色

#>5 利亞歐嘉納褐色淺色褐色

#>#...關於82行更多

星球大戰 %>%

變異（姓名，身體質量指數=質量/（（身高/100)^2））%>%

挑選（姓名：質量，身體質量指數）

#>tibble函數：87×4

#> 姓名身高質量身體質量總數

#> <字元型> <整形> <dbl> <dbl>

#>1盧克天行者 172 77 26.02758

#>2 C-3PO 167 75 26.89232

#>3 R2-D2 96 32 34.72222

#>4 達斯維德 202 136 33.33007

#>5利亞歐嘉納 150 49 21.77778

#>#...關於82行更多

星球大戰%>%

整理（降序排列（質量））

#> tibble函數：87×13

#> 姓名身高質量頭髮顏色膚色

#> <字元型> <整型><dbll> <數值型> <字元型>

#>1 175 1358 褐色

#>2 Grievous 216 159 none 金屬

#>3 IG-88 200 140 none 金屬

#>4 達斯維德 202 136 none 白色

#>5 234 136 棕色棕色

#># 電影<列表>，車輛<列表>，星際飛船<列表>

starwars %>%n group_by(species) %>%n summarise(n n = n(),n mass = mean(mass, na.rm = TRUE)n ) %>%n filter(n > 1)n#> # A tibble: 9 x 3n#> species n massn#> <chr> <int> <dbl>n#> 1 Droid 5 69.75000n#> 2 Gungan 3 74.00000n#> 3 Human 35 82.78182n#> 4 Kaminoan 2 88.00000n#> 5 Mirialan 2 53.10000n#> # ... with 4 more rown

星球大戰 %>%

group_by(物種） %>%

summarise(

n = n()

質量 = 平均值（質量，na.rm = TRUE)

)%>%

filter(n >1)

#> A tibble:9×3

#> 種類 n 質量

<字元串> <整形> <dbl>

#> 1 Droid 569.75000

#> 2 Gungan 374.0000

#> 3 Human 3582.78182

#> 4 Kaminoan 288.00000

#> 5 Mirialan 253.10000

#> # ...關於多餘四行

翻譯完畢，知道很low，但對現在的我來說完成更重要。