Rstudio datasets

9/13/2023

VALUES of CONC are spread "wide" ( v.names). Columns are labelled with TIME ( timevar) Keep SUBJECT as the identifying variable, one per row ( idvar) We want to group them together by some variable that identifies an individual (group of observations). Indometh # one of the built-in R datasets. But let's glance at long-to-wide transformation: There are several methods reshape is powerful. one row per subject multiple observations/column per subject) and "long" format (one observation per row). Often, you need to transform data between "wide" format (e.g. In the R Commander, you can click the Data set button to select a data set, and then click the Edit data set button.įor more advanced data manipulation in R Commander, explore the Data menu, particularly the Data / Active data set and Data / Manage variables in active data set menus. Y <- edit(x) fix(x) # equivalent to x <- edit(x) X <- scan(" # the same, but from a URL (live)Įditing a variable, matrix, or data frame: X <- scan(filename) # do the same but reading from a file on disk X <- scan() # type in numbers, separated by spaces or newlines hit Enter twice to finish Typing stuff in note also that filenames and URLs are often interchangeable: Rm(x) # removes object "x" (if you know UNIX, this will be familiar) Other important object manipulation functions: ls() # list all objects (if you know UNIX, this will be familiar) # Another way, which has no residual effects: Search() # shows the current search path (will now include my.dataset)ĭetach(my.dataset) # when we've finished with it # By the way, get used to the R convention: my.dataset is just a variable name the dot doesn't mean anything special. # (otherwise a new variable called var is created that simply "overlies" the dataset. # Note that to change variables in the dataset, you still need to assign to dataset$var Making a data set visible on the main search path: attach(my.dataset) # we now don't need to use my.dataset$X, my.dataset$Y we can just use X and Y directly It's easy to sort data frames and to create new variables based on existing ones. there are lots of things you can do with this command see ?subset. This type of plot allows us to quickly see that the sepal length tends to be largest for the virginica species and smallest for the setosa species.X 3 # will make temp equal to the logical vector c(FALSE, FALSE, FALSE, TRUE, TRUE) by performing comparisons on each element of v

The x-axis displays the three species and the y-axis displays the distribution of values for sepal length for each species.

We can also use the boxplot() function to create a boxplot by group: #create scatterplot of sepal width vs. Plot(iris$Sepal.Width, iris$Sepal.Length, We can also use the plot() function to create a scatterplot of any pairwise combination of variables: #create scatterplot of sepal width vs. We can also create some plots to visualize the values in the dataset.įor example, we can use the hist() function to create a histogram of the values for a certain variable: #create histogram of values for sepal length "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species" We can also use the names() function to display the column names of the data frame: #display column names We can see that the dataset has 150 rows and 5 columns. We can use the dim() function to get the dimensions of the dataset in terms of number of rows and number of columns: #display rows and columns virginica: This species occurs 50 times.versicolor: This species occurs 50 times.3rd Qu: The value of the third quartile (75th percentile).įor the only categorical variable in the dataset (Species) we see a frequency count of each value:.1st Qu: The value of the first quartile (25th percentile).:2.500įor each of the numeric variables we can see the following information: Sepal.Length Sepal.Width Petal.Length Petal.Width We can use the summary() function to quickly summarize each variable in the dataset: #summarize iris dataset Sepal.Length Sepal.Width Petal.Length Petal.Width Species We can take a look at the first six rows of the dataset by using the head() function: #view first six rows of iris dataset Since the iris dataset is a built-in dataset in R, we can load it by using the following command: data(iris) Related: A Complete Guide to the mtcars Dataset in R Load the Iris Dataset This tutorial explains how to explore and summarize a dataset in R, using the iris dataset as an example. The iris dataset is a built-in dataset in R that contains measurements on 4 different attributes (in centimeters) for 50 flowers from 3 different species.

0 Comments

Rstudio datasets

Leave a Reply.

Author

Archives

Categories