Posts

Building an R Package: Jviz

For this project, I developed the skeleton of an R package, which I named Jviz. The intent is for this package to help beginner R users to easily summarize, explore, and visualize data. I thought of this idea since it can be complicated to write long and complex code to understand a dataset. The goal for this package is to allow basic data analysis and visualization to be easy, efficient, and beginner-friendly.  This package is intended for students, new analysts, and anyone who wants to start learning data visualization with R. There will be functions like quick_summary(), which is meant to easily summarize a dataset, plot_distribution(), which graphs the distribution of a variable, and plot_relationship() for comparing two variables, and group_avg() for calculating averages by groups.  In the DESCRIPTION file, I added the fields stating the package's name, version, and author. I added ggplot2 and dyplr in the import field because they support data visualization and manipulat...

Module 9 Assingment

Image
For this assignment, I chose the Guns dataset from the given list, and I used it to compare different variables in R using basic R graphics, lattice, and ggplot2. The dataset allowed me to compare variables such as income, violent crime rates, and other categories.  For the first comparison, I plotted income against violent crimes, because they are both dependent continuous variables and don't overlap. I also used gun law categories to compare distributions across different groups. These visualizations help show patterns within data, as well as how different visualization tools help show these patterns in different ways.  Base R Visualization This scatter plot shows the relationship between income and violent crime. It helps vizualize weather higher income levels are associated with lower or higher crime rates.  This histogram shows the distribution of violent crimes in the dataset. It helps us identify how crime rates are spread and whether there are any extreme values....

Module #8 Assignment

For this week's assignment, I worked with the given dataset containing four variables for a set of students. The dataset included both males and females, and it also included ages, grades, and names. The assignment tasked me with importing the file into R, calculating the mean grade by sex, filtering the dataset for names containing the letter "i", and export the results into a csv file.  The first step I did was importing the dataset into R using the read.table(). After importing the data, I used the ddply() function from the pylr packaged that we were tasked with installing for this assignment to group the dataset by sex and calculate the average of the grade column. This summarized the comparison between male and female students instead of having to look through them individually. After generating the mean, we get the following output:  Sex Grade_Average 1 Female 86.9375 2 Male 80.2500 After that I converted teh dataset into a datarame and filtered u...

R Object: S3 vs. S4

For this week's assignment, I went and used a dataset I am familiar with, which is mtcars. This dataset contains information about 32 cars. After loading the dataset using the data() function, I tested to see if generic functions and OO systems applied to it.  After checking the class of mtcars using class(mtcars), R returned "data.frame", which in R is implemented using the S3 object system, which in turn makes mtcars an S3 object. Since it's an S3 object, other generic functions can also be applied to it, such as summary() and print() The main difference between S3 and S4 is the structure and formality. S3 is more informal and flexible; it allows for objects to be assigned a class without a strict definition. On the other hand, S4 is more formal, and it requires explicit class definitions using the setClass() function and object creation using new(). Overall, we can see that mtcars is an S3 object, which we can see by generic functions being applied to it; S4 would ...

Module 6: Doing Math Part 2

In this module, we are tasked with performing math with different matrices.  In step 1, we are tasked with making the 2 following matrices and performing A+B and A-B. A = matrix(c(2,0,1,3), ncol=2) B = matrix(c(5,2,4,-1), ncol=2) > A+B [,1] [,2] [1,] 7 5 [2,] 2 2 > A-B [,1] [,2] [1,] -3 -3 [2,] -2 4 After Running the code we get the following outputs for A+B and A-B. For step 2 we are tasked with building a matrix of size for with the following in a diagonal, 4,1,2,3 using the diag() function. We can use the following code to create this matrix using the diag() function. diag(c(4,1,2,3)) > diag(c(4,1,2,3)) [,1] [,2] [,3] [,4] [1,] 4 0 0 0 [2,] 0 1 0 0 [3,] 0 0 2 0 [4,] 0 0 0 3 After running the diag() function with the given numbers we in fact get a matrix with the numbers 4,1,2,3 in a diagonal like the instructions require. In step 3 we are tasked with generating a matrix of 5 with a diago...

Module 5: Doing Math

For this assingment we are tasked with solving the inverse of a matrix using the given values. As the hint stated, we can start with the following for step 1: Step 1: In this step, we are creating the matrices. A <- matrix(1:100, nrow=10)   B <- matrix(1:1000, nrow=10) Step 2: After creating the matrices, we can use the dim function to check their shape. > dim(A) [1] 10 10 > dim(B) [1] 10 100 After running this function, we can see that matrix A is a square 10x10 and matrix B is not becasue its 10x100. Step 3: Next, we can use the det() to find the determinant of matrix A  > det(A) [1] 0 After running the det() we can see that the result is 0 which means the matrix is singular which has no inverse. Step 4: I attempted to double check and try finding the inverse using the solve() function > solve(A) Error in solve.default(A) : Lapack routine dgesv: system is exactly singular: U[6,6] = 0 Step 5: I also attempted to double check matrix B even though is not ...

Module 4: Programming Structue

Image
  We can see in the first image is a boxplot that compares the patient's blood pressures based on the assessment made by the first doctor, which was categorized as good or bad. The plot shows that patients who were rated as having a higher assessment generally have more variable blood pressure compared to those who were rated as good.  The histogram in the second image displays the overall distribution of the blood pressure values across all the assessed patients. Most of the patients fall within a safe range of blood pressure, while some have very high readings. These higher assessments may need closer medical care.  Together, the boxplot and the histogram provide a good summary of both the different assessments by the first doctor as well as the overall pattern in blood pressure measurement. The R code used for this assignment and to generate these visualizations has been posted on the GitHub readme, and the link will be attached. https://github.com/jonathangonzalezz/r-...