Last modified: Sat Mar 23 2019 14:48:53 GMT+0800 (Malay Peninsula Standard Time)
Chapter 8. Data Analysis
The example in this chapter requires above average database knowledge and will not provide a step by step process to achieve our final goal. You may have to create your own view and controller components and perform modification for the routes. Reading the documentation and understand what each method perform is required and crucial. If you are new to Ruby on Rails, do not attempt this example.
Data is always free and available anywhere. We can use the data available in the internet to turn it into a more readable form that can provide us useful information. In this chapter, daru
gem will be used and discussed. daru
is a library for storage, analysis, manipulation and visualization of data in Ruby. The complete documentation for daru
can be found here.
In this chapter, we will use an open source data that shows each California State University employee’s salary in the year. All the code discussed in this chapter can be placed in the same controller. I chose to place it in my pages_controller.rb
.
Do note that this example has already been presented as a class project by the author. Do not duplicate this project and submit as your class project without seeking the permission of the author. You may put yourself into trouble due to this is the author’s original work.
8.1 Initial Setup
To begin, add the gem daru
to your gemfile as shown in Table 8.1.1.
#MyApp/gemfile
gem 'daru'
8.2 Import Data From CSV File
According to the documentation, there are a few ways that you can import your data. In this example, instead of reading from modal or database, data will be imported from .csv
(Microsoft Excel or Spreadsheet) file. Figure 8.2.1 shows a spreadsheet that contains employee that works for California State University (CSU) that is saved in .csv
extension.
Figure 8.2.1: A sample .csv
file.
Since this program will be large (eventually), each of the operation will be created as a method. This allows us to find our code in the method quickly and correctly. Table 8.2.1 shows a method, load_csv
is created to load the .csv
file. YOUR-DATA.csv
is a spreadsheet that is located in the data folder at the root app directory.
#MyApp/controllers/pages_controller.rb
def load_csv
@data_frame = Daru::DataFrame.from_csv('data/YOUR-DATA.csv')
end
8.3 Output The Table
Outputting the result in the view is very straight forward. Before we are able to print out the result, we have to include the method that we created in Table 8.2.1 to our method for the page. Table 8.3.1 shows a result method is created and have the load_csv
method included.
#MyApp/controllers/pages_controller.rb
def result
load_csv
end
There is nothing much that we have to do. By default, @data_frame
method will output the result in table but in the form of HTML code. To print the HTML code on Rails’ view, we have to include the raw method shown in Table 8.3.2.
#MyApp/views/pages/result.html.erb
<%=raw @data_frame %>
Figure 8.3.1 shows 3000 rows and 11 columns of data are processed by daru, Daru::DataFrame(3000x11)
. The table will not show all the data that you have. In fact, it will show the first 30 rows of data and the last row of data.
Figure 8.3.1: Table printed in the view
8.4 Group By And Mean
To get the mean of each job, we first have to group all the job title by using group_by
method. Then, we can use the method .mean
to get the mean of each job. Table 8.4.1 shows the group_by
and mean method.
#MyApp/controllers/pages_controller.rb
def mean_group_by_job
@group_by_job = @data_frame.group_by('Job Title')
@output = @group_by_job.mean
end
Figure 8.4.1 shows the result group by job title. There are a total of 180 rows of data, indicating 180 types of different jobs.
Figure 8.4.1: Result of the view
8.5 Filter By (String)
To get only the job title we want, we can use predefined method, .eq(param)
. param is the parameter that we want to look for. Table 8.5.1 shows the .eq()
method which contains a method @filter_job_title
that user can indicate any job title that they want. Do note that ‘Job Title’ indicates that we are using the ‘Job Title’ column in the table.
#MyApp/controllers/pages_controller.rb
def filter data
@output = @data_frame.where(@data_frame['Job Title'].eq(@filter_job_title))
end
Figure 8.5.1 shows the result group filtered by job title ACCOUNTANT II
. There are a total of 4 rows of data that contains ACCOUNTANT II
.
Figure 8.5.1: Result of the view
8.6 Filter By (Value)
To get only the job pay in the range based on the columns we want, we can use predefined method shown in Table 8.6.1.
Table 8.6.1: Predefined Method
Comparator | Method | Description |
---|---|---|
lt | < | returns true for each entry less than the supplied object |
lteq | <= | returns true for each entry less than or equal to the supplied object |
mt | > | returns true for each entry more than the supplied object |
mteq | >= | returns true for each entry more than or equal to the supplied object |
Since we have a few different column of data, we can only choose to filter one column at a time and get the data based on the input range we wish to. Ideally, user can input one or two values to indicate the minimum and the maximum range of values they are looking for.
Table 8.6.2 shows a method to filter the value that is greater than the value specified. @filter_option
is a column that we would like to have the @greater_than
value filter in.
#MyApp/controllers/pages_controller.rb
def filter data
@output = @data_frame.where(@data_frame[@filter_option].gt(@greater_than.to_i))
end
Figure 8.6.1 shows the result group filtered by job title ACCOUNTANT II
and the total pay is filter by the value greater than 70000
. There is only one result that based on the constraint that we defined.
Figure 8.6.1: Result of the view
8.7 Sort By (Value)
To sort the values based on the column we choose, we can use predefined method, .sort(param)
to sort in ascending
or descending
order.
The process is very straight forward. We pass in a parameter, @sort_by
, to indicate which column we would like to have it sorted based on. Then we included true or false for sorting it in ascending
order. Table 8.6.2 shows a sample code for sorting in descending
order.
#MyApp/controllers/pages_controller.rb
def sort_data
@output = @data_frame.sort([@sort_by], ascending: [false])
end
Figure 8.7.1 shows the data in the table is sorted descending
order based on the Base Pay column.
Figure 8.7.1 Result of the view