Last modified: Sat Mar 23 2019 14:48:53 GMT+0800 (Malay Peninsula Standard Time)

Chapter 8. Data Analysis

The example in this chapter requires above average database knowledge and will not provide a step by step process to achieve our final goal. You may have to create your own view and controller components and perform modification for the routes. Reading the documentation and understand what each method perform is required and crucial. If you are new to Ruby on Rails, do not attempt this example.

Data is always free and available anywhere. We can use the data available in the internet to turn it into a more readable form that can provide us useful information. In this chapter, daru gem will be used and discussed. daru is a library for storage, analysis, manipulation and visualization of data in Ruby. The complete documentation for daru can be found here.

In this chapter, we will use an open source data that shows each California State University employee’s salary in the year. All the code discussed in this chapter can be placed in the same controller. I chose to place it in my pages_controller.rb.

Do note that this example has already been presented as a class project by the author. Do not duplicate this project and submit as your class project without seeking the permission of the author. You may put yourself into trouble due to this is the author’s original work.

8.1 Initial Setup

To begin, add the gem daru to your gemfile as shown in Table 8.1.1.


gem 'daru'

8.2 Import Data From CSV File

According to the documentation, there are a few ways that you can import your data. In this example, instead of reading from modal or database, data will be imported from .csv (Microsoft Excel or Spreadsheet) file. Figure 8.2.1 shows a spreadsheet that contains employee that works for California State University (CSU) that is saved in .csv extension.

Figure 8.2.1: A sample .csv file.

Since this program will be large (eventually), each of the operation will be created as a method. This allows us to find our code in the method quickly and correctly. Table 8.2.1 shows a method, load_csv is created to load the .csv file. YOUR-DATA.csv is a spreadsheet that is located in the data folder at the root app directory.


def load_csv  
  @data_frame = Daru::DataFrame.from_csv('data/YOUR-DATA.csv')  

8.3 Output The Table

Outputting the result in the view is very straight forward. Before we are able to print out the result, we have to include the method that we created in Table 8.2.1 to our method for the page. Table 8.3.1 shows a result method is created and have the load_csv method included.


def result

There is nothing much that we have to do. By default, @data_frame method will output the result in table but in the form of HTML code. To print the HTML code on Rails’ view, we have to include the raw method shown in Table 8.3.2.


<%=raw @data_frame %>

Figure 8.3.1 shows 3000 rows and 11 columns of data are processed by daru, Daru::DataFrame(3000x11). The table will not show all the data that you have. In fact, it will show the first 30 rows of data and the last row of data.

Figure 8.3.1: Table printed in the view

8.4 Group By And Mean

To get the mean of each job, we first have to group all the job title by using group_by method. Then, we can use the method .mean to get the mean of each job. Table 8.4.1 shows the group_by and mean method.


def mean_group_by_job  
  @group_by_job = @data_frame.group_by('Job Title')  
  @output = @group_by_job.mean   

Figure 8.4.1 shows the result group by job title. There are a total of 180 rows of data, indicating 180 types of different jobs.

Figure 8.4.1: Result of the view

8.5 Filter By (String)

To get only the job title we want, we can use predefined method, .eq(param). param is the parameter that we want to look for. Table 8.5.1 shows the .eq() method which contains a method @filter_job_title that user can indicate any job title that they want. Do note that ‘Job Title’ indicates that we are using the ‘Job Title’ column in the table.


def filter data
  @output = @data_frame.where(@data_frame['Job Title'].eq(@filter_job_title))  

Figure 8.5.1 shows the result group filtered by job title ACCOUNTANT II. There are a total of 4 rows of data that contains ACCOUNTANT II.

Figure 8.5.1: Result of the view

8.6 Filter By (Value)

To get only the job pay in the range based on the columns we want, we can use predefined method shown in Table 8.6.1.

Table 8.6.1: Predefined Method

Comparator Method Description
lt < returns true for each entry less than the supplied object
lteq <= returns true for each entry less than or equal to the supplied object
mt > returns true for each entry more than the supplied object
mteq >= returns true for each entry more than or equal to the supplied object

Since we have a few different column of data, we can only choose to filter one column at a time and get the data based on the input range we wish to. Ideally, user can input one or two values to indicate the minimum and the maximum range of values they are looking for.

Table 8.6.2 shows a method to filter the value that is greater than the value specified. @filter_option is a column that we would like to have the @greater_than value filter in.


def filter data
  @output = @data_frame.where(@data_frame[@filter_option].gt(@greater_than.to_i))  

Figure 8.6.1 shows the result group filtered by job title ACCOUNTANT II and the total pay is filter by the value greater than 70000. There is only one result that based on the constraint that we defined.

Figure 8.6.1: Result of the view

8.7 Sort By (Value)

To sort the values based on the column we choose, we can use predefined method, .sort(param) to sort in ascending or descending order.

The process is very straight forward. We pass in a parameter, @sort_by, to indicate which column we would like to have it sorted based on. Then we included true or false for sorting it in ascending order. Table 8.6.2 shows a sample code for sorting in descending order.


def sort_data
  @output = @data_frame.sort([@sort_by], ascending: [false])  

Figure 8.7.1 shows the data in the table is sorted descending order based on the Base Pay column.

Figure 8.7.1 Result of the view

results matching ""

    No results matching ""