Building a shiny app with drag and drop data interface

Introduction

Data visualization is an important aspect of the data science work flow. This app enables the analyst to understand the data in question. In this post, we will build an application which will allow the user to drag and drop a .csv file and get fast summaries of the data.

Background

The idea of this post stems from this StackoverFlow question.

Javascript

Javascript allows us to import data into the shiny application via drag and drop.The code of the same is as follows

var datasets = {};
var dragOver = function(e) { e.preventDefault(); };
var dropData = function(e) {
    e.preventDefault();
    handleDrop(e.dataTransfer.files);
};
var handleDrop = function(files) {
    for (var i = 0, f; f = files[i]; i++) {
    var reader = new FileReader();

    reader.onload = (function(file) {
        return function(e) {
        datasets[file.name.toLowerCase()] = e.target.result;
        Shiny.onInputChange("mydata", datasets);
        var div = document.createElement("div");
        var src = "https://cdn0.iconfinder.com/data/icons/office/512/e42-512.png";
        div.id = "datasets";
        div.innerHTML = [
            "<img class='thumb' src='", src, "' title='", encodeURI(file.name),
            "'/>", "<br>", file.name, "<br>"].join('');
        document.getElementById("drop-area").appendChild(div);
        };
    })(f);
    reader.readAsText(f);
    }
};
// debug
var printData = function(data) {
    var div = document.createElement("div");
    div.innerHTML = datasets[data];
    document.getElementById("data-output").appendChild(div);
};
Raw;

The javascript code is saved as a javascript file (.js). The file should be saved within a folder called www.

UI

The ui.R section of the application implements the user interface. Currently, there are three visualizations that have been implemented. These visualizations are

  • bar graphs for grouped data
  • scatter plots to visualize how one variable varies with the other
  • box plots to look at grouped distributions
  • and ring plots to look at the proportions of factor variables

The bar and box plots are present in the same section.

The visualizations are implemented using the plotly library. The plotly library allows for integration with the ggplot2 graphing library via the ggplotly() function.

The code for the same is as follows:

library(shiny)
library(shinyjs)
library(V8)
appCSS <- "
#loading {
  position: fixed;
left: 50%;
top: 50%;
z-index: 1;
width: 150px;
height: 150px;
margin: -75px 0 0 -75px;
border: 16px solid #f3f3f3;
border-radius: 50%;
border-top: 16px solid #3498db;
width: 120px;
height: 120px;
-webkit-animation: spin 2s linear infinite;
animation: spin 2s linear infinite;
}
@-webkit-keyframes spin {
0% { -webkit-transform: rotate(0deg); }
100% { -webkit-transform: rotate(360deg); }
}
@keyframes spin {
0% { transform: rotate(0deg); }
100% { transform: rotate(360deg); }
}
"


jsfile <- "getdata.js"
cssfile <- "style.css"
js_scroll_file <-'scroll.js'
ui <- shinyUI(
  fluidPage(

    tags$head(tags$link(rel="stylesheet", href=cssfile, type="text/css"),
              tags$script(src=jsfile),
              tags$script(src=js_scroll_file)),
    sidebarLayout(

      sidebarPanel(
        fluidRow(
          h5("Drop Datasets in the box below"),
          div(class="col-xs-12", id="drop-area", ondragover="dragOver(event)", 
              ondrop="dropData(event)")),width=2

      ),
      mainPanel(
        useShinyjs(),
        inlineCSS(appCSS),
        shinyjs::extendShinyjs(text = "shinyjs.refresh = function() { location.reload(); }"),


        hidden(div(id='loading')),

          hidden(div(style="width=1000px",
            id='app-content',

          fluidRow(HTML('<h1> Welcome to the Data Summarizer </h1>'),actionButton("refresh", "Refresh to enter new data")),
        fluidRow(uiOutput("table_summary")),
        uiOutput('tables'),

        fluidRow(column(width=4,HTML("<h3>Scatter Plot</h3>"),uiOutput('num_buttons_1'),uiOutput('num_buttons_2'),uiOutput('grouped_ungrouped')),


        column(width=4,HTML('<h3>Bar and Box Plots</h3>'),uiOutput('factor_button'),
                                                                                            uiOutput('numeric_input'),
                                                                                            uiOutput('output_type')
                                                                                            ),
        column(width=4,HTML('<h3>Ring Plots</h3>'),uiOutput('factor_button_for_ring_chart'))),




       uiOutput("plots_first_row")
        #uiOutput('plots_second_row')

      )
          )

      )
    )

  )
)

The appCSS function helps in creating a transition between operations. The javascript and css files are referenced using tags$script and tags$link respectively. This is similar to the “ and href='test.css' in HTML.

server.R

Every application needs a back end functionality which processes the data in question. For that very purpose, a shiny application needs a server.R file. This file creates a bridge to the ui.R file via reactive elements which change based on user interaction.

library(ggplot2)
library(lazyeval)
library(tidyverse)
library(plotly)
library(V8)
#options(error = 999)
appCSS <- "
#loading {
  position: fixed;
left: 50%;
top: 50%;
z-index: 1;
width: 150px;
height: 150px;
margin: -75px 0 0 -75px;
border: 16px solid #f3f3f3;
border-radius: 50%;
border-top: 16px solid #3498db;
width: 120px;
height: 120px;
-webkit-animation: spin 2s linear infinite;
animation: spin 2s linear infinite;
}
@-webkit-keyframes spin {
0% { -webkit-transform: rotate(0deg); }
100% { -webkit-transform: rotate(360deg); }
}
@keyframes spin {
0% { transform: rotate(0deg); }
100% { transform: rotate(360deg); }
}
"



plotTheme <- function(base_size = 12) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 12,colour = "black",hjust=0.5),
    plot.subtitle = element_text(face="italic"),
    plot.caption = element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),
    panel.grid.major = element_line("grey80", size = 0.1),
    panel.grid.minor = element_blank(),
    strip.background = element_rect(fill = "grey80", color = "white"),
    strip.text = element_text(size=8),
    axis.title = element_text(size=15),
    axis.text = element_text(size=15),
    axis.title.x = element_text(hjust=1,size=15),
    axis.title.y = element_text(hjust=1,size=15),
    plot.background = element_blank(),
    legend.background = element_blank(),
    legend.title = element_text(colour = "black", face = "bold"),
    legend.text = element_text(colour = "black", face = "bold"),
    axis.text.y = element_text(size=15),
    axis.text.x = element_text(vjust=-1,angle=90,size=15))
}
server <- function(input, output, session) {


  observeEvent(input$mydata, {
    show("app-content")




    len = length(input$mydata)
    output$tables <- renderUI({
      table_list <- lapply(1:len, function(i) {
        tableName <- names(input$mydata)[[i]]
        tableOutput(tableName)
      })
      do.call(tagList, table_list)
      #data_output <- read.csv(text=input$mydata[[name]])
      #tableOutput(head(data_output))
    })
    output$table_summary <- renderUI({

      h3("Table Summary")
    })
    observeEvent(input$refresh, {

      show(id="loading", anim = TRUE, animType = "fade")
      Sys.sleep(1.5)

      hide(id = "loading", anim = TRUE, animType = "fade") 
      shinyjs::js$refresh()
    })

    output$num_buttons_1 <- renderUI({
      data_output <- read.csv(text=input$mydata[[name]])
      nums <- sapply(data_output, is.numeric)
      data_nums <- data_output[,nums]

      nums_names <- names(nums[nums==TRUE])
      selectInput('numerical_input_1',label="X-axis",choices=nums_names,selected = nums_names[1])


    })
    output$output_type <- renderUI({


      radioButtons('aggregate_method','Aggregate Function',choices = c('Mean'='mean','Median'='median','Distribution'='dist'),selected = 'mean',inline = T)
    })
    output$num_buttons_2 <- renderUI({
      data_output <- read.csv(text=input$mydata[[name]])
      nums <- sapply(data_output, is.numeric)
      data_nums <- data_output[,nums]
      nums_names <- names(nums[nums==TRUE])
      selectInput('numerical_input_2',label="Y-axis",choices=nums_names,selected = nums_names[2])


    })
    output$factor_button <- renderUI({

      data_output <- read.csv(text=input$mydata[[name]])
      factor_variables <- sapply(data_output,is.factor)
      factor_variables <- factor_variables[factor_variables==TRUE]
      names_factors <- names(factor_variables)
      selectInput('factor_input',label="Input the Factor Variable",choices = names_factors,selected = names_factors[1])

    })

    output$numeric_input  <- renderUI({

      data_output <- read.csv(text=input$mydata[[name]])
      numeric_variables <- sapply(data_output,is.numeric)
      numeric_variables % 
          group_by_(group.var) %>%
          summarise_(n=interp(~median(v),v=as.name(metric.var)))

      }else{
        df %>%
          select_(group.var,metric.var)

      }


    }
    output$grouped_plots <- renderPlot({

      data_output <-read.csv(text=input$mydata[[name]])

      validate(
        need(input$factor_input_for_ring!='', "Ooops...This data set does not have a factor variable."))
      data_group <- group_function(data_output,input$factor_input,input$numeric_input)



      data_group <- as.data.frame(data_group)
      #print(names(data_group))
      if(input$aggregate_method=='dist'){

        p <- ggplot(data=data_group,aes(x=data_group[,names(data_group)[1]],y=data_group[,names(data_group)[2]]))+
          geom_boxplot()+labs(x=names(data_group)[1],y=names(data_group)[2])

      }else{
        p <- ggplot(data=data_group,aes(x=data_group[,names(data_group)[1]],y=data_group[,names(data_group)[2]]))+
          geom_bar(stat='identity')+labs(x=names(data_group)[1],y=names(data_group)[2])

      }
      show(id="loading", anim = TRUE, animType = "fade")
      Sys.sleep(1.5)

      hide(id = "loading", anim = TRUE, animType = "fade") 

      print(p+plotTheme())



    })


    output$scatter <- renderPlot({

      data_output <- read.csv(text=input$mydata[[name]])

      data_nums <- data_output
      # nums_names <- names(nums)




      title_text <- paste("Relationship between",input$numerical_input_1,'and',input$numerical_input_2)
      if(input$grouped_ungrouped=='ungrouped'){
        p % summarise(n=)
        p <- ggplot(data=data_nums,aes(x=data_nums[,input$numerical_input_1],y=data_nums[,input$numerical_input_2],
                                       color=as.factor(data_nums[,input$factor_input])))+
          geom_point()+labs(title=title_text,x=input$numerical_input_1,y=input$numerical_input_2)+
          guides(fill=guide_legend(title=input$factor_input))+theme(legend.position = 'none')

      }

      show(id="loading", anim = TRUE, animType = "fade",'Graphs are loading...')
      Sys.sleep(1.5)

      hide(id = "loading", anim = TRUE, animType = "fade",'Graphs are loading...') 
      tryCatch(print(p+plotTheme()))



    })
    output$grouped_ungrouped <- renderUI({


      radioButtons(inputId = 'grouped_ungrouped',label="Grouped/Ungrouped",choices=c('Grouped'='grouped',
                                                                                     'Ungrouped'='ungrouped'),selected = 'ungrouped')



    })
    output$factor_button_for_ring_chart <- renderUI({

      data_output <- read.csv(text=input$mydata[[name]])
      factor_variables <- sapply(data_output,is.factor)
      factor_variables  <- names(factor_variables[factor_variables==TRUE])
      selectInput('factor_input_for_ring',label="Input Variable",selected = factor_variables[1],choices=factor_variables)

    })

    output$ring_chart % mutate(n=n/sum(n))
        p %
          group_by_(input$factor_input_for_ring) %>%
          summarize(count = n()) %>%
          rename_('group'=input$factor_input_for_ring) %>%
          plot_ly(labels = ~group, values = ~count) %>%
          add_pie(hole = 0.6) %>%
          layout(title = paste('Proportions of',input$factor_input_for_ring),  showlegend = FALSE,
                 xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
                 yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE)) %>%
          config(displayModeBar=FALSE,sendData=FALSE,displaylogo=FALSE)

        p












    })
    output$bar_and_box_plot_header <- renderUI({

      if(input$aggregate_method=="dist"){

        HTML('<h3> Box Plots </h3>')
      }
      else{
        HTML('<h3> Bar Plots </h3>')
      }

    })
    output$plots_first_row <- renderUI({




      fluidRow(

        HTML("<br>"),
        HTML("<br>"),
        HTML("<h3>Scatter Plots</h3>"),
         plotOutput('scatter'),

         HTML("<br>"),
         HTML("<br>"),
        uiOutput('bar_and_box_plot_header'),


         plotOutput('grouped_plots'),
         HTML("<br>"),
         HTML("<br>"),
        HTML("<h3>Ring Plots</h3>"),

         plotlyOutput('ring_chart'),
         HTML("<br>"),
         HTML("<br>"),
         HTML('Go Back To The Top')

      )



    })


    for (name in names(input$mydata)) {
      output[[name]] <- renderTable(head(read.csv(text=input$mydata[[name]]),4))
    }
  })

}

The renderUI() functions allow for responsive change based on user input. This function can be used to change the functionality of the app based on user interactivity.

Additionally, the user can click the Refresh to enter new data button to drag and drop new data. The box and bar plots do not occur together; they are based on user input. The user can also toggle between Grouped and Ungrouped radio buttons to get a sense of grouped plots and ungrouped plots. To get a feel of the applications , there are three demos with popular data sets.

Demo

Iris Flowers Data set

mtcars Data Set

Boston Housing Data

  • For datasets that do not contain factor variables, an error message is displayed ie Oops... This dataset does not have factor variable.
  • The user can re-use the application with another dataset by clicking the refresh button.

Limitations

  • The data set has to be in the .csv format
  • Currently the app does not handle time series data.

Thanks for reading! Feel free to leave a comment below!
You can get in touch with me on Twitter or you can email me at padhokshaja@gmail.com . The code can be found here.

Wanna play around with the app? Click here or run the following command on RStudio "shiny::runGitHub( "Data-Summariser", "adhok")".

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s