Wednesday, 11 December 2013

Registering and executing statistical models



nanobi analytics platform comes with a few built-in predictive models. These are time series forecasting, classification, association rules, regression and outlier detection.

These can be used as usual analytics with specified declarative options. This needs no R or SAS programming.

However, for advanced statistical modelling requirements, there is way to register bespoke models. After the nanomart is created, register the model and then execute the model. The results are seen in the usual way (analytics and hives).

These operations can be done through UI or programmatically using the REST Web services. This post shows the pertinent web services.

It is assumed that authentication is already done and token is available. The auth token can then be used to call the usual metadata APIs to get AppIDs and MartIDs.

Armed with this, we can start working with model scripts.

Getting a list of mart scripts and models

GET
<server url>/DataObjectServer/data/do?ot=ast&an=nbmdc_nanomarts_row_id&av=<mart id>&o==&tokenid=<auth token>

Example response:
{
                "ast":
                [
                {
                                "row_id": "fdffcd9d-f50a-4b14-9b4d-712abe49c34c",
                                "is_applied": "n",
                                "describe_name": "",
                                "icon_content": null,
                                "si_id": "106319b8-d03f-402c-a58e-e993d872a233",
                                "icon_path": null,
                                "updated_by_user_id": null,
                                "file_name": "ms_8b244f93-1bdd-43ea-9fef-d0f7cd50eae0_1_vbwyKFHdDYBEkfM.R",
                                "created_by_user_id": "a15cd765-ead8-4794-a32d-ab17bb826ae7",
                                "subscript_type": "R Script",
                                "nbmdc_assets": "my r code",
                                "script_type": "batchcommand",
                                "description": "my r code",
                                "sequence": "1",
                                "nbmdc_nanomarts_row_id": "8b244f93-1bdd-43ea-9fef-d0f7cd50eae0",
                                "created_datetime": "2013-12-10 04:39:58",
                                "active_flag": "y"
                },
                {
                                "row_id": "3cf08efb-c714-473d-bf69-a294fac6caad",
                                "is_applied": "y",
                                "describe_name": "",
                                "icon_content": null,
                                "si_id": "106319b8-d03f-402c-a58e-e993d872a233",
                                "icon_path": null,
                                "updated_by_user_id": null,
                                "file_name": "ms_8b244f93-1bdd-43ea-9fef-d0f7cd50eae0_0_kQTgGZHUMhEFJUg.R",
                                "created_by_user_id": "a15cd765-ead8-4794-a32d-ab17bb826ae7",
                                "subscript_type": "R Script",
                                "nbmdc_assets": "testing r",
                                "script_type": "batchcommand",
                                "description": "testing r",
                                "sequence": "0",
                                "nbmdc_nanomarts_row_id": "8b244f93-1bdd-43ea-9fef-d0f7cd50eae0",
                                "created_datetime": "2013-12-09 07:31:14",
                                "active_flag": "y"
                }
                ]
}

Note: Not that you care, but if you are wondering, "ast" is short for "asset". A model script is internally stored as a mart asset.

Registering a mart script

POST <server url>/NanoClientApplication/martMaintanence
Parameters:
scriptDescription   my r code
script_type batchcommand
subscript_type R Script
scriptSequence 1
via url
fileUpload
fileUrl url/t.R
is_applied n
created_by_user_id a15cd765-ead8-4794-a32d-ab17bb826ae7
martid 8b244f93-1bdd-43ea-9fef-d0f7cd50eae0
si_id 106319b8-d03f-402c-a58e-e993d872a233
token 0e9a4d40-e695-453b-854c-0330313236b3

Response:
[{
                "statusCode": "0",
                "statusMessage": "Maintenance script executed."
}]


Note: While executing the script, the following built-in parameters are supported. They need not be explicitly passed to the script.
$mart_db
$mart_user
$mart_pwd
$mart_db_url

These parameters can be used in the script and are internally replaced by their values.
Additional command line parameters can be passed when executing the script.

Executing a script

POST
<server url>/DataObjectServer/data/do/serverCommand/execute
Parameters:
tokenid   0e9a4d40-e695-453b-854c-0330313236b3
parameter {"id":"fdffcd9d-f50a-4b14-9b4d-712abe49c34c"}

Response:
[{
                "statusCode": "0",
                "statusMessage":
"Successfully executed command.status file url:server/statusFiles/nNFGqJAqSTrtDtF.out"
}]






Wednesday, 30 October 2013

Insurance e-repository - Leveraging Analytics
Following on the heels of the UIDAI initiative comes Insurance Repository, another innovative service.
Currently, a customer has to submit the entire set of KYC documents every time he or she takes up a new policy. Insurance Repository helps dematerialise the policy details, obviating the need to submit a new set of documents.
According to the new facility, a customer enrols with an Insurance Repository services vendor for an e-insurance account which accords the customer a unique ID. All the KYC-related information is fed into this account. After this, any new policy details will be electronically transmitted by the insurer to this account.
Payment and agent details can be viewed through this platform. The customer can also file complaints through this site.
However, the initiative has both pros and cons.

DEMYSTIFYING OPTIONS

Although it may take time to dematerialise the old policies of the customers, which are in paper format, it’s a good initiative as it helps to reduce the cost of administrative overheads.
This should also help reduce premium rates to some extent in the long run. As a good early bird offer, the insurers should offer a small incentive for e-insurance account opening.
It can track policies across insurers, so that customers can compare theservices in a better manner.
Policy terms and conditions have always been printed in small-sized font. Insurers should now have enough space to print in bigger sizes. The Insurance Regulatory and Development Authority (IRDA) can also think about how to overcome the challenges that were faced earlier due to lack of space and the costs involved.
Undelivered policies are a big problem for insurance companies. E-insurance is one way of dealing with this. Sometimes, undelivered policies could actually be dummy documents used in rackets to make false claims.
Money, time and effort are wasted to store insurance policies, adding to administrative costs. The loss of policies has its costs: inserting an advertisement in the papers, preparing indemnity bonds, and so on. All this can be avoided through e-insurance.
The insurance sector is still struggling to overcome the penetration challenges even after a decade. One reason for this is the difficulty in tracking customer behaviour as the data is only available with the company, or with the IRDA. It’s too big a project for the IRDA to handle.
Now it is possible to keep track of customers across companies in the form of small data marts to analyse by IRDA or through the Insurance Repository services or insurers. This kind of data analysis can help study customer behaviour across products, their investment pattern, loyalty, influence of the agent/distributor/online purchase, premium payment pattern, and so on. This in turn can help customers and insurers plan well across segments. Advanced techniques such as like data mining, self- organising maps and so on can help categorise the customers.
All these analyses can be done without loss of privacy and without revealing competitive information as it is a study only at the macro level; no sensitive information is given out. This, in turn, can help customers, the IRDA, insurance companies, third party administrators, brokers and agents frame new strategies to improve the industry.

DAMAGING PROPORTIONS

It is better to keep track of the proportion of usage based on customer profile analysis;disproportionate usage would sound a warning.
There have been instances when a customer has been issued a policy without his/her own knowledge.
In such cases, fraudsters would misuse documents submitted for some other purpose like opening of a bank account or a credit card application, for this purpose. Or, forged documents are used to compromise the KYC.
Since e-insurance accounts call for one-time KYC, there is potential for fraud. The way to get around this is to impose more stringent norms for KYC so that authenticity is verified in multiple ways like calling customers and visiting them in person.
Also, the process of dematerialising old paper policies should be speeded up. Efforts to integrate the demat account between e-insurance and SEBI, banks are worth considering.
With the e-insurance account becoming like a bank account, and insurance policies having a good investment component, sharing of access details with agents or third parties could be dangerous. Hence, insurers and repository companies should adopt best practices from credit card companies and online banking services.
Although internet has good reach across India, not all rural people have access to it. Financial literacy is yet to catch up.
Hence, the much sought after issues in the insurance sector such as rural penetration and micro insurance policies may have to wait for a while to get into demat stage.
However, insurers and insurance repository companies need to keep a watch on rural usage.
Earlier, insurance policies were delivered to the residence of the rural customer. Under the pretence of getting them the all-inclusive e-insurance account, fraudsters could easily fool rural customers. Flagging such scenarios can help act in the interest of rural beneficiaries.
An overall check by pooling the accounts to avoid duplicate e-insurance account will be a good initiative.
There are instances of people having multiple PAN cards. That should not be repeated for e-insurance accounts. Severe penalties can act as a deterrent.

Tuesday, 22 October 2013

Display target Vs achievement metrics in a bubble chart

This new type chart is ideal for displaying target vs achievement comparisons.

Choose the measures and dimensions.


Set the chart type.




Voila!




On hive



.

Real time Data loading

We've added the ability to load data using a web service. This can be used for real time integration.

URL
/DataObjectServer/nanomartservices/nanomartstore/nanomart/table/jsonload

Parameters:
token:<Authentication_Token>
tableid:<Mart_table_id>
modetype:  <append/replace/merge>
misdate:
mis_date_column:
errortolerance:
automisdateupdate:
data:<json formated data>


Example of data to be loaded:

{
"columns" : ["txn_cd", "client_cd", "security", "location", "client_id", "txn", "qty", "rate", "brokerage", "ser_tax"],
"rows" : [
 ["1", "P0001003", "Cl001", "BLR", "719", "Buy", "50.000", "133.350", "0.120", "1.210"],
 ["2", "P0001004", "Cl002", "BLR", "720", "Sell", "59.000", "108.750", "0.080", "1.090"]
 ]
}

Support for 'DEFAULT' clause in the Mart definition SQL


.

The script can have a DEFAULT clause now. The data file being uploaded need not mention the column that has a DEFAULT set.




The mart table will have the column filled automatically.





Supporting additional mart indices for performance tuning

Mart maintenance can be used for creating (or maintaining) indices. Handy way to do performance tuning sometimes.

This mart has no index other than the primary key index.

.


.
We write the mart maintenance script.


.

Register this script.




.

The new index is created.
.



In appstore environments, the "Apply to Subscribers" checkbox can be checked so that app subscribers also get this index.

Implementing SCD type 2 logic when loading a dimension table

To know about SCD, read this:

http://en.wikipedia.org/wiki/Slowly_changing_dimension


Consider the case where we have to implement SCD type 2 on an employee dimension.


The employee attributes are: code, name, .spouse name, designation, manager name.


Any change in designation or manager name, triggers a new record in the employee dimension.


The MERGE statement can be used to implement SCD 2. 


drop table scd2_emp_dim; -- employee dimension table

drop table scd2_emp_dim_stg; -- stage table

create table scd2_emp_dim (

    emp_wid numeric IDENTITY primary key, -- surrogate key
    emp_code varchar(100) not null,  -- natural key
    emp_name varchar(100) not null, 
    emp_spouse_name varchar(100) not null,
    emp_desig varchar(100) not null,
    emp_mgr_name  varchar(100) not null,
    constraint scd2_emp_dim_uk unique (emp_code, emp_desig, emp_mgr_name) -- to ensure one record per emp code+scd columns
);

create table scd2_emp_dim_stg (

    emp_code varchar(100) primary key,
    emp_name varchar(100) not null,
    emp_spouse_name varchar(100) not null,
    emp_desig varchar(100) not null,
    emp_mgr_name  varchar(100) not null
);

delete from scd2_emp_dim_stg;

insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e1','e1','s1','d1','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e2','e2','s2','d2','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e3','e3','s3','d3','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e4','e4','s4','d4','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e5','e5','s5','d5','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e6','e6','s6','d6','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e7','e7','s7','d7','boss');


merge into scd2_emp_dim t1 using scd2_emp_dim_stg t2 on (t1.emp_code = t2.emp_code AND t1.emp_desig = t2.emp_desig AND t1.emp_mgr_name = t2.emp_mgr_name)

when matched then
    update set t1.emp_name = t2.emp_name, t1.emp_spouse_name = t2.emp_spouse_name
when not matched then
    insert (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) 
    values (t2.emp_code, t2.emp_name, t2.emp_spouse_name, t2.emp_desig, t2.emp_mgr_name);

select * from scd2_emp_dim;


Since the dimension was empty so far, all records got inserted.


delete from scd2_emp_dim_stg;

insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e2','e2','s2','d2','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e3','e3 new','s3 new','d3','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e4','e4','s4','d4 new','boss');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e5','e5','s5','d5','boss new');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e6','e6','s6','d4 new','boss new');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e7','e7 new','s7 new','d7 new','boss new');
insert into scd2_emp_dim_stg (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) values ('e8','e8','s8','d8','boss');

merge into scd2_emp_dim t1 using scd2_emp_dim_stg t2 on (t1.emp_code = t2.emp_code AND t1.emp_desig = t2.emp_desig AND t1.emp_mgr_name = t2.emp_mgr_name)

when matched then
    update set t1.emp_name = t2.emp_name, t1.emp_spouse_name = t2.emp_spouse_name
when not matched then
    insert (emp_code, emp_name, emp_spouse_name, emp_desig, emp_mgr_name) 
    values (t2.emp_code, t2.emp_name, t2.emp_spouse_name, t2.emp_desig, t2.emp_mgr_name);

select * from scd2_emp_dim;

    
The new records got inserted. Existing employee records got a new entry if any of the SCD columns changed (designation, manager name). Existing employee records which changed in non-SCD columns like name or spouse name, were simply updated.


This is a minimalistic SCD 2 scheme. We can add current indicator column to this scheme fairly easily. We can also add effective start date and end date to this easily. Also, records which were not there in the stage but in the dimension table were left untouched. MERGE Statement can be enhanced to mark those soft-deleted.

Specifying filters when the value has special characters like an apostrophe

If the analytic filter is specified as:
dimension_level = 'KPI's' 

we get an error.


This works:


dimension_level = 'KPI''s' 


A single quote in front of the special character like apostrophe takes care of it.


V1.6 will have a new way to apply filters that is visual. There will be almost no need to type filters. 

Maintaining the data load history table manually after initial load

Until this feature is available in UI, here's how we can maintain the data load history table manually after the initial load.

1. Do the load of the entire file (or via DB Connect). Pick blank load period (or a specific value that you note down).

2. Note the mart id of the mart being loaded into.

3. Select * from nbmdc_dataload_history where nbmdc_nanomarts_row_id = <Your mart id> and tablename = <Your table name>
Note down the row_id of this. Use it below.

4. Change the row_id, mart table schema/name, mart table date column name in the statement below and run.

insert into nbmdc_dataload_history (
row_id    ,nbmds_users_row_id    ,nbmdc_nanomarts_row_id    ,nbmdc_nanomart_tables_row_id    ,icon_path    ,mis_date    ,loaded_date_of_data    ,updated_by_user_id,
    operation_mode    ,records_added    ,status_message    ,active_flag    ,tablename    ,martname    ,icon_content    ,created_by_user_id    ,records_deleted,
    records_updated    ,status_code    ,error_filepath    ,records_failed    ,time_taken    ,si_id    ,nbmdm_tenants_row_id    ,file_name
    )
select
t1.row_id  + '-' + cast((row_number() over (order by sysdatetime())) as varchar) row_id    ,
t1.nbmds_users_row_id    ,t1.nbmdc_nanomarts_row_id    ,t1.nbmdc_nanomart_tables_row_id    ,t1.icon_path    ,
t2.txn_date mis_date    ,
t1.loaded_date_of_data    ,
t1.updated_by_user_id,    t1.operation_mode    ,t1.records_added    ,t1.status_message    ,t1.active_flag    ,t1.tablename    ,t1.martname    ,t1.icon_content    ,
t1.created_by_user_id    ,t1.records_deleted    ,t1.records_updated    ,t1.status_code    ,t1.error_filepath    ,t1.records_failed    ,t1.time_taken    ,t1.si_id    ,t1.nbmdm_tenants_row_id    ,
t1.file_name
from nbmdc_dataload_history t1, qSFpsKqaycnaRKe.last_week_numbers t2
where t1.row_Id = '31334174-6ae2-4c55-bf4e-94b43bee401d';

-- delete the old entry as it is no longer required
delete from nbmdc_dataload_history
where row_Id = '31334174-6ae2-4c55-bf4e-94b43bee401d';

Doing minute wise analysis

The lowest grain of the the built in Day dimension is day. We can do analysis at a lower grain by adding appropriate dimensions. If we want to display sales by hour and we have the transaction timestamp in the sales fact table, we can do this analysis.

First we add a column called "Minutes since midnight" in the fact table. This is a number that converts the timestamp into a number that is equal to the number of minutes since midnight. This conversion improves the join performance.


The Minutes dimension table has these columns.


MINUTES_SINCE_MIDNIGHT

TIME_IN_AMPM_FORMAT
TIME_IN_24_HRS_FORMAT
TIME_IN_HOUR_BUCKET
TIME_IN_30_MINS_BUCKET

Data appear like this:





The entire file is kept here and can be used in any app.


Go ahead and create the measures and dimensions. These kind of analysis can be displayed.




Identifying close friends from call records

The data sleuths at telcos do this all the time: identifying close friends from call records. Then use that information to target people with advertisements.

Consider a call records table  that has the caller number, receiver number and call start time. Find those cases where the caller calls the same number first and last.

create table phone_records (
src  numeric,
dest numeric,
call_st_time datetime
)


insert into phone_records (src,dest,call_st_time) values (1,100,'2013-01-01 08:00:00')
insert into phone_records (src,dest,call_st_time) values (1,200,'2013-01-01 08:10:00')
insert into phone_records (src,dest,call_st_time) values (1,300,'2013-01-01 08:20:00')
insert into phone_records (src,dest,call_st_time) values (1,100,'2013-01-01 08:30:00')
insert into phone_records (src,dest,call_st_time) values (2,100,'2013-01-01 08:40:00')
insert into phone_records (src,dest,call_st_time) values (3,100,'2013-01-01 08:50:00')
insert into phone_records (src,dest,call_st_time) values (3,100,'2013-01-01 08:51:00')
insert into phone_records (src,dest,call_st_time) values (3,100,'2013-01-01 08:52:00')
insert into phone_records (src,dest,call_st_time) values (4,100,'2013-01-01 08:53:00')
insert into phone_records (src,dest,call_st_time) values (4,200,'2013-01-01 08:54:00')

Here caller 1 has called the same number first and last. So have 2 and 3. But 4 has not.
Here's the query that can extract this information in one shot:

select src, (case when first_num = last_num then 'Y' else 'N' end) is_match from
(select src, max(case when rn_asc = 1 then dest else -1 end) first_num, max(case when rn_desc = 1 then dest else -1 end) last_num
from
(select * from (
select src, dest, call_st_time,
rank() over (partition by src order by call_st_time) rn_asc, rank() over (partition by src order by call_st_time desc) rn_desc
from phone_records) v1
where rn_asc = 1 or rn_desc = 1) v2 group by src)  v3

Showing different date formats

Default format for date fields is YYYY-MM-DD and that's how they appear in Hives. But if a different format is required, we have to use an expression. If it's a dimension attribute, use the level expression. Use the CONVERT function to display in the format desired.
If it's a measure (as a MAX rollup, since SUM is not possible and COUNT case is immaterial), create an additional computed measure to use the base measure. Screenshots below.

The codes for various format are here:

http://www.sql-server-helper.com/tips/date-formats.aspx

1. We will be able to set this formatting option in analytic/hive in the future.

2. The mechanism used here is SQL Server specific. The functions are different in Oracle.

Screenshots:

Default format



Measure definition






New format


.


showing number of days between 2 dates

The example below shows two dates for each dimension member. Open date and close date. Suppose we also want to show the number of days between the two dates. Here's how this was achieved here. Define a computed measure that uses the base measures and use the DATEDIFF function to get the # of days.



Output.


.

displaying custom date and number formatting

The default date format is YYYY-MM-DD. The example below shows how to show a different format for dates. Also the measure being shown on the chart is shown in K rather than the raw value.

Original chart



New measure defined as




.

The dimension date field updated as:





The chart using these looks like this:





The measure now appears in thousands and the date format is DD-MM-YYYY

.

How to set up a height balanced stacked bar chart

Height balanced stacked bar charts are often used to show % breakup. This allows easy comparison across dimensions. The total height of all stacked bars is identical.



setting max memory of SQL Server


This may be required to ensure SQL Server doesn't take up all the available memory on the server. The default value is too high.

UI feature enable disable options available

nanoclientapplication.properties parameter controls which features are available.

uiFeatureFlag={"maintenanceScripts":"true","appstorepublishsubscribe":"true", "customAdminuseractivate":"true", "alertinhive":"false", "embeddashboardinhive":"true", "enablelogo":"true"}



For example:

customAdminuseractivate = controls whether customer admins can activate a user who is deactivated. This is required where email is not an option

Displaying the number of seconds in hh:mm:ss format

Original analytic.


.


Added a computed measure on the measure that shows the number of seconds as follows:

cast(floor(({Seconds})/3600) as varchar(10))+':'+
cast(floor((({Seconds})%3600)/60) as varchar(10))+':'+
cast(((({Seconds})%3600)%60) as varchar(10))



Final output:


.

Displaying comparative sales information

Requirement : Display Sales numbers day wise. Next to the day's sale, display the amount sold on the same day of last week. Also, display the average of last 4 same day of week sales.

For instance, if we are seeing the sales numbers for Sep 27th (which is a friday), the report should display the number for Sep 20th. Also, the average of the numbers for Sep 20, Sep 13, Sep 6 and Aug 30.


this is one way:


select t1.CAFE_ID,t1.SALES_DATE,

sum(t1.GROSS_SALE) GROSS_SALE,
lag(sum(t1.GROSS_SALE),7,0) over (order by t1.CAFE_ID,t1.SALES_DATE) as GROSS_SALE_LAST_WEEK,
floor(
((lag(sum(t1.GROSS_SALE),7,0) over (order by t1.CAFE_ID,t1.SALES_DATE) +
lag(sum(t1.GROSS_SALE),14,0) over (order by t1.CAFE_ID,t1.SALES_DATE) +
lag(sum(t1.GROSS_SALE),21,0) over (order by t1.CAFE_ID,t1.SALES_DATE) +
lag(sum(t1.GROSS_SALE),28,0) over (order by t1.CAFE_ID,t1.SALES_DATE))/4)) GROSS_SALE_LAST_4_WEEK_AVG
from DS_DAILY_SALES t1
group by t1.CAFE_ID,t1.SALES_DATE;

This uses the LAG function to lookback a week ago.


However, the days where a store was closed, there is no record for the Cafe for that day. So the logic of looking back 7 records in the window, doesn't always work. Even a single day off will make the report wrong.


Alternate:


select v1.CAFE_ID,v1.SALES_DATE,v1.GROSS_SALE, 

coalesce(v2.GROSS_SALE ,0) GROSS_SALE_LAST_WEEK,
floor((coalesce(v2.GROSS_SALE ,0)+coalesce(v3.GROSS_SALE ,0)+coalesce(v4.GROSS_SALE ,0)+coalesce(v5.GROSS_SALE ,0))/4) GROSS_SALE_LAST_4_WEEK_AVG
from
(select CAFE_ID,SALES_DATE,sum(GROSS_SALE) GROSS_SALE
from DS_DAILY_SALES
group by CAFE_ID,SALES_DATE) v1 left outer join (
select CAFE_ID,SALES_DATE,sum(GROSS_SALE) GROSS_SALE
from DS_DAILY_SALES
group by CAFE_ID,SALES_DATE) v2 on (v1.CAFE_ID = v2.CAFE_ID and v1.SALES_DATE = DATEADD(DAY,7,v2.SALES_DATE))
left outer join (
select CAFE_ID,SALES_DATE,sum(GROSS_SALE) GROSS_SALE
from DS_DAILY_SALES
group by CAFE_ID,SALES_DATE) v3 on (v1.CAFE_ID = v3.CAFE_ID and v1.SALES_DATE = DATEADD(DAY,14,v3.SALES_DATE))
left outer join (
select CAFE_ID,SALES_DATE,sum(GROSS_SALE) GROSS_SALE
from DS_DAILY_SALES
group by CAFE_ID,SALES_DATE) v4 on (v1.CAFE_ID = v4.CAFE_ID and v1.SALES_DATE = DATEADD(DAY,21,v4.SALES_DATE))
left outer join (
select CAFE_ID,SALES_DATE,sum(GROSS_SALE) GROSS_SALE
from DS_DAILY_SALES
group by CAFE_ID,SALES_DATE) v5 on (v1.CAFE_ID = v5.CAFE_ID and v1.SALES_DATE = DATEADD(DAY,28,v5.SALES_DATE))

This does a left outer join. As this is an outer join, missing records don't make the report wrong.

Friday, 4 October 2013

Education Analytics - Need of the hour....



The role of education in the contemporary world is not mere knowledge creation but creating a ‘bent of mind’ for the students to adapt themselves to this competitive environment. Not all the science students end up as scientists, similarly engineering students as relevant engineers. A chemical engineer lands up as Java programmer and a science graduate works for a courier firm yet they still work exceptionally well with the basic concepts that was taught in their schools and colleges which is called as ‘Bent of Mind’. In the previous century, rank system was implemented to analyse the students (rank as a way of analytics), then grade system emerged (grade as a measure to analyze and categorize) with the growing complexity of measures and opportunities, complex analytics are required to understand the student aspirations, potential to map  the opportunities and their probability to succeed.

The world is becoming more dynamic, volatile and complex today.  It is a universal truth that “Education is the key for the growth of any individual in this complex world”.  Students join the educational stream and aspire to become a successful person relatively with respect to his/her friends, family, peers and others.

The purpose of education is now to create the required ‘Bent of Mind’ for the student to adapt themselves and take up the opportunity. This makes the chemistry graduate a successful accounting employee and an architect as a successful java programmer.

How long can we run successfully on the basis of “Bent of Mind” concept? The existing educational system was created long back to create lot of clerical jobs only. Thanks to the computerization, many aspects of teaching have been computerized.  Innovative teaching models like ‘Computer Based Training” has evolved in the past two decades taking teaching methods to new heights.  Objective based learning has taken center stage and even competitive exams have been oriented towards objective evaluation. The point is that the world has recognized the need for objective based learning thereby preparing the students to win over competitions, thus making a marked shift from the earlier models of teaching.

In summary, teaching methodology and the drive among majority of students has considerably improved.

Couple of decades back, any information about opportunities can be accessed only through word of mouth, newspapers or library only. Now lot of public information is available over web as public data. With the technological evolution, why can’t we derive technology to map the opportunities with the individual potential of the students to excel in their aspirations?

Any common man may observe that various sporting activities have started using analytics as a crucial element to weed out the competition. Let us take an example of cricket, batting styles of batsman are analysed in various angles and bowler focuses his attack on the basis of the findings of the analysis.

So, what does this Analytics Does?

·         Evaluate students across a set of more than 100 skill sets – It is also customizable to start with simply marks and attendance and expand to more skills
·         Helps the students, parents, teachers to identify the strengths and weaknesses of the students and channelize the efforts accordingly
·         It also gives an overview of the student activities in a snapshot
·         The analytics will be available on mobiles and can be carried out easily at any place and at any point of time.
·         Easy upload options, most schools will be using only the excel sheet for analytics. The analytics will enable the users to upload the same excel sheets for more analytics.
·         Analytics can be expanded to integrate advanced statistical predictive models as well.
·         Improves the brand image of the school or college amongst the parents community.