Smoothing data by bin means, min-max normalization, z-score normalization and normalization by decimal scaling in Python

Problem Statement

Following data (in increasing order) is provided for the attribute ‘age’: 13, 15,
16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46,
52, 70.
(a) Use smoothing by bin means to smooth these data, using a bin depth of 3.
(b) Use min-max normalization to transform the value 35 for age onto the range
[0.0, 1.0].
(c) Use z-score normalization to transform the value 35 for age, where the standard
deviation of age is 12.94 years.
(d) Use normalization by decimal scaling to transform the value 35 for age.

● Algorithm

Input : Data set of elements as data and depth of the binning as depth
Output: Displaying smoothing by bin means, min-max normalization, z-score
normalization and normalization by decimal scaling.
Data Structure: data is a list where we store user-inputted data/predefined data. Depth is
the integer data type where we store the depth of the bin.
Description: Using statistics package’s functions to calculate numerical things with less
complexity.

Step 1 :    Start
Step 2 :    data ← 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35,
35, 36, 40, 45, 46, 52, 70
Depth ← 3

Step 3 :    ans ← Call bin function with data and depth
Step 4 :    minAns ← call minMaxNor with 35 and list
Step 5 :    Display “The data after bin means smoothing “ ans
Step 6 :    Display “After doing min-max normalization “ minAns
Step 7 :    zNormalization ← call zNor with 35 , mean value of the list data nad 12.94
Display “After doing z-score normalization : “, zNormalization

Step 8 :    decimalNormalitation ← call decNor with 35 and maximun value of list data
Display “After doing normalization by decimal scaling :”, decimalNormalitation

Step 9 :    Stop

Algorithm for bin

Input: a list called list from where we will fetch the data
An integer called depth where the depth of the bin is stored

Output : Returns the smoothed bin
Data Structure: maxSize integer to store the length of the array, newList is a list where we
store the smoothed data.
Step 1 :    Start
Step 2 :    maxSize ← length of the list named list
Step 3 :    i ← 0
Step 4 :    Repeat steps 5 to 12 while i less than (maxSize-depth)+1
Step 5 :    sum ← 0
Step 6 :   Repeat step 7 for j ← i and j less than i+depth
Step 7 :    sum ← sum+list[j]
Step 8 :    ans ← sum divided by depth
Step 9 :    Set ans value with 2 decimal places after .
Step 10 :   Repeat step 11 for j ← i and j less than i+depth
Step 11 :   append ans to newList
Step 12:   i ← i+depth
Step 13 :   if maxSize mod depth not equals 0
Then
I. sum ← 0
ii. Repeat step 13.iii for j ← i and j less than maxSize
iii. sum ← sum+list[j]
iv. ans ← sum divided by (maxSize – I)
v. Set ans value with 2 decimal places after .
vi. Repeat step 11 for j ← i and j less than maxSize
vii. append ans to newList
Step 14 : Return newList

Algorithm for minMaxNor

Input: a list called list from where we will fetch the data

An integer called num
Output : Returns the answer
Data Structure : ans an integer where the answer to be stored
Step 1:     Start
Step 2 :    ans ← round with 3 decimal places (num-list[0])/(list[len(list)-1]-list[0])
Step 3 :    Return ans

Algorithm for zNor

Input: mean value of the list as mean
Stander Division as stdDv
An integer called num
Output : Returns the answer

Step 1:    Start
Step 2 :   Return round with 3 decimal places (num-mean)/stdDv

 

Algorithm for decNor

Input: Maximum value of the list as maxNum

An integer called num
Output : Returns the answer
Data Structure : div an integer where the answer to be stored
Digit where the size of the maxNum to be stored

Step 1:     Start
Step 2 :    digit ← len(str(maxNum))
Step 3 :    div ← 10 to the power digit
Step 4 :    returns num/div

'''
Problem Statement
Following data (in increasing order) is provided for the attribute ‘age’: 13, 15,
16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46,
52, 70.
(a) Use smoothing by bin means to smooth these data, using a bin depth of 3.
(b) Use min-max normalization to transform the value 35 for age onto the range
[0.0, 1.0].
(c) Use z-score normalization to transform the value 35 for age, where the standard
deviation of age is 12.94 years.
(d) Use normalization by decimal scaling to transform the value 35 for age.
'''


import statistics

def bin(list,depth):
    maxSize=len(list)
    i=0
    newList = []
    while (i<maxSize-depth+1):
        sum = 0
        #Sum of a BIN
        for j in range (i,i+depth):
            sum+=list[j]
        #Smoothing a BIN
        ans = sum / depth
        ans=round(ans,2)

        #Smoothed Data in a list
        for j in range (i,i+depth):
            newList.append(ans)
        i+=depth

    #if number of elements in list is not a multiplier of depth
    if(maxSize%depth!=0):
        sum=0
        for j in range (i,maxSize):
            sum+=list[j]
        ans=sum/(maxSize-i)
        ans = round(ans, 2)

        # Smoothed Data in a list
        for j in range(i, maxSize):
            newList.append(ans)
    return newList


def minMaxNor(num,list):
    ans=round((num-list[0])/(list[len(list)-1]-list[0]),3)
    return ans

def zNor (num,mean,stdDv):
    return round((num-mean)/stdDv,3)

def decNor(num,maxNum):
    digit=len(str(maxNum))
    div=pow(10,digit)
    return num/div


list = [13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70]
depth=3

#If user Inputted data is needed uncomment the code below
'''
dataSetNo=int(input("Enter the no to data set:\t"))
data=[]
    for data in range(dataSetNo):
    data.append(float(input("Enter data:\t")))
depth=int(input("Enter the no of bins:\t"))
'''

ans=bin(list,depth)
minAns=minMaxNor(35,list)
print("The data after bin means smoothing : \n",ans)
print("\nAfter doing min-max normalization : \t",minAns)
zNormalization=zNor(35,statistics.mean(list),12.94)
print("\nAfter doing z-score normalization : \t", zNormalization)
decimalNormalitation=decNor(35,max(list))
print("\nAfter doing normalization by decimal scaling :\t",decimalNormalitation)

Output:

The data after bin means smoothing :
[14.67, 14.67, 14.67, 18.33, 18.33, 18.33, 21.0, 21.0, 21.0, 24.0, 24.0, 24.0, 26.67, 26.67, 26.67,
33.67, 33.67, 33.67, 35.0, 35.0, 35.0, 40.33, 40.33, 40.33, 56.0, 56.0, 56.0]
After doing min-max normalization : 0.386
After doing z-score normalization : 0.389
After doing normalization by decimal scaling : 0.35

● Discussion:

Here I have used a package called Statistics to uses it’s functions. Using those function
the programme became shorter and saved a lot of time.

Leave a Comment

Your email address will not be published. Required fields are marked *