● Problem Statement
Following data (in increasing order) is provided for the attribute ‘age’: 13, 15,
16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46,
52, 70.
(a) Use smoothing by bin means to smooth these data, using a bin depth of 3.
(b) Use min-max normalization to transform the value 35 for age onto the range
[0.0, 1.0].
(c) Use z-score normalization to transform the value 35 for age, where the standard
deviation of age is 12.94 years.
(d) Use normalization by decimal scaling to transform the value 35 for age.
● Algorithm
Input : Data set of elements as data and depth of the binning as depth
Output: Displaying smoothing by bin means, min-max normalization, z-score
normalization and normalization by decimal scaling.
Data Structure: data is a list where we store user-inputted data/predefined data. Depth is
the integer data type where we store the depth of the bin.
Description: Using statistics package’s functions to calculate numerical things with less
complexity.
Step 1 : Start
Step 2 : data ← 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35,
35, 36, 40, 45, 46, 52, 70
Depth ← 3
Step 3 : ans ← Call bin function with data and depth
Step 4 : minAns ← call minMaxNor with 35 and list
Step 5 : Display “The data after bin means smoothing “ ans
Step 6 : Display “After doing min-max normalization “ minAns
Step 7 : zNormalization ← call zNor with 35 , mean value of the list data nad 12.94
Display “After doing z-score normalization : “, zNormalization
Step 8 : decimalNormalitation ← call decNor with 35 and maximun value of list data
Display “After doing normalization by decimal scaling :”, decimalNormalitation
Step 9 : Stop
Algorithm for bin
Input: a list called list from where we will fetch the data
An integer called depth where the depth of the bin is stored
Output : Returns the smoothed bin
Data Structure: maxSize integer to store the length of the array, newList is a list where we
store the smoothed data.
Step 1 : Start
Step 2 : maxSize ← length of the list named list
Step 3 : i ← 0
Step 4 : Repeat steps 5 to 12 while i less than (maxSize-depth)+1
Step 5 : sum ← 0
Step 6 : Repeat step 7 for j ← i and j less than i+depth
Step 7 : sum ← sum+list[j]
Step 8 : ans ← sum divided by depth
Step 9 : Set ans value with 2 decimal places after .
Step 10 : Repeat step 11 for j ← i and j less than i+depth
Step 11 : append ans to newList
Step 12: i ← i+depth
Step 13 : if maxSize mod depth not equals 0
Then
I. sum ← 0
ii. Repeat step 13.iii for j ← i and j less than maxSize
iii. sum ← sum+list[j]
iv. ans ← sum divided by (maxSize – I)
v. Set ans value with 2 decimal places after .
vi. Repeat step 11 for j ← i and j less than maxSize
vii. append ans to newList
Step 14 : Return newList
Algorithm for minMaxNor
Input: a list called list from where we will fetch the data
An integer called num
Output : Returns the answer
Data Structure : ans an integer where the answer to be stored
Step 1: Start
Step 2 : ans ← round with 3 decimal places (num-list[0])/(list[len(list)-1]-list[0])
Step 3 : Return ans
Algorithm for zNor
Input: mean value of the list as mean
Stander Division as stdDv
An integer called num
Output : Returns the answer
Step 1: Start
Step 2 : Return round with 3 decimal places (num-mean)/stdDv
Algorithm for decNor
Input: Maximum value of the list as maxNum
An integer called num
Output : Returns the answer
Data Structure : div an integer where the answer to be stored
Digit where the size of the maxNum to be stored
Step 1: Start
Step 2 : digit ← len(str(maxNum))
Step 3 : div ← 10 to the power digit
Step 4 : returns num/div
''' Problem Statement Following data (in increasing order) is provided for the attribute ‘age’: 13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70. (a) Use smoothing by bin means to smooth these data, using a bin depth of 3. (b) Use min-max normalization to transform the value 35 for age onto the range [0.0, 1.0]. (c) Use z-score normalization to transform the value 35 for age, where the standard deviation of age is 12.94 years. (d) Use normalization by decimal scaling to transform the value 35 for age. ''' import statistics def bin(list,depth): maxSize=len(list) i=0 newList = [] while (i<maxSize-depth+1): sum = 0 #Sum of a BIN for j in range (i,i+depth): sum+=list[j] #Smoothing a BIN ans = sum / depth ans=round(ans,2) #Smoothed Data in a list for j in range (i,i+depth): newList.append(ans) i+=depth #if number of elements in list is not a multiplier of depth if(maxSize%depth!=0): sum=0 for j in range (i,maxSize): sum+=list[j] ans=sum/(maxSize-i) ans = round(ans, 2) # Smoothed Data in a list for j in range(i, maxSize): newList.append(ans) return newList def minMaxNor(num,list): ans=round((num-list[0])/(list[len(list)-1]-list[0]),3) return ans def zNor (num,mean,stdDv): return round((num-mean)/stdDv,3) def decNor(num,maxNum): digit=len(str(maxNum)) div=pow(10,digit) return num/div list = [13, 15, 16, 16, 19, 20, 20, 21, 22, 22, 25, 25, 25, 25, 30, 33, 33, 35, 35, 35, 35, 36, 40, 45, 46, 52, 70] depth=3 #If user Inputted data is needed uncomment the code below ''' dataSetNo=int(input("Enter the no to data set:\t")) data=[] for data in range(dataSetNo): data.append(float(input("Enter data:\t"))) depth=int(input("Enter the no of bins:\t")) ''' ans=bin(list,depth) minAns=minMaxNor(35,list) print("The data after bin means smoothing : \n",ans) print("\nAfter doing min-max normalization : \t",minAns) zNormalization=zNor(35,statistics.mean(list),12.94) print("\nAfter doing z-score normalization : \t", zNormalization) decimalNormalitation=decNor(35,max(list)) print("\nAfter doing normalization by decimal scaling :\t",decimalNormalitation)
Output:
The data after bin means smoothing :
[14.67, 14.67, 14.67, 18.33, 18.33, 18.33, 21.0, 21.0, 21.0, 24.0, 24.0, 24.0, 26.67, 26.67, 26.67,
33.67, 33.67, 33.67, 35.0, 35.0, 35.0, 40.33, 40.33, 40.33, 56.0, 56.0, 56.0]
After doing min-max normalization : 0.386
After doing z-score normalization : 0.389
After doing normalization by decimal scaling : 0.35
● Discussion:
Here I have used a package called Statistics to uses it’s functions. Using those function
the programme became shorter and saved a lot of time.