● Problem Statement
Suppose that a hospital tested the age and body fat data for 18 randomly selected adults
with the following results:
Age 23 23 27 27 39 41 47 49 50
%fat 9.5 26.5 7.8 17.8 31.4 25.9 27.4 27.2 31.2
Age 52 54 54 56 57 58 58 60 61
%fat 34.6 42.5 28.8 33.4 30.2 34.1 32.9 41.2 35.7
(a) Normalize the two attributes age and %fat based on z-score normalization.
(b) Calculate the correlation coefficient (Pearson’s product moment coefficient). Are
these two attributes positively or negatively correlated? Compute their covariance.
● Algorithm
Input : Data set of elements as dataSetAgeArray and dataSetFatArray.
And dataSetNo for the numbers of elements in the list
Output: Displaying z-score normalization, and correlation coefficient
Data Structure: dataSetAgeArray and dataSetFatArray are list where we store
user-inputted data/predefined data.
Description: Using statistics and scipy package’s functions to calculate numerical
things with less complexity.
Step 1 : Start
Step 2 : dataSetNo ← 18
Step 3 : dataSetAgeArray ← 23,23,27,27,39,41,47,49,50,52,54,54,56,57,58,58,60,61
Step 4 : dataSetFatArray ← 9.5,26.5,7.8,17.8,31.4,25.9,27.4,
27.2,31.2,34.6,42.5,28.8,33.4,30.2,34.1,32.9,41.2,35.7
Step 5 : Display “Calculating z-score normalization”
Step 6: meanValueAge ← Call mean function with dataSetAgeArray
from statistics pacakges
Step 7 : stDVAge← Call stdev function with dataSetAgeArray
from statistics pacakges
Step 8 : meanFatAge ← Call mean function with dataSetFatArray
from statistics pacakges
Step 9 : stDVFat← Call stdev function with dataSetFatArray
from statistics pacakges
Step 10: Repeat steps 11 to 14 for i ← 0 and i less than DataSetNo
Step 11 : normalizedData ← call zNor with dataSetAgeArray[i],meanValueAge,stDVAge
Step 12 : add normalizedData to newAgeNor
Step 13 : normalizedData ← call zNor with dataSetFatArray[i],meanValueFat,stDVFat
Step 14 : add normalizedData to newFatNor
Step 15 : Display”After doing z-score normalization on Ages” newAgeNor
Step 16 : Display “After doing z-score normalization on Fat” newFatNor
Step 17 : corr ← call pearsonr with dataSetAgeArray, dataSetFatArray
Step 18 : Display “Pearsons correlation: ” corr[0]
Step 19 : Stop
Algorithm for zNor
Input: mean value of the list as mean
Stander Division as stdDv
An integer called num
Output : Returns the answer
Step 1: Start
Step 2 : Return round with 2 decimal places (num-mean)/stdDv
Source Code
import statistics from scipy.stats import pearsonr def zNor (num,mean,stdDv): return round((num-mean)/stdDv,2) dataSetNo=18 dataSetAgeArray=[23,23,27,27,39,41,47,49,50,52,54,54,56,57,58,58,60,61] dataSetFatArray=[9.5,26.5,7.8,17.8,31.4,25.9,27.4,27.2,31.2,34.6,42.5,28.8,33.4,30.2,34.1,32.9,41.2,35.7] ''' dataSetNo=int(input("Enter the no to data set:\t")) dataSetAgeArray=[] dataSetFatArray=[] for data in range(dataSetNo): age=int(input("Enter Age")) dataSetFatArray.append(float(input("Enter fat"))) dataSetAgeArray.append(float(input("Enter Age"))) ''' print("\nCalculating z-score normalization") newAgeNor=[] newFatNor=[] meanValueAge=statistics.mean(dataSetAgeArray) stDVAge=statistics.stdev(dataSetAgeArray) meanValueFat=statistics.mean(dataSetFatArray) stDVFat=statistics.stdev(dataSetFatArray) for i in range(dataSetNo): normalizedData=zNor(dataSetAgeArray[i],meanValueAge,stDVAge) newAgeNor.append(normalizedData) normalizedData=zNor(dataSetFatArray[i],meanValueFat,stDVFat) newFatNor.append(normalizedData) print("After doing z-score normalization on Ages\n",newAgeNor) print("After doing z-score normalization on Fat\n",newFatNor) corr=pearsonr(dataSetAgeArray, dataSetFatArray) print("Pearsons correlation:\t%.2f"%corr[0])
● Output:
Calculating z-score normalization
After doing z-score normalization on Ages
[-1.77, -1.77, -1.47, -1.47, -0.56, -0.41, 0.04, 0.19, 0.27, 0.42, 0.57, 0.57, 0.72, 0.8, 0.87, 0.87,
1.03, 1.1]
After doing z-score normalization on Fat
[-2.08, -0.25, -2.27, -1.19, 0.28, -0.31, -0.15, -0.17, 0.26, 0.63, 1.48, 0.0, 0.5, 0.15, 0.57, 0.44,
1.34, 0.75]
Pearsons correlation: 0.82
Process finished with exit code 0
● Discussion:
Here I have used a packages called Statistics and scipy to uses it’s functions. Using those
function the programme became shorter and saved a lot of time.