Академический Документы
Профессиональный Документы
Культура Документы
import warnings
warnings.filterwarnings("ignore")
Out[2]: 1 225
2 81
Name: surv_status, dtype: int64
The Below piece of code was taken from the below link:
--[https://stackoverflow.com/questions/17679089/pandas-dataframe-groupby-two-columns-and-get-counts]
I wanted to analyse how the data appears when it is grouped under multiple columns. Hence, I tried with many columns combination & Finally settled with the
surv_status & axil_nodes:
Observation 2:
From the result set we obtain after running the below code, we can conclude that we can predict the survival status based on the Axil Nodes.
For surv_status = 1 (Assuming that it corresponds to Survival Records) : We can see that Out of total 225 records, 190 records have axil_nodes <= 5 .
Close to 84% of Survival Records.
For surv_status = 2 (Assuming that it corresponds to Non Survival Records) : We can see that Out of total 81 records, 35 records have axil_nodes > 5.
Close to 43% of Non Survival Records
Observation 3: The people affected belong to age range from 30 to 83. People Having axil nodes less than 5, Have a greater opportunity of survival. Out of
281 people with Axil Nodes less than 5, 190 Survived and 92 didnt. (Survived % : 67 & Non Survived % : 32)
2D Scatter Plot :
In [4]: sn.set_style('whitegrid')
sn.FacetGrid(haber, hue = 'surv_status', height = 5) \
.map(plt.scatter, 'age', 'axil_nodes')
plt.legend()
plt.title('2-D Scatter Plot')
plt.show()
Observation 4:
There are a good number of people who were diagnosed with cancer with No Axil Nodes. Most of them survived, but few didnt.
People who underwent treatment at the age of 30 & 31 , All of them survived.
Pair Plot :
In [5]: plt.close()
sn.set_style('whitegrid')
sn.pairplot(haber,vars = ['age', 'op_year','axil_nodes'] ,height = 3)
plt.suptitle('Pair-Plots')
plt.show()
Obervation 5:
We can understand that Axil Nodes would be a good attribute to perform analysis when compared to age & op_age.
plt.plot(surv_status_1['axil_nodes'], np.zeros_like(surv_status_1['axil_nodes']),'o')
#plt.plot(surv_status_2['axil_nodes'], np.zeros_like(surv_status_2['axil_nodes']),'o')
plt.title('1-D Scatter Plot Survival Records')
plt.xlabel('Axil Nodes')
plt.ylabel('Units')
plt.show()
Observation 6:
We can understand that there are a lot of survival records with Axial Nodes less than 18 (approx.)
Observation 7:
We can understand that there are a lot of non survival records with Axial Nodes less than 15 (approx.)
Observation 8:
We can claearly see that there are a lot of survivors who had Axil Nodes = 0.
Observation 9:
Too much overlapping here, We can say that there were too many Non Survival Records between Ages 40 & 50.
Observation 10:
We can say that bulk of the surgeries were performed between 1958 & 1968.
*******************SURVIVAL DATA*********************
Counts : [0.09951691 0.00531401 0.00241546 0.00096618 0.00048309]
Sum : 0.10869565217391305
PDF : [0.91555556 0.04888889 0.02222222 0.00888889 0.00444444]
Bin Edges : [ 0. 9.2 18.4 27.6 36.8 46. ]
CDF : [0.91555556 0.96444444 0.98666667 0.99555556 1. ]
*******************NON SURVIVAL DATA*********************
Counts : [0.0688509 0.01780627 0.00712251 0.00118708 0.00118708]
Sum : 0.09615384615384613
PDF : [0.71604938 0.18518519 0.07407407 0.01234568 0.01234568]
Bin Edges : [ 0. 10.4 20.8 31.2 41.6 52. ]
CDF : [0.71604938 0.90123457 0.97530864 0.98765432 1. ]
Box Plots :
In [16]: sn.boxplot(x = 'surv_status', y = 'axil_nodes',data = haber)
plt.title('Box Plots : Axil_Nodes VS Surv_Status')
plt.show()
Observation 11:
Patients who had no axil nodes (Axil Nodes = 0) had a greater chance of survival.
Observation 12:
The plot shows that people who were treated before the age of 34, all survived.
Violin Plots :
In [19]: sn.violinplot(x = 'surv_status', y = 'op_year', data = haber, size = 5)
plt.title('Violin Plots : Surv_Status VS Op_Year ')
plt.show()
Obervation 13:
The plot shows that most of the surgeries took place between the years 1960 to 1965
Conclusion :
We can say that people with less Axil Nodes have a good chance of survival.
We can say that people who were treated before the age of 34, all survived.
We can say that most number of surgeries were between 1960 to 1965.