Estimating standard deviation of very coarsely binned data

For confirmation that has fixed it.

I did a rather hacky add one to count to the raw data prior to binning and then reduced each count by one.

There are a rather lot of divergences and warnings but I think I can live with that for the time being.

def data_to_bincounts(data, cutpoints):
    d2 = np.append(cutpoints,V0)
    d3 = np.append(0.0,d2)
    
    # this is a dummy count for every bin
    xdummy = (d3[:-1] + d3[1:])/2
    
    def take_1(x):
        return x-1
    xf = np.append(data ,xdummy)
    # categorise each datum into correct bin
    bins = np.digitize(xf, bins=cutpoints)
    # bin counts
    counts = pd.DataFrame({"bins": bins}).groupby(by="bins")["bins"].agg("count")
    
    counts = counts.apply(take_1)
    
    return counts
1 Like