When the observed variable has an extra singleton dimension incorrect results

can be produced. This can lead to unnoticed and subtle bugs. There should be

some form of warning/error when the observed variable does not have the same

number of dimensions as would be observed if data were sampled from the model.

This example illustrates the problem in a very simple example of observing

a single example from a Gaussian likelihood with Gaussian prior:

```
import numpy as np
import pymc3 as pm
n_iter = 1000
N = 10
np.random.seed(1234)
X = np.random.randn(N, 1)
with pm.Model() as model_bugged:
mu = pm.Normal('mu', 0.0, sd=3.0, shape=(N,))
pm.Normal('X', mu=mu, tau=2.0, observed=X)
with model_bugged:
tr_bugged = pm.sampling.sample(draws=n_iter)
with pm.Model() as model_ok:
mu = pm.Normal('mu', 0.0, sd=3.0, shape=(N,))
pm.Normal('X', mu=mu, tau=2.0, observed=X.squeeze(1))
with model_ok:
tr_ok = pm.sampling.sample(draws=n_iter)
exact_tau = (1.0 / 9.0) + 2.0
exact_sd = np.sqrt(1.0 / exact_tau)
exact_mean = X.squeeze(1) * (2.0 / exact_tau)
print 'exact answer'
print exact_mean
print exact_sd * np.ones(N)
print 'ok'
print np.mean(tr_ok['mu'], axis=0)
print np.std(tr_ok['mu'], axis=0)
print 'bugged'
print np.mean(tr_bugged['mu'], axis=0)
print np.std(tr_bugged['mu'], axis=0)
```

which for me produces

```
exact answer
[ 0.44662279 -1.12829276 1.35730134 -0.29619653 -0.68266301 0.84047015
0.81434692 -0.60302227 0.01487025 -2.1246489 ]
[ 0.6882472 0.6882472 0.6882472 0.6882472 0.6882472 0.6882472
0.6882472 0.6882472 0.6882472 0.6882472]
ok
[ 0.48265565 -1.11080441 1.37796336 -0.28009225 -0.70884742 0.8216728
0.82825643 -0.60005764 0.00556801 -2.11422877]
[ 0.66225099 0.66790124 0.65845662 0.69003781 0.6511985 0.6877904
0.68203232 0.69729027 0.69290222 0.67772646]
bugged
[-0.13922189 -0.14672933 -0.13457985 -0.15271475 -0.14627176 -0.13330782
-0.14730323 -0.13669795 -0.14992282 -0.14559449]
[ 0.22075163 0.22182957 0.2211531 0.2246049 0.22193467 0.21384687
0.23333297 0.2288796 0.23979859 0.21362418]
```

I cannot think of an example where this type of behavior would be desirable.

One could even argue that the correct likelihood for the (N,1) observation is 0 since generated data is always (N,).