Tuesday, October 8, 2019
Monday, October 7, 2019
Axis aligned artifacts
![]() |
Left: Original data distribution. Right: Learned co-displacement, darker is lower. |
![]() |
Notice the echoes around (10,-10) and (-10, 10) |
There are minor artifacts created by choosing axis aligned cuts in RRCF, similar to what was noted with IsoForest.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
import pandas as pd | |
import rrcf | |
import matplotlib.pyplot as plt | |
include_anomaly=False | |
# Construct data with two modes and full of anomalies | |
X1 = np.random.multivariate_normal([0,0], [[1,0],[0,1]], 1000) | |
X2 = np.random.multivariate_normal([10,10], [[1,0],[0,1]], 1000) | |
if include_anomaly: | |
XA = np.random.uniform(-5, 15, size=200).reshape((100, 2)) | |
X = np.concatenate([X1, X2, XA]) | |
else: | |
X = np.concatenate([X1, X2]) | |
# plot the original data | |
fig, ax = plt.subplots() | |
ax.plot(X[:,0], X[:,1], '.') | |
fig.show() | |
num_trees = 300 | |
tree_size = 256 | |
n = X.shape[0] | |
# Construct forest | |
forest = [] | |
while len(forest) < num_trees: | |
# Select random subsets of points uniformly from point set | |
ixs = np.random.choice(n, size=(n // tree_size, tree_size), | |
replace=False) | |
# Add sampled trees to forest | |
trees = [rrcf.RCTree(X[ix], index_labels=ix) for ix in ixs] | |
forest.extend(trees) | |
# prepare grid for codisp measurement | |
xvals, yvals = np.arange(-10, 20, 0.5), np.arange(-10, 20, 0.5) | |
nx, ny = len(xvals), len(yvals) | |
xv, yv = np.meshgrid(xvals, yvals) | |
codisp = np.zeros((nx, ny)) | |
# measure codisp across space | |
for i in range(nx): | |
for j in range(ny): | |
temp = [] | |
for tree in forest: | |
point = np.array([xv[i,j], yv[i,j]]) | |
tree.insert_point(point, index='test') | |
temp.append(tree.codisp('test')) | |
tree.forget_point('test') | |
codisp[i,j] = np.mean(temp) | |
# plot codisp | |
fig, axs = plt.subplots(ncols=2, sharex=True, sharey=True) | |
axs[0].plot(X[:,0], X[:,1], '.', ms=2) | |
axs[0].set_aspect(1) | |
axs[1].imshow(codisp, origin='lower', | |
extent = [np.min(xvals), np.max(xvals), np.min(yvals), np.max(yvals)]) | |
fig.show() | |
fig.savefig("bias.png") |
Friday, October 4, 2019
State of Astro-informatics
I had a glance through "Realizing the potential of astrostatistics and astroinformatics" by Eadie et al. (2019). While I do not feel qualified or informed to comment on the suggestions, I can summarize them quickly. There are three problems:
- Education: Most astronomers are not trained in code development resulting in maybe good but fragile code. Similarly, most computer scientists don't have the astronomy background or connections.
- Funding: Grants for methodology improvement are scarce. I wonder if these things can be funded from the computer science side of things in collaborations.
- Quality: Astro-informatics lacks support of state-of-the-art methodology as it stands.
I was much more interested in the final section about potential themes in research:
- Nonlinear dimensionality reduction.
- Sparsity.
- Deep learning.
I find the last theme incredibly broad and am unclear exactly how they mean it. It seems they're most interested in hierarchical representations of data. I would also claim that anomaly detection/clustering is important for reducing the volume of data.
Subscribe to:
Posts (Atom)