EN.601.461/661 – Computer Vision (Python代写,Computer Vision代写,北美程序代写,美国程序代写)

Let’s assume we have a camera with a 10mm ​focal length​ lens

联系我们
微信: biyeprodaixie 欢迎联系咨询

本次CS代写的主要涉及如下领域: Python代写,Computer Vision代写,北美程序代写,美国程序代写

EN.601.461/661 – Computer Vision

Fall 2020

Homework

Due: 11:59 PM, Sunday, October 4, 2020

All solutions (e.g. your code, write-ups, output images) should be zipped up as HW1_yourJHED.zip and submitted on ​ Gradescope ​, where ‘yourJHED’ is your JHED ID (e.g. ghager1). If you have hand-written the written assignment, please make a scan and attach the .pdf in the submission zip. Only basic Python, Numpy, and OpenCV functions are allowed, unless otherwise specified. This rule applies in particular with the use of OpenCV. Basic image IO functions such as ​cv2.imread​, cv2.imwrite​ etc.​ ​are allowed. If you are unsure of an allowable function, please ask on Piazza before assuming! You will get no credit for using a “magic” function to answer any questions where you should be writing code to answer them. When in doubt, ASK! About Piazza usage: Piazza is a great tool for collaboration and questions. However, ​ never post big chunks of source code as a public post ​. Make use of this tool to talk about ideas, concepts and implementation details. Please use python 3 for the programming problems. You will be provided with template Python files containing functions for you to complete. You can implement your own helper functions if necessary. However, do ​ NOT ​ change the input and output signature of the original functions in the instructions. For all functions that take or return images, make sure to handle the actual arrays. Do not pass filenames around. Minor updates in GREEN

Written Assignment (35 pts)

1) Consider a pinhole camera with perspective projection.

a. (10 points) In class we talked about approximating perspective projection by assuming a

constant depth. Let’s assume we have a camera with a 10mm ​focal length​ lens, a target
at a distance of 1 meter, and we’re viewing an object that has a depth (distance front to
back) of Xmm centered at 1 meter.
i. How large can we make X with less than 5% difference in the projection
between true perspective and the linear approximation?
ii. We talked about depth of field and blur related to defocus. Let’s assume our
image sensor has 100 pixels/mm. Can you estimate how large the diameter of
the aperture can be to ensure that the entire range of distances you calculated
above would be in focus?

b. (10 points) In class, we discussed how to find the vanishing point of lines. Suppose I now

consider a family of lines that all lie in a single plane. Intuitively, all of the lines will
vanish on a common line (also known as the horizon line). What is the equation for this
line as a function of the plane?

i. Recall a plane can be written in the form ​Ax+By+Cz+D = 0​, where (A,B,C) is a

unit vector. The coordinate frame is located at the pinhole with the z-axis
pointing towards the image; you may assume a unit focal length and disregard
that term in the projection equations.

ii. Work out a simple case where B=C=D=0 and A=1 which defines a plane

extending horizontally from the optical center. Write down three different line
directions in this plane and work out where they vanish. Include your work in
your submission.

iii. Second, do the same with the plane A=C=D=0 and B = 1. Again include your

work in your submission.

c. (5 points) Following on the above, define a general relationship that relates the

parameters of the plane to the parameters of the vanishing points of all lines that fall in
that plane. This is challenging, but a good change to refresh your geometry and algebra
skills in anticipation of the next section of the course.
2) (10 pts) In the slides for class, we showed the Fourier transforms for sin(k x) and cos(k x) but we
never looked at the case where there was an offset of the form sin(kx + b) where b is not a
multiple of 2ힹ. Show that we can recover b using the atan of the imaginary and real
components of the Fourier transform (hint, recall the trig expansion for sin(a + b)).

Programming Assignment (65 pts)

1) Our goal is to develop a vision system that characterizes two-dimensional objects in images.

Given an image such as ​ two_objects.png ​, we would like our vision system to determine how
many objects are in the image, to compute the type of shape they are, and to compute their
positions and orientations.

The task is divided into three parts, a and b, each corresponding to a​ ​Python function. You need

to complete the code snippets marked with “​#TODO​” in ​ p1n2.py ​.
In this problem, you are provided with a driver program in the ​main​ function. The code can be
run with the following command, which specifies the image to process and the thresh_val
described in a):
python3 p1n2.py two_objects 128

a. (5 points) Write a​ ​Python function that converts a gray-level image to a binary one using

a threshold value. The binary image should be 255 if intensity >= thresh_val else 0.
def ​binarize(gray_image, thresh_val):
# TODO
return binary_image

b. (10 points) Write a ​Python​ function that takes a labeled image and computes a list of

object global shape attributes (which in a real application could be used to locate and
identify an object).
def ​get_attribute(labeled_image):
# TODO
return attribute_list
Each element of the attribute list should be a dictionary with the following keys:
position, orientation, and roundedness. The position should be a dictionary with keys:
‘x’ and ‘y’. The origin is defined as the upper left pixel of the image. Please use radians
for orientation, and all numbers are floats.

2) These global attributes are not enough to distinguish e.g. a triangle from a rectangle from a

circle. We will now extend the exercise above with a Hough transform method to detect
straight lines that are part of a shape.

a. (5 pts) First you need to find the locations of edge points in the image. Complete

function ​detect_edges​. The input to the function is a 2 dimensional uint8 array
representing a grayscale image. The output should be a binary image with the edges
detected. You should use a derivative of Gaussian filter and threshold the magnitudes.
You should make the filter sigma value and threshold parameters:
def ​detect_edges(image, sigma, threshold):
# TODO
return edge_image
(5 points extra credit: implement hysteresis thresholding, in which case your call will
have an upper and a lower threshold)

b. (10 points) Next, you need to use the Hough transform to detect lines and associate

them with the objects they came from. For this purpose you can use the opencv
function ​HoughLines​. This function will return the coordinates of lines found in the
images. You will need to go back and figure out which edge pixels associate with these
lines (giving you a line segment) and then associate them with a figure from part 1. That
is, if you have a rectangle, you should find four line segments and associate that fact
with the information you computed in part 1 for that rectangle. To do so, add an array
of line parameters consisting of angle, distance from origin, and length (in pixels) to the
dictionary for each object of part 1. For the length, the simplest approach is to use the
line equation, identify edge pixels with a given distance of the line, and count. Adding a
fitting step and some connected component analysis will possibly provide better results.
def ​get_edge_attribute(labeled_image, edge_image):
# TODO
return attribute_list
(Extra Credit (5 points): Do the same to find circles using HoughCircles and add this to
your edge attributes).

3) Test your functions.

a. (5 pts) First, test your functions on a set of shapes in the images ​ many_objects_1.png

and ​ many_objects_2.png. ​The output should be the list of attributes of each object in
the image computed by 1 and 2 above.
b. (10 pts) Now, use the main program we have provided to load the images in a training
data folder we have created. Each image contains one object. Build a data structure that
stores the name of the file along with the attributes of the object found in the image.
Implement a matching function in ​ p3.py ​:
def ​ best_match(object_database, test_object)
# TODO
return object_name
This function will accept the list of objects that were in the training data folder, and
return the best matching object. It is up to you to decide how to match objects --
whether by size, roundedness, number of lines, the length of the lines, etc.
Our driver program will call your match function and print and display your match for
the images in the test files ​ many_objects_1.png ​ and ​ many_objects_2.png. ​Note that
for grading purposes, we will also test your functions on held out data as well, so don’t
overtune your function to just work on our supplied test data!

4) Your task here is to implement normalized cross correlation for simple template matching. We

provide ​ data/face.png ​ and ​ data/letter.png ​ as templates, while ​ data/king.png ​ and

data/text.png ​ are images to be matched. You need to complete the code snippets marked with “​#TODO​” in ​ p4.py ​. a. (10 points) Implement normalized cross-correlation in function ​normxcorr2​. The function should assume that input images are 2-dimensional arrays where each element is a floating number between 0.0 and 1.0. If you load the images as 3-channel color images in your driver program, don’t forget to convert them to grayscale and map the values to the correct range before passing them to the function. When dealing with image boundaries, there are several commonly used styles: “full”, “valid”, and “same” (see ​this page​ for more explanation). To make things simpler, here we use the “valid” style and do not pad the search image. In other words, calculation is performed only at locations where the template is fully inside the search image. The function should return a 2-dimensional float array representing the correlation map of matching scores. def ​normxcorr2(template, image):

TODO

return scores b. (5 points) Use your normalized cross-correlation function to find where the face in face.png ​ appears in ​ king.png ​. Complete function ​find_matches​. This function should take a template image and a search image, both as a uint8 color image. Ignore the ​thresh​ argument for now. Make a copy of the input images, do the pre-processing described in (a), and call normxcorr2​ you just implemented to compute the matching scores. After you have the scores, find the best match and determine where in the original image it corresponds to. Return a 2-tuple (x, y) representing the coordinates of the upper left corner of the matched region. def ​find_matches(template, image, thresh=None):

TODO

return coords, match_image c. (5 points) Use your normalized cross-correlation function to find all occurrences of letter.png ​ in ​ text.png ​. Extend your ​find_matches​ function so it has exactly the same behavior as in (b) when ​thresh=None​ but finds multiple matches when given a threshold. To be specific, when given a threshold, the function should return a list of 2-tuples representing all matches together with the visualization result. Experimenting with

different threshold values. In your driver program, save the output image to output/text.png ​ when you are satisfied with the results. def ​find_matches(template, image, thresh=None):

TODO

return coords, match_image