Posts

Showing posts from July, 2019

Day 18

The desktop was supposed to be repaired today, but no one showed up to fix it. Since I can't run the program or train my model I couldn’t really do that much today, but I worked on the code as best I could by separating different parts and running “simulations” to make sure they would work (eg. graphs, etc.). The code for printing out images should also work, but I’m not sure because there might be an issue with the output tensor dimensions from the model. Finally, I added a PSNR calculation into the testing code. I also started looking at the pix2pix code that I’ll be comparing my code with, although that might not be part of the final presentation. I went to a PhD dissertation today (titled Ultrafast Laser Polishing for Optical Fabrication). The research was interesting and was presented really well, so that even a person who knew nothing about the topic (me) could follow along.

Day 17

Today my advisors got a more detailed description of the error in the model code, which indicated that there was an issue with the size and crop of the images. After changing them to multiples of 256 (because of the way I defined the U-Net structure), it now works! The next roadblock was the computer memory, which I guess can no longer handle running the code. I tried running my training code this morning with my model, but it was really slow and was killed by the time I got back from lunch and the master's thesis defense I went to. I ran it again, but even after two hours it was still on the first epoch, and eventually it also ended because the CPU didn’t have enough memory. I’m using the CPU because the GPU doesn’t have enough memory either (even less because other people might be using it) and the computer in the lab is still broken. I emailed my advisor though and I think we'll work out an alternative tomorrow. I also found out that training can take 2-3 days, which is lon

Day 16

Joe went over everyone’s presentation outlines today (except Jocelyn and Hannah’s, which we'll see tomorrow). I started off the day trying to solve the problem I had Friday. I printed out all the tensor sizes and figured out where the error was occurring and why (the encoder and decoder aren’t symmetrical). It has something to do with the U-Net structure and the skip connections, but I’m still not sure of the cause so I've been experimenting with different parts of the code to see if I can get them to line up. I also made my generator code more uniform and used the same kernel-size, stride, and padding throughout, which should help as well. Besides that, I worked more on the training and testing code. I'm currently running the training code on a similar model to see if it works, and so far it seems fine, although it makes the computer screen freeze every minute. I moved over to the computer next to it but the GPU is too old for Pytorch, so I decided to write this blog post

Day 15

At our morning meeting today a RIT student came in to present his senior project. He’s studying different environmental factors that can affect OCR (optical character recognition) so he can use a neural network to predict whether a photo can be processed by OCR before it’s actually done (so you can save time if your photo isn’t good enough). It was pretty interesting. I spent my entire day debugging. I left yesterday with the algorithm outputting that there were zero files in the place that I specified, but I wrote additional code today that made it work. Then I had issues accessing the actual training set because the data augmentation wasn’t working, but eventually I got that to work too. I fixed some other small errors and tried to run it, but I guess it was too much to handle because the computer froze. Hoping it was just processing things slowly, I went to lunch.  During lunch I went to the last seminar, which was pretty long (1 hr 25 min) but really interesting. It was on imaging

Day 14

Today Joe brought us to the Drone Lab where we learned about their projects and equipment. The largest drone currently there can hold up to 15 lbs so they can put a cooled thermal camera on it for imaging. After that we went onto the roof of the building. You can see the Rochester skyline from up there. Today I cleaned up the model and (I think) it’s mostly done. I’m happy with how it looks but the real test will be later when I finish the training code, which is what I worked on afterward. I had most of the foundation done yesterday, so the only major addition was code that'll print out graphs for loss, etc., which I’ll need in my results eventually. I also learned about the peak signal-to-noise ratio (PSNR), which can be used to judge the accuracy of the GANs (since they're translating images there aren't clear-cut “correct” answers like there are in classification). After the training code, I’ll work on the testing, which shouldn’t be too bad. Also, today's Nationa

Day 13

Today we went to David’s lab where he explained his project; he’s also in machine vision, but working on continuous learning which is the focus of his lab. I spent most of the day reading and learning more online about coding GANs. After getting a better understanding, I continued solving errors and editing the model code. It’s definitely different from what I did during the leaf classification exercise since this time I’m not using a pretrained model. It also seems that I don’t have to code two separate models for the generators (or discriminators) but instead define two of each in the training code. I tried to go to the seminar at noon, but none of us could find it- someone told us it might start at 12:30 instead, but at that point I had eaten lunch and was back to coding. After a while I did need a break, so I worked some more on my outline and found images/diagrams for my presentation.

Day 12

Today's field trip was to my lab. I showed the other interns the code I wrote for the leaf classification program, explained each part (not sure how well), and ran it for them. Since I didn’t have much to show for my final project I just described what it was. Then I worked some more on coding the generators and discriminators of the GAN, although right now I only have one of each so I guess I'll find out soon how complicated adding another pair will be. I’m guessing that they won’t be that difficult to code, but figuring out how to incorporate them in the overall program might be more challenging. There are not that many resources relating to specifically dual learning GANs, but I’ve found a couple useful ones. I also started working on the training code, which is still in the "rough draft" stage. I also started writing my outline for the final presentation, which is due on Monday. The upcoming blog posts will probably not be that exciting since I’m just working on

Day 11

Image
There wasn’t a meeting today, so I went straight to my lab. This morning I got another paper (on DualGAN) and my final project: coding a dual learning GAN to translate thermal images to RGB as well as comparing how the results differ when using unpaired/unlabeled inputs versus paired/labeled inputs. Paired/labeled data will typically result in higher accuracy, however in reality, obtaining large amounts of it can be time-consuming, difficult, or impossible. So unpaired/unlabeled data is preferable in many applications as long as the accuracy isn’t too much lower. Besides the usual adversarial loss, a dual learning GAN gets around the problem of unpaired/unlabeled inputs by using a cycle-consistency loss or, as the DualGAN paper called it, a reconstruction loss. You can think of it as translating something forward and back and comparing how close the result is to the original. So there are two pairs of generators and discriminators at work rather than just one. I think that after I co

Day 10

Today we visited the MRI lab where Akul is working, and he explained to us the EPR machine and how it can be used to determine different materials. As I mentioned yesterday, I didn’t have anything definite to work on since I finished my leaf classification program. My advisors actually didn’t think I was coming today, so I didn’t have the laptop to use in the morning. Once they arrived I was able to finish typing my summary with the results from the program. Besides that, I read some more about machine learning and image-to-image translation, and also looked at some code for generative adversarial networks. In the time between our arrivals, I ended up going to the Remote Sensing Lab and helping Jocelyn and Hannah with their sieving. They have to make a lot of weight measurements and use the sieve shaker on the 4th floor lab. It was a fun change from my usual work, although an unfortunate hornet situation made us retreat back to their lab downstairs. I also went to the seminar during lu

Day 9

Image
Today we went to Jocelyn and Hannah’s lab and saw the GRIT machine (goniometer at RIT). Their project involves comparing wet and dry sieving techniques and analyzing grain size distribution. As for me, I finished my program!! I played around some more with the hyperparameters and network structure, and I'm pretty happy with the results. I don't think anyone wants to see all 50 epochs, so here are the "highlights" which were printed at the end (the final program doesn't have these long decimals). Best Training Loss: 0.18281204998493195 Best Training Accuracy: 0.724375 Best Validation Loss: 0.1286986619234085 Best Validation Accuracy: 0.8325 After that I worked on the testing code. At first I tried to loop over the testing Data Loader, but I couldn't open and print the images since they were already transformed. So I decided to use the files themselves- I made three more directories for the three folders in the testing folder, looped through them to

Day 8

Image
At our meeting today Varun showed us his project, which is making realistic road/city scenes to test an algorithm’s ability to identify vehicles in different conditions. Then I went back to my leaf classifying. Since I finished coding the basic structure for the training and validation, I played around with the hyperparameters and network layers. I don’t remember if I did this yesterday or today, but I switched to using Cross Entropy Loss for the loss function and Adam for the optimizer. I also took out a ReLU and a Dropout layer, put back the LogSoftmax, and changed up the learning rate. I still need to test other options too.  After experimenting some more and making it worse, I decided to calculate the accuracy a different way (I don't think I had it right the first time), which combined with some other changes fixed most of my problems, and the accuracy is now around 75%. I then switched to working on the testing portion, which was more time-consuming than expected; I over

Day 7

I went over my problem statement this morning, and then we went on a field trip to Brian’s lab. We learned about the equipment and some of the experiments going on in visual perception which was pretty cool. I spent the rest of the day working on the leaf classification algorithm again. I mostly just ran the program repeatedly and then looked at the errors to fix it, which mainly consisted of editing or rearranging the code a few lines at a time. Finally, I got the program to train itself and output data (the epoch, training loss, training accuracy, validation loss, and validation accuracy) in the correct format. The program takes a long time to train, and accuracy is still really low, but hopefully I’ll be able to improve it by making some changes to the network structure. The goal is to get the accuracy to around 85%. After that, I’ll figure out how to test the program and output the images and predictions. And after that is done, I'll hopefully start working on my actual proje

Day 6

I had my road test this morning (I passed!) which meant I missed the 8:45 a.m. meeting, but I’ll be going over my problem statement tomorrow instead. For the remainder of the morning I read a research paper my advisors wrote on using unsupervised learning techniques for simultaneous depth and camera pose estimation. I also got feedback on the summary paper I wrote last week, so I made some corrections to that. I spent the rest of the day working on the leaf classification program from Friday. It’s different from anything I’ve ever coded before and I still have a long way to go but I think I made a lot of progress today. It’s difficult to tell exactly how much, which can be a little frustrating, but after a lot of Googling and experimentation I at least have a general idea of how to do it. I wrote a “rough draft” (not sure what to call it) and right now I'm in the process of fixing/refining it. I got the network done today- I used a pre-trained VGG-19, but I had to remove the last l

Problem Statement

Amy Ruan Problem Statement           Image-to-image translation is a widely applicable area of computer vision; with it, street scenes can be generated from label maps, color pictures from grayscale ones, full objects from outlines, and photographic images from paintings. In short, image-to-image translation is the transformation of one type of scene representation into another. The development of generative adversarial networks (GANs) has facilitated this process, since its structure composes of two networks- the generator and the discriminator- that are trained to compete against each other until the generator has learned to mimic the data it was trained on, which results in clear and realistic output images in image-to-image translation.           Image-to-image translation and GANs have the potential to solve many problems. Autonomous vehicles, for example, must be able to perceive their surroundings even in the dark. While methods exist that translate nighttime photographs into RG

Day 5

First Friday!  Today I finished the remaining articles on convolutional neural networks (CNNs). They were especially useful in further understanding gradient descent and backpropagation. I also started working on the problem that the professor visiting from Cornell left yesterday. My advisors completed it already so I’m just doing it as an exercise to get used to the coding. I'm using a CNN (specifically VGG-19, but I think next I'm going to learn how to code the network) to classify images of leaves as either being healthy or having one of two diseases. It has been a good learning experience so far since it’s one thing to understand the concepts and another to apply them. My actual project is going to be modifying code my advisors have written for image-to-image translation in an attempt to make it more accurate. I’m glad we figured it out so that I can finish my problem statement. Other than that, the rest of the interns and I went to a seminar during lunch about hyper

Day 4

Today we watched a video during our morning meeting about one of Joe’s imaging science classes and their project for ImagineRIT. It was really interesting to see the behind-the-scenes and teamwork, especially since I went to ImagineRIT for the first time this year. My advisors were gone for most of the day at a conference with another professor, so I worked by myself. I started writing my problem statement, although I’m not entirely sure what my project is yet. I also finished my paper today and turned it in, which is good news. I ended up with 19 citations. Once I was done with that, I switched to reading informational articles. The first one was on image classification and nearest neighbor classifiers. The second was on linear classification and loss functions, and focused mainly on SVM and Softmax classifiers (and also hinge loss and cross-entropy loss, respectively). The last one I read was on optimization and gradient descent. There are three more, and then I’ll have a solid found

Day 3

Today I got to talk more with my advisers about their research. From my understanding, they're using image-to-image translation to turn thermal, nighttime, etc. photographs into regular RGB ones that they can use to perform depth and camera pose estimation, which can then be used to create 3D scene reconstructions. All of this is mainly for autonomous driving, but it can be used in other applications as well. I also went to a seminar during lunch on food production- it really made me think about the popcorn I was eating. A couple other interns went with me, and some undergraduate students from a research program were there as well. With the rest of the time, I continued reading articles online and writing my summary paper. By the end I had read/skimmed a total of ten articles. I thought I finished my paper by the end of the day, but then I found out that I needed more sources for my related works section. I guess that means more reading tomorrow...

Day 2

Today we started our routine- first we had a morning meeting (and got our prizes from yesterday’s video), and then we split up to each of our labs for the rest of the day. I got four more papers today- they were all about image-to-image translation, which is an application of generative adversarial networks. Reading them made the topic make a lot more sense and it’s really interesting. A couple examples of image-to-image translation: turning a painting of a landscape into a photograph, turning an outline into an entire picture, turning a daytime scene into a nighttime one, and even turning a horse into a zebra. Besides the main article about image-to-image translation with conditional adversarial networks, there was one about using cycle-consistency loss and unpaired data, one focused on translating artwork by using patches and memory banks, and one on using multimodal image-to-image translation to get several different but still realistic outputs. I have to write a short summary paper

Day 1

Hello! It was an exhausting but exciting first day. After a short meeting in the Freshman Imaging Lab (and after getting our RIT IDs, computer accounts, and parking permits), the six other interns and I were set loose on a campus scavenger hunt. We split up tasks; Jocelyn and I walked around campus, asking passersby and the Campus Center especially if they recognized any of the objects we needed. We found some major landmarks, but also a lot of smaller ones (e.g. a smiley face on a lamppost). After meeting back at the Center for Imaging Science, all of us attempted to locate the mannequin (worth a lot of points), which we knew was somewhere in the building. We asked every person we could find, but everyone either told us where they thought the mannequin might be, shook their head and wished us luck, or gave us a concerned “No...?” But we still ended up with a fairly successful video, even if the mannequin wasn't included. After the video was judged, we split up again to meet with e