Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

We can do it next time you want, Cheat Sheet of Computer Science

McGill University Computer Science

We are doing a little bit better than we did but I was thinking

Typology: Cheat Sheet

2022/2023

Uploaded on 04/12/2024

miguel-j1f 🇨🇦

1 document

1 / 9

This page cannot be seen from the preview

Don't miss anything!

Final Project

COMP 250 Winter 2024

posted: Wednesday, April 10, 2024

due: Sunday, April 28, 2024 at 23:59

General instructions

• Submission instructions

–Please note that the submission deadline for the final project is very strict. No sub-

missions will be accepted after the deadline.

–As always you can submit your code multiple times but only the latest submission will

be kept. We encourage you to submit a first version a few days before the deadline

(computer crashes do happen and Ed Lesson may be overloaded during rush hours).

• These are the files you should be submitting on Ed:

–MyWebGraph.java

–Sorting.java

–SearchEngine.java

Do not submit any other files, especially .class files. Any deviation from these require-

ments may lead to lost marks

• Starter code is provided for this project. Do not change any of the class names, file names,

or method headers. You can add helper methods or fields if you wish. Note also that for this

project, you are NOT allowed to import any other class (all import statements other than

the one provided in the starter code will be removed). Any failure to comply with these rules

will give you an automatic 0.

• The project shall be graded automatically. Requests to evaluate the project manually shall

not be entertained, so please make sure that you follow the instruction closely or your code

may fail to pass the automatic tests.

• Whenever you submit your files to Ed, you will see the results of some exposed tests counting

for 70% of the project grade. These tests are a smaller version of the tests we will be using to

grade your work. If your code fails those tests, it means that there is a mistake somewhere.

Even if your code passes those tests, it may still contain some errors. We will test your code

on a much more challenging set of examples. We highly encourage you to test your code

thoroughly before submitting your final version.

• By next week we will share with you a Minitester class that you can run to test if your

methods are correct. This class is a subset to the exposed tests on Ed, counting for 40% of

the project grade. Please note that these tests are only a subset of what we will be running

1

Partial preview of the text

Download We can do it next time you want and more Cheat Sheet Computer Science in PDF only on Docsity!

Final Project

COMP 250 Winter 2024

posted: Wednesday, April 10, 2024 due: Sunday, April 28, 2024 at 23:

General instructions

Submission instructions - Please note that the submission deadline for the final project is very strict. No sub- missions will be accepted after the deadline. - As always you can submit your code multiple times but only the latest submission will be kept. We encourage you to submit a first version a few days before the deadline (computer crashes do happen and Ed Lesson may be overloaded during rush hours).
These are the files you should be submitting on Ed: - MyWebGraph.java - Sorting.java - SearchEngine.java

Do not submit any other files, especially .class files. Any deviation from these require- ments may lead to lost marks

Starter code is provided for this project. Do not change any of the class names, file names, or method headers. You can add helper methods or fields if you wish. Note also that for this project, you are NOT allowed to import any other class (all import statements other than the one provided in the starter code will be removed). Any failure to comply with these rules will give you an automatic 0.
The project shall be graded automatically. Requests to evaluate the project manually shall not be entertained, so please make sure that you follow the instruction closely or your code may fail to pass the automatic tests.
Whenever you submit your files to Ed, you will see the results of some exposed tests counting for 70% of the project grade. These tests are a smaller version of the tests we will be using to grade your work. If your code fails those tests, it means that there is a mistake somewhere. Even if your code passes those tests, it may still contain some errors. We will test your code on a much more challenging set of examples. We highly encourage you to test your code thoroughly before submitting your final version.
By next week we will share with you a Minitester class that you can run to test if your methods are correct. This class is a subset to the exposed tests on Ed, counting for 40% of the project grade. Please note that these tests are only a subset of what we will be running

on your submissions. We encourage you modify and expand this class. You are welcome to share your tester code with other students on Ed. Try to identify tricky cases. Do not hand in your tester code.

You will automatically get 0 if your code does not compile.
Failure to comply with any of these rules will be penalized. If anything is unclear, it is up to you to clarify it by asking either directly someone during office hours, or on the discussion board on Ed.

Learning Objectives

This final project is designed to provide you with a hands-on opportunity to apply key concepts learned throughout the course, focusing on hash tables, graphs, and sorting algorithms. By com- pleting this project, you will achieve several learning objectives:

Graph Representation and Traversal : You will gain practical experience in creating and nav- igating graph structures. This includes understanding how vertices and edges are organized within a graph and implementing traversal algorithms such as breadth-first search (BFS) or depth-first search (DFS). These concepts are foundational to various computer science appli- cations, including network analysis and pathfinding algorithms.
Hash Tables for Efficient Data Retrieval : To keep the project contained in size you will not be implementing your own hash table, but you will be able leverage hash tables to efficiently store and retrieve data within your search engine.
Algorithmic Efficiency : Implementing sorting algorithms with time complexity of O(nlogn), such as MergeSort or QuickSort, reinforces your understanding of algorithmic efficiency and computational complexity.
Integration of Data Structures and Algorithms : This project integrates multiple data struc- tures (graphs, hash tables) and algorithms (graph traversal, sorting) into a cohesive system. You will develop a deeper understanding of how different components interact and contribute to the overall functionality and performance of a software system.
Preparation for Advanced Courses : The project introduces concepts that form the basis for more advanced coursework in computer science. You will start to learn more advanced algo- rithms and data structures in COMP 251.

By engaging with this project, you will not only reinforce your technical skills but also enhance your problem-solving abilities, critical thinking, and software development practices— all of which will contribute to better prepare you for your future CS courses.

pages in order of importance. Do all search engines give the same results? Not necessarily. Search engines use different spiders to crawl and use different proprietary algorithms to index the data. Each index is therefore a search engine’s representation of how they see the web. Also the algorithms to rank and search the data are different, so every search engine has its own approach to finding what you’re trying to find. Finally, personalisation adapts the search to a specific computer/user. The results may be based on your geographical location, what else you’ve searched for, and what results were preferred by other users searching for the same thing, for example. Search engines might use and weigh all these factors in a unique way, which will lead to different search results.

The starter code contains four classes:

XmlParser: This class has two methods: one to read the content of the xml database (which will serve as proxy for the web for our project) given a url, and the other to extract information from it. - getContent(String url) returns an ArrayList of Strings corresponding to the set of words in the web page located at the given url. You will need to use this method while crawling in order to build your word index. - getLinks(String url) returns an ArrayList of Strings containing all the hyperlinks going out of the given url. You will need this method to build the graph representing the data found while crawling.

You should NOT modify this class at all.

MyWebGraph: This class implements the data structure needed to store the data collected during the crawling phase. It has an inner class called WebVertex which is used to store data related to a specific web page. You should NOT modify this class at all.
SearchEngine: This is one of the classes in which you will have to add your code. You will be implementing methods that performs the three tasks described above.
Sorting: This is a utility class containing methods that implement some sorting algorithm. You will be asked to implement one of them.

Your task

In this project you are going to write a search engine program that will:

explore a portion of the web (which we’ll simulate through a database),
build an index of the words contained in each web page,
analyze the structure of the web graph to establish which web page should be ranked higher,
use this analysis to perform simple searches where a query word is entered by the user and a sorted list of relevant web pages available is returned.

Although we are going to apply these tools using a local database, what you will develop is a simplified version of the what is used at Google to answer actual web queries.

To be able to implement the search engine program described above, we need a graph data structure that will allow us to store all the information related to the web pages. Such class is provided, but some of its methods still need to be implemented.

[17 points] The class MyWebGraph is an implementation of a directed graph using adjacency lists. You will be using this type of data structure to store the information your program collects in the crawling phase. Each node in the graph will store a String corresponding to the url of a webpage. The class has the following field:

A HashMap storing all the vertices in the graph. Note that we’ll be labelling each vertex with the String corresponding to the url of the webpage represented by this vertex. The class has also the following public methods which have been provided to you and should not be modified :
The constructor MyWebGraph() which does not take any input and initializes the field with an empty HashMap. This constructor creates and empty graph.
A getNeighbors() method which takes as input a String representing a url and returns an ArrayList of all the hyperlinks contained in the specified url.
A getOutDegree() method which takes as input a String representing a url and returns the number of hyperlinks in the specified page.
A setPageRank() method which takes as input a String representing a url and a double representing its rank. The method assigns the input number as the rank of the specified page.
A getPageRank() method which takes as input a String representing a url and returns the rank of the specified page.
A setVisited() method which takes as input a String representing a url and a boolean. The method assigns the input boolean to the visited field of the specified page.
A getVisited() method which takes as input a String representing a url and returns whether or not the specified page has been visited.

To complete the class, you should implement the following methods: [5 points ] An addVertex() method which takes as input a String representing a url. The method adds the corresponding vertex to the graph, if such a vertex is not already there. The method returns a boolean indicating whether or not the graph has been modified as a result of this operation. [5 points ] An addEdge() method which takes as input two Strings, i.e. two urls. The method adds to the graph an edge from the vertex labelled with the first input to the vertex labelled with the second input. Note that an edge can be added only if it is connecting two vertices that already belong to the graph. The method returns a boolean indicating whether or not the graph has been modified as a result of this operation.

[2 points ] A getVertices() method which returns the list of all the urls represented by vertices in the graph.

the web graph. It will only be called after crawlAndIndex has been executed. Here are the principles we will be using to assign a rank to a web page:

Good web pages are cited by many other pages. If we think about it in terms of the webgraph, this means that we should prefer web pages (i.e. vertices) with a large in-degree.
Web pages that link to a large number of other pages are less valuable. In terms of webgraph, this means that we will value less web pages (i.e. vertices) with a large out-degree.
A link from a web page is more valuable if the web page is itself a good one. In graph terms, higher the rank of a web page (i.e. vertex), more valuable an in-edge from it would be.

Note that the rank of a page depends solely on the structure of the graph we have created while crawling. To represent the ideas just described, let

the pr(w) be the page rank of a vertex w
the out(w) be the out-degree of a vertex w
w 1 , w 2 , ..., wk be all the vertices in the graph that have an out edge going into v.

Then, the following equation determines the page rank of a vertex v:

pr(v) = (1 − d) + d ∗

pr(w 1 ) out(w 1 )

pr(w 2 ) out(w 2 )

pr(wk) out(wk)

The constant d is called the damping factor and it is added for technical reasons to account for the probability that an imaginary surfer who is randomly clicking on links will eventually stop clicking. For this project, we will be using a damping factor of 0. 5. As you may notice, pr(v) is defined as a function of pr(wi). If N is the total number of vertices in the graph (v 1 , v 2 ,... , vN ), then we have a system of N linear equations in N variables. The unknown values, i.e. our variables, are the pr(vi). To determine their values, we should solve the system of linear equations described above. To do so, we could use linear algebra, but the implementation of gaussian elimination runs in O(n^3 ). To improve the efficiency, we can instead use an iterative algorithm that approximate the result. The idea is the following:

Start by initializing pr(vi) to 1 for all 0 ≤ i ≤ N
Repeat the following until convergence: - compute pr(vi) for all i using the formula above.

This single step is what the helper method computeRanks should be doing. The method takes as input an ArrayList representing the urls in the web graph and returns an ArrayList representing the newly computed ranks for those urls. Note that the double in the output list is matched to the

url in the input list using their position in the list. That is, the page rank of the url stored in position i in the input list, can be found in position i in the output list.

Convergence is reached when |prk−^1 (vi) − prk(vi)| < for all i, where prj^ (vi) represents the computation of pr(vi) in the j-th iteration, and is the input to the method. Note that, you can assume that the web graph does not contain self loops and every vertex in the graph has at least one outgoing edge.

Consider the following web graph:

Suppose we’d like to compute the page rank assigned to each vertex with an equal to

1. We start by assigning value 1 to each of them:

pr(A) = 1
pr(B) = 1
pr(C) = 1
pr(D) = 1

Then we need to use the following formula (same as the formula above, but with damping factor of 0. 5 ) to recompute the page ranks for each vertex, until convergence.

pr(v) = 1/2 + 1/ 2 ∗

pr(w 1 ) out(w 1 )

pr(w 2 ) out(w 2 )

pr(wk) out(wk)

After the first iteration we get:

pr(A) = 1/2 + 1/ 2 ∗ (1/3 + 1) = 7/6 = 1. 166
pr(B) = 1/2 + 1/ 2 ∗ (1/2) = 3/4 = 0. 75
pr(C) = 1/2 + 1/ 2 ∗ (1/2 + 1/3 + 1) = 17/12 = 1. 416
pr(D) = 1/2 + 1/ 2 ∗ (1/3) = 4/6 = 0. 666

We can do it next time you want, Cheat Sheet of Computer Science

Related documents

Partial preview of the text

Download We can do it next time you want and more Cheat Sheet Computer Science in PDF only on Docsity!

Final Project

COMP 250 Winter 2024

General instructions

Learning Objectives

Your task