Given the devtools_scicomp_project_2025
of the first Pratical Lecture from the main
branch, modify the README file and add your name, email address and the course you are enrolled in. Add the change, amend the last commit and push.
Create a new branch titled knn_classifier
from the main branch
. This is the branch we will work on. Do not use python libraries for today session, but only built in data types.
devtools_scicomp
and install PyYAML
. Add it to requirements.txt
file.src/pyclassify/utils.py
file implement a function called distance
. This function should:
point1
and point2
, both of type list[float]
.point1
and point2
. You can refer to the Euclidean distance formula for this or write some test cases.Inside src/pyclassify/utils.py
file implement a function called majority_vote
. This function should:
neighbors
, which is a list[int]
of class labels.Inside the src/pyclassify/
directory, create a file called classifier.py
. Inside it, create a Python class named kNN
. The kNN
class should:
k
, which specifies the number of nearest neighbors._get_k_nearest_neighbors
, which takes X
, y
(the dataset values), and x
(a new point to be classified). This method should return a list of y
values (labels) of the k
nearest neighbors of x
.kNN
override the __call__
method. It takes two inputs: data
, a tuple containing X
(the feature matrix) and y
(the labels); new_points
, a list of new points to be classified. It should return a list of predicted classes for all points in new_points
. The main algorithm is as follows: for each point in new_points
:
_get_k_nearest_neighbors
method to get the neighbors.majority_vote
function from utils.py
.src/pyclassify/__init__.py
file import kNN
__all__ = [
'kNN'
]
from .classifier import kNN
Set up the tests:
test/test_.py
file implement functions called test_distance
, test_majority_vote
. The test_distance
function should test the distance properties, while test_majority_vote
should test that the algorithm implemented is correct (for example given [1, 0, 0, 0]
the algorithm should return 0
).test/test_.py
check the constructor of kNN
(valid types).Set up experiments:
src/pyclassify/utils.py
add the following function use to read yaml
files:def read_config(file):
filepath = os.path.abspath(f'{file}.yaml')
with open(filepath, 'r') as stream:
kwargs = yaml.safe_load(stream)
return kwargs
- Inside `shell/submit.sh` file write the following line of code, which downloads the [Ionosphere](https://archive.ics.uci.edu/dataset/52/ionosphere) dataset and put it a directory called `./data`. Explore the dataset and in `src/pyclassify/utils.py` create a function named `read_file` which reads the dataset file and returns the features and labels as separate lists.
URL="https://archive.ics.uci.edu/static/public/52/ionosphere.zip"
DEST_DIR="data"
ZIP_FILE="ionosphere.zip"
echo "Downloading ionosphere.zip from $URL..."
curl -o $ZIP_FILE $URL
# Step 2: Create the 'data' directory if it doesn't exist
if [ ! -d "$DEST_DIR" ]; then
echo "Creating directory: $DEST_DIR"
mkdir $DEST_DIR
fi
echo "Extracting $ZIP_FILE..."
unzip $ZIP_FILE
if [ -f "ionosphere.data" ]; then
echo "Moving ionosphere.data to $DEST_DIR"
mv ionosphere.data $DEST_DIR/
else
exit 1
fi
echo "Cleaning up: Removing $ZIP_FILE"
rm $ZIP_FILE
echo "Download and extraction completed successfully."
rm Index ionosphere.names
experiments/config.yaml
insert the following:k: 5
dataset: ./data/ionosphere.data
scripts/run.py
import kNN
and read_config
(remember the package is called pyclassify
). The run file needs to:
config.yaml
filekNN
classification on test data and print compute accuracy.Note
Remember to install the package before running the run.py
!
The repository with the right structure and commits is reported here: GitHub repo