Empowering Educators: Automated Assignment Scoring via Azure OpenAI Service ChatGPT

This post has been republished via RSS; it originally appeared at: Microsoft Tech Community - Latest Blogs - .

About the Author

Cyrus Wong is the senior lecturer of Department of Information Technology (IT) of the Hong Kong Institute of Vocational Education (Lee Wa... and he focuses on teaching public Cloud technologies. He is one of the Microsoft Learn for Educators Ambassador and Microsoft Azure MVP from Hong Kong.

cyruswong_0-1684816943067.png

 

Introduction:

Discover how AI technology can streamline and enhance the grading process for educators with Azure OpenAI ChatGPT. Cyrus Wong, a senior lecturer at the Hong Kong Institute of Vocational Education, shares insights on leveraging AI to automate assignment evaluation and improve the quality of teaching. By utilizing AI-powered tools, such as Azure OpenAI Service ChatGPT, educators can save time on grading tasks and focus on creating engaging educational experiences for their students.

 

In this context, Cyrus Wong demonstrates how Azure OpenAI Service ChatGPT can be employed for automated assignment assessment, specifically evaluating essays submitted in Microsoft Word and Adobe PDF formats. The process involves converting the content into text using libraries like docx2txt and PyPDF2, and then utilizing Azure OpenAI ChatGPT 3.5 to assess the answers. The AI model provides marks, comments, and insights into potential copying from the internet or generative AI usage, allowing educators to efficiently grade assignments.

With this powerful combination of AI technology and grading automation, educators can significantly reduce their workload, gain valuable insights into student responses, and deliver meaningful feedback to enhance the learning experience. Explore the possibilities of Azure OpenAI ChatGPT and revolutionize the way assignments are evaluated in the educational landscape.


Problem:

As an educator, particularly in Asia, it is often necessary to assign a significant amount of homework to students in order to satisfy parental expectations and maintain a school's reputation. While I won't delve into the implications of this cultural norm, I must admit that being a teacher in this environment can be challenging due to the immense workload involved in grading and providing feedback on countless assignments. This situation essentially multiplies the workload a hundredfold, akin to a self-Distributed Denial of Service (DDoS) attack.


Educators often find themselves preoccupied with grading assignments, leaving little time to enhance the quality of their teaching. Instead of focusing primarily on assessment, it is crucial for teachers to allocate time for devising engaging and meaningful educational experiences for their students.


Solution:

Educators should consider utilizing AI technology, such as Azure OpenAI Service ChatGPT, to assist with or automate grading tasks. This AI demonstrates consistent and reliable reasoning and processing capabilities, delivering impressive results in automatic assignment evaluation. Furthermore, it can analyze student responses, providing valuable insights to help enhance their learning experience.

Here is one use case where Azure OpenAI Service can be employed for automated assignment assessment.

Demo Evaluating Essays in Microsoft Word and Adobe PDF Formats from Moodle LMS

This approach involves grading student assignments submitted in Microsoft Word and Adobe PDF formats, where students provide an essay about their internships. However, due to improper LMS configuration by my colleague, some students have submitted files in other formats such as ZIP, 7Z, and RAR. To address this issue, we must implement code to handle these discrepancies and manually fix certain submissions. To streamline the process, it is essential to restrict the accepted file types within your Learning Management System (LMS).

We will not delve into the details of these steps, as they can be avoided through proper configuration of assignments within the LMS.

Assuming that each student's submission is stored in a separate folder labeled with their name, and that the folder contains at least one Word or PDF file.

cyruswong_1-1684816977236.png

cyruswong_2-1684816977236.png

For Microsoft Word assignments, we utilize the docx2txt library to convert the content into text and store it in a Pandas DataFrame column labeled "Answers."

 

def filter_df_by_contains_docx(df):
    return df[(df["ContainsDocxFile"] == True)]
words_df = filter_df_by_contains_docx(df)
paths = words_df["Path"].values

def get_all_docx_files(path):
    import glob
    return glob.glob(path + "/*.docx")

import docx2txt
from functools import reduce

students_words_files = list(map(get_all_docx_files, paths)) # List of lists of word files

file_contents =[];
for word_files in students_words_files:  
    file_contents.append(reduce(lambda x, y: x + y, map(lambda f: docx2txt.process(f), word_files), "\n\n"))

 

For PDF file, we utilize the import PyPDF2 library to convert the content into text and store it in a Pandas DataFrame column labeled "Answers."

 

def filter_df_by_contains_pdf(df):
    return df[(df["ContainsPdfFile"] == True)]
pdfs_df = filter_df_by_contains_pdf(df)
paths = pdfs_df["Path"].values

def get_add_pdf_files(path):
    import glob
    return glob.glob(path + "/*.pdf")

import PyPDF2
from functools import reduce

def convert_pdf_all_pages_to_txt(path):
    pdfFileObj = open(path, 'rb')
    reader = PyPDF2.PdfReader(pdfFileObj)
    num_pages = len(reader.pages)
    count = 0
    text = ""
    while count < num_pages:
        pageObj = reader.pages[count]
        count += 1
        text += pageObj.extract_text()
        text += "\n\n"
    return text

students_pdf_files = list(map(get_add_pdf_files, paths)) # List of lists of word files

file_contents =[];
for pdf_files in students_pdf_files:
    file_contents.append(reduce(lambda x, y: x + y, map(convert_pdf_all_pages_to_txt, pdf_files), "\n\n"))

pdfs_df.loc[:, "Sources"] = students_pdf_files
pdfs_df.loc[:, "Answers"] = file_contents

 

After that, we can utilize Azure OpenAI Service ChatGPT 3.5 to evaluate the answers.

 

import os
import json
import openai
openai.api_type = "azure"
openai.api_base = os.getenv("AZURE_OPENAI_ENDPOINT") 
openai.api_version = "2023-03-15-preview"
openai.api_key = os.getenv("AZURE_OPENAI_KEY")

def get_json_chatGpt(student, prompt):
    response = openai.ChatCompletion.create(
        engine="gpt-35-turbo", # engine = "deployment_name".
        messages=[
            {"role": "system", "content": "You are a teaching assistant."},
            {"role": "user", "content": prompt},      
        ],
        temperature=0.9,
        max_tokens=1600,
        top_p=0.0,
        frequency_penalty=0,
        presence_penalty=0,
        stop=None
    )
    # print(response)
    # print(response['choices'][0]['message']['content'])
    write_text_to_file(f"tmp/{student}.json", json.dumps(response))
    tokens = response['usage']['total_tokens']
    return json.loads(response['choices'][0]['message']['content']) , tokens

def grade_answer(student,student_answer, marking_scheme):    
    prompt=marking_scheme.replace("<ANSWER></ANSWER>", student_answer)
    retry = 0; 
    while True:
        try:
            content, tokens = get_json_chatGpt(student,prompt)
            break             
        except Exception as e:            
            if retry < 2:                
                retry += 1
                print(e)
                print("retry: " + str(retry))
                continue            
            return 0, "Error", 0, 0, True, 0, True
    marks = content['marks']
    comments = content['comments']       
    copyFromInternet = content['copyFromInternet']
    generativeAI = content['generativeAI']        
    manualReview = content['manualReview']     
    return marks, comments, copyFromInternet, generativeAI, manualReview, tokens, False    

def grade_answers(df_answers, marking_scheme):
    for index, row in df_answers.iterrows():      
        student = row["Student"]
        print(student)
        answer = row["Answers"]
       
        marks, comments, copyFromInternet, generativeAI, manualReview, tokens, error = grade_answer(student, answer, marking_scheme)
        df_answers.loc[index, "Marks"] = marks
        df_answers.loc[index, "Comments"] = comments
        df_answers.loc[index, "CopyFromInternet"] = copyFromInternet
        df_answers.loc[index, "GenerativeAI"] = generativeAI
        df_answers.loc[index, "ChatGptTokens"] = tokens     
        df_answers.loc[index, "ManualReview"] = manualReview
        df_answers.loc[index, "Error"] = error
    return df_answers

marking_scheme = read_text_file("marking_scheme.txt")

df_marked = grade_answers(df_answers, marking_scheme)
df_marked.to_excel("data/marks.xlsx", index=False)

 

 

The Prompt design for the marking scheme

 

Act as a teacher, give marks and comments for a writing assignment.

Assignment Background:
Kindly note that you are NOT required to answer all these questions. 
Job Responsibility and Roles 
    What is my job responsibility? 
    How do I view my job role in the workplace? 
What (or who) has influenced my approach to my job duty? 
What makes me feel good about being a staff in your workplace? 
Workplace experience 
    Any workplace experience that I have accumulated? 
    Any workplace experience I can recall is quite memorable?
Learning in Workplace
    Any technical skills (Hardware / Software) that I have learnt? 
    Any soft skills that I have learnt?
   Any skills that I have learnt are related to your programme “Higher Diploma in Cloud & Data Center Administration”?
Comments on Workplace and Suggestions
    Any comments on your workplace? 
    Any suggestions you can make to your workplace supervisor / mentor for your workplace to create a positive impact (e.g. improve efficiency, reduce human error ... etc)?
About WLA (Workplace Learning & Assessment)
    Any difficulties when I perform WLA
    Anything I have learnt through school or workplace can help my WLA
Evaluation
    What do I feel about job performance? If I give a mark (total 100) to myself, what is my result?
    Any achievements I have made so far?
    How does my workplace supervisor/mentor see my work?
    How do my workplace colleagues see my work?
Career Path / Future goals
    What is the career path of my current post? 
    What further goals do you want to achieve at my current job?
Appendices (Optional)
    Any cert or record of compliments I received from my workplace?
    Any evidence to support my statements? 

Comments on the following student answer in 150 words.
The text delimited by 20 equals sign is the student answer.
====================   
<ANSWER></ANSWER>
====================

Instructions:
Give a “mark” from 0-70 for the above student answer. 

Rules:
1. Job Responsibility and Roles                                 10 marks
2. Workplace experience                                         10 marks
3. Learning in Workplace                                        10 marks
4. Comments on Workplace and Suggestions                        10 marks
5. Job Evaluation (Self / Mentor / Colleague)                  10 marks
6. Career Path / Future goals                                   10 marks
7. Format (With at least 800 words but less than 2000 words)   10 marks
8. If the answer does not show anything meaningful give 0 marks and set manualReview to true.

"copyFromInternet" is 0 - 1.0 which is the likelihood of copying from internet. 
"generativeAI" is 0 - 1.0 which is the likelihood of the answer is using AI generate. 

"comments" includes.
1. Explain the mark calculation in "comments"
2. Shows the marks for each rule line by line in "comments"
3. "comments" in Encouraging Style.
4. Less than 200 words in "comments".
5. Escape all special characters in "comments" by following 6 rules:
    replaces \b to \\b
    replaces \n to \\n
    replaces \r to \\r
    replaces \t to \\t
    replaces \" to \\"
    replaces \ to \\

Do not include any explanations, only provide a RFC8259 compliant JSON response following this format without deviation in the following format.
++++++++++++++++++++++++++++++++++++++++++++++++++
{
    "marks": 30,
    "copyFromInternet": 0.1,
    "generativeAI": 0.5,    
    "manualReview": false,
    "comments": "This is good!"    
}
++++++++++++++++++++++++++++++++++++++++++++++++++
Do not return anything after the JSON Object.
The JSON response:

 

The prompt design is built upon Microsoft Learning's Prompt engineering techniques, which include the following steps:

  1. Create a "System message" with the directive, "Act as a teacher, give marks and comments for a writing assignment."
  2. Offer "Provide grounding context" by introducing an "Assignment Background."
  3. Enhance "Add clear syntax" by specifying a delimiter and injecting the student's answer.
  4. Initiate "Start with clear instructions" by presenting a marking rubric.
  5. Implement a "Chain of thought prompting" by explaining the mark calculation and displaying marks for each rule line-by-line in the "comments" section, thereby reducing the possibility of marking inaccuracies.
  6. Use "Specifying the output structure" to generate a valid JSON output. However, as it occasionally produces invalid JSON strings in the "comments" section, we reapply "Start with clear instructions" with the directive, "Escape all special characters in 'comments'." and “Do not return anything after the JSON Object.”

We highly recommend reviewing Microsoft Learning's Prompt Engineering techniques, as they are effective in addressing a wide range of issues. Other intriguing observations include the ability of ChatGPT to detect content copied from the internet and answers generated by AI. However, we currently treat these findings as references. Occasionally, ChatGPT may return an invalid JSON as it disregards the instruction, "Do not return anything after the JSON Object." To address this issue, we have implemented a retry mechanism and set the temperature to 0.9, ensuring that the returned value will have slight variations upon retry.

 

Sample outputs:

For 84 marks

“Great job! Your answer is well-structured and clearly showcases your experience as a Business Analyst Intern at Codefreesoft. Your reflection on your experience and the skills you developed is commendable. Your understanding of the workplace environment is insightful, and your suggestions for improvement are practical and show a willingness to improve. Your evaluation of your skills and job performance is honest and demonstrates self-awareness. Overall, your answer is comprehensive and meets all the requirements of the questions. Keep up the good work!”

For 48 marks

“This is a well-written reflective journal that showcases the individual's learning experiences as an IT system administrator. The answer highlights the importance of effective communication, collaboration, and continuous learning in the field of data analytics. The individual has reflected on their workplace experience and demonstrated an understanding of the tools and software used in the industry. The answer is well-organized and meets the required word count. However, there is room for improvement in terms of providing more specific examples to support their points and expanding on their suggestions for improving workplace efficiency. Overall, the individual has shown a positive attitude towards their career path and future goals. It is clear that they are committed to continuous learning and development in the field of data analytics.”


In-depth Answer Analysis Using Azure OpenAI Embedding

An embedding is a special format of data representation that can be easily utilized by machine learning models and algorithms. The embedding is an information dense representation of the semantic meaning of a piece of text. Each embedding is a vector of floating-point numbers, such that the distance between two embeddings in the vector space is correlated with semantic similarity between two inputs in the original format. For example, if two texts are similar, then their vector representations should also be similar.

 

We follow the Tutorial: Explore Azure OpenAI Service embeddings and document search from Microsoft Learn, to obtain embeddings for each student's answer.


Utilizing K-means Clustering for Student's Answers

 

from sklearn.cluster import KMeans

matrix = np.array(df_embeddings["ada_v2"].to_list())
n_clusters = 7
kmeans = KMeans(n_clusters=n_clusters, init="k-means++", random_state=42, n_init='auto')
kmeans.fit(matrix)
labels = kmeans.labels_
df_embeddings["Cluster"] = labels 
df_embeddings.head()

 

The clustering technique groups students with relatively similar answers into a single cluster, which assists us in identifying problematic cases within smaller-sized groups.


Visualizing Data to Improve Comprehension of Student Performance


T-distributed Stochastic Neighbor Embedding.

t-SNE [1] is a tool to visualize high-dimensional data. It converts similarities between data points to joint probabilities and tries to minimize the Kullback-Leibler divergence between the joint probabilities of the low-dimensional embedding and the high-dimensional data. t-SNE has a cost function that is not convex,    i.e. with different initializations we can get different results.

 

import seaborn as sns
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (15,  

tsne = TSNE(n_components=2, perplexity=5, random_state=42, init='random', learning_rate=200)
vis_dims2 = tsne.fit_transform(matrix)

x = [x for x,y in vis_dims2]
y = [y for x,y in vis_dims2]

palette = sns.color_palette("inferno", 20).as_hex() 

for category, color in enumerate(palette):    
    xs = np.array(x)[df_embeddings["Cluster"]==category]
    ys = np.array(y)[df_embeddings["Cluster"]==category]
    plt.scatter(xs, ys, color=color, alpha=0.1)

    avg_x = xs.mean()
    avg_y = ys.mean()
    
    plt.scatter(avg_x, avg_y, marker='x', color=color, s=100)
plt.title("Embeddings visualized using t-SNE")

 

 

cyruswong_3-1684817335805.png

 

Implementing Principal Component Analysis for Visualizing embedding and clustering results

Principal component analysis (PCA) - Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space.

 

from sklearn.decomposition import PCA

pca_df = df_embeddings.copy()
matrix = pca_df["ada_v2"].to_list()
pca = PCA(n_components=3)
vis_dims = pca.fit_transform(matrix)
pca_df["embed_vis"] = vis_dims.tolist()
pca_df

 

Calculates the ratio of the total variance each principal component captures.

 

print(str(sum(pca.explained_variance_ratio_)*100)+"%")

 

Analyzing the Change in Explained Variance Ratio

 

import numpy as np
nums = np.arange(14)

var_ratio = []
for num in nums:
  pca = PCA(n_components=num)
  pca.fit(matrix)
  var_ratio.append(np.sum(pca.explained_variance_ratio_))

import matplotlib.pyplot as plt

plt.figure(figsize=(4,2),dpi=150)
plt.grid()
plt.plot(nums,var_ratio,marker='o')
plt.xlabel('n_components')
plt.ylabel('Explained variance ratio')
plt.title('n_components vs. Explained Variance Ratio')

 


cyruswong_4-1684817415810.png

3D plots for 3 major principal components

 

%matplotlib widget
import matplotlib.pyplot as plt
import numpy as np

fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(projection='3d')
cmap = plt.get_cmap("tab20")

clusters = pca_df["Cluster"].to_list()

# Plot each sample category individually such that we can set label name.
for i, clusterId in enumerate(clusters):
    sub_matrix = np.array(pca_df[pca_df["Cluster"] == clusterId]["embed_vis"].to_list())
    
    x=sub_matrix[:, 0]
    y=sub_matrix[:, 1]
    z=sub_matrix[:, 2]
    colors = [cmap(i/len(clusters))] * len(sub_matrix)
    ax.scatter(x, y, zs=z, zdir='z', c=colors, label=clusterId)

    students = pca_df[pca_df["Cluster"] == clusterId].index.values.tolist()
    for i, txt in enumerate(students):
        ax.text(x[i], y[i], z[i], txt, size=8, zorder=1, color='k')

ax.set_xlabel('x')
ax.set_ylabel('y')
ax.set_zlabel('z')

 

cyruswong_5-1684817444288.png

Source code: AzureOpenAIChatGTPAutoGrader 

 

Conclusion

We are amazed by the power of Azure OpenAI Service Chatgpt!

It helped me to re-grade 70 students long essay assignments in Microsoft word and Adobe pdf in just 10 minutes and for less than US$0.5. It was worth spending a few hours on python to save thousands of hours in the future! It can actually perform better than me because I would get exhausted, and humans are prone to make mistakes when doing something repetitive for a long time. Next time, let AI first and human for the quality assurance.

 

By utilizing embeddings, we are able to gain valuable insights into students' answers, which allows us to better understand their thought processes, knowledge levels, and areas that may require additional support or instruction.

 

Project collaborators include Shing SetoStanley LeungKa Ka LeungXU YUAN and Hang Ming (Leo) Kwok from the IT114115 Higher Diploma in Cloud and Data Centre Administration.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.